Field Storage Tests with Drupal 7
I had some spare time this weekend and decided to do some tests with the field storage layer. I really just wanted to re-produce the results Moshe Weitzman published a while back. I also wanted to see what the best results I could get were.
The software and versions used for testing were:
- EC2 EBS backed Large instance (8GB of memory) in the US-EAST availability zone
- Ubuntu 12.04 (ami-fd20ad94 as listed in official ubuntu AMI’s)
- MySQL 5.5.28
- PostgreSQL 9.2
- MongoDB 2.0.4
- Drupal 7.17
- Drush 5.1
- Migrate 2.5
I ran tests against both MySQL and PostgreSQL with default settings for both but I also ran tests where I modified the configuration of both systems to be optimized for writes.
The configuration options I specified for MySQL when tuning it were:
innodb_flush_log_at_trx_commit=0 innodb_doublewrite=0 log-bin=0 innodb_support_xa=0 innodb_buffer_pool_size=6G innodb_log_file_size=512M
The configuration options I specified for PostgreSQL when tuning it were:
fsync = off synchronous_commit = off wal_writer_delay = 10000ms wal_buffers = 16MB checkpoint_segments = 64 shared_buffers = 6GB
The dataset used for the tests comes from the migrate_example_baseball module that comes as part of the migrate module. This dataset contains a box score from every Major League Baseball game from the year 2000 to the year 2009. Each year’s data is contained in CSV file. Different components of the box score are saved in fields hence stressing field storage a lot.
Average throughput numbers for the various configurations I tested are shown in the table below.
|Default MySQL||1932 nodes / minute|
|Default PostgreSQL||1649 nodes / minute|
|Tuned MySQL||3024 nodes / minute|
|Tuned PostgreSQL||1772 nodes / minute|
|Default MySQL with MongoDB||4609 nodes / minute|
|Default PostgreSQL with MongoDB||4810 nodes / minute|
|Tuned MySQL with MongoDB||7671 nodes / minute|
|Tuned PostgreSQL with MongoDB||5911 nodes / minute|
The image below shows the results graphically for different environments I tested. The Y axis is throughput (node per minute) with the X axis specifying the CSV file (corresponding to a MLB year) being imported.
Its pretty obvious from glancing at the results above that using MongoDB for field storage results in the best throughput. Tuned MySQL using MongoDB for field storage gave me the best results. This is consistent with what Moshe reported in his original article as well.
What was very interesting to me was the PostgreSQL numbers. The overhead of having a table per field with the default SQL field storage seems to be very high with PostgreSQL. Its interesting to see how much better an optimized PostgreSQL does when using MongoDB for field storage.
After performing these tests, one experiment I really want to try now is to create a field storage module for PostgreSQL that uses the JSON data type included in the 9.2 release. Hopefully, I will get some spare time in the coming week or two to work on that.
blog comments powered by Disqus