BlueStore is the new storage engine for Ceph and is the default configuration in the community edition. BlueStore performance numbers are not included in our current Micron Accelerated Ceph Storage Solution reference architecture since it is currently not supported in Red Hat Ceph 3.0. I ran performance tests against the community edition of Ceph Luminous (12.2.4) on our Ceph reference architecture hardware and will compare the results to the FileStore performance we achieved in RHCS 3.0 in this blog.
BlueStore makes a big difference with large object operations:
- 4MB object read throughput increases by 48% while average latency decreases by 33%.
- 4MB object write throughput increases by 83% while average latency decreases by 46%
This solution is optimized for block performance but still performs well for large object operations.
With 10 drives per storage node, this architecture has a usable storage capacity of 232TB that can be scaled out by adding additional 1U storage nodes.
Reference Design – Hardware
Test Results and Analysis
Ceph Test Methodology
Red Hat Ceph Storage 3.0 (12.2.1) is configured with FileStore with 2 OSDs per Micron 9200 MAX NVMe SSD. A 20GB journal was used for each OSD.
Ceph Luminous Community (12.2.4) is configured with BlueStore with 2 OSDs per Micron 9200MAX NVMe SSD. RocksDB and WAL data are stored on the same partition as data.
In both configurations there are 10 drives per storage node and 2 OSDs per drive, 80 total OSDs with 232TB of usable capacity. The Ceph storage pool tested was created with 8192 placement groups and 2x replication.
4MB Object performance is measured using RADOS bench. This represents the best-case object performance without taking RADOS gateway configuration and overhead into account. Writes are measured by scaling up the number of clients writing to Ceph at a thread count of 16. Reads are measured using all 10 load generation clients and scaling up the number of threads per RADOS Bench run.
RADOS Bench 4MB Object Read Performance: FileStore vs. BlueStore
BlueStore 4MB object read throughput increases by 48% while average latency decreases by 33%. The Ceph reference architecture is tuned for small random block performance in both the FileStore and BlueStore cases.
With FileStore, reaching a higher object read throughput may be achievable by altering ceph.conf, though that will most likely reduce 4KB random block workloads.
BlueStore is network limited at 100GbE, even when tuned for small block performance.
RADOS Bench 4MB Object Write Performance: FileStore vs. BlueStore
BlueStore 4MB object write throughput increases by 83% while average latency decreases by 46%.
This large differential is due to the journaling mechanism in FileStore vs. metadata storage in BlueStore. With FileStore, on a 2x replicated pool, a single 4MB object is written to two separate OSD journals, then de-staged to disk on both OSDs. A single 4MB object becomes a 16MB write. With BlueStore the 4MB object is written to two OSDs along with a small amount of metadata, writing ~8MB to disk. This explains the almost 2x performance improvement in 4MB object writes with BlueStore.
Would You Like to Know More?
RHCS 3.0 + the Micron 9200 MAX NVMe SSD on the Intel Purley platform is super fast. The latest reference architecture for Micron Accelerated Ceph Storage Solutions is available now. My previous blog post discussed FileStore vs. BlueStore related to IOPs and latency. You can find it here. I presented details about the reference architecture and other Ceph tuning and performance topics during my session at OpenStack Summit 2018. A recording of my session is available here.
Have additional questions about our testing or methodology? Leave a comment below or you can email us email@example.com.