RHCS 3.0 is based on Ceph Luminous (12.2.1), and provides optimizations that boost performance. I tested using FileStore to directly compare to our previous Reference Architecture based on RHCS 2 (Jewel).
|Red Hat Ceph Storage 3 works great with NVMe:|
|+10% Higher 4KB Random Read IOPs*|
|+5% Higher 4KB Random Write IOPs*|
|+20% Higher 4MB Object Write Throughput*|
|*Improvement over our previous Reference Architecture w/ RHCS 2|
Performance is improved over our initial reference architecture due to optimizations in Red Hat Ceph 3.0 and a performance boost in the Micron 9200 MAX NVMe SSD.
This solution is optimized for block performance. Random small block testing using the Rados Block Driver in Linux will saturate the highest bin 22-core Intel Broadwell processors in a 2-socket storage node.
Object workload performance is limited by 50GbE network throughput on reads and Ceph Filestore overhead on writes.
With 6 drives per storage node, this architecture has a usable storage capacity of 138TB that can be scaled out by adding up to 4 additional drives per storage node or by adding additional storage nodes.
Quick Caveat: These tests were run with the beta release of Red Hat Ceph Storage 3.0. The GA release of Red Hat Ceph Storage 3.0 is available now. Though the performance should be unaffected, take these numbers with a small grain of salt.
The full hardware and software details can be found in our updated Reference Architecture Document.
Test Results and Analysis
Ceph Test Methodology
Ceph is configured using FileStore with 2 OSDs per Micron 9200MAX NVMe SSD. A 20GB journal was used for each OSD. With 6 drives per storage node and 2 OSDs per drive, Ceph has 48 total OSDs with 138TB of usable capacity. The Ceph pool tested was created with 8192 placement groups and 2x replication.
4KB random block performance was measured using FIO against the Rados Block Driver. 100 RBD images were created at 50GB each, resulting in a dataset of 5TB (10TB with 2x replication). 4MB object performance was measured using the Rados Bench tool included in Ceph.
RBD FIO 4KB Random Read Performance
4KB random read performance is measured against all 100 RBD images to ensure the entire dataset is under test. The queue depth per FIO process was scaled up from a queue depth per client of 1 to 32.
We hit maximum 4KB random read performance of 1.3 Million IOPs at a client queue depth of 16. Average CPU utilization is at 90%+, limiting performance.
RBD FIO 4KB Random Write Performance
4KB random write performance is measured by scaling up the number of FIO clients writing to a unique RBD image per client. The FIO clients are evenly spread across 10 load generation servers. FIO random write tests were run at a queue depth of 32.
4KB write performance hits an optimal mix of IOPs and latency at 30 FIO clients, 254k IOPs, 3.8 ms average latency. At this point, the average CPU utilization on the Ceph storage nodes is over 90%, limiting performance.
Rados Bench 4MB Object Read Performance
4MB object read performance is tested by reading from a 5TB dataset across 8 clients, scaling the number of threads used by Rados Bench.
4MB object read performance is network limited. At 12 Rados Bench threads and up, the read throughput is over 20 GB/s, which is very close to the theoretical maximum bandwidth of the 4x 50 GbE NICs in the storage nodes. Storage node CPU usage is minimal (~20%).
Rados Bench 4MB Object Write Performance
4MB object write performance is tested by scaling up the number of clients running Rados Bench (8 threads) against the Ceph Cluster.
4MB Object write is limited by the overhead in Ceph Filestore. Each 4MB object written causes 2 journal writes and 2 flushes to the OSD data partition (~16MB written). Storage node CPU usage is minimal (~30%).
Good News Everyone!
Through software optimization and the added horsepower of the Micron 9200 MAX, Red Hat Ceph Storage 3.0 beta performs better than our previous RA.
I’m currently testing the GA version of RHCS 3.0 + the Micron 9200 MAX NVMe SSD on the Intel Purley platform. A new Micron / Red Hat / Supermicro Reference Architecture is underway. Stay tuned!