Apache Cassandra is an open-source (branded/bundled) NoSQL database designed from the ground-up to support a high-performance, fault-tolerant, distributed (node distributed or geographically distributed) deployment model. Cassandra is widely adopted in big data analytics, automatic product recommendation systems, online catalog displays, messaging platforms, query analytics and a host of other real-time and near real-time applications. As data volumes expand and performance user/system demands grow, IT is left searching for the ‘next level’ of database responsiveness.
This technical marketing brief highlights the remarkable levels of performance we measured with the Micron 9100 MAX PCIe NVMe SSD in a Cassandra cluster using the Yahoo Cloud Serving Benchmark (YCSB). Measuring several common Cassandra workloads (YCSB workloads A-D and F), we found remarkably high performance (up to 80,000 database transactions per second with workload A) and extremely low and consistent latency.
Users and Systems Demand More, IT Needs Simpler
Many Cassandra implementation teams are pulled in several directions. The growing demands of users and applications can tax traditional designs with unparalleled performance and responsiveness pressures, while deployment and support teams need a simpler way to build, a simpler way to grow.
Moving to the Micron 9100 MAX SSD enables Cassandra platform designers to rethink how they manage these demands. Performance storage like the 9100 MAX can offer these clear and compelling benefits:
- Improved cluster performance
- Smaller, simpler clusters
- Fewer nodes
- Less required rack space
- More power-efficient clusters
With the introduction of NVMe-based SSDs like Micron’s 9100 MAX, this migration to next-generation Cassandra database platforms is coming to the forefront. NVMe SSDs like the 9100 MAX bring data closer to the CPUs for faster processing, lower latency and better performance.
Analyzing an entire Cassandra cluster in a single step can be expensive and time consuming. Each node has to be designed, the number of nodes estimated and the entire system deployed before doing the first performance test on the cluster. To help mitigate these steps, we recommend analyzing the performance of a smaller cluster (two nodes). This gives insight into overall cluster sizing requirements and potential performance for a given workload without the challenges of a larger cluster.
For our performance testing, we used YCSB to measure Cassandra database performance (in operations per second) and responsiveness (average latency in milliseconds) of a two-node Cassandra cluster. Each node in the cluster used a 9100 MAX (1.2TB), for two SSDs per two-node cluster.
In the sections below, we organize performance results by workload (A-D and F), then within each workload section we show measured Cassandra performance in operations per second and latency. We tested with a broad range of thread counts, from 48 to 1024, reflecting common use and performance characterization. Each section shows both performance data and latency data. When noting performance results in each section, taller is better in each performance figure. For both average and 99th percentile latency data, lower is better in each latency figure.
Workload A: Session Action/Recording
Workload A is an update-heavy workload, with 50% of the total I/Os writing data. At the application level, this workload is very similar to recording recent session actions. In Figure 1a, the 9100 MAX cluster performance is along the vertical axis (shown in average operations per second), with taller being better. Figure 1b shows average and 99th percentile latency data for the same test data set (average latency on the left-most vertical axis, the solid line, and 99th percentile latency along the right-most vertical axis, the dashed line). The thread count in both Figures 1a and 1b ranges across the horizontal axis from 48 to 1024.
The 9100 MAX two-node cluster completed between 39,000 and nearly 80,000 database operations per second as the thread count ranged from 48 up to 1024, with the highest performance seen as the system gets busier (more than 240 threads), as shown in Figure 1a. The average latency shows a smooth increase as the thread count increases, while the 99th percentile latency remains well controlled, as seen in Figure 1b.
Workload B: Adding Metadata/Tags
Workload B is an update-light, read-mostly workload, with 5% of the total I/Os writing data. At the application level, this workload is very similar to adding metadata to existing content such as tagging photographs, articles or adding other metadata to existing content.
The 9100 MAX two-node cluster completed between 29,000 and 53,000 database operations per second as the thread count ranged from 48 up to 1024, with the highest performance again seen with a busy platform (Figure 2a). Similar to Workload A, the Workload B average latency shows a smooth increase as the thread count increases, while the 99th percentile latency remains well controlled as, seen in Figure 2b.
Workload C: Static Data Cache
Workload C is a read-only workload (100% of the total I/Os read data; there is no write traffic). At the application level, this workload is very similar to reading user profiles or other static data where profiles are constructed elsewhere.
The 9100 MAX two-node cluster completed between 28,000 and 50,000 database operations per second as the thread count ranged from 48 up to 1024, with the increasing performance seen with increasing thread count, albeit with a small drop off when the thread count reaches 1024 (Figure 3a). Workload C average latency shows a familiar smooth increase as the thread count increases, while the 99th percentile latency remains well controlled across the tested thread counts.
Workload D: Recent Statistics Tracking
Workload D reads the latest entries (most recent records are the most popular). At the application level, this workload is very similar to reading user status updates (where users want to read the most recent entries). Examples of this workload include social media, frequently changing or updated product literature, or software development repositories.
The 9100 MAX cluster completed between 34,000 and 57,000 database operations per second as the thread count ranged from 48 up to 1024, with the highest performance again seen with the busier platform (there is little change beyond 240 threads), as seen in Figure 4a. Workload D average and 99th percentile latencies show a trend similar to other tested workloads, as seen in Figure 4b—with the expected steady, but well controlled, increases with higher thread counts.
Workload F: User Record Changes
Workload F is a read/modify/write workload in which records are read, changed and written back. At the application level, this workload is very similar to users reading and changing data or tracking user activity.
The 9100 MAX cluster completed between 24,000 and 53,000 database operations per second as the thread count ranged from 48 up to 1024, with the highest performance seen again with higher workload (greater thread count), as can be seen in Figure 5a. Workload F average and 99th percentile latencies are both low and well controlled, as seen in Figure 5b.
This brief looks at Cassandra database performance for a small, two-node cluster equipped with Micron’s 9100 MAX (1.2TB) PCIe NVMe SSD (one SSD per node). We used YCSB running several common Cassandra workloads to measure small cluster performance and latency. This small cluster performance can help analyze an entire Cassandra cluster prior to deployment, as large-scale analyzing in one step can be expensive and time consuming. Analyzing smaller cluster (two nodes) performance gives insight into overall cluster sizing requirements and potential performance for a given workload.
Equipped with one 9100 MAX SSD per node, the two-node test cluster was able to reach up to 80,000 operations per second (YCSB Workload A with >240 threads–a ‘busy’ platform) while showing excellent high thread count (‘busy’ platform) performance and demonstrating low average and well controlled 99th percentile latencies.
The 9100 MAX brings clear benefits to Cassandra deployments—better performance and lower latency for more responsive applications. The 9100 MAX is the energy boost for your Cassandra clusters.