Storage

High-Performance, Balanced Core Count: AMD 8-Core and 16-Core (7FX2) + Micron 7300 NVMe on Microsoft SQL Server 2019

By Dilim Nwobu, Ryan Meredith - 2020-04-14

AMD has unleashed its EPYCTM 7FX2 line of CPUs with high clock frequency; high cache; and 8, 16, and 24 cores. Our team in the Micron Austin performance lab tested the performance of these new CPUs using Microsoft SQL Server 2019 with Micron’s new mainstream data center NVMe drive, the 7300.

Enterprise applications like SQL Server can stress the beefiest systems, so why use a CPU with fewer cores when the EPYC family offers higher-core options? Efficiency is the key — either performance per core or performance per watt can determine the best total cost of ownership (TCO) of a hardware platform.

For this technical blog, we compared two of the three new CPUs, the AMD EPYC 7F32 and 7F52. We tested the 7F32 and 7F52 because of their high L3 cache per core. We expected the 24-core 7F72 to perform similarly but did not have time to test it before launch. Table 1 includes the specs of the three CPUs.

Table 1: AMD EPYC new processor offerings

Model Core Count TDP Target L3 Cache L3 Cache/Core Base Freq Max Boost
7F32 8 180W 128MB 16MB 3.7GHz 3.9GHz
7F52 16 240W 256MB 16MB 3.5GHz 3.9GHz
7F72 24 240W 192MB 8MB 3.3GHz 3.8GHz

 

We also wanted to see how Micron’s mainstream NVMe SSD, the 7300, could enable great performance in a test that would punish all the components in a system (Table 2).

Table 2: Micron 7300 performance characteristics

Model Capacity 4KB Random Read IOPs 4KB Random Write IOPs 4KB Random 70/30 IOPs 128k Sequential Read 128k Sequential Write
Micron 7300PRO 3.84TB 520k 70k 160k 3.0GB/s 1.9GB/s

By performing these initial tests, we wanted to provide some insights into whether the 8-core or 16-core option worked best. The decision comes down to what’s important to the user and the workload that the user is running.

How We Tested

Hardware

We used a Dell PowerEdge 7515 server for our testing. The BIOS was tuned with NUMA (non-uniform memory access) per socket (NPS) set to 4, as recommended by AMD. We used the current production BIOS with AGESA (AMD Encapsulated Software Architecture) version 1.0.0.5. Other system BIOS tunings were left to the defaults. Table 3 summarizes the server configurations for both the database server (the system under test) and the load generation server illustrated in Figure 1.

Table 3: Test configuration


Test Systems Loadgen
Processor(s) 1x AMD EPYC 7F32/52 2x Intel Platinum 8168
Memory 512GB 384GB
Storage 4x Micron 3.84TB 7300 PRO
NVMe SSD (LVM RAID 10)
N/A
Network 25Gbps LOM 25Gbps LOM
Operating System CentOS 8.1 CentOS 7.7
Application Microsoft SQL Server Linux 2019 Py-TPCC

Figure 1: Test configuration overview

Figure 1: Test configuration overview

 

Workload and Dataset

Our test workload was a custom, internally developed benchmarking application called Py-TPCC, written in Python. The implementation was very similar to HammerDB and provided comparable — but not identical — performance. It was based off the Transaction Processing Council’s online transaction processing (OLTP) TPC-C benchmark specifications, with a few modifications to better load the entire system and ensure that the entire dataset was accessed during the testing period. To measure performance, we recorded the number of TPC-C transactions (stored procedures) per minute, simply referred to as TPM.

We created a 1TB dataset to ensure the target database didn’t fit in memory. Consequently, the 2-to-1 dataset-to-memory ratio for this configuration resulted in a write-intensive workload to disk.

Test Procedure

Below is a simplified outline of how tests were executed and measured:

  1. Restored dataset, replacing any existing database
  2. Applied load
    1. Ramped up to get to steady state: 20 minutes
    2. Began test measurement period
    3. Continued applied load: 30 minutes
    4. Stopped test

We repeated this process on both test systems, steadily increasing the load applied, until a predefined stop condition was met. In this testing, we stopped increasing load once the resulting TPM reached a performance plateau.

Our Results

Predictably, the 16-core 7F52 CPU supported higher TPC-C TPMs than the 8-core CPU (Figure 2). When looking at transaction response times, both CPUs reported an aggressive average response time (Figure 3) and 99.9% response time (Figure 4), with the 16-core latency lower than the 8-core latency.

Figure 2: Py-TPCC transactions per minute comparison

Figure 2: Py-TPCC transactions per minute comparison

Figure 3: Py-TPCC average response time per transaction

Figure 3: Py-TPCC average response time per transaction

Figure 4: Py-TPCC response times for 99.9% of transactions

Figure 4: Py-TPCC response times for 99.9% of transactions

 

These results and quick response times would be impossible if not for the Micron 7300 PRO NVMe SSD. Microsoft SQL Server has a challenging I/O profile that mixes 64KB reads and writes with smaller 4KB and 8KB I/O. In testing, the logical volume manager (LVM) volume consisting of 4x 3.84TB NVMe SSDs was able to keep the CPUs busy while introducing minimal latency (Figure 5).

Figure 5: Micron 7300 I/O performance

Figure 5: Micron 7300 I/O performance

 

TPM Performance per Core

Looking at application performance efficiency, we saw that the 8-core (7F32) clocked a higher TPM per core than the 16-core (7F52) (Figure 6).

Figure 6: Py-TPCC transaction per minute per CPU core

Figure 6: Py-TPCC transaction per minute per CPU core

 

Power Utilization and Efficiency

The 16-core (7F52) had a higher overall system power draw, which made sense due to the higher thermal design point (TDP) of the CPU and the fact that it was processing more transactions than the 8-core (Figure 7).

Figure 7: Average system power consumed per test iteration

Figure 7: Average system power consumed per test iteration

 

Measuring the power consumed per Py-TPCC transaction, we saw that the 16-core was more power-efficient per operation than the 8-core (Figure 8).

Figure 8: Power consumed per Py-TPCC transaction

Figure 8: Power consumed per Py-TPCC transaction

 

Conclusion

Microsoft SQL Server 2019 can demand very high system resources. Both the AMD 8-core (7F32) and 16-core (7F52) CPUs can fulfill enterprise demands for SQL Server performance due to their high clock speeds and large L3 cache per core. In a high-transaction environment like OLTP solutions, maximizing transactions completed per minute and overall efficiency are the most important success criteria, and the 16-core (7F52) CPU is a great fit. If maximum performance per core is the primary success criteria, then the 8-core 7F32 CPU may be a better fit.

With either configuration, having fast, cost-effective storage is key. Micron’s 7300 PRO data center NVMe is the perfect fit for enterprise use cases like this, where a balance of cost, efficiency and performance will guide architecture decisions.

Would You Like to Know More?

Check out more of what Micron and AMD can do together. Performance does not have to break the bank, and AMD EPYC and Micron 7300 mainstream NVMe SSDs are a match made in the data center.

Through Micron Accelerated Solutions, we also have a wide variety of workload-optimized solutions that provide ready-to-build enterprise storage-centric workloads.

Dilm Nwobu portrait

Dilim Nwobu

Dilim Nwobu started his career at Dell Technologies where he worked on storage, custom solutions and server platform development. At Micron, Dilim is a storage solution engineer where he focuses on testing Micron products with Microsoft technologies, namely SQL Server and Azure Stack HCI.

Ryan Meredith

Ryan Meredith

Ryan Meredith is director of Data Center Workload Engineering for Micron's Storage Business Unit, testing new technologies to help build Micron's thought leadership and awareness in fields like AI and NVMe-oF/TCP, along with all-flash software-defined storage technologies.

+