In order to understand how the performances of databases can be bench marked it helps to understand how they can be organized for the purposes of classification. In terms of their structure, databases can be considered to be able to run two broad types of queries, namely,
- Online Transaction Processing (OLTP) queries — These are typically the queries that are run day in, day out in businesses such as Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems where for example invoice values are inserted, updated or deleted in databases based on end user interaction.
- Online Analytical Processing (OLAP) queries- These are queries that are run for the purposes of reporting, usually by the Business Intelligence Team of a company for KPI evaluation and tracking of metrics by the mid and senior level management of the company. Another feature to note that the database is usually structured as Fact and Dimension tables using something known as a star schema design, to ensure faster processing of data.
So now that you know that these are broadly the two main types of workloads the next question is how to benchmark them. If you read the sales pitch decks that database vendors pitch it would seem that each of their own database is the ‘supposed’ best based on their own tests. This is where the TPC comes in. The TPC as stated on the mission statement on their website is “ a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry.”
In terms of database benchmarks that are relevant the two most relevant benchmarks to this day are TPC-E and TPC-H. TPC-E is an OLTP benchmark while TPC-H can be considered to be an OLAP benchmark.
TPC-E is defined as follows on the TPC website: “Approved in February of 2007, TPC Benchmark E is an on-line transaction processing (OLTP) benchmark. TPC-E is more complex than previous OLTP benchmarks such as TPC-C because of its diverse transaction types, more complex database and overall execution structure. TPC-E involves a mix of twelve concurrent transactions of different types and complexity, either executed on-line or triggered by price or time criteria. The database is comprised of thirty-three tables with a wide range of columns, cardinality, and scaling properties. TPC-E is measured in transactions per second (tpsE). While the benchmark portrays the activity of a stock brokerage firm, TPC-E is not limited to the activity of any particular business segment, but rather represents any industry that must report upon and execute transactions of a financial nature.”
TPC-H is defined as follows : “The TPC Benchmark™H (TPC-H) is a decision support benchmark. It consists of a suite of business oriented ad-hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The performance metric reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric (QphH@Size), and reflects multiple aspects of the capability of the system to process queries. These aspects include the selected database size against which the queries are executed, the query processing power when queries are submitted by a single stream, and the query throughput when queries are submitted by multiple concurrent users. The TPC-H Price/Performance metric is expressed as $/QphH@Size.”
TPC-H results are grouped by database size and it is interesting to note Microsoft SQL Server featuring consistently across the different database sizes as of 7th July 2020. Refer — http://www.tpc.org/tpch/results/tpch_perf_results5.asp?resulttype=all
For TPC-E results check out — (Microsoft SQL Server scores consistently high across several appliances in this space as well ) http://www.tpc.org/tpce/results/tpce_perf_results5.asp?resulttype=all