In memory databases : An Introduction (Open Source & Proprietary)- Part I

Nuzhi Meyen
3 min readJan 17, 2017

--

Well before I begin to touch on this topic I have got to say that the title is a lil’ misleading as I will touch more on a particular proprietary database (specifically SAP HANA, as I have worked with it) as opposed to the other Open Source and Proprietary databases out there.

Relational Database Management Systems (RDBMSs) in their current form owe their origin to the seminal paper ( https://goo.gl/HFV1xf) by E.F. Codd of IBM research, in the 1970’s , in which he suggested a relational model of data for a more efficient form of information retrieval for an enterprise. With the adoption of this relational form for declaring and manipulating data by commercial vendors, enterprises soon began to realize the simplicity and ease of use of making sense of the data which they stored, with the introduction of SQL.

Fast forward to now, where with the explosion in data volumes, velocity and variety business leaders realized that in order to gain a competitive advantage over another as well as to react to the dynamic opportunities and threats faced by them , they needed access to real-time information. In-memory databases were born out of this need coupled with the manufacture of cheaper main memory. In-memory databases are generally much faster when compared to disk-optimized databases since the memory access is faster than disk access, in addition to the internal optimization algorithms being simpler and executing fewer CPU instructions.

Figure 1 — Increase in uncertainty with data volume, velocity and variety ( Courtesy — SAP Hana: An Introduction)

Anyone who has worked with distributed systems would have heard of the CAP (Consistency, Availability & Partition tolerance) theorem posited by Eric Brewer. Simply put it states that you can achieve two of the above characteristics relatively well in a distributed system, whilst having to compromise to a certain degree on another. RDBMSs usually achieve Availability & Consistency while compromising on Partition Tolerance.

Figure 2 — Where different databases appear on the CAP continuum (Courtesy — Cassandra: The Definitive Guide)

When working with RDBMSs it can be seen that they tend to typically support two main types of workloads. These workloads are known as Online Transaction Processing (OLTP) & Online Analytical Processing (OLAP). These two workloads differ mainly in the manner in which they tend to store data with normalized forms of tables are used in OLTP, while in OLAP data are stored in fact tables and dimension tables which are related to each other with star-schema like designs.

It used to be the case that OLTP and OLAP were handled were handled by RDBMs in different systems with OLTP being handled by RDBMSs integrated with business processes while OLAP would be handled by the traditional data warehouse. However with the emergence of databases like SAP Hana, by enabling the use of column storage for OLTP (Earlier column storage technology was developed solely for OLAP), SAP was able to provide a database platform which could handle both OLTP and OLAP queries. IMHO, this was a key differentiator which set apart SAP from it’s competitors with respect to database systems at the time.

In my next post, I hope to delve more in detail with respect to in-memory RDBMSs as well as NoSQL databases.

--

--

Nuzhi Meyen
Nuzhi Meyen

Written by Nuzhi Meyen

Co-founder of Helios P2P. Sri Lankan. Interested in Finance, Advanced Analytics, BI, Data Visualization, Computer Science, Statistics, and Design Thinking.

Responses (1)