The recent developments around in-memory analytics and processing are very promising. But, GAIL SHAW believes it is worth having a look at the underlying technology before jumping on the bandwagon.
Lately, it is hard to ignore the increasing attention of ‚Äòin-memory’ and whether or not it is going to change everything. While the recent developments around in-memory analytics and processing are indeed very promising, it is worth having a look at the underlying technologies, before jumping on the bandwagon.
‚ÄòBig data’ is in itself a buzzword. 5GB of relational banking transactional data a week is not big data but on the other hand, CERN’s Large Hadron Collider’s experimental results, at 6GB/second, is much more in the big data category. At its core, big data refers to data that is too large, too fast and/or too varied to be processed with traditional data-processing methods. It is not just about size, big data implies that there is a lot of data, that it is accumulating quickly and that it is unstructured.
While big data may be processed using in-memory techniques, as a result of the size of the data involved, it is not likely to be stored in memory. Aggregations may be stored in memory, temporarily or permanently, but not the data itself.
When talking about in-memory, it is important to distinguish what kind of memory is being spoken of. Flash memory is a high-speed, non-volatile storage technology. It is what is used in Solid State Hard Drives (SSDs). While significantly faster than traditional magnetic media, flash memory is still a storage technology. However, it can be used to extend a server’s main memory as in the case of SQL Server 2014’s Buffer Pool Extensions.
Generally, flash memory is used to reduce or remove the IO latency from applications that store and retrieve large amounts of data, like relational database engines. Since flash storage in the form of SSDs appear as regular drives, the applications using them need no special code or changes to handle them.
In-memory analytics allows for very fast querying and analysis of data. It encompasses applications such as Tableau and the PowerPivot plugin for Microsoft’s Excel. In-memory analytics will often store data in columnar format, as opposed to the traditional row-based storage seen in relational databases.
Columnar storage enables a very high compression rate, allowing for high speed analytics. The high compression means that large data sets can be worked on in-memory. Typically, the data is stored on durable storage and processed in-memory.
In-memory transactional processing
With in-memory OLTP engines, the data is stored in volatile memory and persisted at intervals to durable storage (flash memory or traditional magnetic drives). In-memory OLTP allows for extremely fast transactional processing, making it suitable for applications that need very high insert rates and to execute huge numbers of small transactions.
Examples of this are stock market trading, telephone exchanges and on-line stores. Solutions in this area include the Hekaton feature in Microsoft SQL Server 2014 (renamed as In-memory OLTP) and Oracle’s TimesTen In-Memory Database engine.
Whichever form of in-memory solution is being considered, it is critically important that it not be implemented as a ‚Äòsolution looking for a problem’. The problems that the various in-memory solutions solve are very different from one another. There is no gain from deploying in-memory solutions for the sake of doing so. Any solution must be carefully considered in the light of what problem it is expected to solve and what gains it is expected to give. Like with most areas of technology, there is unfortunately no magic solution for all problems to be found here.
*Gail Shaw, Technical Lead at Entelect Software
* Follow Gadget on Twitter on @GadgetZA