September 2013 – Database Fog Blog

More on the Oracle 12c In-Memory Option

In-memory DBMS in a jar (Photo credit: Wikipedia)

I recently listened to a session by Juan Loalza of Oracle on the 12c In-memory option. Here are my notes and comments in order…

There is only one copy of the data on disk and that is in row format (“the column format does not exist on disk”). This has huge implications. Many of the implications come up later in the presentation… but consider: on start-up or recovery the data has to be loaded from the row format and converted to a columnar format. This is a very expensive undertaking.

The in-memory columnar representation is a fully redundant copy of the row format OLTP representation. Note that this should not impact performance as the transaction is gated by the time to write to the log and we assume that the columnar tables are managed by MVCC just like the row tables. It would be nice to confirm this.

Data is not as compressed in-memory as in the hybrid-columnar-compression case. This is explained in my discussion of columnar compression here.

Oracle claims that an in-memory columnar implementation (level 3 maturity by my measure see here) is up to 100X faster than the same implementation in a row-based form. Funny, they did not say that the week before OOW? This means, of course, that HANA, xVelocity, BLU, etc. are 100X faster today.

They had a funny slide talking about “reporting” but explained that this is another word for aggregation. Of course in-memory vector aggregation is faster in-memory.

There was a very interesting discussion of Oracle ERP applications. The speaker suggested that there is no reporting schema for these apps and that users therefore place indexes on the OLTP database to provide performance for reporting and analytics. It was suggested that a typical Oracle E-Business table would have 10-20 indexes on it and it was not unusual to see 30-40 indexes. It was even mentioned that the Siebel application main table could require 90 indexes. It was suggested that by removing these indexes you could significantly speed up the OLTP performance, speed up reporting and analytics, cure cancer and end all wars (OK… they did not suggest that you could cure cancer and end war… this post was just getting a little dry).

The In-memory option is clusterable. Further, because a RAC cluster uses shared disk and the im-memory data is not written to disk there is a new shared-nothing implementation included. This is a very nice and significant architectural advance for Oracle. It uses the direct-to-wire infiniband protocol developed for Exadata to exchange data. Remember when Larry dissed Teradata and shared-nothingness… remember when Larry dissed in-memory and HANA… remember when Teradata dissed HANA as nonsense and said SAP was over-reacting to Oracle. Gotta smile :).

Admins need to reserve memory from the SGA for the in-memory option. This is problematic for Exadata as the maximum memory on a RAC node is 256GB… My bad… Exadata X3-8 supports 2TB of memory per node… this is only problematic for Exadata X3-2 and below… – Rob

It is possible to configure a table partition into memory while leaving other partitions in row format on disk.

It is suggested again that analytic indexes may be dropped to speed up OLTP.

The presenter talked about how the architecture, which does not change the row-based tables in any way, allows all of the high-availability options available with Oracle to continue to exist and operate unchanged.

But the beautiful picture starts to dull a little here. On start-up or first access… i.e. during recovery… the in-memory columnar data is unavailable and loaded asynchronously from the row format on disk. Note that it is possible to prioritize the order tables are loaded to get high-priority data in-memory first… a nice feature. During this time… which may be significant… all analytic queries are run against the un-indexed row store. Yikes! This kills OLTP performance and destroys analytic performance. In fact, the presenter suggested that maybe you might keep some indexes for reports on your OLTP tables to mitigate this.

Now the story really starts to unravel. Indexes are required to provide performance during recovery… so if your recovery SLA’s cannot be met without indexes then you are back to square one with indexes that are only used during recovery but must be maintained always, slower OLTP, and the extra requirement for a redundant in-memory columnar data image. I imagine that you could throttle some reports after an outage until the in-memory image is rebuilt… but the seamless operations during recovery using standard Oracle HA products is a bit of a stretch.

Let me raise again the question I asked in my post last week on this subject (here)… how are joins processed between the row format and the columnar format. The presenter says that joins are no problem… but there are really only two ways… maybe three ways to solve for this:

When a row-to-columnar join is identified the optimizer converts the columnar table to a row form and processes the join using the row engine. This would be very very slow as there would be no indexes on the newly converted columnar data to facilitate the join.
When a row-to-columnar join is identified the optimizer pushes down aggregation and projection to the columnar processing engine and converts the columnar result to a row form and processes the join using the row engine. This would be moderately slow as there would be no indexes on the newly converted columnar data to facilitate the join (i.e. the columnar fact would have no indexes to row dimensions or visa versa).
When a row-to-columnar join is identified the optimizer converts the row table to a columnar form and processes the join using the columnar engine. This could be fast.

Numbers two and three are the coolest options… but number three is very unlikely due to the fact that columnar data is sharded in a shared-nothing manner and the row data is not… and number two is unlikely because if it were the case Oracle would surely be talking about it. Number one is easy and the likely state in release 1… but these joins will be very slow.

Finally, the presenter said that this columnar processing would be implemented in Times Ten and then in the Exalytics machine. I do not really get the logic here? If a user can aggregate in their OLTP system in a flash why would they pre-aggregate data and pass it to another data stovepipe? If you had to offload workload from your OLTP system why wouldn’t you deploy a small, standard, Oracle server with the in-memory option and move data there where, as the presenter suggested, you can solve any query fast… not just the pre-aggregated queries served by Exalytics. Frankly, I’ve wondered why SAP has not marketed a small HANA server as an Exalytics replacement for just this reason… more speed… more agility… same cost?

There you have it… my half a cents… some may say cents-less evaluation.

I will end this with a question for my audience (Ofir… I’ll provide a link to your site if you post on this…)… how do we suppose the in-memory option supports bulk data load? This has implications for data warehousing…

Here, of course, is the picture I should have used above… labeled as the in-memory database of your favorite vendor:

Oracle 12c IMDB Announcement at OOW13

I changed the picture to show you the billboard SAP bought on US101N right across from the O HQ…

– Rob

Larry Ellison announced a new in-memory capability for Oracle 12c last night. There is little solid information available but taken at face value the new feature is significant… very cool… and fairly capable.

In short it appears that users have the ability to pin a table into memory in a columnar format. The new feature provides level 3 (see here) columnar capabilities… data is stored compressed and processed using vector and SIMD instruction sets. The pinned data is a redundant copy of the table in-memory… so INSERT/UPDATE/DELETE and data loads queries will use the row store and data is copied and converted to the in-memory columnar format.

As you can imagine there are lots of open questions. Here are some… and I’ll try to sort out answers in the next several weeks:

It seems that data is converted row-to-columnar in real-time using a 2-phased commit. This will significantly slow down OLTP performance. LE suggested that there was a significant speed-up for OLTP based on performance savings from eliminating indexes required for analytics. This is a little disingenuous, methinks… as you will most certainly see a significant degradation when you compare OLTP performance without indexes (or with a couple of OLTP-centric indexes) and with the in-memory columnar feature on to OLTP performance without the redundant copy and format to columnar effort. So be careful. The use case suggested: removing analytic indexes and using the in-memory column store is solid and real… but if you have already optimized for OLTP and removed the analytic indexes you are likely to see performance drop.
It is not clear whether the columnar data is persisted to disk/flash. It seems like maybe it is not. This implies that on start-up or recovery data is read from the row store on-disk tables and logs and converted to columnar. This may significantly impact start-up and recovery for OLTP systems.
It is unclear how columnar tables are joined to row tables. It seems that maybe this is not possible… or maybe there is a dynamic conversion from one form to another? Note that it was mentioned that is possible for columnar data to be joined to columnar data. Solving for heterogeneous joins would require some sophisticated optimization. I suspect that when any row table is mentioned in a query that the row join engine is used. In this case analytic queries may run significantly slower as the analytic indexes will have been removed.
Because of this and of item #2 it is unclear how this feature plays with Exadata. For lots of reasons I suspect that they do not play well and that Exadata cannot use the new feature. For example, there is no mention of new extended memory options for the Exadata appliance… and you can be sure that this feature will require more memory.

There was a new hardware system announced that uses this in-memory capability… If you add all of this up it may be that this is a system designed to run SAP applications. In fact, before the presentation on in-memory there was a long (-winded) presentation of a new Fujitsu system and the SAP SD benchmark was specifically mentioned. This was not likely an accident. So… maybe what we have is a counter to HANA for SAP apps… not a data warehouse at all.

As I said… we’ll see as the technical details emerge. If the architectural constraints 1-4 above hold then this will require some work for Oracle to compete with HANA for SAP apps or for data warehouse workloads…

Oracle promises 100x faster queries with new in-memory option for databases (zdnet.com)
Oracle’s Ellison talks up ‘ungodly speeds’ of in-memory database. SAP: *Cough* Hana *cough* (go.theregister.com)

Who Out-performs Who: A Story…

Performance Car Legends (Photo credit: Ginger Me)

In this blog I have stated explicitly and implied now and again that the big architectural features are what count… despite the fact that little features are often what are marketed. Here is a true story to reinforce this theme… and a reminder of the implications… a real-life battle between two vendors: we’ll call them NewVendor and LegacyVendor.

Four years ago, more or less, NewVendor sold a system to offload work from an existing LegacyVendor configuration. Winning the business was tough and the POC was a knife-fight. At that time the two vendors were architecturally similar with no major advantages on either side. In the end NewVendor won a fixed contract that provided 16 nodes and guaranteed to match the performance of LegacyVendor for a specific set of queries. The 16 node configuration was sized based on the uncompressed data in LegacyVendor’s system.

NewVendor sent in a team to migrate the data and the queries… what was expected to be a short project. But after repeated attempts and some outside effort by experts the queries were running 50% slower than the target. I was asked to have a look and could see no glaring mistakes that could account for such a large performance miss… I saw no obvious big tuning opportunities.

After a day or so of investigation I found the problem. LegacyVendor offered a nice dictionary-based compression scheme that shrunk the size of the data by… you guessed it… exactly 50%. Because the NewVendor solution had to read 50% more data with each query they were 50% slower. I recommended that NewVendor needed to supply eight more nodes to hit the performance targets.

In the course of making these recommendations I was screamed at, literally, by one technology executive and told that I was a failure by another. They refused to see the obvious, exact, connection between the performance and the compression. I was quickly replaced by another expert who spent four months on site tuning and tuning. He squeezed every last drop out of the database going so far as to reorder columns in every table to squeeze out gas where data did not align on word boundaries. His expert tuning managed to reduce the gap and in the end NewVendor purchased three fewer nodes than my recommendation. With those nodes they then hit the targets. But the cost of his time and expense (he was a contractor) exceeded the cost of the nodes he saved… and the extra four-month delay antagonized the customer such that the relationship never recovered.

In a world where the basics of query optimization and execution are known to all there are only big-ticket items that differentiate products. When all of the big-ticket, architectural, capabilities are the same the difference between any two mature RDBMS products will rarely be more than 10%-15% across a large set of queries. The big-ticket differentiators today are the application of parallelism, compression and column store, and I/O avoidance (i.e. in-memory techniques). The answer to the question who out-performs who can be found to a close approximation from looking at who is how parallel (here) and who is how columnar (here) and merging the two… with a dose of who best avoids I/O through effective use of memory. This is the first lesson of my story.

The second lesson is… throw hardware at tuning problems when there are no giant architectural mistakes. Even a fat server node costs around $15K… and you will be better off with faster hardware than with a warehouse or mart that is so finely tuned and fragile that the next change to the schema or the data volumes or the workload breaks it.

Epilogue

Soon after this episode NewVendor rolled out a Level 2 columnar feature. This provided them with a distinct advantage over LegacyVendor… an advantage almost exactly equal to their advantage in compression plus the advantage from columnar projection to reduce I/O… and for several years they did not lose a performance battle to LegacyVendor. Today LegacyVendor has a comparable capability and the knife-fight is on again… Architecture counts…

Share this:

Related articles

Share this:

Share this: