Buzzwords – Page 4 – Database Fog Blog

Using Teradata’s Appliance for Hadoop to Reduce TCO

Teradata has recently announced a very complete Teradata database-to-Hadoop integration. Is this note we’ll consider how a Teradata shop might effectively use these features to significantly reduce the TCO of any Teradata system.

The Teradata Appliance for Hadoop (here) offering is quite well thought out and complete… including a Teradata appliance, a Hadoop appliance, and the new QueryGrid capability to seamlessly connect the two… so hardware, software, support, and services are all available in very easy-to-consume bundles.

There is little published on the details of the QueryGrid feature… so I cannot evaluate where it stands on the query integration maturity curve (see here)… but it certainly provides a significant advance over the current offering (see here and Dan Graham’s associated comments).

I believe that there is some instant financial gratification to be had by existing Teradata customers from this Hadoop mashup. Let’s consider this…

Before the possibility of a Hadoop annex to Teradata, Teradata customers had no choice but to store cold, old, data in the Teradata database. If, on occasion, you wanted to perform year by year comparisons over ten years of data then you needed to keep ten years of data in the database at a rough cost of $50K/TB (see here) … even if these queries were rarely executed and were not expected to run against a high performance service level requirement. If you wanted to perform some sophisticated predictive analysis against this data it had to be online. If fact, the Teradata mantra… one which I wholeheartedly agree with… suggests that you really should keep the details online forever as the business will almost always find a way to glean value from this history.

This mantra is the basis of what the Hadoop vendors call a data lake. A data warehouse expert would quickly recognize a data lake as a staging area for un-scrubbed detailed data… with the added benefit that a Hadoop-based data lake can store and process data at a $1K/TB price point… and this makes it cost-effective to persist the staged data online forever.

So what does this mean to a Teradata EDW owner? Teradata has published numbers (here) suggesting that 92% of the queries in an EDW only touch 20% of the data. I would suggest that there is some sort of similar ratio that holds for 90% of the remaining queries… they may touch only another 40% of the data. This suggests that the 40% of the data remaining is online to service less than 1% of the queries… and I suggest that these queries can be effectively serviced from the $1K/TB Hadoop annex.

In other words, almost every Teradata shop can immediately benefit from Teradata’s new product announcements by moving 40% of their Teradata database data to Hadoop. Such a move would free Teradata disk space and likely take pressure off to upgrade the cluster. Further, when an upgrade is required, users can reduce the disk footprint of the Teradata database side of the system; add a Hadoop annex, and significantly reduce the TCO of the overall configuration.

Some time back I suggested that Teradata would be squeezed by Hadoop (here and here). To their credit Teradata is going to try and mitigate the squeeze. But the economics remain… and Teradata customers should seriously consider how to leverage the low $/TB of Teradata’s Hadoop offering to reduce costs. Data needs to reside in the lowest cost infrastructure that still provides the required level of service… and the Teradata Hadoop integration provides an opportunity to leverage a new, low-cost, infrastructure.

An Architecture for the IoT – Part 1

There are so many things in the Internet of Things (IoT) that might record data into your data fabric that a new approach may be required. Let’s think about this… define some terms, and see how these terms fit into current data fabric thinking, let’s consider how they fit into a more modern logical data warehouse architecture, and let’s think about whether the IoT might push us to a different approach.

I’m not going to go overboard on terms here… But we do need to distinguish between a sensor and a processor.

To my way of thinking a sensor is a thing. It creates, but does not necessarily process, data. A sensor has some means to communicate with a processor… but if there is no significant processing on the sensor other than communications then we will suggest that there is no “processor” in a meaningful sense. Let me give you four examples:

The first is courtesy of Ray Carnes, a chief architect at Boeing. Imagine a brake-pad in your car with 100,000 dust-sized RFID sensors randomly scattered as part of the pad. These sensors do nothing but signal on an interval that they are present. This allows a processor elsewhere to record the signals and determine how much of the brake pad has worn. If only 80,000 sensors report we can assume that 20% of the pad has worn away.
A Nest thermostat senses movement and temperature. It uses a network-connect to send the results of this sensing to the Nest mother-ship and performs little-or-no processing on site.
Sensors in my Audi detect rotation of the wheels. There is a network that sends the results to a small embedded anti-lock braking processor that monitors all four wheels as well as the pressure on the brake-pedal and sends signals to all five components to allow the car to brake evenly.
There is a sensor in the screen on the ATM I used yesterday that detects that I want to request service. This user interface communicates with a powerful general processor which then communicates with the Bank mother-ship to create and process banking transactions.

This last bullet is important… any device that takes user input is a sensor with an embedded processor. It is a “thing” just like the Nest thing. Today we tend to blur the line between sensor and processor as every thing has a powerful processor onboard. The IoT will change this assumption.

A processor then, is a computer that performs some analysis on the data generated by one or more sensors. A processor may also store data… a sensor will not.

Now let’s think about how we might combine sensors and processors in an architecture. To start lets consider the context of the data the processor can use for analysis:

If the processor has only the last data sensed we would say that the context is immediate and local to one sensor. The processor can see streamed data but can only operate on the last event. We would say that this sensor-processor configuration can provide a simple reflexive response. When you press the lock button in your car a sensor detects this event and signals to all four doors and the boot to lock it up.
Another configuration might allow the local processor to store more context from a single sensor over a longer period of time… so the context is historical and local. In the case of the anti-lock brakes… the processor receives signals from a group of sensors and stores a very short historical context. This grouped historical context is very powerful…
Another configuration might store the group context and then forward the event details to a bigger server that stores and analyzes a universal context of all things to look for patterns. Further, there could be a hierarchy of groups leading to a universal context.
Finally, a server with some group context could summarize the details for that group and pass on only a summary over time up to another group server or to a universal server.

I suspect that you can see where I’m going. There is a trade-off is this picture between the advantages of pushing analytic processing close to the sensor and the associated requirement for more analytic processors, the advantages of intermediate analysis requiring more data movement but fewer analytic processors, and the advantage of a central analytic mother ship where all data is stored and analyzed. In the next version of this thread I’ll try to tease apart the trade-offs.

Part 8 – How Hadooped is SQL Server PDW with Polybase?

Now for SQL Server… continuing the thread on RDBMS-Hadoop integration (Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7) I have suggested that we could evaluate integration architecture using three criteria:

How parallel are the pipes to move data between the RDBMS and the parallel file system;
Is there intelligence to push down predicates; and
Is there more intelligence to push down joins and other relational operators?

Before we start I will suggest a fourth criteria that will be more fully explored later when we consider networks and pipes… that is: how is data sharded/hashed/distributed as it moves from the distribution scheme in HDFS to an optimal, usually hashed, scheme in the target RDBMS. Consider Greenplum as an example… they move data in parallel as quickly as possible to the GPDB and then redistribute the data across GPDB segment nodes using scatter-gather, a very efficient distribution mechanism. We will consider how PDW Poybase manages this as part of our first criteria.

Also note… since I started this series Teradata has come out with a new capability: the QueryGrid. I will add a post to consider this separately… and in this note I will assume the older Teradata capability. This is a little unfair to Teradata and I apologize for that… but otherwise this post becomes too complex. I’ll make things right for Teradata ASAP.

Now on to Microsoft…

First, Polybase has effective parallel pipes to move data from HDFS to the parallel SQL Server instances in PDW. This matches the best capability of other products like Teradata and Greenplum in this category. But where Teradata and Greenplum move data and then redistribute it, pushing the data over a network twice, Poybase has pushed the PDW hash function down to the HDFS node so that data is distributed as it is sent. This very nice feature skips one full move of the data.

Our second criteria considers how smart the connector is in pushing down filters/predicates. Polybase uses a cost-based approach to determine whether is is less expensive to push predicates down or to move all of the data up to the PDW layer. This is a best-in-class capability.

For the 3rd criteria we ask does the architecture push down advanced functions like joins and aggregates… and does the architecture minimize data pulled up to join with semi-joins? Polybase again provides strong capabilities here pushing down joins and aggregates. Polybase does not use semi-joins, so there is room to improve here… but Microsoft clearly has this capability in their roadmap.

One final note… Polybase works with PDW but not with other SQL Server products. This limitation may be relevant in many cases.

PDW + Polybase is a strong offering… matching HANA in most aspects with HANA having a slight edge in push-down with semi-joins but with SQL Server matching this with the most sophisticated parallel data distribution capability.

References

Microsoft Gray Systems Lab: Polybase
SIGMOD: Split Query Processing in Polybase

Part 6: How Hadooped is HANA?

Now for HANA plus Hadoop… to continue this thread on RDBMS-Hadoop integration (Part 1, Part 2, Part 3, Part 4, Part 5) I have suggested that we could evaluate integration architecture using three criteria:

How parallel are the pipes to move data between the RDBMS and the parallel file system;
Is there intelligence to push down predicates; and
Is there more intelligence to push down joins and other relational operators?

As a preface I need to include a note/apology. As you will see HANA may well have the best RDBMS-Hadoop integration in the market. I try hard not to blow foam about HANA in this blog… and I hope that the objective criteria I have devised to evaluate all of the products will keep this post credible… but please look at this post harder than most and push back if you think that I overstep.

First… surprisingly, HANA’s first release has only a single pipe to the Hadoop side. This is worrisome but easily fixed. It will negatively impact performance when large tables/files have to be moved up for processing.

But HANA includes Hadoop as a full partner in a federated data architecture using the Smart Data Access (SDA) engine inside the HANA address space. As a result, HANA not only pushes predicates but it uses cost-based optimization to determine what to push down and what to pull up. HANA interrogates the Hadoop system to gather statistics and uses the HANA optimizer to develop smart execution plans with awareness of both the speed of in-memory and the limited memory resources. When data in HANA is joined with data in Hadoop SDA effectively uses semi-joins to minimize the data pulled up.

Finally, HANA can develop execution plans that executes joins in Hadoop. This includes both joins between two Hadoop tables and joins where small in-memory tables are pushed down to execute the joins in Hadoop. The current limitation is that Hadoop files must be defined as Hive tables.

Here is the HANA execution plan for TPC-H query 19. HANA has pushed down all of the steps behind the Remote Row Scan step… so in this case the entire query including a nested loop join was pushed down. In other queries HANA will push only parts of the plan to Hadoop.

So HANA possesses a very sophisticated integration with Hadoop… with capabilities that minimize the amount of data moved based on the cost of the movement. This is where all products need to go. But without parallel pipes this sophisticated capability provides only a moderate advantage (see Part 5),

Note that this is not the ultimate in integration… there is another level… but I’ll leave some ideas for extending integration even further for my final post in the series.

Next… Part 7… considering Greenplum…

Part 1: How Hadooped is Your RDBMS?

Sorry for the comic adjective “Hadooped”?

The next few blogs will try to evaluate the different approaches to integrating Hadoop and a standard RDBMS… so the first thing I’ll try in this post is to suggest a criteria based on some architectural choices for making the evaluation. Further, I’ll inject a little surprise and make the point by using the criteria to say something about a product that is not an integration of an RDBMS and Hadoop.

For the purposes of this let me clear that by “Hadoop” I mean at least HDFS plus MapReduce… so I will discuss integrating a parallel RDBMS with data stored in HDFS: a massively parallel file system with a programming capability included. By “integration” I mean that queries using the full set of SQL supported by the RDBMS must be available for processing queries that refer to data across the Hadoop-RDBMS divide.

Since we’ve assumed that all SQL functionality is supported the architectural issue left to solve is performance and this issue revolves on one topic: how do we minimize the cost of moving data between the two partners for a given query?

Now to get on with it…

The easiest, but not all that easy, problem involves using parallelism to move data from one system to the other… so the first criteria we will evaluate for each product will consider how parallel is their movement of data.

The next criteria involves intelligence in the RDBMS to push down some execution operators to the data layer. Of course the RDBMS must scan remote data… so in this part of the evaluation we will grade each product’s ability to push processing down to apply predicates and project the minimal amount of data up to the RDBMS.

Finally, a most intelligent product would push more than just predicates down… it would push down joins and aggregation… and the decisions around splitting processing would be fully optimized. A most intelligent product would fully federate the HDFS data into the RDBMS.

So there you have it… I will start evaluating RDBMS-Hadoop architecture by three criteria:

how parallel is the data movement between the RDBMS and Hadoop;
is there intelligence to minimize data movement by pushing the least data and the associated query plan to one system or another… this requires parallel pipes in both directions; and
is there intelligence to build an optimal query plan that splits steps across both systems to completely minimize the movement of data and/or optimize the compute.

And a final word on the relative strength of each criteria:

If we imagine a 10-node Hadoop cluster talking to a 10-node RDBMS with 10 parallel pipes and compared it to the same setup with only 1 pipe (not parallel) then we might suggest that the parallel pipes provide a 10X performance increase.
If we imagine intelligence that moved 100K rows rather than 10M then we might suggest that intelligent push down might provide a 100X performance increase…
If we had even more intelligence and further optimized processing then another 10X-100X might be possible.

So all three criteria are not equal… intelligent query planning trumps wide pipes…

Now for the surprise… in the next blog we’ll look at how Exadata’s architecture maps to these criteria… since it is a two-tiered architecture with an RDBMS tied to a parallel file system…

You can see the rest of the series here: Part 2, Part 3, Part 4, Part 5, Part 6, Part 7, Part 8.

HANA, BLU, Hekaton, and Oracle 12c vs. Teradata and Greenplum – November 2013

I would like to point out a very important section in the paper on Hekaton on the Microsoft Research site here. I will quote the section in total:

2. DESIGN CONSIDERATIONS

An analysis done early on in the project drove home the fact that a 10-100X throughput improvement cannot be achieved by optimizing existing SQL Server mechanisms. Throughput can be increased in three ways: improving scalability, improving CPI (cycles per instruction), and reducing the number of instructions executed per request. The analysis showed that, even under highly optimistic assumptions, improving scalability and CPI can produce only a 3-4X improvement. The detailed analysis is included as an appendix.

The only real hope is to reduce the number of instructions executed but the reduction needs to be dramatic. To go 10X faster, the engine must execute 90% fewer instructions and yet still get the work done. To go 100X faster, it must execute 99% fewer instructions. This level of improvement is not feasible by optimizing existing storage and execution mechanisms. Reaching the 10-100X goal requires a much more efficient way to store and process data.

This is important because it confirms the difference in a Level 3 and a Level 2 columnar implementation as described here. It is just not possible for a Level 2 implementation with a row-based join engine to achieve the performance of a Level 3 implementation. This will allow the Level 3 implementations: HANA, BLU, Hekaton, and Oracle 12c to distance themselves from the Level 2 products: Teradata and Greenplum; by more than 10X… and this is a very significant advantage.

Related articles

SAP’s HANA, Cloud Software Key to Improved Q3 Profits (eweek.com)
Oracle Plays Catch-Up With New ‘In-Memory’ Database Technology (readwrite.com)
Is the sky the limit for Flash and In-Memory Databases? (zdnet.com)
Teradata expands Hadoop portfolio, Hortonworks partnership (zdnet.com)

CPUs and HW for HANA, BLU, Hekaton, and Oracle 12c

This short post is intended to provide a quick warning regarding in-memory columnar and cpu requirements… with a longer post to follow.

When a row is inserted or bulk-loaded into a DBMS, if there are no indexes, the amount of cpu required is very small. The majority of the time is spent committing a transaction is the time to write a log record to persist the data.

When the same record is reformatted into a column the amount of processing required is significantly higher. The data must be parsed into columns, the values must be compressed, dictionaries may be updated, and the breadcrumbs that let the columnar data be regenerated into rows must be laid. Further, if the columnar structure are to be optimized then the data must be ordered… with a sort or some kind of index structure. I have seen academic papers that suggest that for an insert columnar processing may be 100X more than row processing… and you can see why this could be true (I apologize for not finding the reference… I’ll dig it up… as I recall I read it in a post some time back by Daniel Abadi).

Now let’s think about this… several vendors are suggesting that you can deploy their columnar features with no changes required… no new hardware… in-place. But this does not ring true if the new columnar feature requires 100X extra CPU cycles per row… or 50X… or 10X… unless you are running your database on an empty server.

This claim is a shot at SAP who, more honestly, suggests new hardware with high-end processors for their in-memory columnar product… but methinks it is marketing, not architecture, from these other folks.

Thoughts on in-memory columnar add-ons (dbms2.com)
Simple and fast with IBM BLU Acceleration (ibm.com)
Software Controls Cache Memory to Speed CPUs (spectrum.ieee.org)
Major in memory performance boosts for Oracle 12c (techcentral.ie)

BLU Meanies: Data In-memory

IBM is presenting a DB2 Tech Talk that compares the BLU Accelerator to HANA. There are several mistakes and some odd thinking in the pitch so let me address the issues as a way to explain some things about HANA and about BLU. This blog will consider what data needs to be in-memory.

IBM like several others, continues to repeat a talking point along the lines of: “We believe that you should not have to fit all of you active data in memory…”. Let’s think about this…

Note that in the current release HANA has a constraint that all of the data in a single column, the entire vector that represents the data in that column, must be in-memory before it can be operated on. If the table is partitioned and partition-elimination is applied then the data in the partition for the column must be loaded in-memory. This is a real constraint that will be removed in a subsequent release… but it is not a very severe constraint if you think about it.

But let’s be clear… HANA does not require all data to be in-memory… it will read data from peripheral devices in and out as required just as BLU does.

Now what does this mean? Let’s walk through some scenarios.

First, let’s imagine a customer with 10TB of user data, per the scenario IBM discusses. Let’s not get into a whose product compresses better discussion and assume that both BLU and HANA will get 4X compression… so there is 2.5TB of user data to be processed.

Now let’s imagine a system with only a very little memory available for data. In other words, let’s configure both BLU and HANA so that they are full columnar databases, but not in-memory databases. In this case BLU would operate by doing constant I/O without constraint and HANA would fail whenever it could not fit a required column in memory. Note that HANA might not fail at all… it would depend on whether there was a large single un-partitioned column that was required.

This scenario is really silly though… HANA is an in-memory database, designed to keep data in-memory from the start… so SAP would not support this imaginary configuration. The fact that you could make BLU work out of memory is not really relevant as nowhere does IBM position, or reference, BLU as a disk-based column store add-on… you would just use DB2.

Now let’s configure a system to IBM’s specification with 400GB of memory. IBM does not really say how much of this memory is available to BLU for data… but for the sake of argument let’s ignore the system requirements and assume that BLU uses one-half, 200GB, as work space to process queries so that 200GB is available to store data in-memory. As you will see it does not really matter in this argument whether I am spot on here or not. So using IBM’s recommendation there is now a 200GB cache that can be used as data is paged in and out. Anyone who has ever used a data warehouse knows that caching does not work well for BI queries as each query touches large enough volumes of the data to flush the cache… so BLU will effectively be performing I/O for most queries and is back to being an out-of-memory columnar database. Note that this flushing issue is why the in-memory capabilities from Oracle and Teradata pin certain tables into memory. In this scenario HANA will operate exactly as BLU does with the constraint that any single column that in a compressed form exceeds 200GB will not be able to be processed.

Finally let’s configure a system with 5TB of memory per SAP’s recommendation for HANA. In this case BLU and HANA both fit all of the data in-memory… with 2.5TB of compressed user data in and 2.5TB of work space… and there is no I/O. This is an in-memory DBMS.

But according to the IBM Power 770 spec (here) there is no way to get 5TB of memory on a single p770 node… so to match HANA and eliminate all I/O they would require two nodes… but BLU cannot be deployed on a cluster… so on they would have to deploy on a single node and perform I/O on 20% of the data. The latency for SSD I/O is 200Kns and for disk it is 10Mns… for DRAM it is 100ns and HANA loads full cache lines so that the average latency is under 20ns… so the penalty paid by BLU is severe and it will never keep up with HANA.

There is more bunk around recommendations for the number of cores but I can make no sense of it at all so I do not know where to begin to debunk it. SAP recommends high-end Intel servers to run HANA. In the scenario above we would recommend multiple servers… soon enough there will be Haswell servers with 6TB of DRAM and this case will run on one node.

I have stated repeatedly that anytime a vendor presents a slide comparing their product to their competitors you should immediately throw them out… it will always be twisted. Don’t trust them. And don’t trust me as I work for SAP. But hopefully you can see some logic in my case. If you need an IMDB then you need memory. If you are short of memory then the IMDB operates like a columnar RDBMS with a memory cache. If you are running a BI query workload then you need to pin data in the cache or the system will thrash. Because of this SAP recommends that you get enough memory to get all of the data in… we recommend that you operate our in-memory database product in-memory…

This really the point of the post. The Five Minute Rule informs us about what data should be in-memory (see here). An in-memory database is designed from the bottom up to manage hot data in-memory. The in-memory add-ons being offered over legacy systems are very capable and should not be ignored… and as the price of memory drops the Five Minute Rule will suggest that data in-memory will account for and ever larger percentage of your EDW. But to offer an in-memory capability and recommend that you should keep the bulk of the data on disk is silly… and to state that your product has a competitive advantage because you do not recommend that all of the data managed by your in-memory feature be kept in-memory is silliness squared.

More on the Oracle 12c In-Memory Option

In-memory DBMS in a jar (Photo credit: Wikipedia)

I recently listened to a session by Juan Loalza of Oracle on the 12c In-memory option. Here are my notes and comments in order…

There is only one copy of the data on disk and that is in row format (“the column format does not exist on disk”). This has huge implications. Many of the implications come up later in the presentation… but consider: on start-up or recovery the data has to be loaded from the row format and converted to a columnar format. This is a very expensive undertaking.

The in-memory columnar representation is a fully redundant copy of the row format OLTP representation. Note that this should not impact performance as the transaction is gated by the time to write to the log and we assume that the columnar tables are managed by MVCC just like the row tables. It would be nice to confirm this.

Data is not as compressed in-memory as in the hybrid-columnar-compression case. This is explained in my discussion of columnar compression here.

Oracle claims that an in-memory columnar implementation (level 3 maturity by my measure see here) is up to 100X faster than the same implementation in a row-based form. Funny, they did not say that the week before OOW? This means, of course, that HANA, xVelocity, BLU, etc. are 100X faster today.

They had a funny slide talking about “reporting” but explained that this is another word for aggregation. Of course in-memory vector aggregation is faster in-memory.

There was a very interesting discussion of Oracle ERP applications. The speaker suggested that there is no reporting schema for these apps and that users therefore place indexes on the OLTP database to provide performance for reporting and analytics. It was suggested that a typical Oracle E-Business table would have 10-20 indexes on it and it was not unusual to see 30-40 indexes. It was even mentioned that the Siebel application main table could require 90 indexes. It was suggested that by removing these indexes you could significantly speed up the OLTP performance, speed up reporting and analytics, cure cancer and end all wars (OK… they did not suggest that you could cure cancer and end war… this post was just getting a little dry).

The In-memory option is clusterable. Further, because a RAC cluster uses shared disk and the im-memory data is not written to disk there is a new shared-nothing implementation included. This is a very nice and significant architectural advance for Oracle. It uses the direct-to-wire infiniband protocol developed for Exadata to exchange data. Remember when Larry dissed Teradata and shared-nothingness… remember when Larry dissed in-memory and HANA… remember when Teradata dissed HANA as nonsense and said SAP was over-reacting to Oracle. Gotta smile :).

Admins need to reserve memory from the SGA for the in-memory option. This is problematic for Exadata as the maximum memory on a RAC node is 256GB… My bad… Exadata X3-8 supports 2TB of memory per node… this is only problematic for Exadata X3-2 and below… – Rob

It is possible to configure a table partition into memory while leaving other partitions in row format on disk.

It is suggested again that analytic indexes may be dropped to speed up OLTP.

The presenter talked about how the architecture, which does not change the row-based tables in any way, allows all of the high-availability options available with Oracle to continue to exist and operate unchanged.

But the beautiful picture starts to dull a little here. On start-up or first access… i.e. during recovery… the in-memory columnar data is unavailable and loaded asynchronously from the row format on disk. Note that it is possible to prioritize the order tables are loaded to get high-priority data in-memory first… a nice feature. During this time… which may be significant… all analytic queries are run against the un-indexed row store. Yikes! This kills OLTP performance and destroys analytic performance. In fact, the presenter suggested that maybe you might keep some indexes for reports on your OLTP tables to mitigate this.

Now the story really starts to unravel. Indexes are required to provide performance during recovery… so if your recovery SLA’s cannot be met without indexes then you are back to square one with indexes that are only used during recovery but must be maintained always, slower OLTP, and the extra requirement for a redundant in-memory columnar data image. I imagine that you could throttle some reports after an outage until the in-memory image is rebuilt… but the seamless operations during recovery using standard Oracle HA products is a bit of a stretch.

Let me raise again the question I asked in my post last week on this subject (here)… how are joins processed between the row format and the columnar format. The presenter says that joins are no problem… but there are really only two ways… maybe three ways to solve for this:

When a row-to-columnar join is identified the optimizer converts the columnar table to a row form and processes the join using the row engine. This would be very very slow as there would be no indexes on the newly converted columnar data to facilitate the join.
When a row-to-columnar join is identified the optimizer pushes down aggregation and projection to the columnar processing engine and converts the columnar result to a row form and processes the join using the row engine. This would be moderately slow as there would be no indexes on the newly converted columnar data to facilitate the join (i.e. the columnar fact would have no indexes to row dimensions or visa versa).
When a row-to-columnar join is identified the optimizer converts the row table to a columnar form and processes the join using the columnar engine. This could be fast.

Numbers two and three are the coolest options… but number three is very unlikely due to the fact that columnar data is sharded in a shared-nothing manner and the row data is not… and number two is unlikely because if it were the case Oracle would surely be talking about it. Number one is easy and the likely state in release 1… but these joins will be very slow.

Finally, the presenter said that this columnar processing would be implemented in Times Ten and then in the Exalytics machine. I do not really get the logic here? If a user can aggregate in their OLTP system in a flash why would they pre-aggregate data and pass it to another data stovepipe? If you had to offload workload from your OLTP system why wouldn’t you deploy a small, standard, Oracle server with the in-memory option and move data there where, as the presenter suggested, you can solve any query fast… not just the pre-aggregated queries served by Exalytics. Frankly, I’ve wondered why SAP has not marketed a small HANA server as an Exalytics replacement for just this reason… more speed… more agility… same cost?

There you have it… my half a cents… some may say cents-less evaluation.

I will end this with a question for my audience (Ofir… I’ll provide a link to your site if you post on this…)… how do we suppose the in-memory option supports bulk data load? This has implications for data warehousing…

Here, of course, is the picture I should have used above… labeled as the in-memory database of your favorite vendor:

Oracle 12c IMDB Announcement at OOW13

I changed the picture to show you the billboard SAP bought on US101N right across from the O HQ…

– Rob

Larry Ellison announced a new in-memory capability for Oracle 12c last night. There is little solid information available but taken at face value the new feature is significant… very cool… and fairly capable.

In short it appears that users have the ability to pin a table into memory in a columnar format. The new feature provides level 3 (see here) columnar capabilities… data is stored compressed and processed using vector and SIMD instruction sets. The pinned data is a redundant copy of the table in-memory… so INSERT/UPDATE/DELETE and data loads queries will use the row store and data is copied and converted to the in-memory columnar format.

As you can imagine there are lots of open questions. Here are some… and I’ll try to sort out answers in the next several weeks:

It seems that data is converted row-to-columnar in real-time using a 2-phased commit. This will significantly slow down OLTP performance. LE suggested that there was a significant speed-up for OLTP based on performance savings from eliminating indexes required for analytics. This is a little disingenuous, methinks… as you will most certainly see a significant degradation when you compare OLTP performance without indexes (or with a couple of OLTP-centric indexes) and with the in-memory columnar feature on to OLTP performance without the redundant copy and format to columnar effort. So be careful. The use case suggested: removing analytic indexes and using the in-memory column store is solid and real… but if you have already optimized for OLTP and removed the analytic indexes you are likely to see performance drop.
It is not clear whether the columnar data is persisted to disk/flash. It seems like maybe it is not. This implies that on start-up or recovery data is read from the row store on-disk tables and logs and converted to columnar. This may significantly impact start-up and recovery for OLTP systems.
It is unclear how columnar tables are joined to row tables. It seems that maybe this is not possible… or maybe there is a dynamic conversion from one form to another? Note that it was mentioned that is possible for columnar data to be joined to columnar data. Solving for heterogeneous joins would require some sophisticated optimization. I suspect that when any row table is mentioned in a query that the row join engine is used. In this case analytic queries may run significantly slower as the analytic indexes will have been removed.
Because of this and of item #2 it is unclear how this feature plays with Exadata. For lots of reasons I suspect that they do not play well and that Exadata cannot use the new feature. For example, there is no mention of new extended memory options for the Exadata appliance… and you can be sure that this feature will require more memory.

There was a new hardware system announced that uses this in-memory capability… If you add all of this up it may be that this is a system designed to run SAP applications. In fact, before the presentation on in-memory there was a long (-winded) presentation of a new Fujitsu system and the SAP SD benchmark was specifically mentioned. This was not likely an accident. So… maybe what we have is a counter to HANA for SAP apps… not a data warehouse at all.

As I said… we’ll see as the technical details emerge. If the architectural constraints 1-4 above hold then this will require some work for Oracle to compete with HANA for SAP apps or for data warehouse workloads…

Oracle promises 100x faster queries with new in-memory option for databases (zdnet.com)
Oracle’s Ellison talks up ‘ungodly speeds’ of in-memory database. SAP: *Cough* Hana *cough* (go.theregister.com)

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Related articles

Share this:

Share this:

Share this:

Related articles

Share this: