Rob Klopp

New SAP Post: HANA, Columns, and OLTP

I posted some thoughts about HANA and OLTP here… it is pretty fair and straightforward, I hope… but as I always point out when I mention a work post on this blog… when I am there I do not promise to be objective… it is my job.

IBM BLU and SAP HANA

As I noted here, I think that the IBM BLU Accelerator is a very nice piece of work. Readers of this blog are in the software business where any feature developed by any vendor can be developed in a relatively short period of time by any other vendor… and BLU certainly moves DB2 forward in the in-memory database space led by HANA… it narrowed the gap. But let’s look at the gap that remains.

First, IBM is touting the fact that BLU requires no proprietary hardware and suggests that HANA does. I do not really understand this positioning? HANA runs on servers from a long list of vendors and each vendor spins the HANA reference architecture a little differently. I suppose that the fact that there is a HANA reference architecture could be considered limiting… and I guess that there is no reference for BLU… maybe it runs anywhere… but let’s think about that.

If you decide to run BLU and put some data in-memory then certainly you need some free memory to store it. Assuming that you are not running on a server with excess memory this means that you need to buy more. If you are running on a blade that only supports 128GB of DRAM or less, then this is problematic. If you upgrade to a 256GB server then you might get a bit of free memory for a little data. If you upgrade to a fat server that supports 512GB of DRAM or more, then you would likely be within the HANA reference architecture set. There is no magic here.

One of the gaps is related: you cannot cluster BLU so the amount of data you can support in-memory is limited to a single node per the paragraphs above. HANA supports shared-nothing clustering and will scale out to support petabytes of data in-memory.

This limit is not so terribly bad if you store some of your data in the conventional DB2 row store… or in a columnar format on-disk. This is why BLU is an accelerator, not a full-fledged in-memory DBMS. But if the limit means that you can get only a small amount of data resident in-memory it may preclude you from putting the sort of medium-to-large fact tables in BLU that would benefit most from the acceleration.

You might consider putting smaller dimension tables in BLU…. but when you join to the conventional DB2 row store the column store tables are materialized as rows and the row database engine executes the join. You can store the facts in BLU in columnar format… but they may not reside in-memory if there is limited availability… and only those joins that do not use row store will use the BLU level 3 columnar features (see here for a description of the levels of columnar maturity). So many queries will require I/O to fetch data.

When you pull this all together: limited available memory on a single node, with large fact tables projecting in and out of disk storage, and joins pushed to the row store you can imagine the severe constraint for a real-world data warehouse workload. BLU will accelerate some stuff… but the application has to be limited to the DRAM dedicated to BLU.

It is only software… IBM will surely add BLU clustering (see here)… and customers will figure out that they need to buy the same big-memory servers that make up the HANA reference architecture to realize the benefits… For analytics, BLU features will converge over the next 2-3 years to make it ever more competitive with HANA. But in this first BLU release the use of in-memory marketing slogans and of tests that might not reflect a real-world workload are a little misleading.

Right now it seems that HANA might retain two architectural advantages:

HANA real-time support for OLTP and analytics against a single table instance; and
the performance of the HANA platform: where more application logic runs next to the DBMS, in the same address space, across a lightweight thread boundary.

It is only software… so even these advantages will not remain… and the changing landscape will provide fodder for bloggers for years to come.

References

Here is a great series of blogs on BLU that shows how joins with the row store materializes columns as rows…

Thinking about BI: Infographics is the next phase…

Infographics by The Guardian (Photo credit: tripu)

I have been thinking about BI… prompted by a friend, Frank Bien, who is the CTO of Looker (you are welcome, Frank, for the plug…) but this post is about a trend in BI that is worth exploring… and only maybe about Looker or any other tool.

BI was originally about reporting… in its very first iterations users coded SQL directly, or used a 4GL scripting language, and the BI tool was there for formatting the output. Then more focus was put on query-building to make it easier for developers to effectively get the required data.

There was lots of talk at this point about knowledge-workers and do-it-yourself BI… but it never really worked out that way. Business users requested canned reports and went to a query guru to request special reports as required.

The following was pretty normal: after finding some interesting fact in a tabular report a business user would pull the data and build a Powerpoint slide to present the results. As interfaces improved you soon could access the data directly and create Excel or Powerpoint charts without copying the data by hand. In other words data visualization was separated from reporting.

The BI vendors caught on to this, recognizing that data presentation is important, and soon all of the BI tools offered some charting options. But the next step was equally interesting. Charting is a bit of an art… so the BI tools programmed in some directives to help you select the chart that fit your data and that would be visually pleasing. So simple charting as visualization was built-in with some simple assistance to help you with the simple art of presentation.

From here vendors went in two directions, one after the other.

First, dashboards were developed that were customizable and these applications, either semi-static or dynamic, caught your eye. Red lights and limits could be built into a heads-up display. The art of presentation was pretty crude and loudness: bright colors and lots of moving dials and eye-candy won the day. But as far as presentation goes, dashboards are just multiple simple charts arranged on a screen. QlikView led this charge.

Next, a new set of visualization products were rolled out by vendors like Tableau and Pentaho. Users saw that some very powerful pictures could be drawn showing data in a time series… and showing the series changing over time. Since the presentation was more nuanced and more “artistic” the automated assistance required more sophistication and this is where the vendors are now fighting to differentiate themselves.

But an interesting thing was happening outside BI… and this is the point of this note. In the same way that PowerPoint led reporting to charting, a new presentation technique called infographics is emerging. It is the state of the art in data visualization… and Powerpoint… and art… rolled into one. And it is very impactful. I imagine that the next wave of BI tools must embrace this more advanced presentation technique.

Here is how I think this plays out.

The advanced data visualization vendors will provide a palette that directly accesses data, and big data, to allow very custom infographics… you can see some of this at Piktochart although it is more about templates than free form development. But since this is art… tools will be developed that help the art-impaired like me to build nice displays and they will do this by analyzing the data and recommending one or more meaningful infographic displays.

So, maybe in the same way that Powerpoint data presentation anticipated charting in BI tools… Infographics anticipates the next data presentation facility for BI.

I suppose that this is not really a controversial conclusion… and I imagine that if I took the time I could find several start-ups who are way ahead of me on this… but sometimes it’s more fun to daydream it up on my own… and pretend that I’m out front…

Chuck McDevitt: An Obituary

Teradata Storage Rack (Photo credit: pchow98)

My friend and a major contributor to the art of database architecture, Chuck McDevitt, died last week. Five years ago Chuck was diagnosed with an advanced cancer and given six months to live. He fought and endured and worked for most of those five years, teaching us all a little about how we might live our lives.

Chuck’s contributions to database architecture are not so well known. He was employee number fourteen at Teradata and developed, with Dan Holle, the Teradata version of SQL. Chuck invented several foundational parts of any parallel database system. He left Teradata and went to Cogit, a start-up that developed a very early parallel data mining tool. From there Chuck went to Ab Inito as a senior architect, and from there he went to Greenplum where he was the Chief Architect.

At Greenplum Chuck was the brains behind the development of their parallel version of Postgres. It is significant to note that Chuck’s architectural insights led to an extensible and powerful implementation that far exceeded the efforts of others trying to accomplish the same result from a Postgres starting point: Aster Data, Netezza, DataAllegro, and the Postgres community.

To convey Chuck’s contribution to Teradata let me tell a story.

After E.F. Codd published his ground-breaking papers on relational theory two research projects began to develop relational databases. The UC Berkeley project developed a query language that was called QUEL which was followed by an IBM Research project and a query language called Sequel. As Teradata entered the scene they developed a third query language that was called TEQUEL, selecting the best features of each. IBM then delivered DB2 and SQL/DS, and the Oracle Database appeared, all based on Sequel which was by then called SQL. QUEL was forgotten as SQL dominated the market leading to it becoming an ANSI standard in 1986.

One Friday in 1985, the Teradata sales management convened a meeting to request that Teradata support SQL in order to effectively compete in the emerging RDBMS market. They were told by Product Management that it would take three years to develop SQL support… that it was out of the question. Further, the Teradata CTO argued that TEQUEL was superior to SQL and that this was an advantage in any case.

Dan Holle heard about the meeting, called Chuck in and, convinced that the Teradata management team was headed in the wrong direction, started to work. What the technical Execs must have known, but clearly did not appreciate, was that Dan and Chuck had developed the Teradata database using what was at the time, a very advanced concept… compiling the TEQUEL query language into an intermediate language and then processing that. Working non-stop through the weekend Chuck developed an SQL parser that would generate the proper intermediate code for consumption by the optimizer and execution engine… and on Monday morning he demonstrated a functioning SQL version of Teradata.

It is fair to say that, but for Chuck McDevitt, Teradata would not exist… and it is likely that Greenplum would not exist.

As I repeatedly suggest in this blog… architecture counts… and you should all know that the database community lost a great architect last week. Chuck will be missed.

DB2 BLU vs. Netezza… and the Winner is…

I wondered here how IBM would position DB2 with BLU versus Netezza. Please have a look before you go on… and let me admit here and now that when I wrote this I chickened out. As I sat down this time I became convinced that I should predict the end of Netezza.

Why?

In the post here Bob Picciano, the general manager of IBM’s Information Management Software Division, made it nearly clear. He said that DB2 BLU is for systems “under 50 terabytes” only because BLU does not cluster. I suspect that if IBM converted all of the Netezza clusters with under 50TB of data to BLU it would knock out 70% or more of the Netezza install base. He states that “most data warehouses are in the under-10-terabyte range”… and so we can assume that Netezza, precluded from anything under 50TB, has a relatively small market left. He suggests that Netezza is for “petabyte-size collections”… but as I suggested here (check out the picture!), Hadoop is going to squeeze the top away from Netezza… while in-memory takes away the bottom… and IBM is very much into Hadoop so the take-away will not require a fight. Finally, we can assume, I think, that the BLU folks are working on a clustered version that will eat more from the bottom of Netezza’s market.

We should pay Netezza some respect as it fades. When they entered the market Teradata was undisputed. Netezza did not knock out the champ but, for the first time, they proved that it was possible to stay in the ring… and this opened the market for Exadata, Greenplum, Vertica and the rest.

Anti-FUD

Here is some anti-FUD from John Appleby titled “The Top 10 SAP HANA Myths Spread by Other Vendors”… well worth the time to read.

Thoughts on Oracle 12c…

Here are some quick thoughts on Oracle 12c…

First, I appreciate the tone of the announcements. They were sober and smart.

I love the pluggable database stuff. It fits into the trends I have discussed here and here. Instead of consolidating virtual machines on multi-core processors and incurring the overhead of virtual operating systems Oracle has consolidated databases into a single address space. Nice.

But let’s be real about the concept. The presentations make it sound like you just unplug from server A and plug into server B… no fuss or muss. But the reality is that the data has to be moved… and that is significant. Further, there are I/O bandwidth considerations. If database X runs adequately on A using 5GB/sec of read bandwidth then there better be 5GB/sec of free bandwidth on server B. I know that this is obvious… but the presentations made it sound magic. In addition 12c added heat maps and storage tiering… but when you plug-in the whole profile of what is hot for that server changes. This too is manageable but not magic. Still, I think that this is a significant step in the right direction.

I also like the inclusion of adaptive execution plans. This capability provides the ability to change the plan on-the-fly if the execution engine determines that the number of rows it is seeing from a step differs significantly from the estimate that informed the optimizer. For big queries this can improve query performance significantly… and this is especially the case because prior to 12c Oracle’s statistics collection capability was weak. This too has been improved. Interestingly the two improvements sort of offset. With better statistics it is less likely that the execution plan will have to adapt… but less likely does not mean unlikely. So this feature is a keeper.

I do not see any of the 12c major features significantly changing Oracle’s competitive position in the data warehouse market. If you run a data warehouse flat-out you will not likely plug it elsewhere… the amount of data to move will be daunting. The adaptive execution plan feature will improve performance for a small set of big queries… but not enough to matter in a competitive benchmark. But for Oracle shops adaptive execution is all positive.

HANA Memory Utilization

The current release of HANA requires that all of the data required to satisfy a query be in-memory to run the query. Let’s think about what this means:

HANA compresses tables into bitmap vectors… and then compresses the vectors on write to reduce disk I/O. Disk I/O with HANA? Yup.

Once this formatting is complete all tables and partitions are persisted to disk… and if there are updates to the tables then logs are written to maintain ACIDity and at some interval, the changed data is persisted asynchronously as blocks to disk. When HANA cold starts no data is in-memory. There are options to pre-load data at start-up… but the default is to load data as it is used.

When the first query begins execution the data required to satisfy the query is moved into memory and decompressed into vectors. Note that the vector format is still highly compressed and the execution engine operates on this compressed vector data. Also, partition elimination occurs during this data move… so only the partitions required are loaded. The remaining data is on disk until required.

Let us imagine that after several queries all of the available memory is consumed… but there is still user data out-of-memory on peripheral storage… and a new query is submitted that requires this data. At this point HANA frees enough storage to satisfy the new query and processes it. Note that, in the usual DW case (write-once/read-many), the data flushed from memory does not need to be written back… the data is already persisted… otherwise HANA will flush any unwritten changed blocks…

If a query is submitted that performs a cartesian product… or that requires all of the data in the warehouse at once… in other words where there is not enough memory to fit all of the vectors in memory even after flushing everything else out… the query fails. It is my understanding that this constraint will be fixed in a next release and data will stream into memory and be processed in-stream instead of in-whole. Note that in other databases a query that consumes all of the available memory may never complete, or will seriously affect all other running queries, or will lock the system… so the HANA approach is not all bad… but as noted there is room for improvement and the constraint is real.

This note should remove several silly arguments leveled by HANA’s competitors:

HANA, and most in-memory databases, offer full ACID-compliance. A system failure does not result in lost data.
HANA supports more data than will fit in-memory and it pages data in-and-out in a smart fashion based on utilization. It is not constrained to only data that fits in-memory.
HANA is not useless when it runs out of memory. HANA has a constraint when there is more data than memory… it does not crash the system… but lets be real… if you page data to disk and run out of disk you are in trouble… and we’ve all seen our DBMS‘s hit this wall. If you have an in-memory DBMS then you need to have enough memory to support your workload… if you have a DB2 system you better not run out of temp space or log space on disk… if you have Teradata you better not run out of spool space.

I apologize… there is no public reference I know of to support the features I described. It is available to HANA customers in the HANA Blue Book. It is my understanding that a public version of the Blue Book is being developed.

Who is How Columnar? Exadata, Teradata, and HANA – Part 3: Post Script

This post will tie up some loose ends around columnar processing and consider some other players I left out. Part 1 and Part 2 of this series may be found here and here, respectively.

Other DBMS

Sybase IQ

How could I forget Sybase IQ? IQ is a mature Level 3 column store with vector optimizations… and several other advanced index optimizations.

See:

SAP Community Network: SAP Sybase IQ Indexes and Indexing Techniques
Sybase IQ 15.3 Technical Details & pricing/Packaging

Infobright

Infobright is a level 1 column store. That is, they use column orientation to achieve compression but then store multiple columns in a data pack. The cool thing they do is that they capture detailed statistics for each data pack which allows them to answer some queries from the stats… and allows them to retrieve and decompress only packs that seem relevant to the query. It is sort of like IBM’s skip processing… only with more intelligent skipping.

See:

The Database Revolution: Welcome to the Machine: How InfoBright Harnesses Big Data

SQL Server xVelocity

SQL Server provides a column-oriented index over base tables. This is a non-updateable redundant data structure, not a columnar DBMS… but it acts like a Level 2 column store for query processing. That is, only the columns required for a query are projected/read.

See:

Vectorwise

Vectorwise uses PAX and is a Level 1 column store in this regard. It provides a MINMAX index that performs the same function as a zone map or skip processing from IBM. It contains a fundamentally different execution engine than other PAX systems… that is it is not row/tuple based… but it is not a columnar engine. Although it sounds odd I believe that the Vectorwise architecture has the advantages of a Level 1 column store and some of the advantages of a Level 3 column store without the advantages of a Level 2 column store.

See:

Vectorwise: Beyond Column Stores

OLTP & Data Loading

First, you should be able to see clearly from the posts why OLTP is poor for column stores… the data comes in as a row and has to be decomposed into a bit mapped, compressed, column. This is like incurring the overhead for an index update and compression at the same time. The same problem has to be overcome for data loading.

The paper referenced in Part 1 on C-Store outlined an approach to get around the issue for data loading… and Vertica is the commercial offspring of the C-Store project so it uses this approach. But the C-Store tuple-mover architecture still includes too much overhead for OLTP due to constraints tied to I/O… so HANA implemented the approach in-memory which results in a single table that supports both OLTP and analytics… supporting both write-optimized and read-optimized workloads.

JOINs

Level 1 and Level 2 columnar databases join using a row engine… that is they convert columns to rows and then join. This misses some powerful opportunities for optimization from joining native column structures.

Hybrid systems like HANA and DB2 BLU support a mixture of row and column tables and allow joins across the types. Note that early materializing, Level 2 column stores like Teradata and Greenplum also support a mix… but they materialize the columns as rows early. DB2 BLU currently converts column-oriented tables to a row orientation and joins using the row engine. In this case BLU operates like a Level 2 maturity column store. I would expect that this will change over time.

The HANA optimizer has some limited ability to join mixed tables natively. Otherwise the optimizer has the ability to decide whether to convert row-oriented tables to column orientation or to convert column-oriented tables to rows. Note that with support for a single table for both OLTP and analytics HANA users typically deploy only columnar tables and avoid orientation translation during runtime.

Analytics

If you materialize rows late in a Level 3 mature column store then you get significant performance benefits. Using SIMD instructions provides an 8X performance advantage over using conventional ADD instructions. Super-computing vector processing provides at least that much again. Loading compressed columns instead of rows reduces CPU stalls to almost nothing providing another 2X boost… and getting data from the cache instead of accessing DRAM provides a 5X-20X (average of 15X) improvement. These benefits are why BLU and HANA are so compelling when compared to early materializing, L2, implementations like Teradata and Greenplum.

Other Points

Note that HANA does not gain a huge I/O advantage from columnar compression… this is because HANA only performs I/O on a large-scale at startup. The gains from compression for an IMDB come through efficient use of memory as compared to a Level 2, early materializing, columnar implementation.
IBM has criticized the HANA delta store/OLTP+Analytics implementation (can’t find the post for some reason… but I know that I read it)… but I do not see the logic behind the criticism? Row orientation works best for OLTP and one way or the other you have to change the orientation if you want the benefits of columnar…

Who is How Columnar? Exadata, Teradata, and HANA – Part 2: Column Processing

In my last post here I suggested that there were three levels of maturity around column orientation and described the first level, PAX, which provides columnar compression. This apparently is the level Exadata operates at with its Hybrid Columnar Compression.

In this post we will consider the next two levels of maturity: early materialized column processing and late materialized column processing which provide more I/O avoidance and some processing advantages.

In the previous post I suggested a five-column table and depicted each of those columns oriented on disk in separate file structures. This orientation provides the second level of maturity: columnar projection.

Imagine a query that selects only 4 of the five columns in the table leaving out the EmpFirst column. In this case the physical structure that stores EmpFirst does not have to be accessed; 20% less data is read, reducing the I/O overhead by the same amount. Somewhere in the process the magic has to be invoked that returns the columns to a row orientation… but just maybe that overhead costs less than the saving from the reduced I/O?

Better still, imagine a fact table with 100 columns and a query that accesses only 10 of the columns. This is a very common use case. The result is a 9X reduction in the amount of data that has to be read and a 9X reduction in the cost of weaving columns into rows. This is columnar projection and the impact of this far outweighs small advantage offered by PAX (PAX may provide a .1X-.5X, 10%-50%, compression advantage over full columnar tables). This is the advantage that lets most of the columnar databases beat Exadata in a fair fight.

But Teradata and Greenplum stop here. After data is projected and selected the data is decompressed into rows and processed using their conventional row-based database engines. The gains from more maturity are significant.

The true column stores read compressed columnar data into memory and then operate of the columnar data directly. This provides distinct advantages:

Since data remains compressed DRAM is used more efficiently
Aggregations against a single column access data in contiguous memory improving cache utilization
Since data remains compressed processor caches are used more efficiently
Since data is stored in bit maps it can be processed as vectors using the super-computing instruction sets available in many CPUs
Aggregations can be executed using multiplication instead of table scans
Distinct query optimizations are available when columnar dictionaries are available
Column structures behave as built-in indexes, eliminating the need for separate index structures

These advantages can provide 10X-50X performance improvements over the previous level of maturity.

Summary

Column Compression provides approximately a 4X performance advantage over row compression (10X instead of 2.5X). This is Column Maturity Level 1.
Columnar Projection includes the advantages of Column Compression and provides a further 5X-10X performance advantage (if your queries touch 1/5-1/10 of the columns). This is Column Maturity Level 2.
Columnar Processing provides a 10X+ performance improvement over just compression and projection. This is Column Maturity Level 3.

Of course your mileage will vary… If your workload tends to touch more than 80% of the columns in your big fact tables then columnar projection will not be useful… and Exadata may win. If your queries do not do much aggregation then columnar processing will be less useful… and a product at Level 2 may win. And of course, this blog has not addressed the complexities of joins and loading and workload management… so please do not consider this as a blanket promotion for Level 3 column stores… but now that you understand the architecture I hope you will be better able to call BS on the marketing…

Included is a table that outlines the maturity level of several products:

Product	Columnar Maturity Level	Notes
Teradata	2	Columnar tables, Row Engine
Exadata	1	PAX only
HANA	3	Full Columnar Support
Greenplum	2	Columnar tables, Row Engine
DB2	3	BLU Hybrid
SQL Server	2	I think… researching…
Vertica	3	Full Columnar Support
Paraccel	3	Full Columnar Support
Netezza	n/a	No Columnar Support
Hadapt	2	I think… researching…

Author: Rob Klopp

New SAP Post: HANA, Columns, and OLTP

Like this: