Rob Klopp

How DBMS Vendors Admit to an Architectural Limitation: Part 1 – Oracle Exadata

Database vendors don’t usually admit to shortcomings… they protest that they have no shortcomings until the market suggests otherwise… then they make some sort of change that signals an admission. This post will explore three of these admissions: Oracle and the shared-nothing architecture, DB2 on the mainframe and the shared-nothing architecture, and Teradata and in-memory processing.

For years Oracle verbally thrashed Teradata in the market… proclaiming that the shared-nothing architecture was bunk. But in the data warehousing space Teradata acquired a large chunk of the market; and more importantly, they won more business as the size of the data warehouses grew. The reason for this is two-fold: the shared-nothing architecture lets you deliver more I/O bandwidth to the problem… and once you have read the disk it provides scalability to deliver more compute to process complex queries.

Finally Oracle had enough and they delivered Exadata, a storage engine attached to the conventional Oracle RAC that provided shared-nothing I/O bandwidth to the biggest part of the problem… the full file scan of big fact tables. This was an admission that they had been wrong all along.

Exadata was a tack-on… not a fundamental redevelopment of the Oracle database engine. They used the 80/20 rule to quickly get something to market and stem the trickle of Oracle customers who were out of gas on RAC and headed to shared nothing products: Teradata, Netezza, and Greenplum.

This was a very smart move and it worked. Even though the 80/20 approach meant that there were a significant number of queries, the complex queries that needed to process large working sets to execute joins, Exadata solved enough of the problem to keep devout Oracle shops in the church. Only the shops who felt that complex query performance was important enough to warrant the cost of a migration (for an existing DW that had grown up) or the lesser cost of introducing a new technology (for a new DW) would move.

So, while Exadata was a smart move… it is a clear admission that shared-nothing is the right architecture for data warehouses and marts. This admission makes it clear that it is silly to build a warehouse or mart on normal Oracle or on RAC unless you consider your database an inviolable part of a technological creed.

In my opinion selecting a database is an engineering process that does not require orthodoxy… we should be strong enough engineers to pick the better technology and learn it. Being an “Oracle shop” is lazy.

Note that the in-memory technologies provided in Oracle12c are significant… and for warehouses and marts that will fit on a single node, 12c as it matures, will be a fine choice for the orthodox Oracle shop and for others. For bigger data applications you will require Exadata and the limitations that come with it.

This provides a nice transition to the Part 2 post on Teradata and in-memory.

Related Posts

Database Fog Blog

Other References

Thoughts on in-memory columnar add-ons (dbms2.com)
Major in memory performance boosts for Oracle 12c (techcentral.ie)
Oracle’s Ellison talks up ‘ungodly speeds’ of in-memory database. SAP: *Cough* Hana *cough* (theregister.com)
Oracle RAC (Wikipedia)
Kevin Closson Interview (Northern California Oracle Users Group Journal)

No Empathy for DevOps

Ugh.

I loved the concept of DevOps and talked it up in the companies I was associated with. Within a database DevOps had a long history as both database products, database ETL facilities, and end-user applications became more dev-operable. The idea that infrastructure had to become code has been a part of the best DBA’s mantra for years.

The cool thing is that you could walk into a first-rate shop and tell that DevOps was part of the infrastructure. You could see it work. If there were two database systems running side-by-side you could determine which system had DevOps components built in.

But somehow the DevOps concept has become a process rather than an outcome. DevOps is no longer an infrastructure as code… it has become a development process, a method, that has qualities like “empathy”. It is “a software development method that stresses communication, collaboration (information sharing and web service usage), integration, automation and measurement between software developers and Information Technology (IT) professionals”. It is a “culture”… that requires corporate management buy-in.

Ugh.

IMO DevOps is a set of software features that provide resiliency. These features are coded specifically for applications… or they are applications architected to be restartable within some larger, software-based infrastructure. Software defined machines (virtual machines), software defined networks, software defined storage sub-systems are all examples of infrastructure that could be coded to provide a self-healing business application. It is these features that we see when we recognize DevOps at work.

It may be true that there is a method that best supports the development of these features… but the evolution of the word “DevOps” from a set of engineered features to a method that focusses on people is not a positive development.

I would suggest that every DBA think about how to add DevOps capabilities to the processes that support your business applications. I suppose that these DBAs should also be empathetic and collaborate with the application developers… but empathy and collaboration are not the measure of a system that is built on the principles of DevOps.

References:

Wikipedia: DevOps
O’Reilly Radar: What is DevOps?
O’Reilly Radar: Revisiting What is DevOps…

A New Year Announcement

Let me start the New Year by announcing that I have a new job as the CTO of the US Social Security Administration. I am in an appointed position, part of the President’s administration, but with a technical, not a political or policy charter. It is daunting and exciting… The SSA IT staff are switched on and extremely competent… the idea that government IT staff are zombies is just bunk… and I expect we’ll do some good work together over the next two years.

Part of my confidence comes from the momentum generated by the healthcare.gov program. While the public discussion about the rollout was political… both parties, and especially the President, recognized that the Silicon Valley team led by Mikey Dickerson, managed to turn it around very quickly… and that maybe there were more opportunities in other programs to move things along.

So wish me luck.

Rob

Key Values and Key-Value Stores and In-memory Databases

Back to more geeky topics… although my Mom loved the videos…

When very high performance is required to return key performance metrics derived from large volumes of data to a very large number of clients… in other words when volume and velocity are factors and the results are to be delivered to thousands of users (I suppose that I could conjure up a clever V here… but the cleverness would come from the silliness of the semantic stretch… so I’ll leave it to you all to have some pun); the conventional approach has been to pre-compute the results into a set of values that can be fetched by key. In other words, we build a pre-aggregated and pre-joined result table that provides answers to a single query template. Conventionally building this query-specific result table has been the only way to solve these big problems.

Further, we conventionally store these key-value results in a relational DBMS and fetch a row at a time providing pretty darn good performance. Sometimes pretty darn good is not good enough. So there are new options. Key-value data stores may well offer a solution that provides performance and scale at a price well below what has been conventional to date. This is well and good.

But I would like to challenge the conventional thinking a little. The process of joining and aggregating volumes of data into results is a BI process. For the last twenty years BI practitioners have been building pre-aggregated tables and data marts to solve these same problems… maybe not at scale… and this practice has proven to be very expensive, the opposite of agile, and unsustainable. The people costs to develop and support multiple pre-computed replicas is exorbitant. The lack of flexibility that comes from imposing a longish development project over what is essentially an aggregate BI query is constraining our enterprise and our customers.

A better approach is to use the new high performance database products: HANA, BLU, Oracle 12c, or maybe Spark; to aggregate on-demand. Included in this approach is a requirement to use the new high performance database computing platforms available to house the databases.

Consider this… an in-memory DBMS can aggregate 12M rows/sec/core. It can scan 3MB/msec/core. Companies like SGI and HP are ganging processors together so that you can buy a single node that contains 32, 64, or 128 cores… and this number will go up. A 64-core server will aggregate 768M rows/sec and scan 19.2TB/sec… and you can gang a small number of nodes together and scale out.

Providing an extensible BI platform for big data is so much easier than building single-query key-value clusters… there is much less risk… and the agility and TCO make it close to a no-brainer. We just have to re-think the approach we’ve used for 20 years and let the new software and hardware do the work.

Some HANA and Intel Videos

Here are two videos of me speaking from the 2013 Intel Developer Forum FYI.

The first has some technical detail:

The second is more of a PR pitch about Intel Hadoop:

I’m working with Intel on a new video with a pretty interesting storyline (at least I hope that you find it interesting?)… so stay tuned.

Rob

Hadoop Squeezes Greenplum

For several years now I have been suggesting that Hadoop will squeeze the big data RDBMSs: Teradata, Exadata, Greenplum, and Netezza… squeezing them first out of the big data end of the market and then impinging on the high-end of the EDW space. Further I have suggested that there may be a significant and immediate TCO reduction from using Hadoop with your EDW RDBMS which squeezes these product’s market faster and further.

Originally I suggested that Greenplum and Netezza would feel the squeeze first since they were embracing Hadoop directly and at the expense of their RDBMS offerings. Greenplum took this further by trying to compete on price… cutting the price of the GPDB and then introducing HAWQ, basically GPDB on HDFS, at a Hadoop DBMS price point. These moves coupled with a neglect of the EDW market where Greenplum made its name apparently has allowed Hadoop to squeeze Greenplum out of the commercial market.

My network has been humming with rumors from reliable sources for 4+ weeks now… and I am now getting confirmation from both inside and outside Pivotal that the Greenplum software will move to open source in short order. The details are being worked out… and while there may still be a change of heart… it seems to be a done deal. The buzzness plan that Greenplum embarked on prior to the EMC acquisition in 2010 has not been a commercial success.

No one is sorrier to see this than me. Greenplum had a real shot at success. It was a very solid piece of work leading the space with strong architectural extensions like data flow shared nothingness, hybrid row/columnar capabilities, and into big data applications. The ORCA optimizer had the potential to change the game again.

Greenplum was nearly profitable in 2009 running hard at Teradata and Exadata and Netezza in the EDW space. The EDW market is tough… so we have to be fair and point out that pursuing this market may have led to the same result… but a small-market analytics play was followed by an open-source Hadoop play that could only end in squeezing Greenplum. There was never really a business plan with a win at the end.

Hopefully by open sourcing Greenplum some of the sound software will make it into PostgreSQL… but dishing Greenplum into the open source space with few developers and no community dishes it into the same space that Informix, Red Brick, and others sit. I know that I suggested open sourcing Greenplum over 18 months ago (see the wacky idea here)… but the idea then, as now, amounts to capitualization. I just declared what seemed to me to be inevitable a little sooner than Pivotal.

Teradata has now further embraced Hadoop… and they run the risk of repeating the Greenplum downturn. They have a much stronger market platform to work from… but in the long run this may also be a deadly embrace.

So here is another wacky idea. The only successful business model around open source software to date (which is not to say that there is not some other model to be discovered) generates revenue from support and services and just a little software around the edges. Teradata has a support team and a services business that knows big data and is embedded in the enterprise… Cloudera, Hortonworks, and MapR are not close here. Were Teradata to go after the Hadoop market with their own distribution (not much of a barrier to entry here.. just download the Apache stuff and build a team of committers… they might even be able to pick up the Pivotal team)… they would start from a spot way ahead of the start-ups in several respects… in several hard respects. Further they have Aster IP which could qualify as software around the edges. As a Hadoop player Teradata could more easily manage how Hadoop squeezes their business, mitigate risk, and emerge a big winner in the big data space.

More thinking on Specialized Databases

Recently I posted (here) some thinking that suggested that the cost of replicating data into specialized databases might outweigh the benefits of specialization. This post will present a counter view and try to sort out when a specialized database might make sense.

In the ZDNet post here: “Look at What Google and Amazon are doing with Databases: That’s your future” Toby Wolpe and Neo Technology CEO Emil Eifrem suggest that:

“The era of the one-size-fits-all database is over. It used to be when I grew up as a developer that for the architect in the project, when it came to choosing the bottom layer of the stack — the persistence layer — the choice was Microsoft, or IBM, or Oracle, or Sybase. It was a vendor choice.

They were all the same type of database. But that era has gone forever and it will never come back because data is just so big and so irregularly shaped now that you’re always going to be able to get a hundred times improvement, a thousand times improvement, a million times improvement if you get a data technology that is shaped like the shape of your data.”

While I have suggested that a swiss army knife DBMS that solves many problems from a single data source… thereby eliminating the cost and complexity of data replication and data synchronization… might provide a sensible choice for most commercial applications.

Actually I agree with Eifrem and Wolpe in many respects… but there is a difference in our starting assumptions. Let me be clear first about where I strongly agree.

When data volumes grow to web-scale… to Google-scale or Amazon-scale… then the inefficiencies of one-size-fits-all amplify and become intractable… so with a specialized DBMS you might indeed see 100X, 1000X, or more performance advantage and gain a competitive edge from replication and specialization.

But a lot of core data is not Big Data. This is where we do not seem to agree. While our company’s all aspire to have a customer database that is in the petabyte range… it is just not usually the case. Likewise we aspire to have a transaction database requiring petabyte scale… but it just is not the case in most businesses even if you keep years and years of history.

Let’s consider graph databases… Maybe customer data should be in a graph database to specialize it for processing relationships. But this is likely to make it sub-optimal for many other processes… in fact it is the thesis of the ZDNet article that it will make it sub-optimal for many other processes… and so replication to more specialized databases is the only alternative.

How might we handle this relationship problem in a generalized DBMS? HANA, for example, can form graphs in-memory from data shaped into columns… unfolding the graphDB blade from the swiss army knife when required but storing the data in a generalized shape otherwise.

It may be true that there could be orders of magnitude advantage for big data shaped into a specialized graphDB form… But if your customer database is in the terabyte range or less, then the advantage may be negligible… or at least the advantage may not justify the cost of replication into two forms.

And think about the implications of specializing big data. Google replicates tens of petabytes of data into multiple shapes to gain competitive advantage… and ten petabytes specialized and replicated ten times is really really big data.

So I agree with parts of the ZDNet post… big data companies are likely to be pushed by the competition to store the data multiple times in specialized replicated big databases… and for this you will look to Google, Amazon, Netflix, and the like for database technology. But most enterprises will be able to store core data in generalized databases… and will extend into big data realms only as machine-to-machine transactions and/or the Internet of Things drive them there… and then they will extend their data architectures rather than replicate again and again.

Some Database Performance Concepts

I’m working on a new idea… it may or may not pan out… but here are some concepts for your consideration… with some thoughts on their performance implications.

First a reminder… a reality check. In my experience if you POC two databases at about the same price point…and one is 30% faster than the other, 1.3X, then 50% of the time the faster DBMS will win the business. If one DBMS is 2X faster… then it will win the business 90% of the time. The 10% where the faster product loses will be because of internal politics or, for an existing application, due to the migration costs. Note that IMO it is silly to do a POC if you know up front that you will not pick the winner.

Now to the concepts… Note that these are ballpark numbers to help you think about trade-offs…

The latency to start fetching data from DRAM is 100 ns… from disk it is 10M ns. If we assume that a smart RDBMS pre-fetches 80% of the data into DRAM then we can assume that an in-memory DBMS has a 200,000X performance advantage over a disk-based system.

The latency to a Flash/SSD device is 100K-200K ns. With the same 80% pre-fetch assumption an in-memory DBMS will be 20,000X faster.

Note that neither of these models include data transfer times which will favor in-memory databases even more.

If we have a hybrid system with both disk and SSD and we assume that 90% of the reads hit the SSD and that both layers in the storage hierarchy achieve 80% pre-fetch then then the in-memory system will be 38,000X faster. If fewer than 90% of the reads hit the SSD, then the latency goes up quickly.

These numbers form the basis for selecting in-memory caches like Teradata’s Intelligent Memory option as well as in-memory offerings from IBM, Microsoft, Oracle and SAP.

For typical data warehouse workloads column compression will provide around a 2.5X performance boost over row compression. This has two implications: you will get 2.5X better performance using column storage and you will get 2.5X more data into the faster levels of your storage hierarchy… more in SSD and more in-memory.

If we assume that a typical query only touches 10% of the columns in the tables addressed… then column projection provides a 9X performance boost over a row store. Exadata does not support column projection in the storage layer… and other hybrid row-or-column systems provide it only for columnar tables.

If we assume that the average latency from the processor caches is 10ns (.5ns L1, 7ns, L2, 15ns L3) and the latency to DRAM is 100ns then an in-memory system which pre-fetches data effectively into the processor caches will be 10X faster than one which goes to DRAM. If we assume that a standard RDBMS which processes uncompressed standard data types (no vector processing) gets a 20% cache hit ratio then the advantage to a cache aware RDBMS which loads full cache lines is around 8X. HANA, BLU, and the Oracle in-memory products are cache aware and get this boost with some caveats.

BLU and the Oracle in-memory option are hybrid systems that often convert data to a row form for processing (see here for some data on Oracle). If we assume that they use the full columnar in-memory vector-based structures 50% of the time then these products will see a 4X performance boost. HANA recommends that all data be stored in a columnar form so it would often see the full 8x boost.

These vector-based processes also avoid the cost of decompression… and since they process compressed vector data they can fit more information into each cache line. There is another 20%-200% (1.2X-2X) boost here but I cannot estimate it closer than that.

Finally, the vector based processes use the high performance computing instruction sets (AVX2) offered on modern CPUs… and this provides another 10X+ boost. Again, BLU and Oracle will utilize the vector form less often than HANA so they will see a boost over products like Teradata… but not see as large a boost as HANA.

There are other features at play here… some products, like HANA, shard data in-memory to get all of the cores busy on each query. I have not been able to determine if BLU or Oracle in-memory are there yet? Note that this powerful feature will allow a single query to run as fast as possible… but the benefit is mitigated when there is a workload of multiple concurrent queries (if I have 4 cores and 4 queries running concurrently, one query per core, then the 4 queries will take only a little more time than if I run the 4 queries serially with each query using all 4 cores).

It is important to note that the Oracle In-memory option does not run in the storage component of an Exadata cluster… only on the RAC layer. It is unclear how this works with the in-memory option.

The bottom line is that in-memory systems are not all alike. You can sort of add up the multipliers to get in the ballpark on how much faster Teradata will be with the Intelligent Memory option… how much faster than that a hybrid row and vector-column system like BLU or Oracle In-Memory… and how much faster a pure in-memory system might be. One thing is for sure… these in-memory options will always make the difference in a POC… in every case including them or not will blow away the 2X rule I started with… and in every case the performance benefit will outweigh the extra cost… the price/performance is very likely to be there.

I know that is skipped my usual referencing so that you can see where I pulled these numbers from… but most of this information is buried in posts here and there on my blog… and as I stated up front… these are ballpark numbers. Hopefully you can see the sense behind them… but if you think I’m off please comment back and I’ll try to adjust…

A Modern Data Warehouse Architecture: Part 3 – Build an EDW Annex

In the first two post of this series (here and here) I first suggested that Hadoop could be effectively used as the platform for staging and then suggested that a modern warehouse would have a federation layer that turned it into a logical data warehouse. Figure 3 depicts this extended architecture.

But if we have both Hadoop and a federation layer implemented… and we recognize the economics associated with moving data to Hadoop… Hadoop provides a 5X-50X price advantage over a commercial very large DBMS product… and we can move data from the expensive environment to the low-cost environment without impacting any applications… then we have the opportunity to move governed EDW data to Hadoop and place it into a Hadoop EDW Annex. Figure 4 shows this.

Now you might suggest that there is an impact… Hadoop will be significantly slower than a commercial EDW platform (for now…). But experienced EDW architects understand that in the classic architecture we had to co-locate data in a single database to join the data. So, we put all of the data, hot and cold data, in our EDW even though the service levels required for queries that touch old cold historical data did not justify the power and price of the EDW infrastructure. We had to but did not need to. We knew, if only implicitly, that most EDW queries touch a small subset of the data. Following the ratio suggested by Teradata (see the reference here) that 90% of the queries touch only 20% of the data we can imagine a system where 80% of the data resides in Hadoop to service 10% of the queries… and only that 10% experiences Hadoop performance.

I suggested this approach for Teradata here… but an architecture with an EDW Annex to store cleansed governed historical data works for any expensive RDBMS that can federate with Hadoop: Exadata, Netezza, Teradata, or HANA.

This concludes this series… sort of. I’ll post soon to express more about how this architecture provides long-term strategic value. I think that these three concepts: Hadoop as an EDW staging area, federation and logical data warehousing, and Hadoop as an EDW Annex; provide the foundation for a modern EDW… and I imagine that over the next several years this will become the reference architecture most of us will build to.

A Modern Data Warehouse Architecture: Part 2 – Make it a Logical Data Warehouse

In the first post of this series (here) I suggested that Hadoop could be effectively used as the platform for staging. Figure 2 describes the result. In this post I will extend the architecture by adding a data federation feature and turning the entire picture into a logical data warehouse… and then we’ll consider the consequences.

Figure 3 shows this extension.. but lets quickly review what a federation fabric provides (for more detail on this please see the series that starts here).

First, the fabric allows any of the tables and files in the picture to be registered as “virtual” tables. This includes tables in EDWs, and any marts, in one or more Hadoop systems, the sandbox, and even in the source systems. From here the fabric is viewed by any programs in the BI, Analytics, Apps layer as a single relational database composed of all of the registered tables.This fabric would consume ODBC and JDBC queries at a minimum and provide some level of data translation, function translation, and query translation to allow all of the virtual tables to be queried through the single SQL dialect offer by the fabric. Finally, the fabric would provide some measure of optimization to reduce the overhead of accessing these distributed systems. It is this optimization that is the main topic of the series I suggested above.

One of the important implications of this… one that is often overlooked before implementation… is that the queries emitted from the fabric add workload to the underlying databases. If, in Figure 3, the top EDW is 100% busy servicing un-federated queries then adding a workload that joins that data to the 2nd EDW’s data will overburden the system. This is why it is not usually sound design to integrate source, OLTP systems into a fabric. The OLTP systems are not likely to be optimized in any way for the resulting workload.

But the upside of the fabric is significant. Consider:

If we just said “yes” and loaded ungoverned data into a sandbox the business users can immediately access that data and join it to dimensions and facts deployed elsewhere in the enterprise.
Rogue data marts can now be integrated back into the fold.
Redundant data deployed to allow joins within a single database instance can be eliminated and the joins can be federated. Note that federation is no silver bullet… there may be performance reasons for co-locating data… but you now can consider the trade-offs (I’ll post later on a way to federate and improve performance).
Data bases products can be retired without affecting the programs that access them. If you have an old data mart built on a product that you would rather not support or license… you can move the data and re-point the virtual tables without impacting the tools and applications at all.
Data can be relocated based on economics and/or for performance reasons… this will be the subject of the next blog in this series… but as a teaser, remember the economics… Hadoop costs $1K/TB (ok… $1K-$4K) hardware included and commercial databases cost much more.

There is one final advantage to this that is strategic and important enough to deserve to not be buried in the previous bullet list… All of us have seen the database product market move in the last 6+ years from a place where the choices were DB2, Oracle, SQL Server, or Teradata… to a market with those products plus Exadata, plus two flavors of SQL Server, plus Netezza, plus Greenplum, plus Vertica… and then plus Impala, plus Hive, plus Tez, plus Spark, plus MongoDB, plus Cassandra, plus plus plus. I think that it is impossible to place database bets today with the confidence that these bets will pay off five years from now… and certainly there are no bets good for ten years. If you are betting on infrastructure to support the Internet of Things the bets are more risky still. In my opinion a federation layer provides critical insulation from this chaos. With federation you can build applications knowing that you can acquire and retire database products and not affect the queries. IMO this insulation is a strategic imperative…

Sorry… I got a little excited… I don’t usually spew foam. Let me try again. IMO you should seriously consider the benefits of federation in both your EDW architecture and in your enterprise data architecture.

Part 3 of this series considers a new extension to the architecture by adding an EDW Annex… you can see it here.

References

Gartner Blog: Mark Beyer, Father of the Logical Data Warehouse, Guest Post
Cisco: Data Virtualization Platform
HANA Blog: Smart Data Access

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: