Hadoop and Company Financial Performance

I have posted several times about the impact of the Hadoop eco-system on a several companies (here, here, here, for example). The topic cam up in a tweet thread a few weeks back… which prompts this quick note.

Fours years ago the street price for a scalable, parallel, enterprise data warehouse platform was $US25K-$US35K per terabyte. This price point provided vendors like Teradata, Netezza, and Greenplum reasonable, lucrative, margins. Hadoop entered the scene and captured the Big Data space from these vendors by offering 20X slower performance at 1/20th the price: $US1K-$US5K per terabyte. The capture was immediate and real… customers who were selecting these products for specialized, very large, 1PB and up deployments switched to Hadoop as fast as possible.

Now, two trends continue to eat at the market share of parallel database products.

First, relational implementations on HDFS continue to improve in performance and they are now 4X-10X slower than the best parallel databases at 1/10th-1/20th the street price. This puts pressure on prices and on margins for these relational vendors and this pressure is felt.

In order to keep their installed base of customers in the fold these vendors have built ever more sophisticated integration between their relational products and Hadoop. This integration, however, allows customers to significantly reduce expense by moving large parts of their EDW to an Annex (see here)… and this trend has started. We might argue whether an EDW Annex should store the coldest 80% or the coldest 20% of the data in your EDW… but there is little doubt that some older data could satisfy SLAs by delivering 4X-10X slower performance.

In addition, these trends converge. If you can only put 20% of your old, cold data in an Annex that is 10X slower than your EDW platform then you might put 50% of your data into an Annex that is only 4X slower. As the Hadoop relational implementations continue to add columnar, in-memory, and other accelerators… ever more data could move to a Hadoop-based EDW Annex.

I’ll leave it to the gamblers who read this to guess the timing and magnitude of the impact of Hadoop on the relational database markets and on company financial performance. I cannot see how it cannot have an impact.

Well, actually I can see one way out. If the requirement for hot data that requires high performance accelerates faster than the high performance advances of Hadoop then the parallel RDBMS folks will hold their own or advance. Maybe the Internet of Things helps here…. but I doubt it.

Trends in Systems Architecture and the HANA Platform

This post is a follow on to the video on Systems Architecture here (video). I’ll describe how HANA, the Platform not the Database, is cutting fat from the landscape. If you have not seen the video recently it will be worth a quick look as I will only briefly recap it here.

The video, kindly sponsored by Intel, suggests that the distributed architecture we developed to collect lots of very small microprocessors into a single platform in order to solve big enterprise sized problems is evolving as multi-core microprocessors become ever more capable. In my opinion, we are starting to collapse the distributed architecture into something different.

DBFB SAP Distributed Landscape v1
Figure 1. A Distributed SAP Landscape

Figure 1 shows how we might deploy a series of servers to implement an SAP software landscape. Note that if we removed a couple of pieces the picture represents a platform for just about any enterprise solution. In other words, it is not just about SAP ERP solutions. I have left this in so that companies with SAP in place can see where they may be headed.

Figure 2 shows the same landscape deployed in the HANA Platform. It is a very lean deployment with all of the fat of inter-operating systems combination replaced by processes communicating over lightweight thread boundaries. Squeezing the fat out will provide significant performance benefits over the distributed deployment… it will be 2-5 times faster easily… and 2-3 times faster even if you deployed the landscape on a single server in a series of virtual machines as you have to add in the overhead of a VM plus the cost of communication over virtual networks.

Figure 2. The Landscape Collapsed into threads on the HANA Platform

The HANA Platform is very powerful stuff… and if you had the choice between SAP and a competitor, the ability to match the performance on 1/2 the hardware might provide an overwhelming advantage to HANA.Note that in this evaluation I have not included any advantages from the performance kick due to the HANA Database… so the SAP story is better still.

Those of you who heard me describe HANA while I was at SAP know that this is not a new story. It was to be the opening of the shelved book on HANA. So this post satisfies my longstanding promise to get this story documented even though the book never generated much executive enthusiasm.

The trend in systems architecture outlined in the video is being capitalized by SAP as the HANA Platform… and the advantage provided is significant. You should expect SAP customers to rapidly move to HANA to take advantage of this tight, high-performance landscape.

But the software world does not stand still for long… and there are already options emerging that approximate this approach. Stay tuned…

Exit mobile version
%%footer%%