Rob Klopp

NoCOUG Referral

I would like to point you to two articles in the latest Northern California Oracle Users Group (NoCOUG) Journal here.

The first is an interview of Kevin Closson here. The interview is long and will take some time to get through… so set aside 30 minutes… it will be worth it as Kevin discusses Exadata, shared-nothingness, and other topics related to database hardware architecture.

The second article I would like to suggest (by the way there are several other excellent articles) is by Dr. Bert Scalzo. He reminds us that our job as engineers is to build the most cost-effective solution… not to build the perfect solution. He suggests that hardware should be treated as a dynamic resource that can be provisioned easily to solve performance problems.

I have argued that in a shared-nothing, scalable, architecture it is often cheaper to add another $20,000 fat server than to spend $100,000 of staff time to tune around a performance problem. This is especially true when the tuning involves building indexes and materialized views or pre-aggregated tables that make your warehouse fragile and more difficult to tune the next time. See here…

Back to Kevin’s interview and to tie the two articles together… Kevin suggests that as long as data flows into the CPUs fast enough then there is no reason to pick a shared-nothing architecture over a shared-everything architecture. He insists on symmetry and rightfully points out that a shared-everything system can be symmetrical. But it is more difficult to maintain symmetry as you scale up a shared-everything system… and easy scale is what is required to treat hardware as a dynamic resource. So… I remain convinced that shared-nothing is the way to go…

Numbers Everyone Should Know

Some of you have seen me build simple models to do a reality-check on architecture (see here, for example). Here are some metrics from a great presentation by Jeff Dean, a Google fellow.

Numbers Everyone Should Know

L1 cache reference	0.5 ns
Branch mispredict	5 ns
L2 cache reference	7 ns
Mutex lock/unlock	25 ns
Main memory reference	100 ns
Compress 1K bytes with Zippy	3,000 ns
Send 2K bytes over 1 Gbps network	20,000 ns
Read 1 MB sequentially from memory	250,000 ns
Round trip within same datacenter	500,000 ns
Disk seek	10,000,000 ns
Read 1 MB sequentially from disk	20,000,000 ns
Send packet CA->Netherlands->CA	150,000,000 ns

Of note is the 120X difference between the cost of reading 1MB from memory and the cost of reading 1MB from disk…

The entire presentation may be found here: http://www.odbms.org/download/dean-keynote-ladis2009.pdf

Happy Modeling…

Who is Massively Parallel? HANA vs. Teradata and (maybe) Oracle

I have promised not to promote HANA heavily on this site… and I will keep that promise. But I want to share something with you about the HANA architecture that is not part of the normal marketing in-memory database (IMDB) message: HANA is parallel from its foundation.

What I mean by that is that when a query is executed in-memory HANA dynamically shards the data in-memory and lets each core start a thread to work on its shard.

Other shared-nothing implementations like Teradata and Greenplum, which are not built on a native parallel architecture, start multiple instances of the database to take advantage of multiple cores. If they can start an instance-per-core then they approximate the parallelism of a native implementation… at the cost of inter-instance communication. Oracle, to my knowledge, does not parallelize steps within a single instance… I could be wrong there so I’ll ask my readers to help?

As you would expect, for analytics and complex queries this architecture provides a distinct advantage. HANA customers are optimizing price models sub-second in-real-time with each quote instead of executing a once-a-week 12-hour modeling job.

June 11, 2013: You can find a more complete and up-to-date discussion of this topic here… – Rob

As you would expect HANA cannot yet stretch into the petabyte range. The current HANA sweet spot is for warehouses or marts is in the sub-TB to 20TB range.

Real-time Analytics and BI: Part 1 – Singing for my Dinner

Several months ago I was invited to a dinner attached to a data science summit… with the price being that I had to deliver a 5 minute talk… I had to sing for my dinner. The result was this thinking on real-time analytics and the Toyota Prius.

Real-time analytics implies two things:

Changes in the data are evaluated continuously; and
The results of the analysis are used or displayed continuously.

In a Toyota Prius we can see two examples of real-time analytics.

The first is in the anti-lock braking system. There data reflecting the pressure on the brake pedal and on rotation of each wheel is sent to a computer that analyzes the results and adjusts the brake pressure on each wheel so that all four wheels turn at the same rate and the car stops in a straight line.

Note that the analytics are real-time and the results are used immediately without human intervention. This is important. It makes little sense to spend the money to capture and analyze data in real-time if the results are not actionable in near-real-time.

Think for a moment about the BI systems built over the last 20 years. First we captured and analyzed monthly data… and acted on that data within a 30-day window. Then we increased the granularity of the data to weekly and slightly adjusted the reports to reflect the finer granularity… and acted on the data within 7 days. Then we adjusted the data to daily and acted on the results each day. Then we adjusted the data to hourly and reacted even more quickly. These changes often did not fundamentally change the business processes driven by the data… they just made the processes more sensitive to the fine-grained information.

But if the data-driven business process takes ten minutes to complete… for example it takes ten minutes for staff to pick inventory, package the results, and load a delivery truck; could there be a return on the investment expense of developing a continuous, real-time analytic? I think not. There may, however, be ROI associated with a new robotic pick, package, and load process…

There is another possibility… If sometimes the pick, package, and load takes ten minutes and sometimes it takes fifteen minutes then the best solution is to perform the analytics on the current state on-demand… when there are resources to support the process. This maximizes the use of the resources without changing the business process.

The point here is that real-time requires a re-think… or at least a deep-think. The business process may have to change significantly to support real-time analytics.

The second real-time system in the Prius illustrates the problem. On the dashboard the Prius displays, in real-time, the state of the hybrid gas-electric system. It shows whether the battery is charging or discharging… it shows whether the car is being driven using the electric or the internal-combustion engine. It is one of the most beautiful dashboard displays you have ever seen… and executives everywhere must look at it and wonder why they cannot get such a beautiful display of the state of their business… after-all… BI dashboards are “the thing”.

But the Prius display is useless. There is no action you would take while driving based on this real-time display.From a decision-making view it represents useless and expensive flash (that helps to sell the Prius…).

So… approach real-time analytics with a deep-think. Look for opportunities like the anti-lock braking system where real-time analytics can be embedded into automatic business processes. Avoid flashy dashboards that do not present actionable data.

In-memory databases (IMDB) such as SAP HANA, Oracle TimesTen, and VMWare SQLFire promise to enable real-time analytics… and this promise is real… the opportunities can and will revolutionize the enterprise over time… but a revolution is not the same old BI at a finer granularity… it is much more significant than that. Heads will roll.

New Gig…

I am back from a short vacation no longer with EMC/Greenplum. I am now on the SAP HANA team.

This move will not change this blog much… I will keep posts here unbiased.

I have a couple queued up… look for them in the next few weeks…

And thanks for following this site…

– Rob

Cloud Computing and Data Warehousing: Part 4 – IMDB Data Warehouse in a Cloud

In the previous blogs on this topic (Part 1, Part 2, Part 3) I suggested that:

Shared-nothing is required for an EDW,
An EDW is not usually under-utilized,
There are difficulties in re-distributing sharded, shared-nothing data to provide elasticity, and
A SAN cannot provide the same IO bandwidth per server as JBOD… nor hit the same price/performance targets.

Note that these issues are tied together. We might be able to spread the EDW workload over so many shards and so many SANs that the amount of I/O bandwidth per GB of EDW data is equal to or greater than that provided on a DW Appliance. This introduces other problems as there are typically overhead issues with a great many nodes. But it could work.

But what if we changed the architecture so that I/O was not the bottleneck? What if we built a cloud-based shared-nothing in-memory database (IMDB)? Now the data could live on SAN as it would only be read at start-up and written at shut-down… so the issues with the disk subsystem disappear… and issues around sharing the SAN disappear. Further, elasticity becomes feasible. With an IMDB we can add and delete nodes and re-distribute data without disk I/O… in fact it is likely that a column store IMDB could move column-compressed data without re-building rows. IMDB changes the game by removing the expense associated with disk I/O.

There is evidence emerging that IMDB technology is going to change the playing field (see here).

Right now there are only a few IMDB products ready in the market:

TimeTen: which is not shared-nothing scalable, nor columnar, but could be the platform for a very small, 400GB or less (see here), cloud-based EDW;
SQLFire: which is semi-shared-nothing scalable (no joins across shards), not columnar, but could be the platform for a larger, maybe 5TB, specialized EDW;
ParAccel: which is shared-nothing scalable, columnar, but not fully an IMDB… but could be (see C. Monash here); or
SAP HANA: which is shared-nothing, IMDB, columnar and scalable to 100TB (see here).

So it is early… but soon enough we should see real EDWs in the cloud and likely on Amazon EC2, based on in-memory database technologies.

Cloud Computing and Data Warehousing: Part 3 – ParAccel on EC2

In the previous post here I suggested that a SAN-based, cloudy, EDW is about 4X the cost for the same performance over a data warehouse appliance.. and I described why. I have actually seen this comparison.

It is difficult to compare Amazon EC2 hardware to the hardware typically assembled in a shared-nothing EDW cluster whether the hardware is from HP, Dell, Sun, IBM, or Teradata. So let’s assume that Amazon gets a 20% edge due to huge volume purchases over your firm. Note that this is a significant edge since the hardware is a commodity. Further, lets assume that Amazon gets another 30% edge in TCO on system administration costs. This is the cost of staff to manage the Linux OS and the hardware components. This may also be generous to the Amazon side of the equation. The numbers are not important… you can put in whatever seems to model your situation best… if you work for a large efficient company the numbers may go down for EC2.

Lets also assume that you reserve and receive dedicated hardware on EC2. This will not be the case but lets continue to build a best-case scenario for EC2.

From these numbers we can assume that the EC2 configuration will be 3X the cost for the same performance as a dedicated purpose-built database cluster. Again this assumes that the EC2 hardware is dedicated so this number is optimistic.

So why would anyone do this? Because EC2 has no up-front capital expense associated… it is an operating expense. This is significant.

So what is the advantage of buying ParAccel on EC2? I’m unsure. ParAccel has not done particularly well in the marketplace… but it is not clear that this is a technology issue. The answer could lie in the fact that companies deploy ParAccel on EC2 for data mart or application-specific workloads that may not use 100% of the hardware resources provided?

I think that if you work through these three blogs you can get an idea of how to model the opportunity for yourself. If the ability to spend OPEX dollars with Amazon is important… even if you need 3X the hardware… then this is a very interesting way to go.

But do not imagine that you are getting the same performance with ParAccel on EC2 that you wold get with ParAccel on HP or Dell… for a fraction of the price. There is no architectural advantage in ParAccel on EC2 over Vertica or Greenplum or any other DBMS that can run on EC2… ParAccel is, however, trying something new and interesting… if you understand the trade-offs.

In the last blog of this series (here) I’ll discuss some new approaches that may change the game… including another interesting possibility for ParAccel going forward.

Cloud Computing and Data Warehousing: Part 2 – An Elastic Data Warehouse

In Part 1 of this topic (here) I suggested that cloud computing has the ability to be elastic… to expand and maybe contract the infrastructure as CPU, memory, or storage requirements change. I also suggested that the workload on an EDW is intense and static to point out that there was no significant advantage to consolidating non-database workloads onto an over utilized EDW platform.

But EDW workload does flex some with the business cycle… quarter end reporting is additive to the regular daily workload. So maybe an elastic stretch to add resources and then a contraction has value? It most probably does add value.

The reason shared-nothing works is because it builds on a sharded model that splits the data across nodes and lets the CPU and I/O bandwidth scale together. This is very important… the limiting factor in these days of multi-core CPUs is I/O bandwidth and many nodes plus shards provides the aggregate I/O bandwidth of all disk controllers in the cluster.

What does that mean with regards to building an elastic data warehouse? It means that with each elastic stretch the data has to be re-deployed across the new number of shards. And because the data to be moved is embedded in blocks it means that the entire warehouse, every block, has to be scanned and re-written. This is an expensive undertaking on disk… one that bottlenecks at the disk controller and one that bottlenecks worse if there are fewer controllers (for example in in a SAN environment). Then, when the configuration is to shrink it process is repeated. In reality the cost of th I/Oe resources to expand and contract does not justify the benefit.

So… we conclude that while it is technically possible to build an elastic EDW it is not really optimal. In every case it is feasible to build a cloud-based EDW… it is possible to deploy a shared-nothing architecture, possible to consolidate workloads, and possible to expand and contract… but it is sub-optimal.

The real measure of this is that in no case would a cloud-based EDW proof-of-concept win business over a stand-alone cluster. The price of the cloudy EDW would be 2X for 1/2 the performance… and it is unlikely that the savings associated with cloud computing could make up this difference (the price of SAN is 2X that of JBOD and the aggregate I/O bandwidth is 1/2… for the same number of servers… hence the rough estimates). This is why EMC offers a Data Computing Appliance without a SAN. Further, this 4X advantage assumes that 100% of the SAN-based cluster is dedicated to the EDW. If 50% of the cluster is shared with some other workload then the performance drops by that 50%.

In the next post (here) I’ll consider Paraccel on the Amazon Cloud…

Cloud Computing and Data Warehousing: Part 1 – The Architectural Issues

My apologies… I was playing with the iPad version of WordPress and accidentally published a very rough outline/first draft of this post. I immediately un-published it… but not before subscribers were notified that there was a new post.

I wonder about the idea that data warehousing is suited to operate in the cloud? This was prompted by Paraccel‘s venture to deploy on the Amazon EC2 cloud infrastructure. Lets work through the architectural implications…

Here are the assumptions I’ll take into this exploration:

A shared-nothing architecture is required to scale.
Cloud infrastructure is cost-effective when the infrastructure is under-utilized and workloads can be consolidated to achieve full utilization… and not so cost-effective when the infrastructure is highly utilized. This is because applications can easily share underutilized resources in the Cloud.
Cloud infrastructure is justified when the workload is inconsistent and either CPU or storage requirements fluctuate widely over the business cycle. This is because a Cloud is elastic and can easily flex as the requirements fluctuate. Cloud computing may not be well suited to static workload requirements.

You can probably see where I’m going with this from the assumptions.

In the end I’ll suggest that there is a database architecture that is suited to warehousing and cloud computing… but let me build to that.

Before I start let me also be clear that I am talking about the database infrastructure… not the application/BI infrastructure required for data warehousing. The BI and ETL components are perfectly suited to cloud computing… they reflect a workload that, in general, runs on under-utilized hardware with BI running during the day and ETL running at night. I have suggested this to my current employer… but alas, I am neither King nor a member of Court.

So in Part 1 let me discuss my first two assumptions and the implications… In Part 2 I’ll discuss data warehousing and elasticity… In Part 3 I’ll consider the Paraccel/Amazon collaboration and in Part 4 I’ll wrap up and consider several new things coming that may change the equations.
—————-
I’ll not work too hard to justify my first assumption… I think that it is well-understood that a shared-nothing architecture provides the best possible approach to scale out. Google and others use this approach to scale to hundreds of petabytes of data and Teradata, Greenplum, Netezza, Paraccel, SAP HANA, and others use it in the data warehouse space. Exadata uses a hybrid approach that scales I/O in a shared-nothing-like storage subsystem… but fails to scale as it passes data to the RAC layer (see Kevin Closson here on the subject).

But the implications are significant for our cloud discussion. First, cloud infrastructure is designed to support general client-server or web-server based commercial computing requirements. A shared-nothing database cluster is a specialized infrastructure optimized for database processing. Implementing the specialized problem on the generalized infrastructure is possible, but sub-optimal. Next, cloud computing requires, more or less, a shared storage subsystem. A shared-nothing architecture shares nothing. Implementing a shared-nothing database on a shared storage subsystem is possible, but sub-optimal.

I believe that the second assumption is also pretty straightforward. The primary rationale for cloud computing comes from the recognition that many data centers deployed applications on servers that were not fully utilized. By virtualizing the hardware on a cloud platform the data center could better service the applications with fewer hardware resources and therefore less cost.

So… in order for cloud computing to be a perfect fit we need to observe a data warehouse database workload with underutilized hardware infrastructure… You might ask yourself… are there underutilized hardware resources upon which my EDW is built? In most cases I believe that the answer to this question will be “no”. Almost every EDW I’ve seen is over-burdened… stretched… with users demanding more and more resource… more data, more users, more queries, deeper queries drive the resource requirements up exponentially. The database is swamped all day with queries and swamped all night by ETL and reporting tasks.

So let’s end this blog concluding that there is a problematic architectural mismatch between a shared cloud and a shared-nothing implementation… and that if your warehouse database platform is highly utilized then there may be little benefit from implementing a warehouse in the cloud.

See Part 2 here…

More on Exalytics Capacity…

I found myself wondering where did the rule-of-thumb for Exalytics that suggests that TimesTen can use 800GB of a 1TB memory space… and requires 400GB of that space for work tables leaving room for 400GB of user data… come from (it is quoted everywhere… here is an example… see question #13).

Sure enough, this rule has been around for a while in the TimesTen literature… in fact it predates Exalytics (see here).

Why is this important? The workspace per query for a TPC-A transaction is very small and the amount of time the memory is held by a TPC-A transaction is very short. But the workspace required by a TPC-H query is at least 10X the space required by a TPC-A query and the duration of a TPC-H query is at least 10X the duration of a TPC-A query. The result is at least 100X more pressure on memory utilization.

So… I suspect that the 600GB of user data I calculated here may be off by more than a little. Maybe Exalytics can support 300GB of user data or 100GB of user data or maybe 60GB?

Note that this is not bad… all of this pressure on memory is still moved to Exalytics from the Exadata RAC subsystem… where memory is dear.

As a side note… it is always important to remember that the pressure on memory is the amount of memory utilized times the duration of the utilization. This is why the data flow architecture used in modern databases like Greenplum are effective. Greenplum uses more memory per transaction but it holds the memory for less time by never (almost) writing it to disk. This is different from older database architectures like Teradata and Oracle which use disk to store intermediate results… lowering the overall amount of memory required but increasing the duration of the query. More on this here…

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: