Database Hardware – Page 3 – Database Fog Blog

More on Exalytics Capacity…

I found myself wondering where did the rule-of-thumb for Exalytics that suggests that TimesTen can use 800GB of a 1TB memory space… and requires 400GB of that space for work tables leaving room for 400GB of user data… come from (it is quoted everywhere… here is an example… see question #13).

Sure enough, this rule has been around for a while in the TimesTen literature… in fact it predates Exalytics (see here).

Why is this important? The workspace per query for a TPC-A transaction is very small and the amount of time the memory is held by a TPC-A transaction is very short. But the workspace required by a TPC-H query is at least 10X the space required by a TPC-A query and the duration of a TPC-H query is at least 10X the duration of a TPC-A query. The result is at least 100X more pressure on memory utilization.

So… I suspect that the 600GB of user data I calculated here may be off by more than a little. Maybe Exalytics can support 300GB of user data or 100GB of user data or maybe 60GB?

Note that this is not bad… all of this pressure on memory is still moved to Exalytics from the Exadata RAC subsystem… where memory is dear.

As a side note… it is always important to remember that the pressure on memory is the amount of memory utilized times the duration of the utilization. This is why the data flow architecture used in modern databases like Greenplum are effective. Greenplum uses more memory per transaction but it holds the memory for less time by never (almost) writing it to disk. This is different from older database architectures like Teradata and Oracle which use disk to store intermediate results… lowering the overall amount of memory required but increasing the duration of the query. More on this here…

Co-processing and Exadata

In my first blog (here) I discussed the implications of using co-processors to offload CPU. The point was that with multi-core processors it made more sense to add generalized processing hardware that could be applied to all parts of the query process than to add specialized processors that dealt with only part of the problem.

Kevin Closson has produced two videos that critically evaluate the architecture of Exadata and I strongly suggest that you view them here before you go on with this post… They are enlightening, irreverent, and make the long post I’ve been drafting on Exadata lightweight and unnecessary.

If you have seen Kevin’s post you understand that Exadata is asymmetric and unbalanced. But his post extends and generalizes my discussion of co-processing in a nice way. Co-processing is asymmetric by definition. The co-processor is not busy after it has executed on its part of the problem.

In fact, Oracle has approximately mirrored the Netezza architecture with Exadata but used commercial processors instead of FPGAs to offload I/O and predicate processing. The result is the same in both cases… underutilized processing capability. The difference is that Netezza wastes some power on relatively inexpensive FPGA processors while Exadata wastes general and expensive CPU resources that might actually be applied usefully elsewhere. And Netezza splits the processing within a shared-nothing architecture while Exadata mixes architectures adding to the inefficiency.

More on Exalytics: How much user data fits?

Sorry… this is a little geeky…

The news and blogs on Exalytics tend to say that Exalytics is an in-memory implementation with 1TB of memory. They then mention, often in the same breath, that the TimesTen product which is the foundation for Exalytics now supports Hybrid Columnar Compression which might compress your data 5X or more. This leaves the reader to conclude that an Exalytics Server can support 5TB of user data. This is not the case.

If you read the documentation (here is a summary…) a 1TB Exalytics server can allocate 800GB to TimesTen of which half may be allocated to store user data. The remainder is work space… so 400GB uncompressed is available for user data. You might now conclude that with 5X compression there is 2TB of compressed user data supported. But I am not so sure…

In Exadata Hybrid Columnar Compression is a feature of the Storage Servers. It is unknown to the RAC layer. The compression allows the Storage Servers to retrieve 5X the data with each read significantly improving the I/O performance of the subsystem. But the data has to be decompressed when it is shipped to the RAC layer.

I expect that the same architecture is implemented in TimesTen… The data is stored in-memory compressed… but decompressed when it moves to the work storage. What does this mean?

If, in a TimesTen implementation without Hybrid Columnar Compression, 400GB of work space in memory is required to support a “normal” query workload against 400GB of user data then we can extrapolate the benefits of 5X compressed data as follows:

x of user data compressed uses x/5 of memory plus x of work space in memory… all of which must fit into 800GB

This resolves to x = 667GB… a nice boost for sure… with some CPU penalty for decompressing.

So do not jump to the conclusion that Hybrid Columnar Compression in TimesTen of 5X allows you to put 5TB of user data on a 1TB Exalytics box… or even that it allows you to load 2TB into the 400GB user memory… the real number may be under 1TB.

Exalytics vs. HANA: What are they thinking?

I’ve been trying to sort through the noise around Exalytics and see if there are any conclusions to be drawn from the architecture. But this post is more about the noise. The vast majority of the articles I’ve read posted by industry analysts suggest that Exalytics is Oracle‘s answer to SAP‘s HANA. See:

But I do not see it?

Exalytics is a smart cache that holds a redundant copy of aggregated data in memory to offload aggregate queries from your data warehouse or mart. The system is a shared-memory implementation that does not scale out as the size of the aggregates increase. It does scale up by daisy-chaining Exalytics boxes to store more aggregates. It is a read-only system that requires another DBMS as the source of the aggregated data. Exalytics provides a performance boost for Oracle including for Exadata (remember, Exadata performs aggregation in the RAC layer… when RAC is swamped Exalytics can offload some processing).

HANA is a fully functional in-memory shared-nothing columnar DBMS. It does not store a copy of the data.. it stores the data. It can be updated. HANA replaces Oracle… it does not speed it up.

I’ll post more on Exalytics… and on HANA… but there is no Exalytics vs. HANA competition ahead. There will be no Exalytics vs. HANA POCs. They are completely different technologies solving different problems with the only similarity being that they both leverage the decreasing costs of RAM to eliminate the expense of I/O to disk or SSD devices. Don’t let the common phrase “in-memory” confuse you.

The Best Data Warehouse Spin of 2011

At this time of the year bloggers everywhere look back and reflect. Some use the timing to highlight significant achievements… and it is in the spirit that I would like to announce my choice for the best marketing in the data warehouse vendor space for 2011.

Marketing is a difficult task. Marketeers need to walk a line between reality and bull-pucky. They need to appeal to real and apparent needs yet differentiate. Often they need to generate spin to fuzz a good story by a competitors marketing or to de-emphasize some short-coming in their own product line.

Below is a picture taken on the floor of a prospect where we engaged in a competitive proof-of-concept. The customer requested that vendors ship a single rack configuration… and so we did.

But the marketing coup is that the vendor on the right, Teradata, told the customer that this is a single rack configuration and that they are in compliance. The customer has asked us if this is reasonable?

This creative marketing spin wins the 2011 award going away… against very tough competition.

I expect this marketing approach to start a trend in the space. Soon we will see warehouse appliance vendors claiming that 1TB = 50TB due to compression… or was that already done this year?

Sorry to be cynical… but I hope that the picture and story provide you with a giggle… and that the giggle helps you to start a happy holiday season.

– Rob Klopp

Greenplum and Teradata: Simliar Architecture, Different Strategies

Hardware systems, servers and network fabric, provide the foundation upon which all shared-nothing database management systems rest. Hardware systems are a major contributor to the overall price/performance and total cost of ownership for a data warehouse platform. This blog considers the hardware strategies of EMC/Greenplum: applying the idea of using common, off-the-shelf (COTS) components to build a competitive foundation; and Teradata: developing a proprietary hardware system by tightly integrating components.

Strategies

The Teradata hardware strategy is simple to describe. They expend R&D dollars to couple low-level technologies into a tightly integrated system. Their servers are custom-designed within a set of guidelines that allows both the LINUX and Microsoft Windows operating systems to execute there. Their network fabric is highly proprietary, using cycles within the fabric to offload sort/merge data processing from the server CPU.

In other words, Teradata believes that the time and effort required to engineer an integrated proprietary offering will improve the performance of their offering enough to offset the cost.

EMC and Greenplum have taken a different approach. They have elected a strategy that leverages off-the-shelf servers offered by hardware vendors like Dell or HP, and network switches from vendors like Brocade, Arista, and Cisco. They have elected to expend few dollars on hardware design and development and to leverage the R&D investments made by these other vendors. In other words, Greenplum believes that the advantages in price and performance provided by using off-the-shelf hardware provides a sustainable advantage.

Price

The lower costs associated with Greenplum’s strategy clearly provide an advantage. Greenplum does not have to expend to design and manufacture custom hardware. The manufacturing costs may not be significant, but the staff costs required by the Teradata strategy must affect the price. Clearly the Greenplum strategy provides an advantage on the price side.

Performance

The Teradata strategy has to be about performance… so lets speculate:

How much of a performance increase might their integration provide on the server-side?
How much of a performance increase might their integration provide on the network side?

In the days before there was a microprocessor based enterprise server market, Teradata could gain substantially here. Microprocessors were built for personal computing and not designed for the high-availability and high-performance requirements in an enterprise. Teradata had much to gain from building rather than buying server.

But today, there is little to gain from a highly customized design. The requirement to run standard LINUX and Windows operating systems limit their ability to innovate and the resulting servers have to be very similar to those built for off-the-shelf enterprise servers. There is little or no performance advantage here.

On the network side, there once was a distinct advantage to Teradata’s ByNet. It was both faster than available off-the shelf switches and it offloaded cycles from the under-powered CPU. Today, however, there are plenty of cheap, fast switches… so the speed advantage has disappeared. Worse still, the introduction of multi-core CPUs have eliminated the advantage of the in-the-switch sort/merge that makes ByNet unique. CPU is inexpensive these days.

The bottom line: it is unclear if the Teradata hardware strategy affords them a performance advantage.

Cost of Ownership

An argument could be made that supporting COTS hardware is inherently less expensive than supporting a Teradata cluster. But there is a more substantial savings that is clear.

Every 2-3 years, as newer Teradata technology obsoletes your currently installed cluster the value of the current hardware goes to zero and the cost of ownership goes up significantly. The costs of this are especially high when you are required to add several nodes to accommodate growth as Teradata refreshes their technology. You may have to buy servers that are already obsolete.

With Greenplum, your current cluster is built from general-purpose servers that are re-purposed with ease. In fact, since the nodes in a Greenplum cluster are usually high-end servers, customers often cycle new technology into their data warehouse and cycle the old servers out into their server farm. The result is a higher performance warehouse and full use of all of the server technology.

A Final Word

The words “proprietary hardware” are sometimes thrown around as an insult. But Teradata’s proprietary approach is based on the belief that a tightly integrated configuration adds benefit to offset the costs. Greenplum believes that today the enterprise server and the network switch vendors have matured their products to the point where off-the-shelf technology can match or exceed the performance of custom hardware… at a significantly reduced cost. You may have an opinion or you may wait to see how the benchmarks, the proof-of-concepts, and the market decide… but its interesting to understand the differing approaches.

Stop Tuning and Scan…

After years of tuning data warehouses, queries, data loads, and BI applications, I give up. In the long run it is not really possible anyway… and better still… no longer necessary. A better approach is to build your database and your hardware infrastructure to scan fast and smart. So here’s a blog on why it’s impossible to tune a warehouse… and on why it’s no longer necessary.

My argument against tuning is easy to grasp. By definition a data warehouse serves many constituencies: Marketing and Finance and Customer Support and Distribution; and these business units will each access the data from their unique perspective following a unique path through the warehouse. A designer cannot lay out the data effectively to support each access path… cannot index every column, cannot map more than one zone, cannot replicate the data again and again with aggregates and materialized views, cannot cache the entire warehouse. Even if you get it right changing business requirements will fracture your approach; or worse, the design will not support new queries and constrain your business.

Many readers will be skeptical at this point… suggesting that the software and hardware to eliminate tuning does not exist. So let’s build a model and test the state of the art.

Let us imagine and model a 25TB data warehouse with a 20TB fact table that holds 25 months of daily facts partitioned by day. The fact table is 100 columns wide and we will model two queries that reference 20 of the columns… One that touches every row and one that is date constrained and touches only 14 days of data.

Here are some hardware specs. A server with a single I/O controller can read about 1.5GB/second into the database. With two controllers can read around 2.7GB/sec. Note that these are not the theoretical limits of the hardware but real measurements taken from the current hardware on the market: Dell, HP, and SUN/Oracle.

Now let’s deploy our imaginary warehouse on a strong state of the market multi-core server with, to be conservative, a single controller. This server would scan our fact table in around 222 minutes. Partition elimination would allow the date constrained query to complete in just over 4 minutes. Note that these imaginary queries ignore the effort to join and/or aggregate data. Later I’ll have more to say on this…

If we deploy our warehouse on a shared-nothing cluster with 20 nodes the aggregate I/O bandwidth increases to 30GB/sec and the execution times for our two queries improves to 11 minutes and 12 seconds, respectively. This is the power of parallel I/O.

Now we have to factor in compression. Typical row-based compression yields approximately a 2.5x result… columnar compression varies wildly… But let’s assume 25X in our model. There is a cost to be paid to decompress the data… But since it is paid by everyone and CPU is a relatively inexpensive commodity, we’ll ignore it in our model.

For 2.5X row-based compression our big query now completes in 4.4 minutes and the smaller query completes in 4.8 seconds.

The model is a little more complicated when we throw in columnar compression so let’s consider two columnar models. For an implementation such as Exadata we get the benefit of columnar compression but not the benefit of columnar projection. 25X hybrid columnar compression will execute our two scans in 26 seconds and .5 seconds. Now we are talking! A more complete columnar implementation will only touch the columns required by our query, 20% of the data, providing another 5X improvement. This drops our scan queries to 5.2 seconds and .1 second, respectively. Smoking fast. Note that the more simple columnar compression approach will provide the same fast response when every column is touched and the more complex approach will slow down in that case… so you can make the trade off in your shop as required.

Let me remind you again… This is a full scan of 20TB with no tricks: no indices, no pre-aggregation, no materialized views, no cache and no flash, no pre-sorted zone maps. All that is required is a parallel implementation with partitioning, compression, and a columnar table type… and this implementation works. It is robust.

A note on joins… It is more difficult to model joins… and I’ll attempt a simple model in another post. But you can see that this fast scan approach has solved the costly part of the problem using parallel processing… and you can imagine that a shared-nothing massively parallel approach to joins may hold the key.

Co-processors and database machines: Intel, Netezza, and Teradata

Intel recently announced a change in direction regarding the use of special hardware co-processing to support RAID. This change is relevant to vendors of data warehouse appliances like Netezza, and to a smaller extent, Teradata, who both use co-processors to offload processing. A report on the Intel announcement is here: http://www.theregister.co.uk/2010/09/15/sas_in_patsburg/ . In short, Intel says that because multi-cores provide so much compute for so little cost it just does not make sense to build a special co-processor to support the RAID function.

What, you may ask, does this have to do with Netezza and Teradata. The answer is that both companies have built database machines that use co-processors more-or-less. And Intel’s conclusion that multi-cores obsolete co-processors significantly affects their value proposition to the market.

In the case of Netezza the significance is considerable. Netezza conceived the FPGA-based architecture when compute was expensive relative to I/O. A FPGA provided a way to add compute power at a low-cost point. But according to Intel this changed as multi-core technology matured. Further, as the Intel processing line adds even more cores, the value of a FPGA co-processor becomes less still.

In the case of Teradata the issue is more manageable. The original Teradata Y-Net and their current BYNET interconnect offloads the merge process from the CPU to the interconnect. Back when Teradata ran on 8086 cores from Intel this was significant and it formed a cornerstone of Teradata’s value proposition. Today, the value of BYNET is diminished and one could imagine Teradata dropping it altogether sometime in the future for a less proprietary network fabric.

Please do not think that I am suggesting that multi-cores make either Teradata or Netezza obsolete. They are both formidable technologies. But multi-cores will reduce the price/performance advantage that co-processing once afforded in favor of database servers that do not use co-processing like Exadata, Greenplum, and DB2.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: