Part 7 – How Hadooped is Greenplum, the Pivotal GPDB?

Now for Greenplum & Hadoop… to continue this thread on RDBMS-Hadoop integration (Part 1, Part 2, Part 3, Part 4, Part 5, Part 6 ) I have suggested that we could evaluate integration architecture using three criteria:

How parallel are the pipes to move data between the RDBMS and the parallel file system;
Is there intelligence to push down predicates; and
Is there more intelligence to push down joins and other relational operators?

The Greenplum interface is architecturally similar to the Teradata interface described in Part 4. Hadoop files are defined to the DBMS as external tables and there are capable parallel pipes to effectively move data from the HDFS side to GPDB. In addition Greenplum uses their Scatter-Gather method to load data into the GPDB effectively.

There is no ability to push down predicates. When a query executes all of the relevant data is sucked through the parallel pipes into the database segments for processing. This is very inefficient and there is not even the crude capability to push down processing provided by Teradata.

Finally, there is no ability to push down joins or aggregation.

Greenplum’s offering is not very advanced. To perform with Greenplum analytics data must move between the two storage layers with no intelligence to mitigate the cost.

On to the last post in the series Part 8 on SQL Server and Polybase.

4 thoughts on “Part 7 – How Hadooped is Greenplum, the Pivotal GPDB?”

I think you are not talking about the HAWQ engine, but can you confirm?

Rob Klopp

Rob Klopp says:

April 7, 2014 at 11:05 am

Nope… Talking about GPDB, dong Jiang. But HAWQ is similar… HAWQ is basically GPDB on HDFS…. But since HAWQ lives alongside Hadoop the lack of advanced features are slightly less painful (at the cost of performing 2X slower overall).

Rob

Pingback: How Hadooped is SQL Server PDW with Polybase? | Database Fog Blog

Pingback: Logical Data Warehouses and the Basics of Database Federation | Database Fog Blog

Comments are closed.

Share this:

4 thoughts on “Part 7 – How Hadooped is Greenplum, the Pivotal GPDB?”

Discover more from Database Fog Blog