Part 7 – How Hadooped is Greenplum, the Pivotal GPDB?

Now for Greenplum & Hadoop… to continue this thread on RDBMS-Hadoop integration (Part 1Part 2, Part 3, Part 4Part 5, Part 6) I have suggested that we could evaluate integration architecture using three criteria:

  1. How parallel are the pipes to move data between the RDBMS and the parallel file system;
  2. Is there intelligence to push down predicates; and
  3. Is there more intelligence to push down joins and other relational operators?

The Greenplum interface is architecturally similar to the Teradata interface described in Part 4. Hadoop files are defined to the DBMS as external tables and there are capable parallel pipes to effectively move data from the HDFS side to GPDB. In addition Greenplum uses their Scatter-Gather method to load data into the GPDB effectively.

There is no ability to push down predicates. When a query executes all of the relevant data is sucked through the parallel pipes into the database segments for processing. This is very inefficient and there is not even the crude capability to push down processing provided by Teradata.

Finally, there is no ability to push down joins or aggregation.

Greenplum’s offering is not very advanced. To perform with Greenplum analytics data must move between the two storage layers with no intelligence to mitigate the cost.

On to the last post in the series Part 8 on SQL Server and Polybase.

4 thoughts on “Part 7 – How Hadooped is Greenplum, the Pivotal GPDB?”

  1. DJ – I consider myself to be an Oracle developer, although I can do DBA stuff as well. I am an employee of CA, Inc(formerly Computer Associates). My group produces eHealth software, which is probably the only Oracle based product in all CA's offerings. The views expressed are my own and not necessarily those of CA, Inc and its affiliates. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.
    Dong Jiang says:

    I think you are not talking about the HAWQ engine, but can you confirm?

    1. Nope… Talking about GPDB, dong Jiang. But HAWQ is similar… HAWQ is basically GPDB on HDFS…. But since HAWQ lives alongside Hadoop the lack of advanced features are slightly less painful (at the cost of performing 2X slower overall).

      Rob

Comments are closed.

Discover more from Database Fog Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading

Exit mobile version
%%footer%%