DRILL: A New Project added in Apache Incubation for Low Latency Query on Large Data Set in Hadoop

Just like Google’s Map Reduce and Google FS papers had been a basis for Hadoop’s Map Reduce and HDFS respectively, another paper from Google has become a basis for new project in Apache. A project named DRILL has been recently submitted to Apache mainly from developers from MapR. This inspiration for this project is Google paper on query language, DREML, on a very large data data set. DREML has been a basis for Google BigQuery for a while.

This project would provide a ‘low latency’ query language on HDFS. Hadoop is a great platform for Big data. It is more suitable for offline / batch processing based on MapReduce pattern. However, many customers need a way to make a real time query on the data residing in the hadoop / HDFS. DRILL will address the need.

This project has just started and a first code is yet to be contributed. However, this will be an important addition to Hadoop ecosystem. This will co-exist with Hive which also provide a query access.

 

Advertisements

One thought on “DRILL: A New Project added in Apache Incubation for Low Latency Query on Large Data Set in Hadoop

  1. Phoenix is used internally by Salesforce.com for low latency queries in the order of milliseconds for simple queries or seconds when tens of millions of rows are processed, according to the project’s description. Phoenix it is not used for map-reduce jobs as HBase is made for, but rather for accessing HBase data through a standardized language.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s