Just like Google’s Map Reduce and Google FS papers had been a basis for Hadoop’s Map Reduce and HDFS respectively, another paper from Google has become a basis for new project in Apache. A project named DRILL has been recently submitted to Apache mainly from developers from MapR. This inspiration for this project is Google paper on query language, DREML, on a very large data data set. DREML has been a basis for Google BigQuery for a while.
This project would provide a ‘low latency’ query language on HDFS. Hadoop is a great platform for Big data. It is more suitable for offline / batch processing based on MapReduce pattern. However, many customers need a way to make a real time query on the data residing in the hadoop / HDFS. DRILL will address the need.
This project has just started and a first code is yet to be contributed. However, this will be an important addition to Hadoop ecosystem. This will co-exist with Hive which also provide a query access.