[tweetmeme source=”khanderao” only_single=false]
At EMC World this week, there were few announcements of Hadoop based products. From the number of announcement, it is apparant that Hadoop’s popularity is growing among the enterprises for processing very large volume of data, typically unstructured data like web logs, social media chatters, emails and similar texts analytics. Hadoop can scale up to very very large number of nodes which are typically commodity hardwares. Hadoop is based on Map Reduce architecture which splits the jobs across the nodes and then reduces them in the reduce phase. Though Hadoop nodes are separate hardwares, there is still pre and post processing of in/out of data in a traditional SQL form is needed. Thats where many solutions as well as Hadoop’s eco system like Pig, Flume, Hbase come in picture.
EMC itself announced EMC announced Greenplum HD as a distribution and appliance. EMC Hadoop Distribution would be available around 3rd quarter both as community as well as commercial mode. The Greenplum HD appliance will combine the Greenplum database and the Enterprise Edition distribution of Hadoop on a single appliance. EMC has announced this direction few months back and very recently it has partnered with Cloudera. Of course, with this announcement, the partnership with Cloudera would come under cloud.
SnapLogic also announced Hadoop integration via SnapReduce making Snaplogic’s data integration pipeline as MapReduce tasks. This is a good way to offer Hadoop’s scalability to the SnapLogic’s cusomters. Also, Nice name, Gaurav, SnapReduce! SnapLogic is a opensource solution for ETLs ofcourse, there is a commercial Solution-training-support and consulting from SnapLogic itself. Following video gives a good introduction of SnapReduce
Since there is a good synergy for Hadoop on cloud, Mellonox, Data center connectivity company, announced acceletators for Hadoop and Memcached. It announced Hadoop product, called Hadoop-Direct on Mellanox’s InfiniBand adapters and switches. For more information
There was another product release, Brisk from a startup, DataStax. This is an interestingly controversial product. It combines Haroop with a competing open source No-SQL product called Cassandra. Traditional pure-play Hadoopers like Cloudera criticized the integration. However, it would be interestin to watch adoption. Following diagram shows the integration.
BTW I am eagerly waiting to hear more from Yahoo about its Hadoop spinoff. May be it would be announced during next months Hadoop Summit that is organized mainly by Yahoo!