Facebook open sources Presto for interactive query over peta bytes of data (web scale)

Facebook today announced Presto as a distributed SQL query engine optimized for ad-hoc analysis at interactive speed. Facebook developed even though Hive provides a query interface over big data because of a need of having lower latency interactive queries over web scale of data that Facebook has. By looking into the results, it seems that Facebook engineering has successfully able to accomplish this mission . It is deployed in three (may be more) geographical regions with scaled a single cluster to 1,000 nodes. This is being used by 1000+ employees firing more than 30,000 (up 3000 since June when Presto was first time revealed at WebScale event) over petabyte of data.

Image

 

So, what does it do differently?

1. Optimized query planner and scheduler firing multiple stages

2. Stages are executed in parallel

3. Intermediate results are kept in memory as against persisting on HDFS thus saving IO cost

4. Optimized Java for key code directly generating optimized byte code.

More importantly, when contrasting with Hive, Presto does not use MapReduce for query processing.

One thing I liked about Presto is that it is built on a pluggable architecture. It can work on with Hive, Scrub, and potentially your own storage. That opens up a good opportunity for its adaption. Of course, we need to compare this with Impala from Cloudera.

In the Web Scale conference at Facebook menlo Park office back in the June, it was told that Presto would explore probabilistic sampling for quicker results at error (concept that BlinkDB implemented). I am not sure where is it, however, BlinkDb already supports Presto in addition to Hive and Shark.

 

Code and docs:

http://prestodb.io/

https://github.com/facebook/presto

Advertisements

Facebook Stream Changes through Worm Hole

Facebook engineering commented on its in-house implementation of Web Scale change propagation messaging infrastructure called Worm Hole. Essentially it Tails on a DB / FS updates and publisher propagates them over streams across data center.

It seems to support some basic messaging Properties:

1. Reliable In-Order Delivery

2. Automicity

3. Low latency

with some web scale abilities:

1. Data Partitioning

2. Rewind in Time

Facebook engineering Listed Advantages as:

1. Volume: Wormhole processes over 1 trillion messages every day (significantly more than 10 million messages every second).

2. Ability to deal with failures

3. Ability to scale up / down

4. INtegration with monitoring systems

5. Cache synchronization across data centers in a shorter span

6. Lowe consumption of resources (Quoate: Compared to the previous system, Wormhole reduced CPU utilization on UDBs by 40% and I/O utilization by 60%)

This is not yet open source.

Does Paul Celiga Deserves 84% of Facebook?

[tweetmeme source=”khanderao” only_single=false]

It may be possible that MarkZ had made an yet another grave mistake in 2003 in signing contract or sending this email to Pal Celiga about Facebook ownership. If the email is true and valid in legal terms, Celiga may claim 84% of Facebook. Wow! The last time I checked Facebook is valued 65billion. Now, keeping legality aside, do you think that Paul should get such a share or any share?

Vote if you want in this poll

http://linkd.in/i3CYtH

other wise comment on what do you think what is appropriate..

Now That Microsoft Sued Google, Which Is the New Evil Empire?

[tweetmeme source=”khanderao” only_single=false]

Ever since I landed in the Silicon valley in 1993, I always read about aggessive tactics that Microsoft followed which were unfair to its competitors. There always have been anti-trust / abuse of dominance related cases against Microsoft. It is common observation that dominance and power result into bullying. So, in other words, the empire which is blamed for bullying is the most powerful and (may be ) is an evil empire. As an example, Microsoft had been sued by N-number of times in the browser wars.

Now that Microsoft complained about Google against its  unfair bullying practices in European Union, the table seems to be reversed. Can we now affirmatively say that power shifted to google? may be, But how long there will be a dominance by a single player? Unlike, the previous ‘near-to’ monopoly by Microsoft, the current scenario is not much polarized as it seems to be. Apple has overtaken Microsoft to become the largest technology company. Google dominates the information world and ad-revenues. But Facebook has overtaken google in-terms of the number of users spending time on the site. Each one of them would be blamed for bullying. However, it is good to have the healthy competition.

Google Finally Added Social Search via +1 (aka Like)

[tweetmeme source=”khanderao” only_single=false]

Finally Google launched a way to tag search results consumer liked. Its with +1 counter. The Liked result would be shared in the consumers social circle as well as potentially be used for public. This way, google added social concepts in its own search. Till now the concept of social search was understood as searching the social media sites for what your friend might have expressed or liked. Now, you or your friend can share the liking on the search results itself. The “liked” results would appear on consumers profile page which can be created in https://profiles.google.com/. This may look like simple but do not take it lightly over a period of time Google may expand the usage of profile page. It may add images, videos, chat, etc etc. and that would be an attack on Facebook. BTW this profile page is similar to what we have Users Profile page in myBantu. Though Google did not say how the likes would impact the search results, just like what I do in mybantu, I am sure Google would rank the liked results higher. And it would also target Blekko ‘s attempt of User’s friends assisted search, just little bit differently!

Applications Of Google Wave

Google Wave (large screen shot) seems to be larger than just describing  as “Communication and Colloaboration”. An article in CIO namely, A New Kind of Mega-Application,  nicely covers it.  According to the author, it has flavours of Facebook News Feeds, Twitter, instance messaging etc. To quote directly, “Wave seems to embrace this streaming interface by using e-mail and messaging as a starting point. In one fluid view, a Wave homepage includes short messages (think: Twitter), communication with large groups (think: Facebook) and basic collaboration tools to engage with the content (think: instant messaging and e-mail).”

However, not everybody agree that this runtime integration of comunications like email, messaging and document sharing would fly immediately.  For example, while agreeing that wave is a neat idea, an author in BusinessInsider says, “Google Tries To Rewrite Email, Won’t Happen Soon” .  As per author, a support from other vendors of email, instance messaging, social networking needs to adopt this to make it more wider and meaningful.

Since Google Wave is open source and it provides extensibility apis, many business opportunity would come up. Innovators may come up with nw collab tools for enterprise market, hosting providers may comeup with Wave hosting, and there could be extensions to existing apps like FaceBook app or etc. A blogger,  Gabor Scelle, nicely presented his thoughts around the business opportunities around Google Wave.

From some established corners, there is a guarded welcome. Cisco says that Wave completes Cisco approach on Unified Communication! Wave validates their vision! It would be interesting to see how Facebook(Social Nw/ Feeds), Twitter(Microblog broadcast) or Microsoft (Outlook), IBM(Lotus Collab Suites etc)/Oracle (Webcenter)responds.