My Sessions in Recent Conferences in June 2013

This month I spoke in two sessions:At JAX Conference on June 5th at Santa Clara Convention Center, I spoke on Java PaaS Cloud platforms. I covered GAE, Amazon beanstalk, CloudFoundry, CloudBees, OpenShift, Heroku and Oracle’s PaaS platform. Here is the slide deck I presented.


Later on 10th I spoke at University of Washington, Seattle on Big Data Ecosystem and Application Architectures. In this conference, I covered more on Hadoop Ecosystem and NoSQL based MapReduce.  I could only briefly mention about  Storm, Impala etc for NRT / RT Querrying. May be topic for a separate session.

I will be speaking in NoSQLNOw in Aug on Data Modeling using NoSQL Databases. I would be mainly covering User profiles and Real Time analytics.


DRILL: A New Project added in Apache Incubation for Low Latency Query on Large Data Set in Hadoop

Just like Google’s Map Reduce and Google FS papers had been a basis for Hadoop’s Map Reduce and HDFS respectively, another paper from Google has become a basis for new project in Apache. A project named DRILL has been recently submitted to Apache mainly from developers from MapR. This inspiration for this project is Google paper on query language, DREML, on a very large data data set. DREML has been a basis for Google BigQuery for a while.

This project would provide a ‘low latency’ query language on HDFS. Hadoop is a great platform for Big data. It is more suitable for offline / batch processing based on MapReduce pattern. However, many customers need a way to make a real time query on the data residing in the hadoop / HDFS. DRILL will address the need.

This project has just started and a first code is yet to be contributed. However, this will be an important addition to Hadoop ecosystem. This will co-exist with Hive which also provide a query access.


GITPRO World 2012 : Best Conference for Technology Professionals and Entrepreneurs

22 Jan 2012,

Cupertino, CA, USA


Global Indian Technology Professionals Association (GITPRO) is hosting a conference on “Emerging Technologies and Opportunities for Professionals and Entrepreneurs” on 18th Feb 2012 at Palo Alto, CA. With three parallel tracks focused on Technology, Career & Leadership and Startups, this conference is best suitable for everybody from technology to entrepreneurs.


Iconic serial entrepreneur and entrepreneurship coach at Stanford University, Steve Blank would be delivering a keynote. The CEO of Persistent Systems, Anand Deshpande, would be delivering keynotes at the conference.


The Technology track is full of experts on Big Data, Hadoop, Cloud, Mobile and Social Computing. They are coming from Greenplums, Cloudera, HortonWorks, Microsoft, IBM, ThisMoment, AdMaxim, and GloMantra.


The Startup Bootcamp at GITPRO World 2012 would cover everything that an entrepreneur should know from launching a startup to a successful exit. Successful startup entrepreneurs, VCs, sales & marketing executives would be guiding aspiring entrepreneurs.


The GITPRO World 2012 has sessions specially focuses on career and leadership related topics covering various aspects like managing with influence, evolving from individual role to manager and leader, mid career accelerators & Mid-Career Switch and job opportunities in 2012.


This event would provide a unique opportunity for Indian Technology professionals for networking with fellow professionals, technical experts, industry leaders, entrepreneurs, career coaches, and VCs.


GITPRO is a global networking platform for Indian Technology Professionals for their professional and self-development and their contribution back to the profession, society, and people of US and India. GITPRO, started in 2009, has chapters in Silicon Valley, Contra Costa Valley, Seattle, DC, Denver in US and Bangalore, Hyderabad, Pune in India.

Oracle Goes BigWay in Big Data Analytics, NoSQL and Hadoop

In his keynote at Oracle Open World 2011, Mark Hurd announced new Exalytics analytics appliance that is geared to execute OLAP and MOLAP. It is for online application processing or multi-dimensional online application processing, for deriving business intelligence. Cloud, Big data are among the key themese on this year’s Oracle Open World. Oracle’s Co-president Safra Katz declared, “We are big data. We are also the cloud.”. The push on Cloud is much more significant on the background of last year’s statement by Larry first ridiculing the usage of term Cloud and then claming that Oracle is already providing cloud. But this year is the real delivery of Cloud BI, Cloud based Apps, etc. Fitting in its vision of e2e in a box, in addition to Exadata and Exalytics, Oracle announced Big Data Appliance. The Oracle Big Data Appliance integrated Apache Hadoop, Open Source R, Oracle’s NoSQL Database, ODI adapter for Hadoop and Oracle Loader for Hadoop on Linux and Oracle Java VM in a Big Box. This combination provides a good for big data processing of unstructured / strucutred data. For more on Big Data Appliance:

With the advent of NoSQL database and MapReduce infrastructures, I already thought that Oracle cannot be left behind in the latest NoSQL train. Hadoop is gaining significant traction in batch oriented applications like unstrucutred data processing, Warehousing, etc. Hadoop provides a way to distribute data and processing logic on nodes in server cluster. It takes the processing logic close to the data. Hadoop, originated from Yahoo, is based on Map Reduce architecture introduced by Google. Anyway, I predict that usage of Hadoop in Oracle stack would go adding it in Big Data Appliance . Oracle may do some acquisition in the same.

Social Media Monitoring by Governments , Intellegience Agencies and Technologies Used

Among many important phenomenon happening in the world, this year, two significant people related phenomenon are happening. One is the social media frenzy has caught up almost all part of the world and  social – political unrest is unravelling various parts of the places. This year Facebook users crossed Google users in terms of numbers and time spent on. At the same time, unexpected social unrest and political regime change happened in MidEast area starting from Tunisia, Egypt and spreading across covering Libia, Yemen etc. No doubt that the advent of social media fuelled the frenzy of these political unrests which were due in many countries where either dictatorial or single person/ party rule was continued. recognizing this correlation, many such undemocratic or psuedo-democratic countries started monitoring the social media. However, it is no more limited to those countries.

In the recent weeks when protests affected Briton, the British Prime Minister David Cameron also expressed the need to monitor (and if possible curb) social media (read comments by Facebook and others in an article in Guardian ). Interesting, right? Briton is one of the great and old democracy. Around the same period, just before anti-corruption movement brought large number of Indian’s on road earlier this week, there were news that Indian government / security / intelligence agencies too wanted to have access to Facebook and Twitter.  so, the largest democracy in the world too wanted to join the social surveillance via Social Media Monitoring. In my humble opinion, it is undemocratic act.

However, I can understand the need of Social Media Monitoring for social trend, identifying terrorism or anti-social . For example, security agencies or intelligence agencies like CIA monitors many online channels (Read news on CIA ‘s investment arm investing in Visible for social media monitoring) and must be monitoring Social media too. It would help in nabbing terrorists like Osama Bin Laden as well as detecting plots like NY bombing.

How is it done? Social media like Twitter, Facebook, as well as various blogs, forums contain a wealth of information in text forms. The information is huge and difficult to correlate, moreover it is heavily text centric, complete unstructured and huge. This need has renewed great interest in text processing, text analytics and is possible through big data computing especially leveraging Hadoop like distributed computing big file systems and databases .

Buzz around NonRelational DB

Last Saturday we (GITPRO – Global Indian Tech Professionals Association) arranged Tech Talk on NoSQL (nonRelational actually) DBs and Scaling Hadoop. It was very well attended. In the general introduction session when many introduced themselves they told their interests in Hadoop and NoSQL DB. It was nice to see a good size crowd sacrificing their Saturday evening to attend this informative session. It was more surprising to see many of them were actually users of these technology.

We (at myBantu / GloMantra ) are using MongoDb which is a document orient database. We store XML document (actually when store it is BSON in MongoDB) and queries use Scripting language for conditions. Other alternative in this class is CouchDB which is more Web-like and gives REST based access. Other famous Non-Relational (popularly called as NoSQL) we of course Hadoop and Cassandra. Both are apache projects with few very good show case implementations. However, recently when Diggs had problem and was using Cassandra, it got a bad name which is not that accurate. Anyway, Hadoop and its database called HBase are making more buzz. It was interesting news when Facebook also moved their messaging system from Cassandra to HBase. Its interesting especially because Cassandra originally came from engineers at Facebook. They used in their InBox search. There is some interesting work on Hadoop is happening in Facebook. They are the original contributors of Hive which is a data manipulation add of targeted towards implementing warehousing on top of Hadoop. While MapReduce databases created a lot of buzz around NoSQL, it is interesting that Hive and Hbase are SQL. so, when folks say NoSQL, it is actually non-Relational databases. Another warehousing related add-on to Hadoop is Pig (Apache Pig) which has originally coming out of Yahoo.

Anyway, its interestingly rapid development happening in this space and the major drive is due to the huge user generated data being handled in the social networking giants like Facebook, Zynga, LinkedIn,.. but the original credit to this concept of Big Table goes to Google from where the Map Reduce database was introduced. The space is not getting its own eco-system developed!