Databricks joins Amazon and Google for providing Big Data Analytics on Cloud

Today at Spark Summit, Databricks CEO, Ion Stoica , announced its first product that is DatabricksCloud. This was one among two important announcements from Databricks today. The first one was that Databricks got series B round of funding 33M from Andreeseen Horowitz. But IMO the availability of DatabricksCloud is more significant.

A few months back, I had an opportunity of have teleconference with Ion. During the discussion, while talking on business model, he emphasized more on Databricks’s Spark certification service. He was (rightly though) ambiguous on other developments and business model. After the call, I discussed with a colleague about a possible product around IDE for programmers and data scientists and monitoring of Spark clusters. But what we saw today was much better.

Databricks Cloud has put Apache Spark on Cloud. Big Data on Cloud is not a new thing. However, what Databricks has provided is an interactive, SQL based Web tool for Data Scientists to play with data and visually see the output in different forms like trends, charts, etc. It also provides a powerful WYSIWYG dashboard builder.

With making Spark and Spark streaming available on Cloud, Databricks joins Google and Amazon, both of them have streaming services with analytic stack available on cloud for building real time analytics and dashboard. However, the key difference is that Amazon and Google built those services for programmers to write streaming applications. In contrast, Databricks Cloud is more suitable for Data scientists.

Databricks Cloud has web based interactive tool , called Databricks Notebook. Though the details of the technology it built on is not yet available (in fact Databricks website is still silence on the announcement), the concept and look & feel is astonishingly similar to ipython’s Notebook. And name is similar too!! Is it reusing the Rich client from iPython’s Notebook? Of course, it also seems to be different. A few big differences are: the Databricks Notebook heavily demonstrates the power of using SQL on data. It also make the power of machine learning available to the data scientists. Anyway, looking forward to get to see more details about the service, and pricing.

Anyway, for last couple of months, I was exploring a business viability of Spark Analytics as a Service on Cloud. It just got killed! Good that it happened earlier than later 🙂


Cloud Based Services Outages Becoming An Issue

Last week was a bad week for Microsoft and Google Cloud apps. Microsoft’s online services infrastructure experienced outage affecting some customers in North America online. It caused interruptions in Office 365 and various Windows Live services for a few hours. Coincidently on the same day, Google’s cloud productivity service, Google Docs, went offline for some time.

Google on the outages: The [Google Docs] outage was caused by a change designed to improve real time collaboration within the document list,Unfortunately this change exposed a memory management bug which was only evident under heavy usage … We have assembled a list of steps which will both reduce the chance of a future event, decrease the time required to notice and resolve a problem, and limit the scope which any single problem can affect.

Microsoft on the outage: Microsoft became aware of a Domain Name Service (DNS) problem causing service degradation for multiple cloud-based services, A tool that helps balance network traffic was being updated, and for a currently unknown reason, the update did not work correctly. As a result, the configuration was corrupted, which caused service disruption. We are continuing to review the incident.

Amazon: Few months back, Amazon had faced similar outages. In the second week of Aug, Amazon EC2 and RDS outage had impacted Netflix. In the month of April, Amazon had hit with serious outages for which Amazon had to apologize.

Solution? so far, the outages are low and quickly addressed. so, this outages are an issue that is being discussed but so far not adversely affecting adaption of cloud . However, it does raise a need to come up with a solution to mitigate such risk. Standardizations of cloud platform and having a standby on private cloud or secondary provider could be possible solution.