The following are the learnings from the podcast:

  • Ion Stoica from Berkeley started MESOS as a class room project
  • Cluster management to support multiple Hadoop frameworks
  • What to build on TOP of MESOS?
  • First commit 2000 lines of code
  • Workloads first Hadoop was not good enough
  • Spark - a nice component in Hadoop ecosystem
  • Startup- real time stack and historical stack. Difficult to maintain two separate code bases
  • Historical data analysis with Hadoop- Try to recompute the metrics
  • Basic requirements was to enable real time queries, iterative machine learning on TOP of Hadoop that was essentially a distributed batch processing engine
  • Started working on it from 2009
  • First spark summit in 2014
  • Students made it possible
  • Databricks company was lunched and first spark Summit was organised in 2014
  • Align the incentives of the students with the project
  • Spark becomes Apache project
  • 2012 Spark tutorial
  • Work with data software companies to sell Spark
  • Make it easy to develop on TOP of it
  • Added Scala, Java, Python, Machine libraries, Graph libraries