We live in an era where computational resources are being rapidly commoditized. Cloud is pervasive and is democratizing IT. Big Data led by Apache Hadoop is the lead edge of this revolution from an IT perspective. Just like automobiles did to travel in last century, mobile is doing to communication, and web is doing to information access in the past decade; Big Data is on track to change the way enterprise IT is done. Big Data is the compute and storage layer underneath mobile, social, internet of things, and other emerging ecosystems. They all have one thing in common; they all produce increasing amounts of data that requires increasing amounts of resources be it compute or storage. Big Data, led by Hadoop and its scale-out architecture based on commodity hardware is enabling this disruption. Today, a team of few folks in any remote corner of the world can viably disrupt existing business. Exciting times indeed, and I feel very fortunate to be part of it.
Since it’s inception over 10 years ago, Hadoop remains largely under-productized. The genesis was the usage of MapReduce at the core of Hadoop 1.0. The first generation of Big Data technology had attempted to do search indexing better. A noble endeavour, but nevertheless in hindsight an underachievement, or not visionary enough. The real and pervasive disruption required a simpler question to be asked – “What can we do with massively distributed resources?” YARN (Hadoop 2.0) heralded the advent of next generation Hadoop, post-MapReduce. YARN was still in alpha in 2012, but we knew that this was the real disruption. We asked ourselves, “What would it take to productize big data? What would it take to commoditize the expertise needed to successfully launch big data projects?” Big Data applications need to be as mass market as mobile applications are today. Big Data applications had to be easy to develop, easy to operate, and easy to integrate into our current technology stack. Additionally, they had to meet business SLA’s with low total cost of ownership, and low time to market.
With this charter in mind, we proposed Apache Apex for incubation last year, and our proposal was accepted in August 2015. As part of incubation, we were happy to see CapitalOne, DirecTV (now AT&T), General Electric, and Silver Spring Networks among the enterprises that joined our open source community. Apache Apex was blessed with great mentors, namely Alan Gates, Chris Nauroth, Hitesh Shah, Justin McClean, Taylor Goetz, and Ted Dunning. The Apache Software Foundation provided a framework to develop a fabulous community. The ASF welcomed Apache Apex with open arms as we learned the Apache way.
Today, Apache Apex is in production with customers enabling use cases including log processing, billing, big data ingestion and movement, fast real-time streaming analytics, ETL, fast batch, database off-load, alerts/monitoring, machine scoring models, and real-time dashboarding. Apache Apex is being used for both streaming as well as batch use cases. Verticals include ad-tech, internet of things, financial services, and telecommunications.
This journey would not have been possible without the high calibre of founding engineers and co-founders, namely Chetan Narsude, and Thomas Weise. We also got a great initial team, in Pramod Immaneni, David Yan, and Gaurav Gupta. A lot of contributors, and community participants helped along the way – Thanks to all of you for making Apache Apex happen. This list is not complete without Phu Hoang, co-founder and CEO, who helped navigate the business aspects of this relatively small team.
With the skyrocketing growth of Apex usage, meetups, and community throughout the world, I am excited to see what the future holds. Again, congratulations Apache Apex on this journey to becoming a top-level project.