Part 1: Failure of open source technologies to deliver successful business outcomes
Part 2: High-level guidelines for achieving successful business outcomes with big data
Part 3: Development Pattern: Application stitched with loosely coupled big data services
Part 4: DataTorrent Apoxi Framework

Over the past decade, adopting big data using the open source software stack has become a gold rush. There is a top-down push from the C-suite to transform enterprises to be more data-driven using open source technologies. Transformational innovation in open source along with continual commoditization of hardware resources opened up opportunity for enterprises to get competitive advantage by being data-driven. There are some success stories, notably in Silicon Valley, and may be in Bengaluru or Beijing. The opportunities are real, and the gold is there for those who know how to mine it. But we are now looking at over a decade since open source driven big data projects were first adopted, and there is no viable ROI in sight.

Enterprises continue to push hard on open source and have made it a must-have for any project. The decade old mantra of cheap commodity hardware and free open source still exists, but with no viable realistic and standardized delivery model. Today big data products are mainly known for their failure to deliver business outcomes. Data-driven transformations have been very rare. The time has come to calibrate this mad rush to be data-driven, and actually find our pot of gold.

The journey for this pot of gold is not optional. Those not mastering it risk being obsolete, even as ROI remains horrible. The gold to be mined in this rush is something without which the future is in danger. Enormous data growth has squeezed enterprises out of both un-scalable legacy software and extremely costly proprietary big data software. Data is the new oil of the information revolution that fuels the future of any enterprise. Open source is and will continue to be the underlying technology that fuels this rush. The growth of cloud with Amazon and Azure leading the way helped alleviate some pressure on time to market, but has not resolved the fundamental disconnect.

This unstoppable force of trying to be data-driven has met a very slow-moving object which is the un-productizable open source software stack. In this clash, time to market has been the prime casualty. Thankfully, this slow-moving object was not an immovable object.  Time to market can be addressed as the industry learns from our earlier failures. Enterprises are now in a hangover phase; the elixir of open source has begun to fade. Voices of failures are explicit and are being focused on; those not adapting swiftly face the danger of being obsolete. At Gartner’s 2017 Data & Analytics Summit in Sydney, Gartner research director, Nick Heudecker opened with the grim prediction that 70 percent of Hadoop and Spark projects will fail to deliver their planned business outcomes. If big data software were a proprietary software developed by an internal team, after a decade of failure we would be looking at some serious career consequences for this team.

The fact is that as much as open source is a must-have, it is not the ONLY thing to have in your IT stack. Open source by itself is a “handle with care” software that needs a lot of tooling and glue logic, aka proprietary software, to make it succeed. Open source ecosystems are leading innovation in big data and cloud, and a lot of new ideas are coming to fruition, but this innovation needs to be harnessed to succeed in this gold rush.

The root cause of failure is the open source software community. The developers in open source have a habit of throwing raw code into the open and neither bothering about operability nor about productization. There are too many lab experiments going on with a callous disregard for operational issues. Such approaches barely work in densely populated areas of big data engineers like Silicon Valley, or Bengaluru, or Beijing. The failure rate is high enough in these locations, but if we look outside of these geographical areas, the failure rate spikes and time to market goes for a toss.

This dichotomy has impeded open source software adoption and stopped it from becoming mass-market. Open source has under-delivered on the promise of a data-driven journey. The gold rush has not delivered promised riches. In big data, we don’t have something like LAMP architecture that commoditized the internet front end. Something that enterprise can rinse, repeat; develop a cookie cutter and make a lot of cookies, aka successful big data products.

Blindly adopting open source has not worked historically, nor will it in the future. Chasing open source without a strategy will make the enterprise a guinea pig of new, raw, ever-changing, and unhardened software. It is time to adopt, change, and pursue success realistically. Enterprises must ask the difficult questions: “How do I operationalize and productize this software?” and “How do I extract business value from big data software?”

For any product to deliver a viable business outcome, time to market is crucial. The viability is in terms of total cost of ownership being within the SLA, both in the short term as well as in the long term. Ability to run the same software on-prem as well as in the cloud, aka cloud agnostic, is another extremely good to have feature. In this blog series, I will discuss areas to consider and focus on for being successful in this data-driven transformation aided by open source software. I will discuss a blueprint of a big data framework that was designed to get enterprises to succeed in data-driven journey in a timely manner. Success in this data-driven journey is not an optional choice. Failure means the huge risk of a competitor succeeding in being data-driven, i.e., failure means being obsolete.

Stay tuned for the second part of this blog series, where I will discuss high-level guidelines for big data products to succeed; i.e., what needs to be done to ensure transformation to being data-driven happens in a timely manner.