This weekend wifey and I spent time cleaning a bookcase. Yup, I know it’s not the most exciting task but some things just need to be done. Amongst the books on bee keeping, chicken breeding, and idiots guide to raising Bulldogs were a bunch of business related books.


Now, I am not a big subscriber to business or self-help books, but I stumbled on Malcolm Gladwell a number of years ago and lapped up The Tipping Point, literally couldn’t put Blink down and Outlier was quite a revelation. Suffice it to say I am a fan of this series.

This find triggered a total recall from a meeting earlier in the week where a prospective client was comparing Apache Apex to another stream-based engine suggesting that it was cool but really didn’t have the same level of operational capabilities. So this very thought popped into my head on the 10,000-hour rule and frankly speaking, never a truer word spoken here.

This isn’t to say other comparative products aren’t good, but they are simply technology drops in a bucket and not designed to be a scaled “lights out” production-ready product.

The production-ready product is entirely due to the two founders of DataTorrent and the Creators of Apache Apex: Phu Hoang and Amol Kekre.

So where did they accrue these 10,000 hours? It comes from a combination of Yahoo! in the formative days being one of the pioneers in Big Data and the thrust that now drives the frontline of analytics.

They both met previously at a company called Escalade, Phu went to Yahoo! as engineer #5 becoming part of the Yahoo original mafia and brought Amol in a couple of years later. Phu ended up steering Search, Advertising, Commerce, Sports, News, and Finance for Yahoo! While Amol birthed Yahoo Finance, their Real-Time data architecture.

During this tenure, Hadoop was born from many workloads across search and ads, while finance demanded real-time streaming and the website’s 24/7 operations DNA.

Following this uncharted journey that spawned many of the components we take for granted as table stakes for Big Data analytics, the guys truly understood what it meant to stand-up a scale-out environment fast, make it bullet proof, and as cost effective as possible to run.

So moving this DNA into DataTorrent, you get subtleties and nuances that all add up to a rocking platform to build your Big Data-in-motion analytics and action applications!

Rather than more diatribe, here’s a few examples to whet your appetite… at this point, I will just drop the mic and leave the building for the closing credits:

  • Deliver products and applications not a toolkit of stuff
  • If you need PS or contract work post deployment then we failed you
  • Any changes to any parts of the code or additions of data services must inherit the platform’s properties (1st class citizen), fault tolerance, global schemes, UI, visualizations, security, etc.
  • Complete separation of functional and operational code enabling updates without a hiccup to production running systems
  • Pre-built templates/pipeline and a solution factory to reduce development cost
  • Both a low level API for operations and a high-level API for dev
  • Security is built in and simple to integrate
  • Full benchmarking suite
  • Extensive hardened operator library and connectors
  • Full schema support
  • Application and operations metrics, and built in historical view/replay