Back in 2014, I was sitting in a meeting room in Las Vegas at EMC World on a grueling schedule of back to back meetings. As part of the patter, the temptation was to talk about “their today” and “our collective” future. I specifically recall talking about the phrase David Goulden coined called the “sea of storage.” As the saying goes, “wherever I lay my hat.” I was so into it around EMC with both feet when one specific client got rather agitated and not being too shy, he said:
“That’s so typical of EMC, always seeing storage at the center. It’s not a sea of storage but more a lake of data.” Aha I thought, that sounds much cooler… a Data Lake. I’ll rob that phrase, make it my own, and stick it into EMC’s vernacular.
Eager for the gold stars and to pop a bottle of red red wine in celebration, I found a couple of my bosses or as I like to call them the “men at work” and suggested this term. The exact response to my suggestion I could not quite repeat, but suffice it to say it can be summarized as “give it up!”
Oh, I so wish I could’ve kept my old emails as I think I stuck in writing my genius recommendation. To which I got an instant, classic snarky Jeremy Burton response. I was damaged, I was hurt. They don’t know but my inner voice was screaming, “please don’t make me cry.” I might not be one of these tough lads from a land down under or brought up fist fighting on the waterfront, but I suggested the term only for love really.
To this very date, I have no idea which client it was, what the genesis of the term was, what cure it provided or where he got it. But I still think I (almost) coined the phrase that gives my clients such a headache now.
Four years later, the stock response to any data, big data or analytics related question is – “first you build your data lake.” It’s like a flock of lunatics with simple minds who sneak into the rooms of CIO’s late at night and whisper “build a data lake” as they writhe around in a sleepy haze reminiscent of the safety dance, worrying about how to solve their data analytics quandary.
The data lake served companies fantastically well through the data “at rest” and “batch” era, but it’s rapidly becoming the Achilles heel for REAL real-time data analytics.
Parking data first then analyzing it immediately puts companies at a massive disadvantage. When it comes to gaining insights and taking actions as fast as compute can allow, companies relying on stale event data create a total eclipse on visibility, actions, and any possible immediate remediation.
DataTorrent has been helping clients unlock the abilities of analytics on data-in-motion and the philosophy of “shifting left.” The fact is that a data lake is massively important, but its creation shouldn’t be the first step.
Compute and technology advancements allow you to run your analytics, gain your insights and take actions on fresh data as events happen. THEN you can filter forward to a data lake of your making for posthumous analytics. They are not mutually exclusive. Moreover, the data lake and data-in-motion analytics should be holding hands and singing “Let’s Stay Together” as they bask in the bright sunlight of immediate insights.
So why aren’t people moving their architecture but are instead becoming victims of cyber-attacks or competitive leap frogging? Well, three main reasons really:
- The incorrect notion that the scale and complexity of the current analytics cannot be run on an even stream-in-motion.
- Nobody in the company is challenging conventional wisdom but rather they are ascribing to the notion of “fat, dumb, and happy” or “if it ain’t broke.”
- The team that manages the analytics is a different team from the stream team, and they don’t want to lose control by moving their work upstream to a different group.
I predict that in the next couple of years this argument will be moot. Either companies will get with the program and recognize that there is a world outside their data lake or their company will be vaguely remembered via Wikipedia.
Oh, and for no special reason other than I am listening to this album right now, I’ve used 20 references off the very first “Now That’s What I Call Music” Vol1 UK edition throughout this blog post. Wonder if you can find them all?
That’s all folks, just don’t forget to give your insights that New Data Smell!