‘And te tide and te time þat tu iboren were, schal beon iblescet’
Yup that makes absolutely no sense to me either but it’s pre-English language and is the first literal use of “time stands still for no man.”
Things seem to be moving at an increasing pace now, and nothing epitomizes this more than the explosion of the internet. I heard a description of the evolution of the internet recently, and although everyone has an interpretation it did make me go hmm…
And I quote:
“The internet was formed when people had a need to join companies together electronically e.g. daisy-chain or inter-company-network or perhaps an inter-network. We then iterated on the granularity of connections being computers then laptops then PDAs, and now pretty much everything can be a connected device.”
Not exactly earth shattering but it did give me a flash of analogy to the Richter scale where each advancement has a pervasive, invasive, and exponential impact.
We all spend inordinate amounts of time trying to make sense of things both personally and professionally with varying degrees of success. Sadly, the simple days are over: laying in a grassy field in the south of England listening to the “Dark Side of the Moon” and wondering if I was going to be an astronaut or Evel Knievel. Kinda makes me wonder why I was so impatient to leave school and become an adult.
The fact remains that we live in a complicated world. From a professional perspective, we need to make sense of things faster than ever in an even faster paced, more verbose landscape. To do this, we have to perform analytics like our lives depend on it.
I signed up to the real-time streaming analytics movement figuring people need a decision in the NOW, based on data in the NOW. But this is only one leg of the stool.
The second leg is around doing complex analytics on a small compute footprint aka seeing clearly in the fog or where you can’t just chug down a big gulp of Data Lake and need to process in a stream. The third leg is utilizing the parallelization capabilities and run a dizzying amount of data slicing and crunching (dimensional compute) in a very, very short period of time.
I just sat with an engineer who concluded a successful POC for a client who wanted to run a data enrichment process to provide a current viewpoint or healthcare customer-360. Their test was ingesting multiple streams of data from partners and enriching it with historical data to support downstream slicing and dicing. A record size would increment at each hop as more contextual information is added to it. The text involved supporting schema evolution, the ability to add and change rules and to load the data to a NoSQL store, so customers and business users get a unified view in real-time. Their test data size was 1m records and 67m records.
The results showed our nearest competitor on a million records was complete in 57 seconds and DataTorrent in *nine* seconds. We completed 67 million records in 17minutes and 17 seconds, and our competition just could not complete.
So worth a little brag, but in reality, data ingest and analytics explodes from a compute perspective.
This one client wanted to ingest 67million patient records then keep that updated with stream data on all the enrichment axis, run rules on it and spit it out to NoSQL store along-with enabling in-stream visualizations.
With data explosion, most micro-batching systems do not keep up due to incremental scheduling and non-incremental recovery.
In a scale-out architecture, the philosophy of treating a batch as an unbounded stream just does not work.
At best, it leads to incremental lags as data volumes and cardinality goes up, and at worst it leads to continuous failures triggering recovery from the last checkpoint state, sending an already overworked scheduler into a tizzy.
Streaming data analytics is becoming part of our daily vernacular. For what it is worth, by the end of 2018, I predict that nobody will settle for just performing analytics on data at rest. Why have latency in insight when there are no drawbacks? Mind the Gap people!