I have been working with Big Data/Hadoop for a while now. It has been fun. Frankly, it has been exhilarating. It is cool to talk about it at parties, with friends, in my case even my family. To see big data unfold and disrupt is an awesome experience. To be part of it is a huge added bonus. Commoditization of compute and storage is triggering massive innovation cycle worldwide.
As fun as my job is, on a personal note I had a hard time explaining how big data applies to day to day life to my kids. Engineers at times are not able to correlate a technology to day to day life, let alone big data. My previous work at Yahoo! Finance was easy to talk about. Everyone used it, they all knew what stocks are and how they matter, specially in Silicon Valley. “People check stock prices to be able to trade them”, my kids got it. Hadoop, big data, ah! now that was hard. It was geeky, techy, and my kids’ (Sharvil and Ishas) eyes would roll over with the “dad is blabbering again” look. In general big data is hard to explain to folks, let alone my kids. They knew it was cool, they had heard the buzz word, but big data was not a slam dunk for non-techies as Yahoo! Finance was.
Then it happened one day.
It was Thursday night a couple of months ago. Being in a start up, going home late is the norm. I reached home late night to find Priti (my better half) in foul mood. Internet and phone were not working. We used Comcast for both, and so far over last 4 or 5 years it worked fine. Power recycling the device once a while sufficed. She had tried power cycling, she even tried the reset pin. Nothing worked that day. We could live with land line going down, as we all had cell phones. Internet outage we could not live without. In terms of importance, internet came right after air, water, and food.
Next day, she called Comcast customer service. They did their usual remote power cycle, asked her to power cycle, reset, Nothing worked. She got bounced around to a few customer service agents. Some agents were experts, but nonetheless a few hours down the road we still had no internet access. She was then asked to replace the device. She drove over to their warehouse in a nearby city, got the device, plugged it in, set it up, no luck. Another hour of customer service calls, and she finally got an comcast service appointment set up for Monday morning 9am. Wife in foul mood, no internet, the weekend was not fun. Monday was my turn to stay at home. I waited for the Comcast service agent to arrive, by 9:30am I decided that something had gone wrong. I called Comcast customer service, and was told that my phone and internet were working so the onsite visit was cancelled. I begged to differ as I was still using my cell phone. The customer service agent was polite and we went through our routine power cycle, reset, etc. After talking to at least 5 agents, I now had a first hand view of why Priti was in a bad mood. With each agent I had to start afresh. Finally after over an hour, they gave us an appointment for Friday. That meant we were out of internet for over a week. Ouch!
That week was bad. You really know the value of a service in an outage. My family is addicted to internet. We use it to work from home at times, access emails, look up news, share photos. My kids use it for homework, accessing school records, chat, etc. We could live without TV, but internet is a must have. Moving all internet activity to mobile (cell phones) was not viable yet, and would have been too costly. I seriously evaluated switching to AT&T right away. But calmer sense prevailed and I decided to give Comcast one more chance.
The appointment was for Friday 8:30am. The agent arrived at 8:39am. He was very polite, and apologized for delay. Apparently there were a lot of houses on my street with Comcast outages, and he had gone to a wrong house. I was happy as he had shown up. He asked me what happened, and I said we lost internet and phone 8 days ago. He asked if I had ordered my Comcast service within last year or two. I answered in negative. He said he knew what the problem was. He went to Comcast box outside our house next to our garage. He opened the box and there it was, a metallic connector with female connection on one side, and male on the other. He removed it, connected the cables directly, went to his van to authorize the service, and it worked. 5 seconds of diagnosis, 5 minutes of work and we had internet and phone back up. I asked him what had happened. He smiled, and said that years ago Comcast feared usage of their cable for non Comcast services by customers. So they had installed a frequency blocker (the metallic filter). A year or two ago, Comcast executives realized that the filter was stopping Comcast itself from providing more services and their fear of piracy of cable bandwidth was unfounded. So they decided to start using other frequencies. Someone somewhere in the organization had forgotten to remove the filters. As the upstream server got upgraded to a new frequency all of us who installed Comcast more than 2 years ago lost access. He then proceeded to help the rest of the internet-less souls on our street. The remote power cycle had worked while internet was down as they had not changed the frequency used for remote power cycle.
I wondered how Comcast functioned as an organization. All Comcast servive agents, that Priti and I interacted with were polite, doing what they were supposed to do, and helped us as best as they could. Except each one of them queried our issue on a LAMP like architecture, where they only had access to our data. They did not see the pattern, that the visiting Comcast agent saw, and knew instantly. I also realized that even if the customer service agents had access to all the outages on that day, they will still have a hard time narrowing the problem down. The pattern will only show up if all the dimensional combinations were computed. Dimensions as in, the zip code, year of installation and other installation details, subscription plan combo, device type, date and time of outage, previous hardware or software upgrades, the upstream device type, the load on upstream device, the history of upgrades on the upstream device, the dns used, load on that dns, etc. If a software application continually computed these dimensions the cause of outage would have immediately shown up. Collecting all this data, analyzing it and computing dimensional combinations is big data by definition. Though it is cheaper to compute with commodity hardware now, nonetheless it is “a big data problem”. More importantly they need not wait a day to find out with batch processing, this had to be done in real-time and intercepted as quickly as possible. That night when I explained this to my kids, they seemed to get it. It was no longer the “dad is blabbering again” look. Too many things to look up and compute, you need a computers to do it and their single laptop will not be able to do all the computations. They got it
Later in the day, I thought about what Comcast should have done. I concluded that the algorithm was not complicated. They had to aggregate all their data, including real-time customer service calls, and continuously (24Ã—7) monitor. Computing all the dimensional combinations and comparing with previous trends would give them the correct answer. Software would have told them that all folks in a particular zip code, with a particular upstream server/cluster, who had ordered their plans more than 2 years ago are having outages. The customer service agents would have known the cause as alerts on a particular zip code. Each of us in the neighbourhood had talked to at least 15 customer service agents. It was a waste of resources for a root cause that should have been easily identified. There is nothing more unproductive for a customer service agent than to spend time with a customer, not knowing what was going on. In addition to this it was a very bad customer experience. I did not switch to AT&T, but someone nearby probably did. Overall the upgrading upstream server to a new frequency was a very costly operation for Comcast.
I thought some more on what Comcast should do for customer delight. Their software should run all planned changes with all dimensional combinations for all customers. Each customer dimensional combinations would evaluate the changes and give back status/impact. Comcast would then find out that this frequency change would block internet of a given list of customers. They would then contact the customers and pro-actively remove the metallic filter, thereby getting a happy customer and significantly reducing costs. They should continue to proactively monitor meta events from each customer in real time and figure out outages via patterns. In my case this was a sudden drop in dns queries from a set of devices with a given dimensional combination (zip code, and year of installation).
There you have it, big data == cost saving+good customer experience. Additionally my kids got it. Win-Win With the current pace of big data disruption, I expect changes similar to one discussed to happen soon. Meanwhile with my kids, big data is now relevant.