One of the most common use cases for Hadoop is for an organization to deliver personalized products, services and offers to their customers and prospects.   In order to do so a 360-degree view of the customer needs to be created. The process  usually starts with aggregating data across all the  customer touch points in a Hadoop cluster and then  creating on demand data fusions from multiple input sources for analysis.

The input sources can be of wide variety from CRM, ERP, and other operational systems but the key is to cover end to end customer lifecycle  – from acquisition to retention  e.g. To find  correlations between customer spend & customer service, customer sales transaction data needs to be fused with customer  service data. The Hadoop ecosystem has evolved in terms or both visualization & analytical tools that  enable the use of the fused data as inputs to iterative & machine learning based data analysis. The result of this analysis is usually very revealing and I have seen numerous instances where ‘micro-segments’ are uncovered.

There could be tens or hundreds of these micro-segments each with some very distinct characteristics. However, they all  share one attribute in common ‘ they all existed only for a finite time period and are useless if they are not leveraged in that fleeting window of time.

This raises an important question ‘

Once the individual micro segment characteristics are known, how to identify and take action on the segment at the right time?

This question can be answered by a dynamic segmentation streaming analytics application  that segments the incoming data stream (grouping operation) based on some condition or pattern that  characterizes the micro-segments.

 The dynamic segmentation streaming application has three critical functions

1. The triggering criteria ‘ Knowing WHAT characteristics constitutes a micro-segment.  The solution needs to support multiple ways to trigger a new segmentation  mechanism

A. External triggers ‘ This approach is usually used when the real-time data stream being analyzed does not contain enough information to constitute the defining criteria of the micro-segment. Hence, the micro-segmentation criteria need to be loaded into the streaming application from an external datastore. The criteria is then stored in-memory of the streaming application and are often refreshed based on the offline analytics being done.   A great example of this is a streaming application analyzing a stream of sales transactions alongside a segmentation criteria defining a set of customers or products which is dynamically injected into the application at runtime.

B. Rule based trigger ‘ This approach is used when the real-time data stream being analyzed contains all the data that can be used to look for ‘patterns’ that define the micro-segments. Which patterns to look for are usually calculated off-line and are pre-defined into the streaming application. The streaming application continuously analyzes the incoming data against the patterns to look for the micro-segments e.g. all product categories whose sales exceed $1M in a 5 min window.

C. Machine learning model driven trigger ‘ This is the most sophisticated approach. A machine-learning model (e.g. R based model) is invoked with the real-time data and its output score is used to determine which micro-segment a given data point falls in.

2. Segment creation ‘ Grouping the events into micro-segments on the fly

Once the segmentation criteria is known and micro-segmentation triggered, all the events that belong to a particular micro-segment often need to be routed through a particular set of processing steps that prepare them for being acted on. These preparatory steps can and do vary from one micro-segment to another. This requires the streaming solution to dynamically adapt to the event grouping & processing needs of the micro-segment.

3. Invoking actions ‘ Taking actions that drive business value as soon as the insight is gained

All dynamic segments will be useless until a business action is taken to capitalize on them. ‘Taking Action’ can mean very different things to different organizations. Some of the common actions we have seen are-

a. Flagging the micro-segments on a dashboard for analyst action

b. Invoking an external business process

c. Sending an alert to a operational system

Our customers are already using the DataTorrent RTS platform to enable dynamic segmentation. Here are three capabilities that are very unique to the DataTorrent RTS platform and extremely foundational in enabling dynamic segmentation as described above

1. Auto scaling & dynamic partitioning ‘ As you start segmenting dynamically, you might have different memory requirements for the operators tracking the segments (every micro-segment size might be different based on the segmentation criteria). As a result you want the streaming solution to be able to dynamically partition and scale to accommodate the changing memory & processing needs of the micro-segments

2. Dynamic modification ‘ The triggers for the micro-segment criteria need to be updatable on the fly as the external trigger can change the micro-segments for example, from being based on transaction size to being based on channel type. The trigger would typically be executed in a ‘partitioning operator’ that will initiate the partitioning of the input stream based on the micro-segmentation criteria defined by the trigger. The micro-segments are then routed to operators that do various operations like aggregations. The time windows over which the aggregations are done can vary dramatically from one micro-segment to another.

3. NxN dimension computation ‘ This is an extremely sophisticated operator that is available out of the box in the DataTorrent platform. It allows users to identify various combinations of the fields from the input data stream upfront so the streaming application can continuously calculate various aggregations on those combinations. As an example, if the input event has then this operator allows users to continuously calculate total sales by product category as well as by channel type over the last ‘X’ minutes. These aggregations can then be evaluated for patterns or thresholds to dynamically trigger micro-segmentation.

At Hadoop World NY we will demonstrate creating a dynamic segmentation application from scratch with just a few clicks! Come visit us at booth # 511