When it comes to buying things we all love variety. Variety in the ways we can make purchases, variety in the ways we can make payments and variety in the devices we can use. We love this variety so much, that it is estimated omni-channel retail sales will hit $1.8 trillion this year.

While variety is good for consumers, it can be challenging to retailers and financial institutions when it comes to managing payment fraud. Specifically, how to prevent payment fraud in “real-time” before it happens and not just detect it after the fact.


DataTorrent is tackling the problem of payment fraud prevention with pre-built applications designed to ingest, transform, analyze, act and visualize data as soon as it is generated. The Omni-channel Payment Fraud Prevention Application addresses the challenge of preventing fraud in real time. Here’s how.

The application is built as DAG on DataTorrent RTS platform. The following image presents an architectural overview of the application.


Incoming data tuples/event goes through various processing phases as shown below,

  1. Ingestion
    Data from all your input sources is pushed onto a message bus such as Apache kafka. The application leverages platform capabilities to ingest data from Kafka message bus to rest of its pipeline.
  2. Filter
    All data tuples you receive may not be of interest or may not be in the expected formats. Therefore, to reduce processing overhead you might want to filter out few records as soon as you receive them. This phase can be used to filter out such tuples. The filter condition can be configured from outside.
  3. Transform
    Since you get data from diverse sources, Data you receive is never in the format your analytics expect. This is especially true as you get diverse sources. This is the first step of taking in unstructured data and giving it some structure for analysis later in the pipeline. The transform phase is used to standardize and give some structure to the data you receive from multiple sources. E.g. A county can be named in various ways as USA, United States, US, where your processing units expect US or a number to enable later analysis to work with integers. Such and many other transformations like field mapping, feature calculations etc could be applied in this phase. The transform expressions are configurable.
  4. Enrich
    Incoming data tuples usually do not carry all required information e.g. a transaction record will have card id, customer id, product ids etc but not the detail record like home address, income, user behavior etc. The missing values can be updated from your database so that the record has all the data needed for analytics . Those details are required to do in depth pattern analysis and other operations. The application supports multiple data sources as json file, jdbc stores etc to lookup for enriching.
  5. Rules Execution
    All above stages of processing prepares data to analyze any behavior / pattern which indicate certain event which can be acted on. Omni Channel Fraud Prevention application uses Apache licensed Drools library to do static rules evaluation. You should write rules as per your business and the execution part will be handled by the Rule Execution Operator of application. Traditionally it’s observed most of the rule execution engines doesn’t support parallel processing. Using DataTorrent RTS’s scale out feature we are providing partitioning capabilities on drools operator which will help in doing parallel processing of rules execution hence providing better performance benefits.
  6. Dimension computation
    Once data is analyzed application does dimension computation on result and stores it in queryable format. Application has many real time dashboards which can be monitored on DataTorrent RTS UI. The UI dashboards queries data from dimension compute store as well as auto metrics published by different dag operators of application. The UI provides unified view of multiple app level widgets which helps to take actions in real time. E.g. dedicated dashboards have been shipped with application for Executives, IT operations and analytics. As per role user gets relevant metrics which can be acted upon in real time. This unified view of data is otherwise hard to obtain in distributed systems.
  7. Outputs
    The processed transactions are written to HDFS to run any historical analysis in future. Also transactions are published on message bus like kafka to take real time actions like activating some workflow to take corrective actions, blocking transactions and so on. We can also integrate with alerting systems to send real time alerts to relevant users, by default we have capability to send email alerts.

The beauty of this application is, its designed with focus of having lowest total cost of ownership (TCO) and faster time to value (TTV). Which means along with providing rich feature set, we focus on productizing big data products. We have heard from our customers in past putting big data products in production is the most difficult task they have to do. Knowing this our goal is to ease life of developers and operations. For more details please check blog. Also this application is meant to run on your commodity hardware that means no need to spend extra $$ to buy new hardware. Platform architecture also makes it easy to scale in fault tolerant way as your data load increases with time, no need to do any modifications to your application code.