I have published a series of blogs discussing how adopting open source technologies in a do-it-yourself manner has caused a lot of products to fail. In this blog, I will walk through technical details of what we are launching in the RTS 3.10 release. This release is geared to help enterprises deliver successful business outcomes in a heterogeneous environment. In the RTS 3.10 release, we are launching a series of applications, data services, and features geared towards reducing time to value for customers. This is the first release that includes basic components of our Apoxi framework and has data services and applications developed using Apoxi.
In RTS 3.10, we are launching the Application Backplane which includes essential components that are required to stitch applications together using loosely coupled data services. A module is available for endpoint connectors that help developers create their own services. This module wraps up the data serialization and schema for data to be published and subscribed from the service bus (Kafka). Data service developers no longer have to work with a plain vanilla message bus and will save time by being able to focus on the data services’ business logic. Mechanisms are provided to configure (define) the schema that is leveraged by this module. Applications and services can now communicate through Kafka. Two applications can communicate their results to one another. The ability to easily exchange this data creates more valuable products. Customers now have a basic support structure to create and connect data services to stitch them into an application to get to critical business outcomes quickly.
Docker Support: We have included support for Docker in this release. Apoxi features include the ability to manage, monitor, and launch Docker-based services. Applications can now include data services that are Dockerized, allowing customers to use Docker to run their services. RTS Docker support is aimed at reducing both launch time and support cost for Docker-based services. Examples of Dockerized services in RTS 3.10 include Apache SuperSet Visualization Service and Drools Workbench Service.
Azure Support: RTS 3.10 includes full support for Azure technologies. These include Azure eventhub and Blob store. Operators that integrate with Azure are now part of the certification and hardening done for all supported operators. Tools and scripts are available for deployment on Azure HDI. From RTS 3.10 onwards, application developers can use Azure as part of their hybrid cloud.
These features in the Application Backplane will help enterprise customers reduce time to market as well have low total cost of ownership for big data products.
RTS 3.10 release includes major new data services as well as new applications that leverage Apoxi. The data services are as follows:
Online Analytic Service (OAS): OAS is an Apoxi-enabled data service for real-time as well as historical analytics. Support for integrations with other data services is provided by endpoint connectors that work with the Application Backplane. OAS provides OLAP-based analytics using Druid open source libraries. Druid implementation in OAS is hardened, benchmarked, certified, and supported. The aim is to make OLAP consumable for users as a data service without having to spend time on common operational tasks. Druid is open source (Apache 2.0 license) technology designed for sub-second queries on real-time and historical data. Druid handles low latency ingestion and fast data aggregation. Druid is a column store that uses distributed resources to scale-out.
OAS seamlessly merges Apex ingestion, scale and operational support with Druid OLAP libraries, and its scale-out architecture. The historical data store is backed into HDFS (or any DFS complaint store), and Apex checkpointing support is leveraged for full HA of Druid. OAS also seamlessly merges sharding of Druid with partitions of Apex as to ensure that data inflow part of OAS matches cubes/keys computations. OAS has no single point of failure as both Apex and Druid have no SPOF. Druid’s distributed store is leveraged as-is with Apex providing efficient data flow support to enable Druid to scale linearly. OAS runs natively in Yarn and will run natively in other distributed compute environments in the near future. Druid supports roll-up at ingestion time via pre-defined schema, that can greatly reduce memory requirements. This means that individual events no longer exist, but this is a good trade-off for analytics based on OLAP. Do note that Druid is not meant for full-text search. Druid does not pre-compute all permutations of user queries and thereby helps scale operations. Arbitrary exploration of data is supported by Druid as ad-hoc queries are done without pre-computation over the rolled-up cubes. Druid is ideally suited for OLAP and has been battle tested to power user facing analytics. A UI service that is based on SuperSet is part of RTS 3.10. This service tightly integrates with OAS. The Superset service is embedded in RTS dashboards and is Dockerized. All aspects of OAS including the UI can be configured without code changes. Developers can customize schema, dimensions used to construct cubes, amount of memory in the heap for real-time ingesting and indexing before immutable historical segments are constructed. etc.
Developers who want to further customize a feature can always do so. Omni-channel support is available as within any application, developers can ingest different data streams before the data is sent to OAS. We saw a lot of our customers already using Druid for OLAP but found that each of them had spent time and money to operationalize Druid. OAS significantly reduces the time needed to develop fast data OLAP and lowers the support cost over the lifetime of this service. In a future release, DataTorrent will support user-specified pre-computations of certain cubes and enable alerts on these. This will enable alerting without human intervention. Since the pre-computed cubes will be developer-specified, a developer or DevOps team will be able to decide the trade-off between resources and the number of cubes to alert on.
Store and Replay: An Apoxi-enabled data service is available for store and replay data. This means that replay is no longer restricted to what a message bus can retain. Messages are stored in HDFS and are available for in-order replay. The data is limited only by the size of HDFS or S3, or Azure store, which means that storage can linearly scale as the grid scales in the future. This data service is very valuable in supporting a big data product. New versions of the application, or data services within the application can be certified by replaying old data and verifying the results. A consumer data service can be taken down, upgraded and then can catch up with data by first processing the data stored during the downtime. This data service is usable for any application that is Apoxi-enabled. This service can also be used to sync up data on other grids, including on hybrid cloud. Another use case is to use this data for another data service application as customers now have a golden source of data that can be replayed in-order.
Machine Learning: In RTS 3.10, we are releasing support for machine scoring. This support includes full support for PMML and Python-based models. This feature integrates very well with machine learning infrastructure. DevOps can run multiple machine scoring data services covering different models or even different versions of the same model. The time it takes to roll out a PMML-based machine scoring data service is under a week for an average big data application developer. This code is operable and inherits all operational features available in DataTorrent RTS. PMML support is the first step towards full-fledged support for machine learning in DataTorrent products.
Drools Workbench: In RTS 3.10, we are including a Drools Workbench Service. This makes it easier to modify CEP rules and launch them into production. The UI and support for the life cycle of rules help with time to value and lowers the support cost. The Drools Workbench is a Docker-based service.
Frame Dashboard Widgets: This feature allows customers to embed 3rd party UI and visualizations into a single dashboard. Superset UI is an example of an embedded UI. The ability to embed 3rd party UI enables developers and DevOps teams to have more data in a single dashboard. This ability greatly improves ease of use. The feature of i-framing UI widgets to embed them in third-party dashboards already existed.
Fast big data applications included in RTS 3.10 are as follows:
Omni-channel Payment Fraud Prevention Application v2: The previous version of our fraud application was not Apoxi-enabled, i.e. it was not constructed using loosely coupled data services. In RTS 3.10, the enhanced Fraud Prevention application (v2) was developed using Apoxi. The OAS data service is also integrated into the Fraud Prevention application. OLAP analytics are now available for business users for both real-time and historical analysis. Fraud v2 includes support for stateful rules, which enables prevention of fraud patterns that require knowledge of recent transaction history of a user. UI dashboards have been greatly improved with the addition of SuperSet visualization service. The Drools workbench service has been added to enable ease of creating fraud rules in this version of the Fraud Prevention application. With support for machine scoring, customers will be able to run machine learning models to detect fraud.
Online Account Takeover Application: This is a new Apoxi-enabled application available in DataTorerent’s AppFactory. It enables customers to do omni-channel processing of multiple data streams in order to identify and prevent account takeover. Multiple streams of data can be processed with account takeover rules being run on the streams. Each of these streams is ingested, parsed, and enriched. Users can add more ingestion data services to include more sources. Rules are then run on this data to detect account takeover. This application includes the OAS data service, Superset UI for query and drill down, and integration with the Drools workbench for editing and managing rules. The Account Takeover application was stitched together with similar data services as in v2 of the Fraud Prevention application. This is an example of how quickly new applications can be created from available data services. The majority of the work was for the configuration of schema and services to make complete applications. Integration of data services was done by using endpoint connectors available in Apoxi. The store and replay data service is also available in this application.
Retail Recommender (reference architecture): We are including an Apoxi-enabled reference architecture for retail recommendations in RTS 3.10. This application demonstrates how machine learning can be used to make real-time retail offers on web and mobile channels. The application also shows how machine learning can be used to deliver more applications within the RTS platform by utilizing the Apoxi framework. This application showcases examples of UI dashboards and has integration with online analytics service (Druid) as well as the Superset visualization service. This architecture also showcases omni-channel processing of two data streams, both of which send data to the same OAS service. We collaborate with our partner, Mindstix Software Labs, to develop the reference architecture for a retail recommendation application. Mindstix will work with customers to provide software services for use cases similar to a retail application. Mindstix will also provide data science knowledge for a recommendation engine. Customers can then use RTS 3.10 and the included Apoxi framework to operationalize and launch these applications in a timely manner.
At DataTorrent, we relentlessly focus on reducing time to market for our customers and ensuring that post-launch customers are able to operate their applications viably. RTS 3.10 is a significant step towards this vision. It has the first full release of Apoxi and includes services that cover all important parts of our fast big data stack. Customers will be able to put together a fast big data ETL application within a few weeks. They can reuse various ingestion data services from AppFactory or create their own. OAS provides ready to use OLAP analytics along with Superset visualization service. Common IT data services like store and replay, cloud connectivity, full support for Azure and Amazon, and machine learning support makes RTS 3.10 a very robust product for developing fast data applications. In future releases, we look forward to more data services, and more applications being made available in our AppFactory.