Part 1: Failure of open source technologies to deliver successful business outcomes
Part 2: High-level guidelines for achieving successful business outcomes with big data
Part 3: Development Pattern: Application stitched with loosely coupled big data services
Part 4: DataTorrent Apoxi Framework

In previous parts of this blog series, we discussed how applications built from loosely coupled big data services help enterprises reduce time to market. In this fourth, and final, part of the blog series, we will evaluate a framework designed for developing applications by stitching together loosely connected data services.

Loosely coupled applications rely on a lot of meta-services that make them viable, including glue logic, connectivity contracts/protocols, re-use of certified data services, and visibility of big data services, etc. DataTorrent’s Apoxi framework is one such software framework that includes all the software and tooling required to ensure that an application stitched from loosely coupled big data services gets into production quickly and is viable from a total cost of ownership perspective. Apoxi operationalizes fast big data products to enable enterprise customers to achieve business outcomes. DataTorrent’s AppFactory contains numerous data services developed, hardened, and certified using Apoxi. I will cover details of DataTorrent’s AppFactory in a future blog.

Let’s look at the various parts of DataTorrent’s Apoxi framework that are required to put together a loosely coupled big data application. These parts include support services and repositories, and are as follows:

  • Message/Service Bus: At the core of an application put together by loosely connected data services is an efficient data exchange/transport service, commonly known as a message bus. A bus to which a data service can continually publish data, and on the other side a consumer can subscribe to (consume) this data. It is possible that no consumer exists at a given time. A message bus is not something new. IT teams have operationalized message buses for multiple decades and have deep operational expertise in supporting a message bus. By using Apoxi, DataTorrent is able to leverage a message bus with the perspective of enabling low time to market for big data products, and leverage DevOps expertise that already exists in current IT teams worldwide. As a data exchange mechanism, in addition to a message bus, we do need other ways of providing data. For example, some big data services may make web services (ports, protocols, formats) available to other big data services. Apoxi enforces the discipline and support services required to make the data exchange seamless, and operable. The term used within Apoxi is a “service bus” to denote the fact that some (though small) portions of data exchange may be done outside of a message bus. The service bus of the Apoxi framework includes in the operational aspects referenced earlier.
  • Connection Endpoints: Application developers need a lot more than a plain, vanilla message bus for putting together big data applications. Apoxi includes support for connecting endpoints and an API for accessing the service bus. These help in hiding the nitty gritty details of the message bus and enable developers to concentrate mainly on their business logic. The connection endpoints leverage other tools and services in Apoxi to make this a seamless experience. By providing a discipline and an API for the service bus, Apoxi takes away a lot of operational issues that crop up due to applications developed by a “do-it-yourself” approach. DataTorrent’s AppFactory includes connection end points for cloud with both Amazon and Azure core technologies covered.
  • Fault Tolerance and High-availability: These operational tasks are much easier to do when the task can be reduced to launching a big data service that consumes or publishes data onto a service bus. AppFactory includes data services that are pre-certified for fault tolerance and high-availability. Apoxi provides fault tolerant connectors (connection endpoints) that hide the service bus underneath and drastically reduce time to develop a big data service. The ubiquitous presence of the message bus and web services in the IT stack means that DevOps already has the ability to operate a service bus and know the nuances of high-availability protocols. Thus, Apoxi does not add any new complexity for the DevOps team to learn. In addition to basic support from Apoxi for high-availability, DataTorrent’s AppFactory includes a lot of highly available data services and applications that customers can leverage to reduce TTV. All artifacts in the AppFactory are hardened and certified for high-availability.
  • Hybrid Cloud: Another very powerful aspect of a service bus is that the big data application is not be constrained within a data center in the present or in the future. It is relatively easy to construct a multi-data center big data application. Hybrid cloud is no longer an anomaly within the enterprise infrastructure. A service bus functions as a logical unit that enables loose connections between data services in different data centers. Given the big data requirements of the future, including not fully knowing which application or which data service (part) of the application will be on Amazon/Azure, a service bus is critical from a supportability and agility perspective. Apoxi includes a productized service bus at its core. Additionally, Apoxi and AppFactory are cloud agnostic, and have full coverage for Amazon and Azure. Users can easily decide which cloud to use for a data service. It is also relatively easy to migrate a data service from on-prem to the cloud, or between two clouds, or from cloud back to on-prem. Hybrid cloud is also critical for edge computing. The big data stacks with a data lake as the central computing hub are failing rapidly, and the next generation big data stacks rely on hybrid cloud which supports edge computing. Doing analytics close to where data is generated, and not moving all raw data to data lakes is critical for timely business outcomes. Apoxi is ideally suited and a strong enabler for such a setup.
  • Service Repository: For any developer to reuse data services or to connect to them, these services must be discoverable. The consumer needs to know all possible metadata required to successfully consume the data. This includes the service name, message bus topics, schema, version, latency, throughput, type, etc. All of this is stored in a repository, namely a “service repository.” A lot of times applications do not work because of a lack of knowledge of data services or errors in configuring their connectivity. A repository of all the services provides this data and helps remove these type of error(s). It also serves as a source of truth and enables multiple teams to work together. A monolith application relies a lot on all expertise being within a team. Meanwhile, an application made out of loosely connected data services enable enterprises to leverage expertise of each team member to get a better and faster outcome. A service repository enables multi-team product development by setting up contracts under which each team develop their data services for consumption by other team(s).
  • Schema Repository: Data services ingest data and, upon transformation, make it available to other data services to consume. This means that the consumer and the producer should be able to handle the schema of the data. Just as a service repository helps one to know the existence of a service, a schema repository helps each data service know the schema of data on the message bus or even on a webservice. Apoxi includes tools and software that help automate this part and free up developers from spending too much time on schema conversion. A developer can thus focus on the business logic and not spend too much time on schema issues. Apoxi helps with versioning and backward compatibility of schemas too.
  • Metrics Integration: DataTorrent’s Apoxi framework includes a metrics platform. This enables all data services to generate, store, and make application and system metrics available for analysis and real-time consumption. Metrics are the critical meta-data of big data applications that are the eyes and ears for DevOps. This data is the most critical part for DevOps to launch a product on time and keep the cost of support feasible throughout the life cycle of the product. This is a very critical part of the supportability of big data products. It directly makes business value extraction viable. First generation big data technologies were very weak on metrics as they frequently came as an afterthought. The metrics platform has a simple read and write API, enabling data availability on every data service in a time-series manner. Tools are available for DevOps to integrate this data into their current monitoring stack. The ability to manage and monitor big data products with current monitoring tools and technology is an important facet of reducing the cost of operations, and the time it takes to launch a product. Details of the metrics platform will be published in a later blog.
  • Staging Ability: Big data products are notorious for being very costly and inefficient to stage, i.e. DevOps find it is very costly to certify a big data product for a production launch, which then causes delays in the product launch, and sometimes failures. Monolith applications are not well-suited for staging as they impact a lot of subsystems and too many tasks have to be done to stage them. Loosely coupled applications inherently help with staging as you can stage big data services (components) of the application independently. With data services, upgrades can be done in parts, and teams can mix and match services. A newer version can be staged and rolled out upon certification, and metrics can be compared before flipping the switch. The Apoxi framework has tools and features included to help with staging.
  • Backward Compatibility: Open source projects have had a long history of not being backward compatible. It is normal for enterprise companies to spend a lot of effort in just certification when moving to a new version of an open source software. With 30+ such types of software in the stack, backward compatibility is a support nightmare. The Apoxi framework helps by first reducing the stack to a few hardened technologies. The framework then ensures that they integrate very well into the big data services. While doing so, Apoxi helps drastically reduce the compatibility issues, as a lot of them now become internal to a big data service. Additionally, with various repositories and schema-aware connectors, data exchange is enabled across different versions of the data services. Apoxi thus provides the discipline and tools needed to manage an extremely difficult problem. DataTorrent’s AppFactory goes a step further and provides data services and applications that are certified to be backward compatible.
  • UI and Web Services: In all DataTorrent products, the user interface is a first-class citizen ensuring that the data is available through versioned web services for UI consumption. Apoxi has that notion infused as well. This ability is very valuable as UI and web services are the most critical aspects of productization. In a big data analytics product, there are three important personas: DevOps, Data Engineers, and Business Analysts. The first generation of big data grossly underserved DevOps and Business Analysts and made Data Engineers’ job very difficult. UI and web services are critical features that enable Data Engineers to develop features that help the other two and make viable products out of big data applications.

The Apoxi framework is geared towards a single goal: to enable customers to successfully get business outcomes in a timely manner, by using big data open source technologies. The tools, services, and features contained therein are measured in terms of how they reduce time to market and total cost of ownership.

In an upcoming blog series, I will delve deeper into how an enterprise can leverage DataTorrent’s AppFactory to get big data products faster and more successfully to market. To get started, you can download DataTorrent RTS or take a look at our pre-certified data services for various use case scenarios in DataTorrent’s AppFactory.