DataTorrent, Inc. has open sourced its entire library of Operators under the Apache 2.0 license. Using the project name "Malhar", the library contains over 250 commonly used operators which will continue to grow over time. The code is available at https://github.com/DataTorrent/Malhar. A discussion group "malhar-users" has been created under Google Groups. You may subscribe to this group at firstname.lastname@example.org
DataTorrent is the world's first real-time, stream processing platform for Big Data on Hadoop. As a YARN-native application, DataTorrent allows enterprises to leverage their existing Hadoop 2.0+ infrastructure to begin processing data as it comes streaming in to the enterprise. Using this real-time processing engine, businesses can analyze the streams of data coming in, perform calculations or transformations and act on the data in milliseconds. No need to wait hours or days for Hadoop jobs to complete. Businesses can get the results they need from their data in minutes, seconds or less than a second.
The Java-based applications that run on the DataTorrent Platform are built by combining various Operators into a Directed Acyclic Graph (DAG). Over 250 of these Operators have been pre-built by DataTorrent and all are now available as open source.
The Operators are templatized, customizable, tested, and benchmarked with every release. They can be used as is or can be extended to further customize their functionality. They are designed to be optimal, instrumented for scalability, partitioning, and have proper verification built in. A version of the Malhar project will be bundled with each platform release.