The Next Generation of Big Data and Apache Apex

We live in an era where computational resources are being rapidly commoditized. Cloud is pervasive and is democratizing IT. Big Data led by Apache Hadoop is the lead edge of this revolution from an IT perspective.

Amol Kekre | 
April 25th, 2016|Technical|0 Comments

Latency Calculation in Apache Apex

In stream processing applications, data arrives continuously and needs to be processed expediently in order to keep up with the incoming flow. Latency  is the primary metric with which the health of a streaming application is measured. High latency is typically an indication of problems. It can cause the application to be unable to keep [...]

David Yan | 
March 23rd, 2016|Technical|0 Comments

Apache Apex 2015: Climbed A First Peak

Since Apache Apex got accepted into Apache incubation on August 17th, we have seen significant momentum in our community growth. Here are a few stats: Apex Meetup membership grew to more than 1000 in less than 4 months since inception with highest weekly growth rate recorded at 26.73%. Our Meetup groups spread all over N. [...]

Desmond Chan | 
January 4th, 2016|Technical|0 Comments

GE Predix Supercharging IoT with Apache Apex

 Years ago when I was a data analyst, my responsibility was to analyze machine-generated data to troubleshoot system errors. I would present the findings based on the analysis so the engineering team can modify their design and the technicians can adjust their calibration. When I needed data, the technician had to dress up, go into [...]

Jie Wu | 
December 17th, 2015|Technical|0 Comments

An Accelerated, Simplified and Scalable Approach to Fuel Your Big Data Projects

Fast Big Data To conduct meaningful analytics, first you need to collect all the necessary, relevant data to feed to your model. But gathering data from multiple disparate sources can be an extremely messy, time consuming task. Big Data, often (in)famously characterized with its unprecedented speed, volume and format, adds more complexity to this [...]

Jie Wu | 
December 3rd, 2015|Technical|0 Comments

An introduction to checkpointing in Apache Apex

For successful launch of fast, big data projects, try DataTorrent’s AppFactory. Big data is evolving  in a big way. As it booms, the issue of fault  tolerance  becomes more and more exigent. What happens if a node fails? Will your application recover from the effects of data or process corruption?In a conventional world, the simplest [...]

Gaurav Gupta | 
November 10th, 2015|Technical|1 Comment

Dimensions Computation (Aggregate Navigator) Part 2: Implementation

OverviewWhile the theory of computing the aggregations is correct, some more work is required to provide a scalable implementation of Dimensions Computation. As can be seen from the formulas provided in the previous post, the number of aggregations to maintain grows rapidly as the number of unique key values, aggregators, dimension combinations, and time buckets [...]

Tim Farkas | 
November 5th, 2015|Technical|0 Comments

Tracing DAGs from specification to execution

How Apex orchestrates the DAG lifecycleApache Apex (incubating) uses the concept of a DAG to represent an application's processing logic. This blog will introduce the different perspectives within the architecture, starting from specification by the user to execution within the engine.Understanding DAGsDAG, or Directed Acyclic Graph, expresses processing logic as operators (vertices) and streams (edges) [...]

Thomas Weise | 
October 1st, 2015|Technical|1 Comment