The DataTorrent Blog

Stay up-to-date on the latest from the DataTorrent team

SQL on Apache Apex

History Big Data has an interesting history. In the past few years, massive amounts of data have been generated for processing and analytics, and enterprises have been facing problems processing ever increasing data size. In order to process this increasing data size, the way was to scale up but scaling up was costly and resulted [...]

Chinmay Kolhatkar | 
January 13th, 2017|Big Data in Everyday Life|2 Comments

Getting Stack Traces in Apache Apex Applications

Stack trace is one of the most used techniques to debug an application. In the Java world, JStack is the most ubiquitous tool used to get a stack trace from a single JVM. When it comes to Apache Apex or any other distributed system, it is not easy to get a stack trace, as application [...]

Sandesh Hegde | 
December 21st, 2016|Big Data in Everyday Life|1 Comment

Machine Learning on Apache Apex with Apache SAMOA

Introduction Apache SAMOA is an open source platform for mining big data streams. SAMOA features a Write-Once-Run-Anywhere (WORA) architecture which allows multiple Distributed Stream Processing Engines (DSPEs) to be integrated into the framework. In this blog, we’ll describe the integration with Apache Apex which is a YARN native, unified batch and stream processing engine. Apache [...]

Bhupesh Chawda  | 
December 9th, 2016|Big Data in Everyday Life|1 Comment

Deploy and Manage DataTorrent RTS using Ambari

Many organizations choose Apache Ambari for simplifying their Hadoop operations. Ambari provides an easy way to configure and manage a Hadoop platform and it’s services. DataTorrent RTS being a Hadoop native platform makes a good candidate. Considering this, DataTorrent has added support for Ambari, ambari- DataTorrent-service, to install and manage DataTorrent RTS platform using Ambari. [...]

Priyanka Gugale | 
November 15th, 2016|Big Data in Everyday Life|0 Comments

Throughput, Latency, and Yahoo! Performance Benchmarks. Is there a winner?

Yahoo! benchmark Over the last year, Big Data Streaming computation engines such as Apache Apex, Apache Flink, Apache Spark (Spark Streaming), Google Dataflow and many others gained significant popularity among software development community and business users as such platforms provide additional capabilities that batch processing engines cannot deliver. There is a large number of use [...]

Vlad Rozov | 
November 12th, 2016|Big Data in Everyday Life|0 Comments

Fault-Tolerant File Processing

A majority of the big data setups still use files and streaming applications and platforms are a new concept.

Chandni Singh | 
May 17th, 2016|Big Data in Everyday Life|3 Comments

The Next Generation of Big Data and Apache Apex

We live in an era where computational resources are being rapidly commoditized. Cloud is pervasive and is democratizing IT. Big Data led by Apache Hadoop is the lead edge of this revolution from an IT perspective.

Amol Kekre | 
April 25th, 2016|Big Data in Everyday Life|0 Comments

Latency Calculation in Apache Apex

In stream processing applications, data arrives continuously and needs to be processed expediently in order to keep up with the incoming flow. Latency  is the primary metric with which the health of a streaming application is measured. High latency is typically an indication of problems. It can cause the application to be unable to keep [...]

David Yan | 
March 23rd, 2016|Big Data in Everyday Life|0 Comments

End-to-end “Exactly-Once” with Apache Apex

Apache ® Apex (http://apex.incubator.apache.org/), a stream processing platform that is currently incubating at the Apache Software Foundation, helps you build processing pipelines with fault tolerance and strong processing guarantees. Apex's architecture is equipped with the capability for low-latency processing, scalability, high availability, and operability. What's more, the Apex Malhar operator library comes with a wide [...]

Thomas Weise | 
March 3rd, 2016|Big Data in Everyday Life|1 Comment

Implementing Linear Road Benchmark in Apache Apex

Linear Road Benchmark - the basics [1, 2]Linear Road is a benchmark that is impelled by the variable tolling system on observed on highways across the globe. The benchmark underlines a variable tolling system for a simulated urban expressway system the tolls for which depend on factors such as congestion and accident proximity. Each vehicle [...]

Gaurav Gupta | 
January 14th, 2016|Big Data in Everyday Life, How-to|0 Comments