Linear Road Benchmark – the basics [1, 2]

Linear Road is a benchmark that is impelled by the variable tolling system on observed on highways across the globe. The benchmark underlines a variable tolling system for a simulated urban expressway system the tolls for which depend on factors such as congestion and accident proximity. Each vehicle comes with a sensor that identifies its location coordinates every 30 seconds. These position reports are used to generate statistics about traffic conditions on every segment of every expressway for every minute. The statistics, including average vehicle speed, number of vehicles and existence of accidents, are used to determine toll charges for the given segment of road that the vehicle is in. This tolling is designed to control the traffic flow roadways by discouraging drivers from using already congested roads through increased tolls. Alternatively, this may encourage the use of less congested roads through decreased tolls. [Source:]

Linear Road depends on Linear City, a fictional metropolitan area that contains X parallel expressways.  Each expressway has four lanes in each direction: 3 travel lanes (lanes #1-3),one lane devoted to entrance (lane #0), and exit (lane #4) ramps. Each expressway has 100 entrance ramps and 100 exit ramps in each direction, thereby dividing it into 100 mile-long segments . This figure is an example of such a segment.


Vehicles in Linear City are equipped with sensors that emit a position report  every 30 seconds for coordinate identification. Position reports are processed to generate statistics about traffic conditions for every segment of the expressways every minute. The statistics include average vehicle speed, number of vehicles, and accident occurrences. These statistics are used to determine the toll charges for variable tolling. In addition, vehicles can issue queries to determine their current account balance with the expressway system, total tolls assessed on a given day, and travel time estimates.

Why Linear Road Benchmark?

Linear Road Benchmark is designed to determine the performance metric of a stream processing system. It tests Stream Processing Systems by measuring how many expressways a system can support by giving accurate and timely results to four types of queries that fit two categories: continuous and historical.

Linear Road Benchmark as an Apex application

Factors determining the implementation

To implement the Linear Road Benchmark in Apex, two factors were taken into consideration:

  • Parallel processing

The implementation hinged on the idea of breaking components into smaller parts for parallel processing. For Linear Road, the average speed and  accident detection can be computed in parallel because they are mutually exclusive.

  • Pipelining

To implement Linear Road, pipelining is essential. The advantage of this approach is that no process remains idle. In the case of Linear Road, Accident Notifier and Toll Notifier are aligned  in pipeline along with Accident Detector and Average Speed Calculator. While Accident Detector and Average Speed Calculator process new data, Accident Notifier and Toll Notifier process data that Accident Detector and Average Speed Calculator already processed. This ensures optimum resource utilization.

The Linear Road DAG

Linear Road is implemented as a Directed Acyclic Graph or a DAG. Apex supports DAGs, as acyclic graphs of one or more operators, each performing a specific operation. For example, you can have a DAG that performs simple addition of integers. This DAG will require (at a minimum) an input operator that reads integers, an addition operator that performs the addition, and an output operator that displays the result.

The DAG for Linear Road comprises of the following operators:



This operator  reads historical data from HDFS files. After a read operation historicalInputReceiver  notifies  InitiateStreamData, the downstream operator, to initiate read operation of streaming data.


The time required for reading historical data is not a part of the variable tolling system simulation. That is why InitiateStreamData reads  data streams only after historical data is read. This operator:

  • Triggers the ingestion of data streams into the system
  • Simulates data bursts


Receiver operator reads data streams from HDFS files. The operator ingests data in bursts to best simulate a real-life scenario where data flow might not be constant, but rather, sporadic.


AccidentDetector detects and clears accidents in Linear City.


AverageSpeedCalculator calculates the average speed of  vehicles in a given segment for a given minute. It also calculates the speed of vehicles for a given minute on expressways.


AccidentNotifier sends accident notifications to vehicles that are in the vicinity of an accident location


TollNotifier operator calculates the toll depending on  factors such as congestion and accident proximity.


DailyBalanceStore operator stores the total toll for a given vehicle on a specified expressway on  a specified day. It responds to query type 3 that requests for the vehicle’s total tolls on a specified expressway on a specified day.


AccountBalanceStore operator stores the total toll for a given vehicle. It responds to query type 2 that requests for the current account balance.


Daily-Balance-Console writes the results of query type 3 to an HDFS file.


Account-Balance-Console writes the results of query type 2 to an HDFS file.


Accident-Notifier-Console writes accident notifications to an HDFS file.


Toll-Notifier-Console writes the toll notifications to an HDFS file.

The Linear Road configuration

For this example, the following configuration is used:

Number of Servers: 4

CPU on each server: Intel Xeon Processor E5-2670 2.6 GHz 16 cores

RAM: DDR3-1600 MHz, 112 GB

Challenges during implementation

  1. Storing and indexing GigaBytes of data for answering query type 2 and 3 in an effective way. Answering queries 2 and 3 within given time constraint of 5sec, we have to make sure that the data is stored and indexed such that it is easily searchable. For this, Datatorrent proprietary Scalable datastore is used.
  1. Reading input stream data as fast as possible. To achieve this, the stream data for each expressway is stored in different file and multiple instances of Receiver are created to  read the data from multiple files in parallel. Following diagram shows how the logical DAG is converted to physical DAG with 2 instances of Receiver. There are two parallel pipelines processing data in parallel.


The Linear Road Implementation

With this example, here are a couple observations:

Total cores used: 46

Total memory used: 360GB

Total expressways benchmarked for this solution on the given infrastructure are : 102

All the query results  were processed within 2 seconds from reception of the  input tuple




  • Download DataTorrent Sandbox here
  • Download DataTorrent Enterprise Edition here
  • Join Apache Apex Community  here