DataTorrent RTS is a set of tools that augment the Apache Apex engine (incubating at the time of this writing). One of those tools is the DataTorrent Console, a web-based interface that allows you to assemble, launch, debug, and monitor your Apex applications. The best way to get started with the Console is to download the sandbox from our website.
This blog post focuses on using the Console to debug an Apex application running on your cluster. For more information on Apex applications in general, take a look at Thomas Weise’s great post on Apex application architecture.
One of the most significant shortcomings of the Hadoop ecosystem is the lack of tools that make debugging and problem analysis easy. At DataTorrent, we are all about developer productivity and ease of use. I want to showcase four features that intend to work towards these ideals.
One final note before diving in: the first two features I’ll talk about are available in the free version of the Console, while the other two are only available in the Enterprise edition. For a more comprehensive overview of the Console, check out the guide.
1. Recording Tuples
Events moving through an Apex application are typically referred to as “tuples.” As a developer, it can be very helpful to actually see the content of the tuples coming through a running application. To do this using the Console, select the application you want to record, switch to the physical tab, choose the physical operator you would like to record tuples on, and press the “record a sample” button:
2. Events Widget
Another handy tool that will help in the development and debug process on a running app is the Events widget. To access it, go to an application page and make sure that the logical tab is selected. The events widget should show up on the right side and should look like the following:
As you can see, some events have additional details that can be viewed by clicking on the button labeled details to the right of the event listing. This will open up a dialog box containing the full details of the particular event:
If one of your operator throws an exception, a container goes down, or a host of other issues occurs, you’ll be able to see details like the stack trace from this widget.
3. Log-level Setter
When running/debugging a running application on your cluster, one of the most basic actions you will be doing is viewing logs. By default, only log statements at the INFO level or higher get written to
dt.log files (container logs). Often times, you will want to see lower priority messages such as DEBUG. You can use the Console to dynamically specify what logging level gets written to the dt.log files of a running application in one of two ways. The first way is through the set logging level button in the Application Overview widget:
You will then be presented with a dialog where you can specify either fully qualified class names or package identifiers with wildcards:
The second way to set logging levels is to select a logical operator in the Logical Operators List widget and use the provided dropdown, like so:
4. Container Log Viewer
To actually view the log files of a given container, select a container from the Containers List widget (default location of this widget is in the “physical” dashboard). Then click the logs dropdown and select the log you want to look at.
Once you are viewing a log file in the console, there are few tricks to traversing it. You can scroll to the top to fetch earlier content, scroll to the bottom for later content, grep for strings in the selected range or over the entire log, and click the “eye” icon to the far left of every line to go to that location of the log.
We have numerous improvements in store for the DataTorrent Console, and user feedback is highly valued in our planning, so please provide any that will help in your usage of the tool!