Friday, February 28, 2014

OpenTSDB proxy

We use OpenTSDB to store the majority of our time series server and application statistics at Tumblr. We recently began a project to migrate OpenTSDB from an existing HBase cluster running an older version of HBase to a new cluster with newer hardware and running the latest stable version of Hbase.

We wanted a way to have some historical data in the new cluster before we switched to it. Within Tumblr we have a variety of applications generating these metrics and it was not very practical for us to change all of them to double write this data. Instead, we chose to replace the standard OpenTSDB listeners with a proxy that would do this double writing for us. While we could have used HBase copy table or written our own tool to backfill historical data from the old cluster, double writing for an initial period allowed us to avoid adding additional load on our existing cluster. This strategy also allowed us to move queries for recent data to new cluster earlier than the full cutover.

The tsd_proxy is written in Clojure and relies heavily on the Lamina and Aleph which in turn build on top of Netty. We have been using this in our production infrastructure for over two months now while sustaining writes at or above 175k/s (across the cluster) and it has been working well for us. We are open sourcing this proxy in the hope that others might find a use for this as well.

The tsd_proxy listens on a configurable port and can forward the incoming data stream to multiple end points. It also has the ability to filter the incoming stream and reject data points that don't match a (configurable) set of regular expressions. It also has the ability to queue the incoming stream and re-attempt delivery if one of the end points is down. It is also possible to limit the queue size so you don't blow through your heap. The README has some more information on how to set this up.