Sunday, October 4, 2015
Friday, February 28, 2014
We use OpenTSDB to store the majority of our time series server and application statistics at Tumblr. We recently began a project to migrate OpenTSDB from an existing HBase cluster running an older version of HBase to a new cluster with newer hardware and running the latest stable version of Hbase.
We wanted a way to have some historical data in the new cluster before we switched to it. Within Tumblr we have a variety of applications generating these metrics and it was not very practical for us to change all of them to double write this data. Instead, we chose to replace the standard OpenTSDB listeners with a proxy that would do this double writing for us. While we could have used HBase copy table or written our own tool to backfill historical data from the old cluster, double writing for an initial period allowed us to avoid adding additional load on our existing cluster. This strategy also allowed us to move queries for recent data to new cluster earlier than the full cutover.
The tsd_proxy is written in Clojure and relies heavily on the Lamina and Aleph which in turn build on top of Netty. We have been using this in our production infrastructure for over two months now while sustaining writes at or above 175k/s (across the cluster) and it has been working well for us. We are open sourcing this proxy in the hope that others might find a use for this as well.
The tsd_proxy listens on a configurable port and can forward the incoming data stream to multiple end points. It also has the ability to filter the incoming stream and reject data points that don't match a (configurable) set of regular expressions. It also has the ability to queue the incoming stream and re-attempt delivery if one of the end points is down. It is also possible to limit the queue size so you don't blow through your heap. The README has some more information on how to set this up.
Friday, March 2, 2012
This post is an expansion of my talk at a local HBase meetup. I am going to go into a little more detail on our HBase setup and cluster automation and will hopefully give you ideas on how to build/manage your HBase infrastructure.
- SuperMicro boxes
- Ubuntu Lucid running backported kernels
- 48 GB RAM (No swap)
- Six SATA 2T - Hitachi Deskstar 7K, 64MB cache
- Two quad core - Intel Xeon CPU L5630 - 2.13GHz
- Each of the machines uses a single gigabit uplink
Directory layoutAll of our Hadoop/HBase processes run as the Hadoop user. The configs for Hadoop and HBase are maintained in git and are distributed to the servers via Puppet. These are synced to the ~hadoop/hadoop_conf and ~hadoop/hbase_conf directories on the servers. One of our goals is to stay as close to the upstream release as possible, so we use the bits from the packaged binary builds directly. When we get new builds, the packaged binaries are directly expanded into the corresponding <product>-<rel>-<ver> directories. At any given time, the active build is sym-linked to the corresponding product directory. We then symlink the <prod>/conf directories to the corresponding ~hadoop/<prod>_conf directories synced via puppet. This is how the directory listing looks.
~$ ls -l | sed -e 's/ \(.*[0-9]\) / /' total 52 lrwxrwxrwx hadoop -> hadoop-0.20.2-cdh3u3 drwxr-xr-x hadoop-0.20.2-cdh3u3 drwxr-xr-x hadoop_conf lrwxrwxrwx hbase -> /home/hadoop/hbase-0.90.5-a9d4c8d drwxr-xr-x hbase-0.90.5-a9d4c8d drwxr-xr-x hbase-0.90.5-fb2b8ca drwxr-xr-x hbase_conf drwxr-xr-x run ~$ ~$ ls -l hbase-0.90.5-a9d4c8d | sed -e 's/ \(.*[0-9]\) / /' total 3784 drwxr-xr-x bin -rw-r--r-- CHANGES.txt lrwxrwxrwx conf -> /home/hadoop/hbase_conf/ -rwxr-xr-x hbase-0.90.5.jar -rwxr-xr-x hbase-0.90.5-tests.jar lrwxrwxrwx hbase.jar -> hbase-0.90.5.jar drwxr-xr-x hbase-webapps drwxr-xr-x lib -rw-r--r-- LICENSE.txt -rw-r--r-- NOTICE.txt -rw-r--r-- pom.xml -rw-r--r-- README.txt drwxr-xr-x src ~$
Deployment automationOne of our goals is to deploy new HBase releases with zero downtime. We use Fabric to automate almost all of this process, and it is currently mostly hands-off. There are parts of this, that are still prone to manual intervention, but it usually works pretty well. When we get a new HBase build to be deployed, the deployment step looks like this.
fab prep-release:/home/stack/hbase-0.90.3-9fbaa99.tar.gz disable_balancer deploy-hbase:/home/stack/hbase-0.90.3-9fbaa99.tar.gz enable_balancerThis lays out (extracts code, makes symlinks) the code, pushes it to the regionserver machine and does a graceful restart of the node. After this, a restart of the HBase master is required to make it use the new code. This last step is currently manual.
A rolling restart of the cluster for new configs to take effect looks like this.
fab -P -z 3 rolling-restartWith parallel Fabric (since release 1.3), these rolling restarts bounce multiple (the spread is controlled by the -z flag) nodes at once. The 0.92 HBase release introduces the notion of draining nodes - these are nodes that will not get any more new regions. A node is marked as a draining node by creating an entry in ZooKeeper under the hbase_root/draining znode with the format "name,port,startcode" just like the regionserver entries under hbase_root/rs znode. This makes it easier to gracefully drain multiple regionservers at the same time. This command puts regionservers "foo" and "bar" into the draining state.
fab add_rs_to_draining:hosts="foo,bar"Here is the list of tasks we can handle with our current Fabric setup.
~$ fab -l Available commands: add_rs_to_draining Put the regionserver into a draining state. assert_configs Check that all the region servers have the sam... assert_regions Check that all the regions have been vacated f... assert_release Check the release running on the server. clear_all_draining_nodes Remove all servers under the Zookeeper /draini... clear_rs_from_draining Remove the regionserver from the draining stat... deploy_hbase Deploy the new hbase release to the regionserv... disable_balancer Disable the balancer. dist_hadoop Rsyncs the hadoop release to the region server... dist_hbase Rsyncs the hbase release to the region servers... dist_release Rsyncs the release to the region servers. enable_balancer Balance regions and enable the balancer. hadoop_start Start hadoop. hadoop_stop Start hadoop. hbase_gstop HBase graceful stop. hbase_start Start hbase. hbase_stop Stop hbase (WARNING: does not unload regions). jmx_kill Kill JMX collectors. list_draining_nodes List all servers under the Zookeeper /draining... prep_release Copies the tar file from face and extract it. reboot_server Reboot the box. region_count Returns a count of the number of regions in th... rolling_reboot Rolling reboot of the whole cluster. rolling_restart Rolling restart of the whole cluster. sync_puppet Sync puppet on the box. thrift_restart Re-start thrift. thrift_start Start thrift. thrift_stop Stop thrift. unload_regions Un-load HBase regions on the server so it can ... ~$
Additional NotesThe fabfile and other scripts to run all of this are on github.
- The latest version of the fabfile is meant for the 0.92 release. For older HBase releases look at this commit.
- The older fabfile is meant to be run on one node at a time in serial, so the -P flag (parallel mode) will not work correctly.
- I stole zkclient.py from here and added command line arguments to make it do some simple tasks I needed for manipulating ZooKeeper nodes. You will need the python-zookeeper libraries to make it work. I could not get the ZooKeeper cli_mt client to work correctly, which would have made zkclient.py un-necessary.
- Ideally, I would have liked to use the python-zookeeper libraries directly from Fabric. However, the python-zookeeper libraries need threading support and that doesn't play nice with Fabric's parallel mode. It works fine in the serial mode.
- It is important to drain HBase regions slowly when restarting regionservers. Otherwise, multiple regions go offline simultaneously as they are re-assigned to other nodes. Depending on your usage patterns, this might not be desirable.
- The region_mover.rb script is an extension of the standard region_mover.rb that ships with stock HBase. I hacked it a little to add slow balancing support and automatic region balancing while unloading regions from a server. This version is also aware of draining servers and avoids them during region assignment and balancing. Again, look for the older commit if you want to use this with 0.90.x HBase releases. The latest version is for the 0.92 release.
- We use linux cgroups to contain the TaskTracker processes, so if you plan on using this to manage your Hadoop cluster - be aware of that (remove that stuff, if you don't need cgroups).
- We grant the hadoop user sudo permissions to run puppet on our cluster nodes, you will need to do something similar if you want to manage configuration through Puppet/Fabric. Your life will be a lot easier if you setup no-password ssh logins from the master (or wherever you run fab from) to your regionserver nodes.
Hope this helps other folks with their HBase deployments.
Tuesday, January 31, 2012
- sudo apt-get install g++ git automake libtool libltdl-dev python-dev swig python-setuptools (not sure libltdl-dev is needed, but libtool suggested it, so I figured, why not!)
- git clone git://git.apache.org/mesos.git
- cd mesos
- ./configure --with-webui \
- make -j 2 ... ... at some point this will fail while building zookeeper.
- vi third_party/zookeeper-3.3.1/src/c/configure.ac
- comment out lines 25 to 44
- make -j 2 ... ... This will again fail with some libtool version compatibility problems. I don't know the auto* tools well enough to understand why. Posts on the mesos mailing list suggest autoreconf. That worked for me.
- autoreconf -fi
- make -j 2
Tuesday, September 13, 2011
I have been reading a parenting book by John Medina - Brain rules for baby(highly recommended), trying to morph into a good parent and everything... anyhoo.. I found this little nugget in the book. The author is describing some observations by two sociologists Edward Jones and Richard Nisbett - "People view their own behaviors as originating from amendable, situational constraints, but they view others behaviours as originating from inherent, immutable personality traits". Thinking back to a lot of my everyday work, home and family experiences - this explains things so well.. everytime I feel like I am right, or everytime I think things ought to be done differently, maybe it's just my assymmetric brain failing to see the other side of the problem.
After reading that.. I couldn't help but admire these sociologists - reducing everything down to predictable, simple facts and observations.
anyways.. thought I'd share. Next time, a little more putting yourself in the other persons shoes and a little less of the "jump to conclusions" mat!
Wednesday, February 16, 2011
In my new role, I will be working as an operations engineer with a focus on developing tools to better understand and monitor Hadoop environments (at least initially). The goal would be to open source these tools and contribute them to the community.
Hope to see you on the other side..
Wednesday, April 7, 2010
Other tools like phpldapadmin, ldapsearch have their uses, but this is the most usable ldap browsing, editing tool I found so far. Figured someone else out there might find a use for it.. Thanks Mahlon E. Smith for shelldap!