Friday, December 16, 2011

Long running HBase clients

The default HBase client was not designed to be used in long running, multithreaded Application Servers.

An HTable object is not thread safe, so the application either has to cache HTables - possibly in an HTablePool, or create a new one for each use.

HTables are not lightweight objects. Under the hood an ExecutorService is created for each HTable to parallelize requests to multiple regionservers.

To make matters even more complicated, the connection to an HBase cluster is managed by an HConnectionImplementation, which maintains a pool of TCP connections per regionserver. Each TCP connection managed by an HConnectionImplementation has its own thread.
HConnections are created and cached on demand on behalf of HTables.

So when you use HTablePool you get a pool of HTables, each maintaining its own pool of threads along with a (potentially shared) HConnection, which in turn maintains a pool of TCP connections, each with its own thread.

This setting becomes inscrutable quickly.

In HBASE-4805 I propose a different way of looking at this setup.

Instead of creating HTables - and implicitly thread pools and HConnections, you can now (optionally) create and manage your HConnections to the HBase cluster directly and create very lightweight HTable objects when needed by a thread for a few operations:

So the Application Server would create HConnection(s) and ExecutorService(s) ahead of time and reuse them with many HTables.

Configuration conf = HConfiguration.create();
HConnection conn = HConnectionManager.createConnection(conf);
ExecutorService threadPool = ...

...

HTable t = new HTable("<tablename>", conn, threadPool);
t.put(...);

A quick tests I performed suggests that it is actually cheaper to create an HTable this way than it is to retrieve one from an HTablePool.

Eventually the ExecutorService could be managed by the HConnection as well, and HTables are then simply retrieved with HConnection.getHTable(<tableName>)... But it is start for those of us who use an HBase client inside an Application Server.

1 comment: