Hector API v2

Update: This post is now close for comments. If you have any questions please feel free to subscribe and ask at hector-users@googlegroups.com

Update: The API was change a bit, it was mainly package renaming and interface extractions. To get a consistent snapshot of the api usage have a look at github on the release tag for example for 0.6.*:
http://github.com/rantav/hector/blob/0.6.0-17/src/main/java/me/prettyprint/cassandra/examples/ExampleDaoV2.java
and http://github.com/rantav/hector/blob/0.6.0-17/src/test/java/me/prettyprint/hector/api/ApiV2SystemTest.java

and for 0.7.*
http://github.com/rantav/hector/blob/0.7.0-18/src/main/java/me/prettyprint/cassandra/examples/ExampleDaoV2.java
and http://github.com/rantav/hector/blob/0.7.0-18/src/test/java/me/prettyprint/hector/api/ApiV2SystemTest.java

Hector is a high level java client for the cassandra database. It was first released some six months ago and was coined  as the de-facto java client for cassandra.

There is a large community of users and companies who run their production high scale systems based on hector.

The main benefits hector adds to the existing thrift based interface are:

Since the time hector was first written, first by me, then with the good help of other community members (of note is Nathan McCall) it’s gained popularity even in the face of “competition” so to speak, in an open source manner.

However, one thing that I’ve always felt we can improve is the API. When writing the first version of hector the premise was that users are comfortable with the current level of the thrift API so hector should maintain an API similar in spirit. Hector may make things more type safe (such as when replacing all the ColumnOrSuperColumn types with specific Column or a SuperColumn typed methods) but in general I figured I should introduce as little new concepts as possible so that new users who are already familiar with the thrift way can easily learn the new library.

I was wrong.

As it turns out, users don’t learn the thrift API and then go use hector. Most users tend to just skip the thrift API and start with hector. Fait enough. But then I’m asked why did I make such a funny API… They are right, users of hector should not suffer from the limitations of the thrift API.

Add to that the complexity of dealing with failover, which clients need not care about at the API level (and in the v1 API they did) and some complex anonymous classes and the Command pattern users need to understand (if only we could have closures in java…) then we get a less than ideal API.

So the conclusion was clear: an API v2 was needed.

The new API uses the same proven hector implementation underneath but exposes a cleaner interface to the users. It also makes meta operations such as cluster management explicit.

We’re all coders, so enough talkin, let’s see some code…

// Create a cluster
Cluster c = HFactory.getOrCreateCluster("MyCluster", "cassandra1:9160");
// Choose a keyspace
KeyspaceOperator ko = HFactory.createKeyspaceOperator("Keyspace1", c);
// create an string extractor. I'll explain that later
StringExtractor se = StringExtractor.get();
// insert value
Mutator m = HFactory.createMutator(keyspaceOperator);
m.insert("key1", "ColumnFamily1", createColumn("column1", "value1", se, se));
 
// Now read a value
// Create a query
ColumnQuery q = HFactory.createColumnQuery(keyspaceOperator, se, se);
// set key, name, cf and execute
Result> r = q.setKey("key1").
        setName("column1").
        setColumnFamily("ColumnFamily1").
        execute();
// read value from the result
HColumn c = r.get();
String value =  c.getValue();
System.out.println(value);

Clusters are identified by their name, in this case MyCluster. A call to getOrCreateCluster(“MyCluster”, hostport) would create a new cluster if it doesn’t exist, but return a previously created one if it does. The Cluster object represents the client’s view of the cassandra cluster. A program may hold several Cluster instances, although typically one is sufficient.

A KeyspaceOperator is the object used to make operations (reads, writes) on specific keyspaces. A program can create many of those.

The StringExtractor is interesting… Hector provides type safety for column names and column values. Recall that in cassandra column names are byte[] and column values are byte[] too. So usually the programmer is required to translate those bytes back and forth to actual java objects. When designing the API we wanted to simplify this work and provide a type-safe API. Note that in the next lines we use HColumn<String, String> which is a column who’s name is of type String and value is also of type String as well as ColumnQuery<String, String> which is a query that returns columns with String names and String values. We could also have chosen to have columns with String names but Long values HColumn<String,Long> where it makes sense for the application.

To provide this type safety hector defines an Extractor<T> interface.

/**
 * Extracts a type T from the given bytes, or vice a versa.
 *
 * In cassandra column names and column values (and starting with 0.7.0 row keys) are all byte[].
 * To allow type safe conversion in java and keep all conversion code in one place we define the
 * Extractor interface.
 * Implementors of the interface define type conversion according to their domains. A predefined
 * set of common extractors can be found in the extractors package, for example
 * {@link StringExtractor}.
 *
 * @author Ran Tavory
 *
 * @param   The type to which data extraction should work.
 */
public interface Extractor {
 
  /**
   * Extract bytes from the obj of type T
   * @param obj
   * @return
   */
  public byte[] toBytes(T obj);
 
  /**
   * Extract an object of type T from the bytes.
   * @param bytes
   * @return
   */
  public T fromBytes(byte[] bytes);
}

Hector provides a set of default and commonly used implementations of extractors, such as StringExtractor and LongExtractor (see package me.prettyprint.cassandra.extractors). Users of Hector are expected to implement their own application-specific extractors. The interface is pretty straight forward and simple, you only need to implement two methods which convert your type to/from byte[]. Extractors are purely functional which means that they don’t have side effects and have no state. Using extractors, the API adds simplicity, separation of concerns and type safety.

Next we create a mutator and insert a value:

Mutator m = HFactory.createMutator(keyspaceOperator);
m.insert("key1", "ColumnFamily1",
    HFactory.createColumn("column1", "value1", se, se));

The class HFactory (“Hector Factory”) has a set of many useful static factory methods. I usually just import static me.prettyprint.cassandra.model.HFactory.* and get all it’s public method as short method names, so the previous line can be written as:

Mutator m = createMutator(keyspaceOperator);
m.insert("key1", "ColumnFamily1", createColumn("column1", "value1", se, se));

Note again the use of the StringExtractor (se) when creating a column. The column gets a String name and a String value so it needs a StringExtractor to assist it in serializing and desiaralizing the strings to byte[]. As a matter of fact, we’ve noticed that it’s so common for use to use columns of type HColumn<String,String> that we decided we add a utility factory method: createStringColumn(name, value) which is a bit shorter than createColumn(name, value, nameExtractor,  valueExtractor). You may create your convenience factories as well and are welcome to contribute them back to hector.

Next to reading the value. We’d like to read a simple column value given by its key (key1), and column name (column1).

We create a ColumnQuery<N,V> where N is the type of the column name and V is the type of the column value (in this case, again it’s String, String)

ColumnQuery q = createColumnQuery(keyspaceOperator, se, se);

Next we set the query attributes – key, column and column family, and execute it.

q.setKey("key1");
setName("column1");
setColumnFamily("ColumnFamily1");
Result&gt; r = execute();

The 4 lines above can also be written shortly as a one liner due to method chaining.

Result&gt; r = q.setKey("key1").
        setName("column1").
        setColumnFamily("ColumnFamily1").
        execute();

What we see here is another feature of the API called method chaining. By convention all setters return a pointer to this so that it’s easy to setX(x).setY(y).setZ(z).
You can even call execute() on the same line, e.g. Result<T> r = q.setX(x).setY(y).execute().

execute() returns a typed Result<T> object. Note again the type safety. In this case we have Result<HColumn<String, String>> since the query is of type ColumnQuery<String, String>. In general we have ColumnQuery<N, V>.execute() => Result<HColumn<N, V>>

So far we’ve looked at the ColumnQuery<N,V> but as a matter of fact there an many other types of Queries, in this example we query a simple column but the API defines all types of query use cases allowed by the thrift API, all implement the Query<T> interface:

/**
 * The Query interface defines the common parts of all hector queries, such as {@link ColumnQuery}.
 *
 * The common usage pattern is to create a query, set the required query attributes and invoke
 * {@link Query#execute()} such as in the following example:
 *
 * Note that all query mutators, such as setName or setColumnFamily always return the Query object
 * so it's easy to write strings such as <code>q.setKey(x).setName(y).setColumnFamily(z).execute();</code>
 *
 * @author Ran Tavory
 *
 * @param  Result type. For example HColumn or HSuperColumn
 */
public interface Query {
  <q>&gt; Q setColumnFamily(String cf);
  Result execute();
}</q>

There are ColumnQuery<N,V>, SuperColumnQuery<SN,N,V>, SliceQuery<N,V>, SuperSliceQuery<SN,N,V>, SubSliceQuery<SN,N,V>, RangeSlicesQuery<N,V> and more… All query objects are type safe, so they return the only type the should return and the compiler keeps you safe.

To read the value off of a result object just call Result.get(). Here again we provide type safety so in this case the result is of type HColumn<String,String>

// read value from the result
HColumn c = r.get();

A Result , apart from holding the actual value also has some nice metadata, such as getExecutionTimeMicro() and getQuery(). We plan to add more to that.

Lastly we print out the string.

String value =  c.getValue();
System.out.println(value);

In this case value is of type String. If the column would have been defined as follows: HColumn<String, Long> then we’d have

Long value = c.getValue().

The new API has other nice additions such as an improved exception hierarchy (all exceptions extend HectorException which is a RuntimeException and there’s a translation b/w the thrift exception and the hector ones, see ExceptionTranslator), no dependency on thrift and more.

It’s important to note that the new API does not rely on thrift anymore, so users who want to use avro as their transport are able to do it without changing their implementation (after cassandra really supports avro, and we add that to hector).

Support for cassandra 0.7.0 is underway, and with the new API should be relatively easy to add.

An extensive list of tests of the entire query/mutate API is available at ApiV2SystemTest

To sum up, here’s a short list of new concepts introcuded by the v2 API and its main benefits

  • All previous functionality provided by hector remains. You still get connection pooling, JMX etc. The old API is still in place, untouched (except for some small exception hierarchy refinements) so if you have existing code already working with hector it won’t break. We do plan to phase out the older API just so we have only one concise API , but as of now it’s left untouched.
  • Clear and simple Mutator API calls. Mutator.insert(), Mutator.delete(), and for batch operations: Mutator.addInsertion().addInsertion().addDeletion().execute()
  • Extensive, yet very simple query support. The API supports all types of queries supported by cassandra as a simple and type-safe java API. You can query for columns, super-columns, sub-columns (subcolumns of supercolumn), ranges, slices, multigets etc.
  • Simple and concise exception hierarchy based on HectorException which extends a RuntimeException, so you don’t need to get your code dirty with try-catch where you don’t necessarily want to. You can still of course handle all exception types, the information is not lost, but code is much much cleaner when you don’t care to.
  • No dependency on thrift (or on avro). The v2 API is completely independent of the wire protocol. When avro is finally implemented by cassandra all you have to do is tell hector whether you want to use thrift/avro and that’s all, no other code changes. Hector provides its own (type safe) objects such as HColumn<N,V> or HSuperColumn<SN,N,V>
  • Type safety and separation of concerns. You implement (or reuse) a typed Extractor<T> and need not care about those byte[]s anymore.

Lastly, we marked the new API as “beta” not because it’s not ready, but purely because we want to get your feedback. We’d like to leave it as beta for a few more weeks to get the developers feedback and if everyone’s happy release it as final, so do feel free to let us know how you feel about it.
The API is available on the downloads page and is marked as 0.6.0-15.

27 Responses to “Hector API v2”

  1. Ah, great news. I was leaning towards pelops just because of its API but was a little nervous b/c Hector like you said was the defacto.

    Exciting to give it a try.

    By Salman on Aug 6, 2010

  2. does the Cluster object thread-safe?
    i mean, several threads own/use one Cluster object

    By levy on Aug 11, 2010

  3. A Cluster is thread safe, yes, and is meant to be reused by different threads.
    I’ll add that to the class docs

    By Ran Tavory on Aug 11, 2010

  4. With API v2, how can I count columns in a column family or a super column? I can not find a query to do that.

    By gfremex on Aug 15, 2010

  5. That’s a use case I haven’t coded yet, it’s easy to do, just didn’t get to it yet…
    Do you have a suggestion how would you like to see the API defined?

    By Ran Tavory on Aug 15, 2010

  6. Maybe a CountQuery can do that.

    I tried to write a method to get columns count by using API V2. But I failed because of restricted access to some methods.

    For example, I want to call KeyspaceOperator.doExecute(KeyspaceOperationCallback koc) with a customized koc which can use a ks passed in to get columns count. However doExecute() is invisible in my code.

    Can you release a new API version as soon as possible or tell me how I could get columns count?

    By gfremex on Aug 16, 2010

  7. Hi, sorry for the delay and not adding this in the first round of the API, I’ve just committed this, check it out on http://github.com/rantav/hector/commit/501e4d79a86e07bc157c9bdff24b11e7908e81e6

    It’s in master as well as on 0.6.0 but I haven’t made a release with it yet, I’m waiting for a few more changes to go it

    By Ran Tavory on Aug 16, 2010

  8. Use the new API V2.How can I insert a column or columns in a supercolumn what is already exist.
    In new API I only find create a new supercolumn insert the the table.

    By GongZhi on Aug 21, 2010

  9. Please post your questions to hector-users@googlegroups.com (and subscribe).
    When you add subcolumns to an existing supercolumn it will not remove the previous columns in that SC, so simple create a new SC with subcolumns and insert them, it’s additive.

    By Ran Tavory on Aug 21, 2010

  10. Thanks a lot!

    Now I can count columns/super columns in an easy way!

    By gfremex on Aug 23, 2010

  11. I hava solved the problem today and subscribe the group

    By GongZhi on Aug 23, 2010

  12. Couple of things.. and please correct me if I am wrong..

    I checked out he master branch.. and Extractor has been renamed as Serializer, StringExtractor as StringSerializer.

    By atin on Aug 24, 2010

  13. yes, this is correct, that’s part of the feedback we’re getting from the mailing list, you’re welcome to join http://groups.google.com/group/hector-users

    By Ran Tavory on Aug 24, 2010

  14. How do I do a multiColumnget query using the V2 api. I know for V1 I can do s.th like
    keyspace.multigetColumn(keys, columnpath).. but which Query does it map to in V2

    By atin on Aug 25, 2010

  15. @atin for a standard column: MultigetSliceQuery and call setColumnNames(cName) if you only need a single column
    See the test here ApiV2SystemTest.testMultigetSliceQuery()

    If it’s a super then there are the MultigetSuperSliceQuery and MultigetSubSlilceQuery, one is for getting a set of supercolumns and the other for getting a set of columns under one specific supercolumn

    By Ran Tavory on Aug 25, 2010

  16. I keep getting this error..

    me.prettyprint.cassandra.model.HectorTransportException: org.apache.thrift.protocol.TProtocolException: Required field ‘predicate’ was not present! Struct: multiget_slice_args(keyspace:SITP, keys:[USER_1282727830168], column_parent:ColumnParent(column_family:USERLINE), predicate:null, consistency_level:QUORUM)
    at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:24)
    at me.prettyprint.cassandra.service.KeyspaceImpl$14.execute(KeyspaceImpl.java:451)
    at me.prettyprint.cassandra.service.KeyspaceImpl$14.execute(KeyspaceImpl.java:438)
    at me.prettyprint.cassandra.service.Operation.executeAndSetResult(FailoverOperator.java:366)
    at me.prettyprint.cassandra.service.FailoverOperator.operateSingleIteration(FailoverOperator.java:175)
    at me.prettyprint.cassandra.service.FailoverOperator.operate(FailoverOperator.java:84)
    at me.prettyprint.cassandra.service.KeyspaceImpl.operateWithFailover(KeyspaceImpl.java:151)
    at me.prettyprint.cassandra.service.KeyspaceImpl.multigetSlice(KeyspaceImpl.java:455)
    at me.prettyprint.cassandra.model.MultigetSliceQuery$1.doInKeyspace(MultigetSliceQuery.java:44)
    at me.prettyprint.cassandra.model.MultigetSliceQuery$1.doInKeyspace(MultigetSliceQuery.java:38)
    at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:12)
    at me.prettyprint.cassandra.model.KeyspaceOperator.doExecute(KeyspaceOperator.java:47)
    at me.prettyprint.cassandra.model.MultigetSliceQuery.execute(MultigetSliceQuery.java:37)

    Any ideas??

    Sorry for the botheration

    By atin on Aug 25, 2010

  17. @atin it’s best to follow this on hector-users@googlegroups.com
    Could I ask you to register and post more details there?
    Can you pls send your code and I’ll try to repro it locally and fix?

    By Ran Tavory on Aug 25, 2010

  18. Are there any quick example usages available anywhere?
    I’ve been using Cassandra for abotu 6 months but i’ve always used the thrift interface so if any examples are available please point out where i can get a hold of em.
    thanks

    By robinsonc494 on Sep 1, 2010

  19. You can see some public examples here: ExampleDaoV2 and here ApiV2SystemTest.
    If you need to see what other folks do I suggest you ask more specific questions at the mailing list hector-users

    By Ran Tavory on Sep 1, 2010

  20. Hi,
    Is it possible to set multiple columns(in the same column family) for a single key in a single insert method? or do i have to do two separate inserts with different column names ?

    By Karthik on Oct 25, 2010

  21. Can I use hector-0.7.* for Cassandra-0.6.5 server?

    By Sam on Nov 10, 2010

  22. @Sam no, you need to use hector 0.6.0-* for cassadra 0.6.*, just b/c of the thrift compatibility requirements.

    By Ran Tavory on Nov 10, 2010

  23. Hi,
    is only the Cluster thread safe, or is the Keyspace also thread safe?

    -Stephan

    By Stephan on Jan 20, 2011

  24. A cluster is thread safe, yes

    By Ran Tavory on Jan 20, 2011

3 Trackback(s)

  1. Oct 6, 2010: Cassandra 0.7 and Hector for Noobs | Geek on the Loose
  2. Feb 13, 2011: Confluence: BeSTGRID
  3. May 25, 2011: Confluence: Engineering Team

Sorry, comments for this entry are closed at this time.