<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PrettyPrint.me &#187; Uncategorized</title>
	<atom:link href="http://prettyprint.me/category/uncategorized/feed/" rel="self" type="application/rss+xml" />
	<link>http://prettyprint.me</link>
	<description>by Ran Tavory</description>
	<lastBuildDate>Fri, 06 Aug 2010 11:39:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Hector API v2</title>
		<link>http://prettyprint.me/2010/08/06/hector-api-v2/</link>
		<comments>http://prettyprint.me/2010/08/06/hector-api-v2/#comments</comments>
		<pubDate>Fri, 06 Aug 2010 11:25:22 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=322</guid>
		<description><![CDATA[Hector is a high level java client for the cassandra database. It was first released some six months ago and was coined  as the de-facto java client for cassandra. There is a large community of users and companies who run their production high scale systems based on hector. The main benefits hector adds to the [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F08%2F06%2Fhector-api-v2%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F08%2F06%2Fhector-api-v2%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/08/Hector.jpg"><img class="alignright size-thumbnail wp-image-323" title="Hector" src="http://prettyprint.me/wp-content/uploads/2010/08/Hector-150x150.jpg" alt="" width="150" height="150" /></a><a href="http://github.com/rantav/hector">Hector</a> is a high level java client for the <a href="http://cassandra.apache.org/">cassandra database</a>. It was <a href="http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/">first released some six months ago</a> and was coined  as the de-facto java client for cassandra.</p>
<p>There is a large <a href="http://groups.google.com/group/hector-users">community</a> of users and companies who run their production high scale systems based on hector.</p>
<p>The main benefits hector adds to the existing thrift based interface are:</p>
<ul>
<li>connection pooling</li>
<li><a href="http://prettyprint.me/2010/04/03/jmx-in-hector/">extensive jmx support</a></li>
<li>failover</li>
<li><a href="http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/">simple load balancing</a></li>
<li>high level java based API</li>
</ul>
<p>Since the time hector was first written, first by me, then with the good help of other community members (of note is Nathan McCall) it&#8217;s gained popularity even in the face of &#8220;competition&#8221; so to speak, in an open source manner.</p>
<p>However, one thing that I&#8217;ve always felt we can improve is the API. When writing the first version of hector the premise was that users are comfortable with the current level of the thrift API so hector should maintain an API similar in spirit. Hector may make things more type safe (such as when replacing all the ColumnOrSuperColumn types with specific Column or a SuperColumn typed methods) but in general I figured I should introduce as little new concepts as possible so that new users who are already familiar with the thrift way can easily learn the new library.</p>
<p>I was wrong.</p>
<p>As it turns out, users don&#8217;t learn the thrift API and then go use hector. Most users tend to just skip the thrift API and start with hector. Fait enough. But then I&#8217;m asked why did I make such a funny API&#8230; They are right, users of hector should not suffer from the limitations of the thrift API.</p>
<p>Add to that the complexity of dealing with failover, which clients need not care about at the API level (and in the v1 API they did) and some complex anonymous classes and the <a href="http://en.wikipedia.org/wiki/Command_pattern">Command pattern</a> users need to understand (if only we could have closures in java&#8230;) then we get a less than ideal API.</p>
<p>So the conclusion was clear: <strong>an API v2 was needed</strong>.</p>
<p>The new API uses the same proven hector implementation underneath but exposes a cleaner interface to the users. It also makes meta operations such as cluster management explicit.</p>
<p>We&#8217;re all coders, so enough talkin, let&#8217;s see some code&#8230;</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// Create a cluster</span>
Cluster c <span style="color: #339933;">=</span> HFactory.<span style="color: #006633;">getOrCreateCluster</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;MyCluster&quot;</span>, <span style="color: #0000ff;">&quot;cassandra1:9160&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// Choose a keyspace</span>
KeyspaceOperator ko <span style="color: #339933;">=</span> HFactory.<span style="color: #006633;">createKeyspaceOperator</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Keyspace1&quot;</span>, c<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// create an string extractor. I'll explain that later</span>
StringExtractor se <span style="color: #339933;">=</span> StringExtractor.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// insert value</span>
Mutator m <span style="color: #339933;">=</span> HFactory.<span style="color: #006633;">createMutator</span><span style="color: #009900;">&#40;</span>keyspaceOperator<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
m.<span style="color: #006633;">insert</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;key1&quot;</span>, <span style="color: #0000ff;">&quot;ColumnFamily1&quot;</span>, createColumn<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;column1&quot;</span>, <span style="color: #0000ff;">&quot;value1&quot;</span>, se, se<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #666666; font-style: italic;">// Now read a value</span>
<span style="color: #666666; font-style: italic;">// Create a query</span>
ColumnQuery<span style="color: #339933;">&lt;</span>String, String<span style="color: #339933;">&gt;</span> q <span style="color: #339933;">=</span> HFactory.<span style="color: #006633;">createColumnQuery</span><span style="color: #009900;">&#40;</span>keyspaceOperator, se, se<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// set key, name, cf and execute</span>
Result<span style="color: #339933;">&lt;</span>HColumn<span style="color: #339933;">&lt;</span>String, String<span style="color: #339933;">&gt;&gt;</span> r <span style="color: #339933;">=</span> q.<span style="color: #006633;">setKey</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;key1&quot;</span><span style="color: #009900;">&#41;</span>.
        <span style="color: #006633;">setName</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;column1&quot;</span><span style="color: #009900;">&#41;</span>.
        <span style="color: #006633;">setColumnFamily</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ColumnFamily1&quot;</span><span style="color: #009900;">&#41;</span>.
        <span style="color: #006633;">execute</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #666666; font-style: italic;">// read value from the result</span>
HColumn<span style="color: #339933;">&lt;</span>String, String<span style="color: #339933;">&gt;</span> c <span style="color: #339933;">=</span> r.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #003399;">String</span> value <span style="color: #339933;">=</span>  c.<span style="color: #006633;">getValue</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>value<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p><strong>Clusters</strong> are identified by their name, in this case <em>MyCluster.</em> A call to getOrCreateCluster(&#8220;MyCluster&#8221;, hostport) would create a new cluster if it doesn&#8217;t exist, but return a previously created one if it does. The Cluster object represents the client&#8217;s view of the cassandra cluster. A program may hold several Cluster instances, although typically one is sufficient.</p>
<p>A <strong>KeyspaceOperator</strong> is the object used to make operations (reads, writes) on specific keyspaces. A program can create many of those.</p>
<p>The <strong>StringExtractor</strong> is interesting&#8230; <strong>Hector provides type safety</strong> for column names and column values. Recall that in cassandra column names are byte[] and column values are byte[] too. So usually the programmer is required to translate those bytes back and forth to actual java objects. When designing the API we wanted to simplify this work and provide a type-safe API. Note that in the next lines we use HColumn&lt;String, String&gt; which is a column who&#8217;s name is of type String and value is also of type String as well as ColumnQuery&lt;String, String&gt; which is a query that returns columns with String names and String values. We could also have chosen to have columns with String names but Long values HColumn&lt;String,Long&gt; where it makes sense for the application.</p>
<p>To provide this type safety hector defines an Extractor&lt;T&gt; interface.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * Extracts a type T from the given bytes, or vice a versa.
 *
 * In cassandra column names and column values (and starting with 0.7.0 row keys) are all byte[].
 * To allow type safe conversion in java and keep all conversion code in one place we define the
 * Extractor interface.
 * Implementors of the interface define type conversion according to their domains. A predefined
 * set of common extractors can be found in the extractors package, for example
 * {@link StringExtractor}.
 *
 * @author Ran Tavory
 *
 * @param &lt;T&gt;  The type to which data extraction should work.
 */</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">interface</span> Extractor<span style="color: #339933;">&lt;</span>T<span style="color: #339933;">&gt;</span> <span style="color: #009900;">&#123;</span>
&nbsp;
  <span style="color: #008000; font-style: italic; font-weight: bold;">/**
   * Extract bytes from the obj of type T
   * @param obj
   * @return
   */</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">byte</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> toBytes<span style="color: #009900;">&#40;</span>T obj<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
  <span style="color: #008000; font-style: italic; font-weight: bold;">/**
   * Extract an object of type T from the bytes.
   * @param bytes
   * @return
   */</span>
  <span style="color: #000000; font-weight: bold;">public</span> T fromBytes<span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">byte</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> bytes<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Hector provides a set of default and commonly used implementations of extractors, such as <em>StringExtractor</em> and <em>LongExtractor</em> (see <em>package me.prettyprint.cassandra.extractors</em>). Users of Hector are expected to implement their own application-specific extractors. The interface is pretty straight forward and simple, you only need to implement two methods which convert your type to/from byte[]. Extractors are <a href="http://en.wikipedia.org/wiki/Purely_functional">purely functional</a> which means that they don&#8217;t have side effects and have no state. Using extractors, the API adds simplicity, <a href="http://en.wikipedia.org/wiki/Separation_of_concerns">separation of concerns</a> and type safety.</p>
<p>Next we create a mutator and insert a value:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">Mutator m <span style="color: #339933;">=</span> HFactory.<span style="color: #006633;">createMutator</span><span style="color: #009900;">&#40;</span>keyspaceOperator<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
m.<span style="color: #006633;">insert</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;key1&quot;</span>, <span style="color: #0000ff;">&quot;ColumnFamily1&quot;</span>,
    HFactory.<span style="color: #006633;">createColumn</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;column1&quot;</span>, <span style="color: #0000ff;">&quot;value1&quot;</span>, se, se<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>The class HFactory (&#8220;Hector Factory&#8221;) has a set of many useful static factory methods. I usually just <em>import static me.prettyprint.cassandra.model.HFactory.*</em> and get all it&#8217;s public method as short method names, so the previous line can be written as:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">Mutator m <span style="color: #339933;">=</span> createMutator<span style="color: #009900;">&#40;</span>keyspaceOperator<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
m.<span style="color: #006633;">insert</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;key1&quot;</span>, <span style="color: #0000ff;">&quot;ColumnFamily1&quot;</span>, createColumn<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;column1&quot;</span>, <span style="color: #0000ff;">&quot;value1&quot;</span>, se, se<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Note again the use of the StringExtractor (se) when creating a column. The column gets a String name and a String value so it needs a <em>StringExtractor</em> to assist it in serializing and desiaralizing the strings to byte[]. As a matter of fact, we&#8217;ve noticed that it&#8217;s so common for use to use columns of type HColumn&lt;String,String&gt; that we decided we add a utility factory method: <em>createStringColumn(name, value)</em> which is a bit shorter than <em>createColumn(name, value, nameExtractor,  valueExtractor). </em>You may create your convenience factories as well and are welcome to contribute them back to hector.</p>
<p>Next to reading the value. We&#8217;d like to read a simple column value given by its key (key1), and column name (column1).</p>
<p>We create a ColumnQuery&lt;N,V&gt; where N is the type of the column name and V is the type of the column value (in this case, again it&#8217;s String, String)</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">ColumnQuery<span style="color: #339933;">&lt;</span>String, String<span style="color: #339933;">&gt;</span> q <span style="color: #339933;">=</span> createColumnQuery<span style="color: #009900;">&#40;</span>keyspaceOperator, se, se<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Next we set the query attributes &#8211; key, column and column family, and execute it.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">q.<span style="color: #006633;">setKey</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;key1&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
setName<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;column1&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
setColumnFamily<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ColumnFamily1&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
Result<span style="color: #339933;">&lt;</span>HColumn<span style="color: #339933;">&lt;</span>String,String<span style="color: #339933;">&gt;&gt;</span> r <span style="color: #339933;">=</span> execute<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>The 4 lines above can also be written shortly as a one liner due to method chaining.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">Result<span style="color: #339933;">&lt;</span>HColumn<span style="color: #339933;">&lt;</span>String,String<span style="color: #339933;">&gt;&gt;</span> r <span style="color: #339933;">=</span> q.<span style="color: #006633;">setKey</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;key1&quot;</span><span style="color: #009900;">&#41;</span>.
        <span style="color: #006633;">setName</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;column1&quot;</span><span style="color: #009900;">&#41;</span>.
        <span style="color: #006633;">setColumnFamily</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;ColumnFamily1&quot;</span><span style="color: #009900;">&#41;</span>.
        <span style="color: #006633;">execute</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>What we see here is another feature of the API called <a href="http://en.wikipedia.org/wiki/Method_chaining">method chaining</a>. By convention all setters return a pointer to <em>this</em> so that it&#8217;s easy to <em>setX(x).setY(y).setZ(z)</em>.<br />
You can even call <em>execute()</em> on the same line, e.g. <em>Result&lt;T&gt; r = q.setX(x).setY(y).execute()</em>.</p>
<p><strong>execute()</strong> returns a typed <strong>Result&lt;T&gt;</strong> object. Note again the type safety. In this case we have Result&lt;HColumn&lt;String, String&gt;&gt; since the query is of type ColumnQuery&lt;String, String&gt;. In general we have ColumnQuery&lt;N, V&gt;.execute() =&gt; Result&lt;HColumn&lt;N, V&gt;&gt;</p>
<p>So far we&#8217;ve looked at the ColumnQuery&lt;N,V&gt; but as a matter of fact there an many other types of Queries, in this example we query a simple column but the API defines all types of query use cases allowed by the thrift API, all implement the Query&lt;T&gt; interface:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * The Query interface defines the common parts of all hector queries, such as {@link ColumnQuery}.
 *
 * The common usage pattern is to create a query, set the required query attributes and invoke
 * {@link Query#execute()} such as in the following example:
 *
 * Note that all query mutators, such as setName or setColumnFamily always return the Query object
 * so it's easy to write strings such as &lt;code&gt;q.setKey(x).setName(y).setColumnFamily(z).execute();&lt;/code&gt;
 *
 * @author Ran Tavory
 *
 * @param &lt;T&gt; Result type. For example HColumn or HSuperColumn
 */</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">interface</span> Query<span style="color: #339933;">&lt;</span>T<span style="color: #339933;">&gt;</span> <span style="color: #009900;">&#123;</span>
  <span style="color: #339933;">&lt;</span>Q <span style="color: #000000; font-weight: bold;">extends</span> Query<span style="color: #339933;">&lt;</span>T<span style="color: #339933;">&gt;&gt;</span> Q setColumnFamily<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> cf<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  Result<span style="color: #339933;">&lt;</span>T<span style="color: #339933;">&gt;</span> execute<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>There are <em>ColumnQuery&lt;N,V&gt;</em>, <em>SuperColumnQuery&lt;SN,N,V&gt;</em>, <em>SliceQuery&lt;N,V&gt;</em>, <em>SuperSliceQuery&lt;SN,N,V&gt;</em>, <em>SubSliceQuery&lt;SN,N,V&gt;</em>, <em>RangeSlicesQuery&lt;N,V&gt;</em> and more&#8230; All query objects are type safe, so they return the only type the should return and the compiler keeps you safe.</p>
<p>To read the value off of a result object just call Result.get(). Here again we provide type safety so in this case the result is of type HColumn&lt;String,String&gt;</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// read value from the result</span>
HColumn<span style="color: #339933;">&lt;</span>String, String<span style="color: #339933;">&gt;</span> c <span style="color: #339933;">=</span> r.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>A <a href="http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/model/Result.java">Result</a> , apart from holding the actual value also has some nice metadata, such as <a href="http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/model/ExecutionResult.java#L29">getExecutionTimeMicro()</a> and <a href="http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/model/Result.java#L21">getQuery()</a>. We plan to add more to that.</p>
<p>Lastly we print out the string.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #003399;">String</span> value <span style="color: #339933;">=</span>  c.<span style="color: #006633;">getValue</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>value<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>In this case value is of type String. If the column would have been defined as follows: HColumn&lt;String, Long&gt; then we&#8217;d have</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #003399;">Long</span> value <span style="color: #339933;">=</span> c.<span style="color: #006633;">getValue</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.</pre></div></div>

<p>The new API has other nice additions such as an improved exception hierarchy (all exceptions extend HectorException which is a RuntimeException and there&#8217;s a translation b/w the thrift exception and the hector ones, see <a href="http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/ExceptionsTranslatorImpl.java">ExceptionTranslator</a>), no dependency on thrift and more.</p>
<p>It&#8217;s important to note that the new API does not rely on thrift anymore, so users who want to use avro as their transport are able to do it without changing their implementation (after cassandra really supports avro, and we add that to hector).</p>
<p>Support for cassandra 0.7.0 is underway, and with the new API should be relatively easy to add.</p>
<p>An extensive list of tests of the entire query/mutate API is available at <a href="http://github.com/rantav/hector/blob/master/src/test/java/me/prettyprint/cassandra/model/ApiV2SystemTest.java">ApiV2SystemTest</a></p>
<p>To sum up, here&#8217;s a short list of new concepts introcuded by the v2 API and its main benefits</p>
<ul>
<li>All previous functionality provided by hector remains. You still get connection pooling, JMX etc. The old API is still in place, untouched (except for some small exception hierarchy refinements) so if you have existing code already working with hector it won&#8217;t break. We do plan to phase out the older API just so we have only one concise API , but as of now it&#8217;s left untouched.</li>
<li>Clear and simple Mutator API calls. Mutator.insert(), Mutator.delete(), and for batch operations: Mutator.addInsertion().addInsertion().addDeletion().execute()</li>
<li> Extensive, yet very simple query support. The API supports all types of queries supported by cassandra as a simple and type-safe java API. You can query for columns, super-columns, sub-columns (subcolumns of supercolumn), ranges, slices, multigets etc.</li>
<li>Simple and concise exception hierarchy based on HectorException which extends a RuntimeException, so you don&#8217;t need to get your code dirty with try-catch where you don&#8217;t necessarily want to. You can still of course handle all exception types, the information is not lost, but code is much much cleaner when you don&#8217;t care to.</li>
<li>No dependency on thrift (or on avro). The v2 API is completely independent of the wire protocol. When avro is finally implemented by cassandra all you have to do is tell hector whether you want to use thrift/avro and that&#8217;s all, no other code changes. Hector provides its own (type safe) objects such as HColumn&lt;N,V&gt; or HSuperColumn&lt;SN,N,V&gt;</li>
<li>Type safety and separation of concerns. You implement (or reuse) a typed Extractor&lt;T&gt; and need not care about those byte[]s anymore.</li>
</ul>
<p>Lastly, we marked the new API as &#8220;beta&#8221; not because it&#8217;s not ready, but purely because we want to get your feedback. We&#8217;d like to leave it as beta for a few more weeks to get the developers feedback and if everyone&#8217;s happy release it as final, so do feel free to <a href="http://groups.google.com/group/hector-users">let us know</a> how you feel about it.<br />
The API is available on the <a href="http://github.com/rantav/hector/downloads">downloads page</a> and is marked as 0.6.0-15.</p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/08/06/hector-api-v2/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Understanding Cassandra Code Base</title>
		<link>http://prettyprint.me/2010/05/02/understanding-cassandra-code-base/</link>
		<comments>http://prettyprint.me/2010/05/02/understanding-cassandra-code-base/#comments</comments>
		<pubDate>Sun, 02 May 2010 14:20:10 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=301</guid>
		<description><![CDATA[Lately I&#8217;ve been adding some random small features to cassandra so I took the time to have a closer look at the internal design of the system. While with some features added, such as an embedded service, I could have certainly get away without good understanding of the codebase and design, others, such as the [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F05%2F02%2Funderstanding-cassandra-code-base%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F05%2F02%2Funderstanding-cassandra-code-base%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p><img class="alignleft size-medium wp-image-311" title="Reading Code" src="http://prettyprint.me/wp-content/uploads/2010/04/Software_Test_Web-300x224.jpg" alt="" width="210" height="157" />Lately I&#8217;ve been adding some random small features to <a href="http://cassandra.apache.org/">cassandra</a> so I took the time to have a closer look at the internal design of the system.<br />
While with some features added, such as an <a href="http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/">embedded service</a>, I could have certainly get away without good understanding of the codebase and design, others, such as the <a href="https://issues.apache.org/jira/browse/CASSANDRA-531">truncate</a> feature require good understanding of the various algorithms used, such as how writes are performed, how reads are performed, how values are deleted (hint: they are not&#8230;) etc.</p>
<p>The codebase, although isn&#8217;t very large, about 91136 lines, is quite dense and packed with algorithmic sauce, so simply reading through it just didn&#8217;t cut it for me. (I used the following kong-fu to count: <code>$ cassandra/trunk $ find * -name *.java -type f -exec cat {} \;|wc -l</code>)</p>
<p>I&#8217;m writing this post in hope it&#8217;d help others get up to speed. I&#8217;m not going to cover the basics, such as what is cassandra, how to deploy, how to checkout code, how to build, how to download thrift etc. I&#8217;m also not going to cover the real algorithmic complicated parts, such as how merkle trees are used by the ae-service, how bloom filters are used in different parts of cassandra (and what are they), how gossip is used etc. I don&#8217;t think I&#8217;m the right person to explain all this, plus there are already bits of those in the cassandra <a href="http://wiki.apache.org/cassandra/ArchitectureInternals">developer wiki</a>. What I am going to write about is what was the path that I took in order to learn cassandra and what I&#8217;ve learned along the way. I haven&#8217;t found all that stuff documented somewhere else (perhaps I&#8217;ll contribute it back to the wiki when I&#8217;m done) so I think I&#8217;d be very helpful to have it next time I dive into a new codebase.</p>
<p>Lastly, a disclaimer: The views expressed here are simply my personal understanding of how the system works, they are both incomplete and inaccurate, so be warned. Keep in mind that I&#8217;m only learning and still sort of new to cassandra. Please also keep in mind that cassandra is a moving target and keeps changing so rapidly that any given snapshot of the code will get irrelevant sooner or later. By the time of writing this the currently official version is 0.6.1 but I&#8217;m working on trunk towards 0.7.0.</p>
<p>Here&#8217;s a description of the steps I took and things I learned.</p>
<h3>Download, configure, run&#8230;</h3>
<p>First you need to download the code and run unit tests. If you use eclipse, idea, netbeans, vi, emacs and what not, you want to configure it. That was easy. There&#8217;s more <a href="http://wiki.apache.org/cassandra/HowToContribute">here</a>.</p>
<h3>Reading</h3>
<p>Next you want to read some of the background material, depending on what part exactly you want to work on. I wanted to understand the read path, write path and how values are deleted, so I read the following documents about 5 times each. Yes, 5 times. Each. They are packed with information and I found myself absorbing a few more details each time I read. I used to read the document, get back to the source code, make sure I understand how the algorithm maps to the methods and classes, reread the document, reread the source code, read the unit tests (and run them, with a debugger) etc. Here are the docs.</p>
<p><a href="http://wiki.apache.org/cassandra/ArchitectureInternals">http://wiki.apache.org/cassandra/ArchitectureInternals</a></p>
<p><a href="http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf">SEDA paper</a></p>
<p><a href="http://wiki.apache.org/cassandra/HintedHandoff">http://wiki.apache.org/cassandra/HintedHandoff</a></p>
<p><a href="http://wiki.apache.org/cassandra/ArchitectureAntiEntropy">http://wiki.apache.org/cassandra/ArchitectureAntiEntropy</a></p>
<p><a href="http://wiki.apache.org/cassandra/ArchitectureSSTable">http://wiki.apache.org/cassandra/ArchitectureSSTable</a></p>
<p><a href="http://wiki.apache.org/cassandra/ArchitectureCommitLog">http://wiki.apache.org/cassandra/ArchitectureCommitLog</a></p>
<p><a href="http://wiki.apache.org/cassandra/DistributedDeletes">http://wiki.apache.org/cassandra/DistributedDeletes</a></p>
<p>I also read the google <a href="http://labs.google.com/papers/bigtable.html">BigTable paper</a> and the fascinating Amazon&#8217;s <a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html">Dynamo paper</a>, but that was a long time ago. They are good as background material, but not required to understand actual bits of code.</p>
<p>Well, after having read all this I was starting to get a clue what can be done and how but I still didn&#8217;t feel I&#8217;m at the level of really coding new features. After reading through the code a few times I realized I&#8217;m kind of stuck and still don&#8217;t understand things like &#8220;how do values really get deleted&#8221;, which class is responsible for which functionality, what stages are there and how is data flowing between stages, or &#8220;how can I mark and entire column family as deleted&#8221;, which is what I really wanted to do with the truncate operation.</p>
<h3>Stages</h3>
<p>Cassandra operates in a concurrency model described by the SEDA paper. This basically means that, unlike many other concurrent systems, an operation, say a write operation, does not start and end by the same thread. Instead, an operation starts at one thread, which then passes it to another thread (asynchronously), which then passes it to another thread etc, until it ends. As a matter of fact, the operation doesn&#8217;t exactly flow b/w threads, it actually flows b/w <strong>stages</strong>. It moves from one stage to another. Each stage is associated with a thread pool and this thread pool executes the operation when it&#8217;s convenient to it. Some operations are IO bound, some are disk or network bound, so &#8220;convenience&#8221; is determined by resource availability. The SEDA paper explains this process very well (good read, worth your time), but basically what you gain by that is higher level of concurrently and better resource management, resource being CPU, disk, network etc.</p>
<p>So, to understand data flow in cassandra you first need to understand SEDA. Then you need to know which stages exist in cassandra and exactly does the data flow b/w them.</p>
<p>Fortunately, to get you started, a partial list of stages is present at the <a href="https://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/concurrent/StageManager.java">StageManager</a> class:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span> READ_STAGE <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;ROW-READ-STAGE&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span> MUTATION_STAGE <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;ROW-MUTATION-STAGE&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span> STREAM_STAGE <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;STREAM-STAGE&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span> GOSSIP_STAGE <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;GS&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #003399;">String</span> RESPONSE_STAGE <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;RESPONSE-STAGE&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #003399;">String</span> AE_SERVICE_STAGE <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;AE-SERVICE-STAGE&quot;</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #003399;">String</span> LOADBALANCE_STAGE <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;LOAD-BALANCER-STAGE&quot;</span><span style="color: #339933;">;</span></pre></div></div>

<p>I won&#8217;t go into detail about what each and every stage is responsible for (b/c I don&#8217;t know&#8230;) but I can say that, in short, we have the ROW-READ-STAGE which takes part in the read operation, the ROW-MUTATION-STAGE which takes part in the write and delete operations, the AE-SERVICE-STAGE which is responsible for anti-entropy. This is not a comprehensive list of stages, depending on the code path you&#8217;re interested in, you may find more along the way. For example, browsing the file <a href="https://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java">ColumnFamilyStore</a> you&#8217;ll find some more stages, such as FLUSH-SORTER-POOL, FLUSH-WRITER-POOL and MEMTABLE-POST-FLUSHER. In Cassandra stages are identified by instances of the ExecutorService, which is more or less a thread pool and they all have all-caps names, such as MEMTABLE-POST-FLUSHER.</p>
<p>To visualize that I created a diagram that mixes both classes and stages. This isn&#8217;t valid UML, but I think it&#8217;s a good way to look at how data flows in the system. This is not a comprehensive diagram of all classes and all stages, just the ones that were interesting to me.</p>
<p><img class="alignnone" title="Classes and stages" src="http://yuml.me/44bdc092" alt="" /><br />
<a href="http://yuml.me/diagram/plain;dir:LR;/class/edit/[note:Classes and STAGES], [ColumnFamilyStore| flushSorter_:FLUSH-SORTER-POOL; flushWriter_:FLUSH-WRITER-POOL; commitLogUpdater_:MEMTABLE-POST-FLUSHER], [ColumnFamilyStore]-&gt;[SSTableTracker], [ColumnFamilyStore]-&gt;[Memtable (memtable_)], [CommitLog|CommitLogExecutor], [DeletionService|FILEUTILS-DELETE-POOL], [StorageLoadBalancer| lb_:LB-OPERATIONS; lbOperations_:LB-TARGET], [StorageService| consistencyManager_:CONSISTENCY-MANAGER], [StageManager| READ_STAGE; MUTATION_STAGE; STREAM_STAGE; GOSSIP_STAGE; RESPONSE_STAGE; AE_SERVICE_STAGE; LOADBALANCE_STAGE; MIGRATION_STAGE]">yUML source</a></p>
<h3>Debugging</h3>
<p>Reading through the code using a debugger, while running a unit-test is an awesome way to get things into your head. I&#8217;m <a href="http://stackoverflow.com/questions/602138/is-a-debugger-the-mother-of-all-evil">not a huge fan of debuggers</a>, but one thing they are good at is learning a new codebase by singlestepping into unit tests. So what I did was to run the unit-tests while single stepping into the code. That was awesome. I also ran the unit tests for Hector, which uses the thrift interface and spawn an embedded cassandra server so they were right to the point, user friendly and eye opening.</p>
<h3>Class Diagrams</h3>
<p>Next thing I did is use a tool to extract class diagrams from the existing codebase. That was not a great use of my time.</p>
<p>Well, the tool I used wasn&#8217;t great, but that&#8217;s not the point. The point is that cassandra&#8217;s codebase is written in such way that class diagrams help very little in understanding it. UML class diagrams are great for object oriented design. The essence of them is the list of classes, class members and their relationships. For example if a class A has a list of Bs, so you can draw that in a UML class diagram such that A is an aggregation of Bs and just by looking at the diagram you learn a lot. For example, an Airplane has a list of Passengers.</p>
<p>Cassandra is a complex system with solid algorithmic background and excellent performance, but, to be honest, IMO from the sole perspective of good oo practice, it isn&#8217;t a good case study&#8230; Its classes contain many static methods and members and in many cases you&#8217;d see one class calling other static method of another class, C style, therefore I found that class diagrams, although they are somewhat helpful at getting a visual sense of what classes exist and learn roughly manner about their relationships, are not so helpful.</p>
<p>I ditched the class diagrams and continued to the next diagram &#8211; sequence diagrams.</p>
<h3>Sequence Diagrams</h3>
<p>Sequence diagrams are great at abstracting and visualizing interactions b/w entities. In my case an entity may either be a class, or a STAGE, or a thrift client. Luckily with sequence diagrams you don&#8217;t have to be too specific and formal about the kind of entities are used in it, you just represent them all as happy actors (at least, I allow myself to do that, I hope the gods of UML will forgive).</p>
<p>The following diagrams were produced by running <a href="http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/">Hector</a>&#8216;s unit tests and using an embedded cassandra server (single node). The diagrams aren&#8217;t generic, they describe only <strong>one possible code path</strong> while there could be many, but I preferred keeping them as simple as possible even in the cost of small inaccuracies.</p>
<p>I used a simple online sequence diagram editor at <a href="http://www.websequencediagrams.com">http://www.websequencediagrams.com</a> to generate them.</p>
<p>Read Path</p>
<div class="wsd">
<pre>note left of CassandraServer: Read Path

CassandraServer -&gt; StorageProxy: readProtocol
StorageProxy -&gt; weakReadLocal: READ-STAGE.call

weakReadLocal -&gt; SliceByNamesReadCommand: getRow
SliceByNamesReadCommand -&gt; Table: getRow
Table -&gt; ColumnFamilyStore: getColumnFamily
ColumnFamilyStore -&gt; QueryFilter: collectCollatedColumns
QueryFilter -&gt; ColumnFamilyStore:
ColumnFamilyStore -&gt; ColumnFamilyStore: removeDeleted
ColumnFamilyStore -&gt; Table:
Table -&gt; SliceByNamesReadCommand:
SliceByNamesReadCommand -&gt; weakReadLocal:

weakReadLocal -&gt; StorageProxy:
StorageProxy -&gt; CassandraServer:</pre>
</div>
<p><script src="http://www.websequencediagrams.com/service.js" type="text/javascript"></script></p>
<p>Write Path</p>
<div class="wsd">
<pre>note left of CassandraServer: Write Path
CassandraServer -&gt; StorageProxy: mutateBlocking

note over StorageProxy: async
StorageProxy --&gt; StorageProxy: MUTATION-STAGE call
StorageProxy -&gt; RowMutation: run

RowMutation -&gt; Table: apply

note over Table, CommitLog: async
Table --&gt; CommitLog: COMMIT-LOG-WRITER add
CommitLog -&gt; CommitLogSegment: write
CommitLogSegment -&gt; CommitLog: 

Table -&gt; ColumnFamilyStore: apply
ColumnFamilyStore -&gt; Memtable: put
Memtable -&gt; Memtable: resolve
Memtable -&gt; ColumnFamilyStore:
ColumnFamilyStore -&gt; Table:
Table -&gt; RowMutation:
RowMutation -&gt; StorageProxy:
StorageProxy --&gt; StorageProxy: signal
StorageProxy -&gt; CassandraServer:</pre>
</div>
<p><script src="http://www.websequencediagrams.com/service.js" type="text/javascript"></script></p>
<h3>Table is a Keyspace</h3>
<p>One final note: As user of cassandra I use the terms Keyspace, ColumnFamily, Column etc. However, the codebase is packed with the term Table. What are Tables?&#8230; As it turns out, a <a href="https://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/db/Table.java">Table</a> is actually a Keyspace&#8230; just keep this in mind, that&#8217;s all.</p>
<p>Learning the codebase was a large and satisfying task, I hope this writing helps you get up and running as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/05/02/understanding-cassandra-code-base/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>JMX in Hector</title>
		<link>http://prettyprint.me/2010/04/03/jmx-in-hector/</link>
		<comments>http://prettyprint.me/2010/04/03/jmx-in-hector/#comments</comments>
		<pubDate>Sat, 03 Apr 2010 20:58:09 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=285</guid>
		<description><![CDATA[Hector is a Java client for Cassandra I&#8217;ve implemented and have written about before (here and here). What I haven&#8217;t written about is its extensive JMX support, which makes it really unique, among other properties such as failover and really simple load balancing. JMX support in hector isn&#8217;t really new, but it&#8217;s the first time [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F04%2F03%2Fjmx-in-hector%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F04%2F03%2Fjmx-in-hector%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://github.com/rantav/hector">Hector</a> is a Java client for Cassandra I&#8217;ve implemented and have written about before (<a href="http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/comment-page-1/">here</a> and <a href="http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/">here</a>).</p>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/04/JMX+in+Action.jpg-320×4001.png"><img class="alignleft size-full wp-image-288" title="JMX" src="http://prettyprint.me/wp-content/uploads/2010/04/JMX+in+Action.jpg-320×4001.png" alt="" width="282" height="195" /></a></p>
<p>What I haven&#8217;t written about is its extensive JMX support, which makes it really unique, among other properties such as failover and really simple load balancing. JMX support in hector isn&#8217;t really new, but it&#8217;s the first time I have the chance to write writing about it.</p>
<p>JMX is Java&#8217;s standard way for monitoring applications. The default thrift cassandra client provides no JMX support at all so I figured you have to be crazy to run a cassandra client at such a high scale without being able to monitor it.</p>
<p>Here&#8217;s the list of JMX attributes provided by hector</p>
<pre style="clear: both;"><strong>WriteFail</strong> - Number of failed write operations.
<strong>ReadFail</strong> - Number of failed read operations
<strong>RecoverableTimedOutCount</strong> - Number of recoverable TimedOut
  exceptions. Those exceptions may happen when certain nodes
  are under heavy load that they can't provide the service
<strong>RecoverableUnavailableCount</strong> - Number of recoverable
  Unavailable exceptions
<strong>RecoverableTransportExceptionCount</strong> - Number of recoverable
  Transport exceptions
<strong>RecoverableErrorCount</strong> - Total number of recoverable errors.
<strong>SkipHostSuccess</strong> - Number of times that a successful skip-host
  (failover) has occurred.
<strong>NumPoolExhaustedEventCount</strong> - Number of times threads have
  encountered the pool-exhausted state (and were blocked)
<strong>NumPools</strong> - Number of connections pools.
  This is also the number of unique hosts in the
   ring that this client has communicated with.
  The number may be one or more, depending on the load balance
  policy and failover attempts.
<strong>PoolNames</strong> - The list of known pools
<strong>NumIdleConnections</strong> - Number of currently idle connections
  (in all pools)
<strong>NumActive</strong> - number of currently active connections (all pools)
<strong>NumExhaustedPools</strong> - Number of currently exhausted
  connection pools.
<strong>RecoverableLoadBalancedConnectErrors</strong> - Number of recoverable
  load-balance connection errors.
<strong>ExhaustedPoolNames</strong> - The list of exhausted connection pools.
<strong>NumBlockedThreads</strong> - Number of currently blocked threads.
<strong>NumConnectionErrors</strong> - Number of connection errors
  (initial connection to the ring for retrieving metadata)
<strong>KnownHosts</strong> - the list of known hosts in the ring.
  This list will be used by the client in case failover is required.
<strong>updateKnownHosts</strong> - This is an operation that may be invoked
   by an admin to tell the client to update its list of known hosts.
  Usually this is done after the ring configuration has changed.</pre>
<p>Performance Counters: (I used the mechanics of <a href="http://perf4j.codehaus.org/">perf4j</a> to implement those)</p>
<pre style="clear: both;"><strong>READ.success_TPS</strong> - Total Read Transactions Per Second
  (measured as the average over the last 10 seconds).
<strong>READ.success_Mean</strong> - The Mean time of successful read requests
  over the last 10 seconds.
<strong>READ.success_Min</strong> - Time in millisec of the fastest successful
  read operation (over the last 10 seconds)
<strong>READ.success_Max</strong> - Time in millisec of the slowest read
  (over the last 10 seconds)
<strong>READ.success_StdDev</strong> - Standard deviation of time of successful read
  operations (over the last 10 seconds)
<strong>WRITE.success_TPS</strong> - Total write transactions per second over
  (over the last 10 seconds).
<strong>WRITE.success_Mean</strong> - ...
<strong>WRITE.success_Min
WRITE.success_Max
WRITE.success_StdDev
READ.fail_TPS
READ.fail_Mean
READ.fail_Min
READ.fail_Max
READ.fail_StdDev
WRITE.fail_TPS
WRITE.fail_Mean
WRITE.fail_Min
WRITE.fail_Max
WRITE.fail_StdDev
</strong></pre>
<p>This looks like this in jconsole (ignore the zeros, it&#8217;s not real data&#8230;)<br />
<a href="http://prettyprint.me/wp-content/uploads/2010/04/Java-Monitoring-Management-Console.png"><img class="aligncenter size-full wp-image-290" title="Java Monitoring &amp; Management Console" src="http://prettyprint.me/wp-content/uploads/2010/04/Java-Monitoring-Management-Console.png" alt="" width="829" height="650" /></a><a href="http://prettyprint.me/wp-content/uploads/2010/04/Java-Monitoring-Management-Console-1.png"><img class="aligncenter size-full wp-image-291" title="Java Monitoring &amp; Management Console-1" src="http://prettyprint.me/wp-content/uploads/2010/04/Java-Monitoring-Management-Console-1.png" alt="" width="834" height="650" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/04/03/jmx-in-hector/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Load balancing and improved failover in Hector</title>
		<link>http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/</link>
		<comments>http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/#comments</comments>
		<pubDate>Wed, 03 Mar 2010 06:34:20 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=278</guid>
		<description><![CDATA[I&#8217;ve added a very simple load balancing feature, as well as improved failover behavior to Hector. Hector is a Java Cassandra client, to read more about it please see my previous post Hector – a Java Cassandra client. In version 0.5.0-6 I added poor-man&#8217;s load balancing as well as improved failover behavior. The interface CassandraClientPool used to [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F03%2F03%2Fload-balancing-and-improved-failover-in-hector%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F03%2F03%2Fload-balancing-and-improved-failover-in-hector%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p><img class="size-thumbnail wp-image-279 alignleft" title="Balance the load" src="http://prettyprint.me/wp-content/uploads/2010/03/load_balance-150x150.jpg" alt="Balance the load, woman!" width="150" height="150" /></p>
<p>I&#8217;ve added a very simple load balancing feature, as well as improved failover behavior to <a href="http://github.com/rantav/hector">Hector</a>. Hector is a Java Cassandra client, to read more about it please see my previous post <a href="http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/comment-page-1/">Hector – a Java Cassandra client</a>.</p>
<p>In <a href="http://github.com/downloads/rantav/hector/hector-0.5.0-6.jar">version 0.5.0-6</a> I added poor-man&#8217;s load balancing as well as improved failover behavior.</p>
<p>The interface <a href="http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClientPool.java">CassandraClientPool</a> used to have this method for obtaining clients:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * Borrows a client from the pool defined by url:port
 * @param url
 * @param port
 * @return
 */</span>
CassandraClient borrowClient<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> url, <span style="color: #000066; font-weight: bold;">int</span> port<span style="color: #009900;">&#41;</span>
    <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IllegalStateException</span>, PoolExhaustedException, <span style="color: #003399;">Exception</span><span style="color: #339933;">;</span></pre></div></div>

<p>Now with the added LB and failover it has:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * Borrow a load-balanced client, a random client from the array of given client addresses.
 *
 * This method is typically used to allow load balancing b/w the list of given client URLs. The
 * method will return a random client from the array of the given url:port pairs.
 * The method will try connecting each host in the list and will only stop when there's one
 * successful connection, so in that sense it's also useful for failover.
 *
 * @param clientUrls An array of &quot;url:port&quot; cassandra client addresses.
 *
 * @return A randomly chosen client from the array of clientUrls.
 * @throws Exception
 */</span>
CassandraClient borrowClient<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> clientUrls<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span><span style="color: #339933;">;</span></pre></div></div>

<p>And usage looks like that:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">// Get a connection to any of the hosts cas1, ca2 or cas3</span>
CassandraClient client <span style="color: #339933;">=</span> pool.<span style="color: #006633;">borrowClient</span><span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> <span style="color: #009900;">&#123;</span><span style="color: #0000ff;">&quot;cas1:9160&quot;</span>, <span style="color: #0000ff;">&quot;cas2:9160&quot;</span>, <span style="color: #0000ff;">&quot;cas3:9160&quot;</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>So, when calling borrowClient(String[]) the method randomly chooses any of the clients in the array and connects to it. That&#8217;s what I call poor man&#8217;s load balancing, just plain dumb random, not real load balancing. By all means, true load balancing which takes into account performance measurements such as response time and throughput is infinitely better than the plain random selection I&#8217;m employing here and in my opinion should be left out for your ops folks to deal with and not to the program, however, if you only need a very simplistic approach of random selection, then this method may suite your needs.</p>
<p>A nice side effect of using this method is <strong>improved failover</strong>. In previous versions hector implemented failover, but in order to find out about the ring structure it had to connect to at least one host in the ring first and query it to learn about the rest. The result was that if a new connection is made and it&#8217;s so unfortunate that this new connections is made to unavailable host, then this new client cannot connect to the host to learn about other live hosts so it fails right away. With this new method which sends an array of hosts the client keeps connecting to hosts in the list in random order until it finds one that&#8217;s up. In the example above the client may choose to connect to cas2 first; if cas2 is down it&#8217;ll try to connect to (say) cas3 and if cas3 is also down it&#8217;ll try to connect to cas1; only if all three hosts are down will it give up and return an error. Failing to connect to hosts is considered an error, but a recoverable error, so it&#8217;s transparent to the client of hector but is reported to JMX and has its own special counter (RecoverableLoadBalancedConnectErrors).</p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/03/03/load-balancing-and-improved-failover-in-hector/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Hector &#8211; a Java Cassandra client</title>
		<link>http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/</link>
		<comments>http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/#comments</comments>
		<pubDate>Tue, 23 Feb 2010 14:04:09 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=263</guid>
		<description><![CDATA[UPDATE 3: Comments are closed now and for the sake of information reuse, please post all your hector questions to the mailing list hector-users@googlegroups.com and please subscribe to it as well. UPDATE: I added a downloads section, so you may simply download the jar and sources if you&#8217;re not into git or maven. UPDATE 2: [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F02%2F23%2Fhector-a-java-cassandra-client%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F02%2F23%2Fhector-a-java-cassandra-client%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>UPDATE 3: Comments are closed now and for the sake of information reuse, please post all your hector questions to the mailing list hector-users@googlegroups.com and please subscribe to it as well.</p>
<p>UPDATE: I added a <a href="http://github.com/rantav/hector/downloads">downloads</a> section, so you may simply download the jar and sources if you&#8217;re not into git or maven.</p>
<p>UPDATE 2: I added license clarification; the license it <a href="http://www.opensource.org/licenses/mit-license.php">MIT</a>, which is the most permissive license I know of and basically lets you do anything with the software: use it commercially or uncommercially, copy it, fork it (but I&#8217;ll be happy to accept patches and committers) and whatnot. I added a <a href="http://github.com/rantav/hector/blob/master/LICENSE">LICENSE file</a> and over time I&#8217;ll add that block of comment to every file.</p>
<p>In the <a href="http://en.wikipedia.org/wiki/Greek_mythology">Greek Mythology</a>, <a href="http://en.wikipedia.org/wiki/Hector">Hector</a> was the builder of <a href="http://en.wikipedia.org/wiki/Troy">Troy</a>, the greatest warrior ever and brother of <a href="http://en.wikipedia.org/wiki/Cassandra">Cassandra</a>.</p>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/02/Ajax_and_Hector_exchange_gifts1.jpg"><img class="size-medium wp-image-268 alignnone" title="Ajax_and_Hector_exchange_gifts" src="http://prettyprint.me/wp-content/uploads/2010/02/Ajax_and_Hector_exchange_gifts1-300x286.jpg" alt="" width="240" height="229" /></a></p>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/02/Ajax_and_Hector_exchange_gifts1.jpg"></a>Nowdays, <a href="http://incubator.apache.org/cassandra/">Cassandra</a> is a high scale database and <a href="http://github.com/rantav/hector">Hector</a> is the Java client I&#8217;ve written for it.</p>
<p>Over the last couple of days I got the the conclusion that the java client I&#8217;ve been using so far to speak to cassanrda wasn&#8217;t satisfactory. I used the one simply called <a href="http://code.google.com/p/cassandra-java-client/">cassandra-java-client</a>, which is a good start but had some shortcomings I could just not live with (no support for Cassandra v0.5, no JMX and no failover). So I&#8217;ve written my own.</p>
<p>For anyone not familiar with cassanra, it&#8217;s client API is just a simple <a href="http://incubator.apache.org/thrift/">thrift</a> client. This means that, unlike other datastore clients such as jdbc etc, the client provided has somewhat limiter functionality; It can sent messages to cassanra, write values and read values of course, but other client goodies required for large scale applications are not provided, features such as monitoring, connection pooling etc. The client I initially used provides connection pooling, which is a very nice start, but I decided it was missing too much so I&#8217;d write my own.</p>
<p>As a good open-source citizen, I initially contacted the authors of cassandra-java-client asking their permission to contribute and make the suggested improvements, but after weeks without reply I realized I&#8217;ll need to go solo. I started with the concepts captured by the folks who had built the java-client, but pretty soon the code has morphed to be something completely different.</p>
<p>Here&#8217;s how code that uses Hector looks like. This is an implementation of a simple distributed hashtable over cassandra. By the virtue of cassandra, this hashtable can grow pretty large:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">  <span style="color: #008000; font-style: italic; font-weight: bold;">/**
   * Insert a new value keyed by key
   * @param key Key for the value
   * @param value the String value to insert
   */</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> insert<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">final</span> <span style="color: #003399;">String</span> key, <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #003399;">String</span> value<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> <span style="color: #009900;">&#123;</span>
    execute<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Command<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
      <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">Void</span> execute<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">final</span> Keyspace ks<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> <span style="color: #009900;">&#123;</span>
        ks.<span style="color: #006633;">insert</span><span style="color: #009900;">&#40;</span>key, createColumnPath<span style="color: #009900;">&#40;</span>COLUMN_NAME<span style="color: #009900;">&#41;</span>, bytes<span style="color: #009900;">&#40;</span>value<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #008000; font-style: italic; font-weight: bold;">/**
   * Get a string value.
   * @return The string value; null if no value exists for the given key.
   */</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">String</span> get<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">final</span> <span style="color: #003399;">String</span> key<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000000; font-weight: bold;">return</span> execute<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Command<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
      <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">String</span> execute<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">final</span> Keyspace ks<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">try</span> <span style="color: #009900;">&#123;</span>
          <span style="color: #000000; font-weight: bold;">return</span> string<span style="color: #009900;">&#40;</span>ks.<span style="color: #006633;">getColumn</span><span style="color: #009900;">&#40;</span>key, createColumnPath<span style="color: #009900;">&#40;</span>COLUMN_NAME<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">getValue</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span> <span style="color: #000000; font-weight: bold;">catch</span> <span style="color: #009900;">&#40;</span>NotFoundException e<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
          <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
      <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #008000; font-style: italic; font-weight: bold;">/**
   * Delete a key from cassandra
   */</span>
  <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> delete<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">final</span> <span style="color: #003399;">String</span> key<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> <span style="color: #009900;">&#123;</span>
    execute<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> Command<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
      <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">Void</span> execute<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">final</span> Keyspace ks<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">Exception</span> <span style="color: #009900;">&#123;</span>
        ks.<span style="color: #006633;">remove</span><span style="color: #009900;">&#40;</span>key, createColumnPath<span style="color: #009900;">&#40;</span>COLUMN_NAME<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">return</span> <span style="color: #000066; font-weight: bold;">null</span><span style="color: #339933;">;</span>
      <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
  <span style="color: #009900;">&#125;</span></pre></div></div>

<p>Out of the box Cassanra provides a raw <a href="http://incubator.apache.org/thrift/">thrift</a> client, which is OK, but lacks many features essential to real world clients. I&#8217;ve built Hector to fill this gap.</p>
<p>Here are the high level features of <a href="http://github.com/rantav/hector">Hector</a>, currently hosted at github.</p>
<ul>
<li>A high-level object oriented interface to cassandra. As noted before, Cassandra&#8217;s out of the box client is a thrift client, which isn&#8217;t always that nice and clean to work with. I wanted to provide higher level and cleaner API. This part was mainly inspired by the mentioned cassandra-java-client. The API is defined in the <a href="http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/Keyspace.java">Keyspace</a> interface. See for example methods such as Keyspace.insert() and keyspace.getColumn()</li>
<li>Failover support. Cassandra is a distributed data store and it may handle very well one or several hosts going down. However, out of the box thrift provides no support for failing clients. What it the client is configured to connect a cassandra host that just happened to be down right now? In hector, if a client is connected to one host in the ring and this host goes down, the client will automatically and transparently search for other available hosts to perform its operation before giving up  and returning an error to its user. There are currently 3 ways to configure the failover policy: FAIL_FAST (no retry, just fail if there are errors, nothing smart), ON_FAIL_TRY_ONE_NEXT_AVAILABLE (try one more host before giving up) and ON_FAIL_TRY_ALL_AVAILABLE (try all available hosts before giving up). See <a href="http://github.com/rantav/hector/blob/master/src/main/java/me/prettyprint/cassandra/service/CassandraClient.java">CassandraClient.FailoverPolicy</a>.</li>
<li>Connection pooling. This is a real necessity for high scale applications. The usual pattern for DAOs (Data Access Objects) is large number of small reads/writes. Clients cannot afford to open a new connection with each and every request, not only because of the overhead in the tcp handshake (thrift uses tcp), but also because of the fact that sockets remain in <a href="http://www.developerweb.net/forum/showthread.php?t=2941">TIME_WAIT</a> so a client may easily run out of available sockets if it operates fast enough. This part was also inspired by cassandra-java-client but was improved in my version. Hector provides connection pooling and a nice framework that manages all its gory details.</li>
<li>JMX support. It&#8217;s a widely known fact that applications have a life of their own. You built it to do X but it does Y b/c you didn&#8217;t expect Z to happen. Running an application without the ability to monitor it is like walking blindfolded on a dark highway; sooner or later you&#8217;ll get hit by something. Hector exposes JMX for many important runtime metrics, such as number of available connections, idle connections, error statistics and more.</li>
<li>Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take care of the required plumbing. This is demonstrated in the code above.</li>
</ul>
<p>I&#8217;ve been using hector internally, at outbrain and so far so good. I&#8217;d be happy to get the comminuty feedback &#8211; API, implementation, features and so on and hope you can find it useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/feed/</wfw:commentRss>
		<slash:comments>88</slash:comments>
		</item>
		<item>
		<title>Running Cassandra as an embedded service</title>
		<link>http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/</link>
		<comments>http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/#comments</comments>
		<pubDate>Sun, 14 Feb 2010 06:08:45 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=255</guid>
		<description><![CDATA[While developing an application at outbrain, using Cassandra I was looking for a good way to test my app. The application consists of a Cassandra Client package, some Data Access Objects (DAOs) and some bean object that represent the data entities in cassandra. I wanted to test them all. As unit test tradition goes, my [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F02%2F14%2Frunning-cassandra-as-an-embedded-service%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F02%2F14%2Frunning-cassandra-as-an-embedded-service%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/02/cassandra3125.jpg"><img class="size-medium wp-image-257 alignleft" title="cassandra embedded" src="http://prettyprint.me/wp-content/uploads/2010/02/cassandra3125-220x300.jpg" alt="" width="220" height="300" /></a>While developing an application at outbrain, using <a href="http://incubator.apache.org/cassandra/">Cassandra</a> I was looking for a good way to test my app. The application consists of a Cassandra Client package, some Data Access Objects (DAOs) and some bean object that represent the data entities in cassandra. I wanted to test them all.</p>
<p>As unit test tradition goes, my requirement was zero-configuration, zero preparation, no external dependencies, full isolation, fully reproducible results and fast. Database testing has always been a challenge in this perspective, for example when testing SQL clients in java often <a href="http://hsqldb.org/">HSQLDB</a> is used to to mock the database. Cassandra, however, did not have something ready just yet so I had to build it.</p>
<p>One way to go was to setup a cassandra instance just for unit testing. There are many downsides to this approach, such as it&#8217;s not zero-configuration, tests need to cleanup before they execute, if two tests are run at the same time by two developers they can collide and change the results in unexpected way, it&#8217;s slow&#8230; out of the question, not good.</p>
<p>Enter the embedded cassandra server.</p>
<p>With the help of the community I&#8217;ve built an embedded cassandra service ideal for unit testing and perhaps other uses. I&#8217;ve also built a cleanup utility that helps wipe out all data before the service starts running so the combination of both provides isolation etc. Now each test process runs an in-process, embedded instance of cassandra.</p>
<p>Below is the source code, already committed to cassandra SCM on trunk. If you want to use it for the current stable release(0.5.0) only a small package rename is required (in trunk some classes moved a bit), and it&#8217;s presented at the end of the post.</p>
<p>The embedded service:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">org.apache.cassandra.service</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.File</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.FileOutputStream</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.InputStream</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.OutputStream</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.config.DatabaseDescriptor</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.io.util.FileUtils</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.CassandraDaemon</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.thrift.transport.TTransportException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.slf4j.Logger</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.slf4j.LoggerFactory</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * An embedded, in-memory cassandra storage service that listens
 * on the thrift interface as configured in storage-conf.xml
 * This kind of service is useful when running unit tests of
 * services using cassandra for example.
 *
&nbsp;
 * This is the implementation of https://issues.apache.org/jira/browse/CASSANDRA-740
 *
&nbsp;
 * How to use:
 * In the client code create a new thread and spawn it with its {@link Thread#start()} method.
 * Example:
 *
 *      // Tell cassandra where the configuration files are.
        System.setProperty(&quot;storage-config&quot;, &quot;conf&quot;);
&nbsp;
        cassandra = new EmbeddedCassandraService();
        cassandra.init();
&nbsp;
        // spawn cassandra in a new thread
        Thread t = new Thread(cassandra);
        t.setDaemon(true);
        t.start();
&nbsp;
 *
 * @author Ran Tavory (rantav@gmail.com)
 *
 */</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> EmbeddedCassandraService <span style="color: #000000; font-weight: bold;">implements</span> <span style="color: #003399;">Runnable</span>
<span style="color: #009900;">&#123;</span>
&nbsp;
    CassandraDaemon cassandraDaemon<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> init<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> TTransportException, <span style="color: #003399;">IOException</span>
    <span style="color: #009900;">&#123;</span>
        cassandraDaemon <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> CassandraDaemon<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        cassandraDaemon.<span style="color: #006633;">init</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">null</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> run<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
        cassandraDaemon.<span style="color: #006633;">start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The data cleaner:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">org.apache.cassandra.contrib.utils.service</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.File</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.HashSet</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.util.Set</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.config.DatabaseDescriptor</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.io.util.FileUtils</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * A cleanup utility that wipes the cassandra data directories.
 *
 * @author Ran Tavory (rantav@gmail.com)
 *
 */</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> CassandraServiceDataCleaner <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Creates all data dir if they don't exist and cleans them
     * @throws IOException
     */</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> prepare<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span> <span style="color: #009900;">&#123;</span>
        makeDirsIfNotExist<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        cleanupDataDirectories<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Deletes all data from cassandra data directories, including the commit log.
     * @throws IOException in case of permissions error etc.
     */</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> cleanupDataDirectories<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> s<span style="color: #339933;">:</span> getDataDirs<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            cleanDir<span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Creates the data diurectories, if they didn't exist.
     * @throws IOException if directories cannot be created (permissions etc).
     */</span>
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> makeDirsIfNotExist<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> s<span style="color: #339933;">:</span> getDataDirs<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            mkdir<span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Collects all data dirs and returns a set of String paths on the file system.
     *
     * @return
     */</span>
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #003399;">Set</span> getDataDirs<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #003399;">Set</span> dirs <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">HashSet</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> s <span style="color: #339933;">:</span> DatabaseDescriptor.<span style="color: #006633;">getAllDataFileLocations</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            dirs.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>s<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        dirs.<span style="color: #006633;">add</span><span style="color: #009900;">&#40;</span>DatabaseDescriptor.<span style="color: #006633;">getLogFileLocation</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">return</span> dirs<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Creates a directory
     *
     * @param dir
     * @throws IOException
     */</span>
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000066; font-weight: bold;">void</span> mkdir<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> dir<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span> <span style="color: #009900;">&#123;</span>
        FileUtils.<span style="color: #006633;">createDirectory</span><span style="color: #009900;">&#40;</span>dir<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Removes all directory content from file the system
     *
     * @param dir
     * @throws IOException
     */</span>
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000066; font-weight: bold;">void</span> cleanDir<span style="color: #009900;">&#40;</span><span style="color: #003399;">String</span> dir<span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #003399;">File</span> dirFile <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">File</span><span style="color: #009900;">&#40;</span>dir<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>dirFile.<span style="color: #006633;">exists</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">&amp;</span>amp<span style="color: #339933;">;&amp;</span>amp<span style="color: #339933;">;</span> dirFile.<span style="color: #006633;">isDirectory</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            FileUtils.<span style="color: #006633;">delete</span><span style="color: #009900;">&#40;</span>dirFile.<span style="color: #006633;">listFiles</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>And an example test that uses both:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">org.apache.cassandra.contrib.utils.service</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">static</span> org.<span style="color: #006633;">junit</span>.<span style="color: #000000; font-weight: bold;">Assert</span>.<span style="color: #006633;">assertEquals</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">static</span> org.<span style="color: #006633;">junit</span>.<span style="color: #000000; font-weight: bold;">Assert</span>.<span style="color: #006633;">assertNotNull</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.UnsupportedEncodingException</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.service.EmbeddedCassandraService</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.Cassandra</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.ColumnOrSuperColumn</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.ColumnPath</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.ConsistencyLevel</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.InvalidRequestException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.NotFoundException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.TimedOutException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.cassandra.thrift.UnavailableException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.thrift.TException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.thrift.protocol.TBinaryProtocol</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.thrift.protocol.TProtocol</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.thrift.transport.TSocket</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.thrift.transport.TTransport</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.apache.thrift.transport.TTransportException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.junit.BeforeClass</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.junit.Test</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #008000; font-style: italic; font-weight: bold;">/**
 * Example how to use an embedded and a data cleaner.
 *
 * @author Ran Tavory (rantav@gmail.com)
 *
 */</span>
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> CassandraServiceTest <span style="color: #009900;">&#123;</span>
&nbsp;
    <span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> EmbeddedCassandraService cassandra<span style="color: #339933;">;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Set embedded cassandra up and spawn it in a new thread.
     *
     * @throws TTransportException
     * @throws IOException
     * @throws InterruptedException
     */</span>
    @BeforeClass
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">static</span> <span style="color: #000066; font-weight: bold;">void</span> setup<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> TTransportException, <span style="color: #003399;">IOException</span>,
            <span style="color: #003399;">InterruptedException</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #666666; font-style: italic;">// Tell cassandra where the configuration files are.</span>
        <span style="color: #666666; font-style: italic;">// Use the test configuration file.</span>
        <span style="color: #003399;">System</span>.<span style="color: #006633;">setProperty</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;storage-config&quot;</span>, <span style="color: #0000ff;">&quot;../../test/conf&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        CassandraServiceDataCleaner cleaner <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> CassandraServiceDataCleaner<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        cleaner.<span style="color: #006633;">prepare</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        cassandra <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> EmbeddedCassandraService<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        cassandra.<span style="color: #006633;">init</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #003399;">Thread</span> t <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">Thread</span><span style="color: #009900;">&#40;</span>cassandra<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        t.<span style="color: #006633;">setDaemon</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">true</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        t.<span style="color: #006633;">start</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>   
&nbsp;
    @Test
    <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> testInProcessCassandraServer<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
            <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">UnsupportedEncodingException</span>, InvalidRequestException,
            UnavailableException, TimedOutException, TException,
            NotFoundException <span style="color: #009900;">&#123;</span>
        Cassandra.<span style="color: #006633;">Client</span> client <span style="color: #339933;">=</span> getClient<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #003399;">String</span> key_user_id <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;1&quot;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #000066; font-weight: bold;">long</span> timestamp <span style="color: #339933;">=</span> <span style="color: #003399;">System</span>.<span style="color: #006633;">currentTimeMillis</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        ColumnPath cp <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> ColumnPath<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Standard1&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        cp.<span style="color: #006633;">setColumn</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;name&quot;</span>.<span style="color: #006633;">getBytes</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;utf-8&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">// insert</span>
        client.<span style="color: #006633;">insert</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Keyspace1&quot;</span>, key_user_id, cp, <span style="color: #0000ff;">&quot;Ran&quot;</span>.<span style="color: #006633;">getBytes</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;UTF-8&quot;</span><span style="color: #009900;">&#41;</span>,
                timestamp, ConsistencyLevel.<span style="color: #006633;">ONE</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">// read</span>
        ColumnOrSuperColumn got <span style="color: #339933;">=</span> client.<span style="color: #006633;">get</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Keyspace1&quot;</span>, key_user_id, cp,
                ConsistencyLevel.<span style="color: #006633;">ONE</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
&nbsp;
        <span style="color: #666666; font-style: italic;">// assert</span>
        assertNotNull<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Got a null ColumnOrSuperColumn&quot;</span>, got<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        assertEquals<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Ran&quot;</span>, <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">String</span><span style="color: #009900;">&#40;</span>got.<span style="color: #006633;">getColumn</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>.<span style="color: #006633;">getValue</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>, <span style="color: #0000ff;">&quot;utf-8&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
&nbsp;
    <span style="color: #008000; font-style: italic; font-weight: bold;">/**
     * Gets a connection to the localhost client
     *
     * @return
     * @throws TTransportException
     */</span>
    <span style="color: #000000; font-weight: bold;">private</span> Cassandra.<span style="color: #006633;">Client</span> getClient<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #000000; font-weight: bold;">throws</span> TTransportException <span style="color: #009900;">&#123;</span>
        TTransport tr <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TSocket<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;localhost&quot;</span>, <span style="color: #cc66cc;">9170</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        TProtocol proto <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TBinaryProtocol<span style="color: #009900;">&#40;</span>tr<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        Cassandra.<span style="color: #006633;">Client</span> client <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Cassandra.<span style="color: #006633;">Client</span><span style="color: #009900;">&#40;</span>proto<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        tr.<span style="color: #006633;">open</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #000000; font-weight: bold;">return</span> client<span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>To use this source code in v0.5.0 a small package rename is required:<br />
org.apache.cassandra.io.util.FileUtils =&gt; org.apache.cassandra.utils.FileUtils<br />
org.apache.thrift.transport.TTransportException =&gt; org.apache.transport.TTransportException<br />
org.apache.cassandra.thrift.CassandraDaemon =&gt; org.apache.cassandra.CassandraDaemon</p>
<p>One nifty detail: When running multiple tests serially, make sure to spawn each test in a separate JVM (fork mode) since cassandra doesn&#8217;t shut down all threads immediately. Running each in separate jvm ensures the previous test dies before the next one begins.</p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Introduction to NOSQL and cassandra, part 2</title>
		<link>http://prettyprint.me/2010/01/20/introduction-to-nosql-and-cassandra-part-2/</link>
		<comments>http://prettyprint.me/2010/01/20/introduction-to-nosql-and-cassandra-part-2/#comments</comments>
		<pubDate>Wed, 20 Jan 2010 08:52:40 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=241</guid>
		<description><![CDATA[In part 1 of this talk I presented few of the theoretical concepts behind nosql and cassandra. In this talk we deep dive into the Cassandra API and implementation. The video is again in Hebrew, but the slides are multilingual ;-) Started with a short recap of some of RDBMS and SQL properties, such as ADIC, why [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F01%2F20%2Fintroduction-to-nosql-and-cassandra-part-2%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F01%2F20%2Fintroduction-to-nosql-and-cassandra-part-2%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>In <a href="http://prettyprint.me/2010/01/09/introduction-to-nosql-and-cassandra-part-1/">part 1</a> of this talk I presented few of the theoretical  concepts behind nosql and cassandra.</p>
<p>In this talk we deep dive into the Cassandra API and implementation. The video is again in Hebrew, but the slides are multilingual ;-)<br />
<object width="425" height="344"><param name="movie" value="http://www.youtube.com/v/8P_wAe6Xpxw&#038;hl=en_US&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/8P_wAe6Xpxw&#038;hl=en_US&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="344"></embed></object><br />
<iframe src="http://docs.google.com/present/embed?id=ahbp3bktzpkc_145c5gmf2gz" frameborder="0" width="410" height="342"></iframe></p>
<ul>
<li>Started with a short recap of some of RDBMS and SQL properties, such as <a href="http://databases.about.com/od/specificproducts/a/acid.htm">ADIC</a>, why SQL if very programmer friendly, but is also limited in its support for large scale systems.</li>
<li><a href="http://databases.about.com/od/specificproducts/a/acid.htm"></a>Short recap of the <a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem">CAP theorem</a></li>
<li><a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem"></a>Short recap of what N/R/W are</li>
<li>Cassandra Data Model: Cassandra is a column oriented DB which follows a similar data model to <a href="http://labs.google.com/papers/bigtable.html">Google&#8217;s BigTable</a></li>
<li>Do you know SQL? So you better start forgetting it, Cassandra is a different game.</li>
<li>Vocabulary:
<ul>
<li>Keyspace &#8211; a logical buffer for application data. For example &#8211; Billing keyspace, or statistics keyspace, appX keyspace etc</li>
<li>ColumnFamily &#8211; similar to SQL tables. Aggregates columns and rows</li>
<li>Keys (or Rows). Each set of columns is identified by a key. A key is unique per Column Family</li>
<li>Columns &#8211; the actual values. Columns are represented by triplets &#8211; (name, value, timestamp)</li>
<li>Super-Columns &#8211; Facebook&#8217;s addition to the BigTable model SuperColumns are columns who&#8217;s values is a list of Columns. (but this is not recursive, you can only have one level of super-columns)</li>
</ul>
</li>
<li><a href="http://labs.google.com/papers/bigtable.html"></a>One way to think of cassandra is as a key-value store, but with extra functionality:
<ul>
<li>Each key has multiple values. In Cassandra jargon those are Columns</li>
<li>When reading or writing data it&#8217;s possible to read/write a set of columns for one specific key (row) atomically. This set of columns may either be a specified by the list column names, or by a slice predicate, assuming the columns are sorted in some way (that&#8217;s a configuration parameter)</li>
<li>In a addition, a multi-get operation is supported and a row-range-read operation is supported as well.</li>
<li>Row-range-read operations are supported only of a partitioner is defined which supports that (configuration parameter)</li>
</ul>
</li>
<li><strong>Key concept</strong>: In SQL you add your data first and then retrieve it in ad-hoc manner using select queries and where clauses; In Cassandra you can&#8217;t do that. Data can only be retrieved by it&#8217;s row key, so you have to think about how you&#8217;re going to be reading your data before you insert it. This is a conceptual diff b/w SQL and Cassandra.</li>
<li>I covered the Cassandra API methods:
<ul>
<li>get</li>
<li>get_slice</li>
<li>multiget</li>
<li>multiget_slice</li>
<li>get_count</li>
<li>get_range_slice</li>
<li>insert</li>
<li>batch_insert</li>
<li>delete</li>
<li>(these are the 0.4 api method. In 0.5 it&#8217;s a little different)</li>
</ul>
</li>
<li>Between N/R/W, N is set per keyspace; R is defined per each read operation (get/multiget/etc) and W is defined per write operation (insert/batch_insert/delete)</li>
<li>Applications play with their R/W values to get different effects, for example they use QUORUM to get high consistency levels, or DC_QUORUM for a balance of high consistency and performance, W=0 to have async writes with reduced consistency.</li>
<li>Cassandra defines different sorting orders on it&#8217;s columns. Sort order may be defined at the ColumnFamily level and is used to get a slice of columns, for example, read all columns that start with a&#8230; and end with z&#8230;</li>
<li>There are several out of the box sort types, such as ascii, utf, numeric and date; Applications may also add their own sorters; This is as far as I recall the only place where Cassandra allows external code to be hooked in.</li>
<li>Thrift is a protocol and a library for cross-process communication and is used by Cassandra. You define a thrift interface and then compile it to the language of your choosing &#8211; C++, Java, Python, PHP etc. This makes it very easy for cross-language processes to talk to each other.</li>
<li>Thrift is also very efficient serializing and  deserializing objects and is also space-efficient (much more than Java serialization is).</li>
<li>I did not have enough time to cover the <a href="http://en.wikipedia.org/wiki/Gossip_protocol">Gossip protocol</a> used by Cassandra internally to learn about the health of its hosts.</li>
<li>I also did not have enough time to cover the Repair-on-reads algorithm used by Cassandra to repair data inconsistencies lazily.</li>
<li>I did not have time to talk about <a href="http://en.wikipedia.org/wiki/Consistent_hashing">consistent hashing</a>, which is what cassandra implements internally to reduce overhead of joined or dropped hosts occurrences.</li>
</ul>
<p>So, as you can see, this was an overloaded, 1h+ talk with a lot to grasp. Wish me luck implementing Cassandra into outbrain!</p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/01/20/introduction-to-nosql-and-cassandra-part-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Maven Code Quality Dashboard and TeamCity</title>
		<link>http://prettyprint.me/2010/01/14/maven-code-quality-dashboard-and-teamcity/</link>
		<comments>http://prettyprint.me/2010/01/14/maven-code-quality-dashboard-and-teamcity/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 19:38:13 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=227</guid>
		<description><![CDATA[I&#8217;ve recently implemented a code-quality dashboard at outbrain for maven java projects and hooked it into the our TeamCity continuous integration server. I was very pleased with the result, but the process had a few hickups, so I thought I&#8217;d mention them here for future generations. A code quality dashboard includes the following components: Tests [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F01%2F14%2Fmaven-code-quality-dashboard-and-teamcity%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F01%2F14%2Fmaven-code-quality-dashboard-and-teamcity%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>I&#8217;ve recently implemented a code-quality dashboard at <a href="http://outbrain.com">outbrain</a> for <a href="http://maven.apache.org/">maven</a> java projects and hooked it into the our <a href="http://www.jetbrains.com/teamcity/index.html">TeamCity</a> continuous integration server. I was very pleased with the result, but the process had a few hickups, so I thought I&#8217;d mention them here for future generations.</p>
<p>A code quality dashboard includes the following components:</p>
<ul>
<li>Tests status &#8211; failed, passed and  skipped count <strong>along with good looking graphs</strong></li>
<li><strong>Code coverage</strong> report detailing all covered and uncovered lines and branches, including nice coverage graphs</li>
<li><strong> Copy-Paste detection</strong> by <a href="http://pmd.sourceforge.net/cpd.html">CPD</a></li>
<li><a href="http://findbugs.sourceforge.net/">FindBugs</a> report</li>
<li><a href="http://clarkware.com/software/JDepend.html">jDepend</a> report</li>
</ul>
<p>The process had two phases: phase one is where I add the dashboard report to maven&#8217;s site goal in my pom.xml and phase two is where I make this report available at TeamCity, which is a bit of a manual work bot not too bad.</p>
<p>To add those nice reports, edit your pom.xml to add:</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;">  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;reporting<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plugins<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>org.codehaus.mojo<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>cobertura-maven-plugin<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>org.apache.maven.plugins<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>maven-pmd-plugin<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;version<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>2.3<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/version<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;configuration<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
          <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;linkXref<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>true<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/linkXref<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
          <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;targetJdk<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>1.5<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/targetJdk<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/configuration<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>org.apache.maven.plugins<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>maven-surefire-report-plugin<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;version<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>2.4.2<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/version<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>org.codehaus.mojo<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>jdepend-maven-plugin<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>org.codehaus.mojo<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>findbugs-maven-plugin<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;version<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>2.0.1<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/version<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;configuration<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
          <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;xmlOutput<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>true<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/xmlOutput<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
          <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;effort<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>Max<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/effort<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/configuration<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>org.codehaus.mojo<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>dashboard-maven-plugin<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
    <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plugins<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
  <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/reporting<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></div></div>

<p>Now, in theory, that would have been all. All you have to do is run <em><strong>mvn site</strong></em> and bang &#8211; you have the reports under target/site. That&#8217;s why maven is nice.</p>
<p>However, if you&#8217;re running a multi-module project then mvn-site is buggy&#8230; all links to the subproject are broken links. But no despair, here&#8217;s the solution &#8211; configure the site plugin to place its generated content where the site plugin expects it to be&#8230; yeah, I know it sounds confusing, the thing is that the site plugin has a bug so it&#8217;s links to the submodule projects are broken, but here&#8217;s an easy fix that worked for me (as long as the projects are only one directory deep under the parent pom.xml). In the parent pom.xml add:</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;">      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>org.apache.maven.plugins<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/groupId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>maven-site-plugin<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/artifactId<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;configuration<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
          <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;outputDirectory<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>../target/site/${project.name}<span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/outputDirectory<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
        <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/configuration<span style="color: #000000; font-weight: bold;">&gt;</span></span></span>
      <span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;/plugin<span style="color: #000000; font-weight: bold;">&gt;</span></span></span></pre></div></div>

<p>Now the reports are fixed.</p>
<p>Next step is to add them into TeamCity.</p>
<p>Step one: Tell teamcity to collect the site artifacts (html, css, js&#8230;). Go to the build configuration and under Artifacts Paths add <strong>**/target/site/**/*</strong></p>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/01/Trunk-Code-Quality-Configuration-TeamCity.png"><img class="size-full wp-image-234 alignnone" title="Trunk Code Quality Configuration -- TeamCity" src="http://prettyprint.me/wp-content/uploads/2010/01/Trunk-Code-Quality-Configuration-TeamCity.png" alt="" width="590" height="161" /></a></p>
<p>Step two: Add a Code Quality tab to the build results. You do that by ssh-ing to the teamc host and editing</p>
<pre>vi /teamc/TeamCity/.BuildServer/config/main-config.xml</pre>
<p>to add</p>
<pre>&lt;report-tab title="Code Quality" basePath="target/site/" startPage="dashboard-report-details.html" /&gt;</pre>
<p>Here&#8217;s the result:</p>
<p><img class="alignnone size-full wp-image-233" title="Code Quality Tab" src="http://prettyprint.me/wp-content/uploads/2010/01/Trunk-__-Trunk-Code-Quality-8-13-Jan-10-20_43-Overview-TeamCity.png" alt="" width="632" height="83" /></p>
<p>That&#8217;s it! Now run your build with mvn site and get those gorgeous looking reports right in your teamcity build results page.<br />
<a href="http://prettyprint.me/wp-content/uploads/2010/01/Global-DashBoard-Report.png"><img class="size-full wp-image-228 alignnone" title="Global DashBoard Report" src="http://prettyprint.me/wp-content/uploads/2010/01/Global-DashBoard-Report.png" alt="" width="715" height="677" /></a></p>
<p><img class="size-full wp-image-230 alignnone" title="Outrain Root Project - DashBoard Report" src="http://prettyprint.me/wp-content/uploads/2010/01/Outrain-Root-Project-DashBoard-Report-2.png" alt="" width="590" height="359" /></p>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/01/Outrain-Root-Project-DashBoard-Report-1-1.png"><img class="size-full wp-image-229 alignnone" title="Outrain Root Project - DashBoard Report" src="http://prettyprint.me/wp-content/uploads/2010/01/Outrain-Root-Project-DashBoard-Report-1-1.png" alt="" width="441" height="274" /></a></p>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/01/Coverage-Report.png"><img class="alignnone size-full wp-image-236" title="Coverage Report" src="http://prettyprint.me/wp-content/uploads/2010/01/Coverage-Report.png" alt="" width="703" height="536" /></a></p>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/01/Coverage-Report-1.png"><img class="alignnone size-full wp-image-235" title="Coverage Report" src="http://prettyprint.me/wp-content/uploads/2010/01/Coverage-Report-1.png" alt="" width="460" height="378" /></a><img class="alignnone size-full wp-image-237" title="FindBugs piechart" src="http://prettyprint.me/wp-content/uploads/2010/01/ImageFetcher-DashBoard-Report.png" alt="" width="825" height="426" /></p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/01/14/maven-code-quality-dashboard-and-teamcity/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Moved hosting away from #godaddy #sucks</title>
		<link>http://prettyprint.me/2010/01/11/moved-hosting-away-from-godaddy-sucks/</link>
		<comments>http://prettyprint.me/2010/01/11/moved-hosting-away-from-godaddy-sucks/#comments</comments>
		<pubDate>Mon, 11 Jan 2010 18:34:12 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=223</guid>
		<description><![CDATA[This blog has moved away from GoDaddy hosting to bluehost. I don&#8217;t know how good bluehost is (so far it&#8217;s been OK) but I know for sure that GoDaddy is pretty darn awful. Awful as in So slow that it&#8217;s a nightmare to be editing posts online. I had to do them offline and then [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F01%2F11%2Fmoved-hosting-away-from-godaddy-sucks%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F01%2F11%2Fmoved-hosting-away-from-godaddy-sucks%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p><a href="http://prettyprint.me/wp-content/uploads/2010/01/go_daddy_sucks.png"><img class="alignleft size-full wp-image-224" title="go_daddy_sucks" src="http://prettyprint.me/wp-content/uploads/2010/01/go_daddy_sucks.png" alt="" width="270" height="103" /></a><br />
This blog has moved away from <a href="http://www.godaddy.com/">GoDaddy</a> hosting to <a href="https://www.bluehost.com/">bluehost</a>. I don&#8217;t know how good bluehost is (so far it&#8217;s been OK) but I know for sure that GoDaddy is pretty darn awful.</p>
<div style="clear: both;"></div>
<div style="clear: both;">Awful as in</div>
<div style="clear: both;">
<ul>
<li>So slow that it&#8217;s a nightmare to be editing posts online. I had to do them offline and then copy-paste (and reformat), what a waste of time.</li>
<li>So fragile that my <a href="http://www.pingdom.com/">pingdom</a> monitor reports it unavailable for more than 5 minutes at least twice a day.</li>
<li>So unresponsive that I&#8217;m sometimes ashamed to share permalinks to my blog in fear it would be offline. Why did I even pay them?</li>
</ul>
<p>I didn&#8217;t wait for the one year I paid up front to end, just packed my stuff and moved over to bluehost. I don&#8217;t know how good bluehost is, but at least it&#8217;s fun again to be editing online and my pingdom monitor hasn&#8217;t told me anything bad yet, so knock on wood, looking good so far.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/01/11/moved-hosting-away-from-godaddy-sucks/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Introduction to NOSQL and cassandra, part 1</title>
		<link>http://prettyprint.me/2010/01/09/introduction-to-nosql-and-cassandra-part-1/</link>
		<comments>http://prettyprint.me/2010/01/09/introduction-to-nosql-and-cassandra-part-1/#comments</comments>
		<pubDate>Sat, 09 Jan 2010 14:16:50 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://prettyprint.me/?p=220</guid>
		<description><![CDATA[I recently gave a talk at outbrain, where I work, about an introduction to no-sql and Cassandra as we&#8217;re looking for alternatives of scaling out our database solution to match our incredible growth rate. NOSQL is a general name for many non relational databases and Cassandra is one of them. This was the first session [...]]]></description>
			<content:encoded><![CDATA[<div class="tweetmeme_button" style="float: right; margin-left: 10px;">
			<a href="http://api.tweetmeme.com/share?url=http%3A%2F%2Fprettyprint.me%2F2010%2F01%2F09%2Fintroduction-to-nosql-and-cassandra-part-1%2F"><br />
				<img src="http://api.tweetmeme.com/imagebutton.gif?url=http%3A%2F%2Fprettyprint.me%2F2010%2F01%2F09%2Fintroduction-to-nosql-and-cassandra-part-1%2F&amp;style=normal" height="61" width="50" /><br />
			</a>
		</div>
<p>I recently gave a talk at <a href="http://outbrain.com">outbrain</a>, where I work, about an introduction to no-sql and Cassandra as we&#8217;re looking for alternatives of scaling out our database solution to match our incredible growth rate.</p>
<p><a href="http://en.wikipedia.org/wiki/NoSQL">NOSQL</a> is a general name for many non relational databases and <a href="http://incubator.apache.org/cassandra/">Cassandra</a> is one of them.</p>
<p>This was the first session of two in which I introduced the theoretical background and explained few of the important concepts of nosql. In the second session, due next week, I&#8217;ll talk more specifically about Cassandra.</p>
<p>The talk is on youtube, video below, but it&#8217;s in Hebrew so I&#8217;ll share it&#8217;s outline in English here. Slides are enclosed as well.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/fWkEeyT3e2Y&amp;hl=en_US&amp;fs=1&amp;rel=0" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/fWkEeyT3e2Y&amp;hl=en_US&amp;fs=1&amp;rel=0" allowfullscreen="true"></embed></object><br />
<iframe src="http://docs.google.com/present/embed?id=ahbp3bktzpkc_145c5gmf2gz" frameborder="0" width="410" height="342"></iframe></p>
<ul>
<li>SQL and relational DBs in general offer is a very good general purpose solution for many  applications such as blogs, banking, my cat&#8217;s site etc.</li>
<li>RDBMS provide <a href="http://databases.about.com/od/specificproducts/a/acid.htm">ACID</a>: Atomicity, Consistency, Isolation and Durability</li>
<li>RDBMS + SQL (the query language) + ACID provide a very nice and clean programming interface, suitable for banks, online merchants and many other applications, but not all applications really actually do require full ACID and one has to realize that ACID and SQL features are not without costs when systems need to scale out. Cost is not only in $$, it&#8217;s also in application performance and features.</li>
<li>The new generation of internet scale applications put very high demands on DB systems when it comes to scale and speed of operation but they don&#8217;t necessariry require all the good that&#8217;s in RDBMS, such as Full Consistency or Atomicity.</li>
<li>So, a new brand of DB systems has grown over the past 5 or so years &#8211; nosql, which either stands for No-SQL or Not-Only-SQL.</li>
<li>Leading actors in the nosql arena are Google with its BigTable, Amazon with Dynamo, Facebook with Cassandra and there&#8217;s more.</li>
<li>I presented intermediate solutions before going no-sql, namely RDBMS <a href="http://en.wikipedia.org/wiki/Shard_(database_architecture)">sharding</a> which is very common and <a href="http://bret.appspot.com/entry/how-friendfeed-uses-mysql">FriendFeed&#8217;s particularly interesting solution</a> of application level indexing for using mysql with a schema-less data model.</li>
<li>CAP Theorem: At large scale systems you may only choose 2 out of the 3 desired attributes: Consistency, Availability and Partition-Tolerance. All three may not go hand in hand and application designers need to realize that.</li>
<li>A Consistent and Available system with no Partition-tolerance is a RDBMS system that comes to a halt if one of it&#8217;s hosts is down. That&#8217;s a very commonly used solution and perfect for small systems. This blog, for example, which uses WordPress, also uses a single mysql server which, if happens to be down, will also take the blog down. However, for internet scale systems where at almost any point in time there&#8217;s a good chance that one of the nodes is either down, or there are network disruptions, the No-Partition-Tolerance approach just isn&#8217;t going to cut it and they will have to choose a different approach for providing their SLAs.</li>
<li>Systems that are Available at all times and are capable of handling Partitions must sacrifice their consistency. As it turns out, though, this isn&#8217;t bad as it seems, as there are pretty good alternatives for lower levels of consistently, one such solution is <a href="http://en.wikipedia.org/wiki/Eventual_consistency">Eventual Consistency</a>, which actually works pretty nicely for &#8220;social applications&#8221; such as Google&#8217;s Facebook&#8217;s and Outbrain&#8217;s</li>
<li>I introduced the concept of NRW &#8211; N is the number of database replicas data is copied to one must replicate data in order to withstand partitions. W is the number of replicas a write operation would block on until it returns to it&#8217;s caller and is &#8220;successful&#8221; and R is the number of replicas a read operation would block on before returning to its caller.</li>
<li>N, R and W are crucial when dealing with Eventual Consistency as their values usually determine the level of consistency you&#8217;re going to have. For example, when N=R=W you have a full consistency (which isn&#8217;t tolerant to partitions or course). When W=0 you have async writes, which is the lowest level of consistency (you never know when the write operation actually finishes)</li>
<li>I introduced the concept of <a href="http://en.wikipedia.org/wiki/Atomicity_(database_systems)">Quorum</a>, which means R=W=ceil((N+1)/2)</li>
<li>Introduced a (very partial) list of currently available nosql solutions, such as Cassandra, BigTable, HBase, Dynamo, Voldemort, Riak, CouchDB, MongoDB and more.</li>
</ul>
<p>Overall this was a very interesting talk, a lot of (fun and interesting) theory. The next part is going to be specific about Cassandra &#8211; how all this theory fits into Cassandra and how does one use Cassandra&#8217;s API, so stay tuned.</p>
]]></content:encoded>
			<wfw:commentRss>http://prettyprint.me/2010/01/09/introduction-to-nosql-and-cassandra-part-1/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
