JMX in Hector
What I haven’t written about is its extensive JMX support, which makes it really unique, among other properties such as failover and really simple load balancing. JMX support in hector isn’t really new, but it’s the first time I have the chance to write writing about it.
JMX is Java’s standard way for monitoring applications. The default thrift cassandra client provides no JMX support at all so I figured you have to be crazy to run a cassandra client at such a high scale without being able to monitor it.
Here’s the list of JMX attributes provided by hector
WriteFail - Number of failed write operations. ReadFail - Number of failed read operations RecoverableTimedOutCount - Number of recoverable TimedOut exceptions. Those exceptions may happen when certain nodes are under heavy load that they can't provide the service RecoverableUnavailableCount - Number of recoverable Unavailable exceptions RecoverableTransportExceptionCount - Number of recoverable Transport exceptions RecoverableErrorCount - Total number of recoverable errors. SkipHostSuccess - Number of times that a successful skip-host (failover) has occurred. NumPoolExhaustedEventCount - Number of times threads have encountered the pool-exhausted state (and were blocked) NumPools - Number of connections pools. This is also the number of unique hosts in the ring that this client has communicated with. The number may be one or more, depending on the load balance policy and failover attempts. PoolNames - The list of known pools NumIdleConnections - Number of currently idle connections (in all pools) NumActive - number of currently active connections (all pools) NumExhaustedPools - Number of currently exhausted connection pools. RecoverableLoadBalancedConnectErrors - Number of recoverable load-balance connection errors. ExhaustedPoolNames - The list of exhausted connection pools. NumBlockedThreads - Number of currently blocked threads. NumConnectionErrors - Number of connection errors (initial connection to the ring for retrieving metadata) KnownHosts - the list of known hosts in the ring. This list will be used by the client in case failover is required. updateKnownHosts - This is an operation that may be invoked by an admin to tell the client to update its list of known hosts. Usually this is done after the ring configuration has changed.
Performance Counters: (I used the mechanics of perf4j to implement those)
READ.success_TPS - Total Read Transactions Per Second (measured as the average over the last 10 seconds). READ.success_Mean - The Mean time of successful read requests over the last 10 seconds. READ.success_Min - Time in millisec of the fastest successful read operation (over the last 10 seconds) READ.success_Max - Time in millisec of the slowest read (over the last 10 seconds) READ.success_StdDev - Standard deviation of time of successful read operations (over the last 10 seconds) WRITE.success_TPS - Total write transactions per second over (over the last 10 seconds). WRITE.success_Mean - ... WRITE.success_Min WRITE.success_Max WRITE.success_StdDev READ.fail_TPS READ.fail_Mean READ.fail_Min READ.fail_Max READ.fail_StdDev WRITE.fail_TPS WRITE.fail_Mean WRITE.fail_Min WRITE.fail_Max WRITE.fail_StdDev