Tuesday, May 27, 2014

Connecting to Cassandra from Java

In my post Hello Cassandra, I looked at downloading the Cassandra NoSQL database and using cqlsh to connect to a Cassandra database. In this post, I look at the basics of connecting to a Cassandra database from a Java client.

Although there are several frameworks available for accessing the Cassandra database from Java, I will use the DataStax Java Client JAR in this post. The DataStax Java Driver for Apache Cassandra is available on GitHub. The datastax/java-driver GitHub project page states that it is a "Java client driver for Apache Cassandra" that "works exclusively with the Cassandra Query Language version 3 (CQL3)" and is "licensed under the Apache License, Version 2.0."

The Java Driver 2.0 for Apache Cassandra page provides a high-level overview and architectural details about the driver. Its Writing Your First Client section provides code listings and explanations regarding connecting to Cassandra with the Java driver and executing CQL statements from Java code. The code listings in this post are adaptations of those examples applied to my example cases.

The Cassandra Java Driver has several dependencies. The Java Driver 2.0 for Apache Cassandra documentation includes a page called Setting up your Java development environment that outlines the Java Driver 2.0's dependencies: cassandra-driver-core-2.0.1.jar (datastax/java-driver 2.0), netty-3.9.0-Final.jar (netty direct), guava-16.0.1.jar (Guava 16 direct), metrics-core-3.0.2.jar (Metrics Core), and slf4j-api-1.7.5.jar (slf4j direct). I also found that I needed to place LZ4Factory.java and snappy-java on the classpath.

The next code listing is of a simple class called CassandraConnector.

CassandraConnector.java
package com.marxmart.persistence;

import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Host;
import com.datastax.driver.core.Metadata;
import com.datastax.driver.core.Session;

import static java.lang.System.out;

/**
 * Class used for connecting to Cassandra database.
 */
public class CassandraConnector
{
   /** Cassandra Cluster. */
   private Cluster cluster;

   /** Cassandra Session. */
   private Session session;

   /**
    * Connect to Cassandra Cluster specified by provided node IP
    * address and port number.
    *
    * @param node Cluster node IP address.
    * @param port Port of cluster host.
    */
   public void connect(final String node, final int port)
   {
      this.cluster = Cluster.builder().addContactPoint(node).withPort(port).build();
      final Metadata metadata = cluster.getMetadata();
      out.printf("Connected to cluster: %s\n", metadata.getClusterName());
      for (final Host host : metadata.getAllHosts())
      {
         out.printf("Datacenter: %s; Host: %s; Rack: %s\n",
            host.getDatacenter(), host.getAddress(), host.getRack());
      }
      session = cluster.connect();
   }

   /**
    * Provide my Session.
    *
    * @return My session.
    */
   public Session getSession()
   {
      return this.session;
   }

   /** Close cluster. */
   public void close()
   {
      cluster.close();
   }
}

The above connecting class could be invoked as shown in the next code listing.

Code Using CassandraConnector
/**
 * Main function for demonstrating connecting to Cassandra with host and port.
 *
 * @param args Command-line arguments; first argument, if provided, is the
 *    host and second argument, if provided, is the port.
 */
public static void main(final String[] args)
{
   final CassandraConnector client = new CassandraConnector();
   final String ipAddress = args.length > 0 ? args[0] : "localhost";
   final int port = args.length > 1 ? Integer.parseInt(args[1]) : 9042;
   out.println("Connecting to IP Address " + ipAddress + ":" + port + "...");
   client.connect(ipAddress, port);
   client.close();
}

The example code in that last code listing specified default node and port of localhost and port 9042. This port number is specified in the cassandra.yaml file located in the apache-cassandra/conf directory. The Cassandra 1.2 documentation has a page on The cassandra.yaml configuration file which describes the cassandra.yaml file as "the main configuration file for Cassandra." Incidentally, another important configuration file in that same directory is cassandra-env.sh, which defines numerous JVM options for the Java-based Cassandra database.

For the examples in this post, I will be using a MOVIES table created with the following Cassandra Query Language (CQL):

createMovie.cql
CREATE TABLE movies
(
   title varchar,
   year int,
   description varchar,
   mmpa_rating varchar,
   dustin_rating varchar,
   PRIMARY KEY (title, year)
);

The above file can be executed within cqlsh with the command source 'C:\cassandra\cql\examples\createMovie.cql' (assuming that the file is placed in the specified directory, of course) and this is demonstrated in the next screen snapshot.

One thing worth highlighting here is that the columns that were created as varchar datatypes are described as text datatypes by the cqlsh describe command. Although I created this table directly via cqlsh, I also could have created the table in Java as shown in the next code listing and associated screen snapshot that follows the code listing.

Creating Cassandra Table with Java Driver
final String createMovieCql =
     "CREATE TABLE movies_keyspace.movies (title varchar, year int, description varchar, "
   + "mmpa_rating varchar, dustin_rating varchar, PRIMARY KEY (title, year))";
client.getSession().execute(createMovieCql);

The above code accesses an instance variable client. The class with this instance variable that it might exist in is shown next.

Shell of MoviePersistence.java
package dustin.examples.cassandra;

import com.datastax.driver.core.ResultSet;
import com.datastax.driver.core.Row;

import java.util.Optional;

import static java.lang.System.out;

/**
 * Handles movie persistence access.
 */
public class MoviePersistence
{
   private final CassandraConnector client = new CassandraConnector();

   public MoviePersistence(final String newHost, final int newPort)
   {
      out.println("Connecting to IP Address " + newHost + ":" + newPort + "...");
      client.connect(newHost, newPort);
   }

   /**
    * Close my underlying Cassandra connection.
    */
   private void close()
   {
      client.close();
   }
}

With the MOVIES table created as shown above (either by cqlsh or with Java client code), the next steps are to manipulate data related to this table. The next code listing shows a method that could be used to write new rows to the MOVIES table.

/**
 * Persist provided movie information.
 *
 * @param title Title of movie to be persisted.
 * @param year Year of movie to be persisted.
 * @param description Description of movie to be persisted.
 * @param mmpaRating MMPA rating.
 * @param dustinRating Dustin's rating.
 */
public void persistMovie(
   final String title, final int year, final String description,
   final String mmpaRating, final String dustinRating)
{
   client.getSession().execute(
      "INSERT INTO movies_keyspace.movies (title, year, description, mmpa_rating, dustin_rating) VALUES (?, ?, ?, ?, ?)",
      title, year, description, mmpaRating, dustinRating);
}

With the data inserted into the MOVIES table, we need to be able to query it. The next code listing shows one potential implementation for querying a movie by title and year.

Querying with Cassandra Java Driver
/**
 * Returns movie matching provided title and year.
 *
 * @param title Title of desired movie.
 * @param year Year of desired movie.
 * @return Desired movie if match is found; Optional.empty() if no match is found.
 */
public Optional<Movie> queryMovieByTitleAndYear(final String title, final int year)
{
   final ResultSet movieResults = client.getSession().execute(
      "SELECT * from movies_keyspace.movies WHERE title = ? AND year = ?", title, year);
   final Row movieRow = movieResults.one();
   final Optional<Movie> movie =
        movieRow != null
      ? Optional.of(new Movie(
           movieRow.getString("title"),
           movieRow.getInt("year"),
           movieRow.getString("description"),
           movieRow.getString("mmpa_rating"),
           movieRow.getString("dustin_rating")))
      : Optional.empty();
   return movie;
}

If we need to delete data already stored in the Cassandra database, this is easily accomplished as shown in the next code listing.

Deleting with Cassandra Java Driver
/**
 * Deletes the movie with the provided title and release year.
 *
 * @param title Title of movie to be deleted.
 * @param year Year of release of movie to be deleted.
 */
public void deleteMovieWithTitleAndYear(final String title, final int year)
{
   final String deleteString = "DELETE FROM movies_keyspace.movies WHERE title = ? and year = ?";
   client.getSession().execute(deleteString, title, year);
}

As the examples in this blog post have shown, it's easy to access Cassandra from Java applications using the Java Driver. It is worth noting that Cassandra is written in Java. The advantage of this for Java developers is that many of Cassandra's configuration values are JVM options that Java developers are already familiar with. The cassandra-env.sh file in the Cassandra conf directory allows one to specify standard JVM options used by Cassandra (such as heap sizing parameters -Xms, -Xmx, and -Xmn),HotSpot-specific JVM options (such as -XX:-HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath, garbage collection tuning options, and garbage collection logging options), enabling assertions (-ea), and exposing Cassandra for remote JMX management.

Speaking of Cassandra and JMX, Cassandra can be monitored via JMX as discussed in the "Monitoring using JConsole" section of Monitoring a Cassandra cluster. The book excerpt The Basics of Monitoring Cassandra also discusses using JMX to monitor Cassandra. Because Java developers are more likely to be familiar with JMX clients such as JConsole and VisualVM, this is an intuitive approach to monitoring Cassandra for Java developers.

Another advantage of Cassandra's Java roots is that Java classes used by Cassandra can be extended and Cassandra can be customized via Java. For example, custom data types can be implemented by extending the AbstractType class.

Conclusion

The Cassandra Java Driver makes it easy to access Cassandra from Java applications. Cassandra also features significant Java-based configuration and monitoring and can even be customized with Java.

3 comments:

Chamila Wijayaratna said...

Hi, Thanks for sharing this valuable information. I'm trying to add password authentication to a cassandra database. I changed authenticator to PasswordAuthenticator in cassandra.yaml. How can I give this authentication detail when connecting to database via Java?

luisagcenteno said...

Just in case you still need it, here it is

this.cluster = Cluster.builder().addContactPoint(node).withPort(port).withCredentials("user", "password").build();

Usama Siddiqui said...

It's always hard with Java to connect to Cassandra. I believe PHP is more suitable.