gregw

Monday Feb 19, 2007

Clustering cometd with Terracotta.

I recently attended a Torino JUG meeting and saw Jonas Bonér speak about transparent clustering with Terracotta. It was an interesting talk and went a long way to alleviating the concerns I have with the star architecture of terracotta. Thus I wanted to try out a terracotta cluster, but I thought that simply clustering HTTP session in jetty was a bit obvious, easy and above all boring. So instead I used terracotta to cluster the jetty implementation of cometd.

Terracotta

Terracotta is a heap-level replication mechanism that uses byte-code manipulation to cluster fine grain data changes and synchronization between nodes via a central server (which can itself be replicated to avoid a single point of failure). The promise of terracotta is that you can use an XML file to identify the object roots that you want clustered, and terracotta will byte code instrument those classes so that they are replicated on synchronization memory barriers to the terracotta server. Then any other node in the cluster can access those objects and they are lazily downloaded to nodes from the server as they are needed. Normal synchronization mechanisms are used to prevent multiple nodes from simultaneously updating the same object. The intent is that there is no API and that normal well written java code will just work transparently in a cluster.

Terracotta has recently been released as open source

Cometd

Cometd is an Ajax message bus for distributing messages over Ajax push transports to clients. It is being developed by the Dojo foundation and already has many client and server implementations. The paradigm of cometd (or more accurately the bayeux protocol which is used by cometd) is publish/subscribe to named channels. It is ideal for collaborative and interactive web applications where clients are asynchronously notified of server events or messages from other clients (think chat, auctions, price changes, etc.)

Clustered Cometd

The cometd servlet does not use the HTTP session mechanism, so clustering cometd is unrelated to clustering sessions. Instead the features of terracotta are used to directly cluster the data and events used by the cometd servlet. In order to cluster cometd, there are two things that need to be distribututed: 1) The subscription metadata and 2) the message publishing events.

The Bayeux class is the root Object and is placed in the ServletContext by the CometdServlet. It contains the Channels, Clients and subscription relationships between them, thus it was the perfect class to distribute subscription meta data to all nodes. Clustering this class was pretty easy, but not exactly transparent as some rewriting was needed:

  • Terracotta cannot distribute some classes and others are inefficient to cluster. So the following were made transient: Timer; reference to the servlet context; a Random number generator; and a data cache instance. Transient objects are not clustered by terracotta, so they must be manually instantiated on each node. This can be done with terracotta scripts, but in this case I was able to use the servlet life cycle methods to call init on the Bayeux object in all nodes. So on a given node, the non-transient data is set by replication and the transient data is set locally
  • Because the Bayeux object can be created on another node in the cluster, the CometdServlet had to be modified to detect if it should create a new instance or locally initialize the an existing instance of the Bayeux object. This resulted in the slightly strange code:
    public class CometdServlet extends HttpServlet
    {
      private Bayeux _bayeux;
    
      public void init() throws ServletException
      {
        synchronized (CometdServlet.class)
        {
          if (_bayeux==null)
          {
            _bayeux=new ContinuationBayeux();
          }
          
          synchronized(_bayeux)
          {
            _bayeux.initialize(getServletContext());
          }
        }
      }
    }
    
    On first glance, you would think that the if (_bayeux==null) clause would always be true. But with Terracotta, that is not the case as the _bayeux field is identified as a distributed root an may be initialized by another node (or even a previous run of the current node) and will thus be non-null when the current JVM starts.
  • Because terracotta distributes data and not events, the event of delivering a messages to a Client must clustered with a different mechanism. The Client.deliver(Map message) method will add a message to a the distributed queue, but it will not notify the waiting Ajax push mechanism unless the Client is connected to the same node on which the message was published. So a resume() has been added which performs the notification and invocation of that method is distributed by terracotta.
  • Terracotta is quiet insistent that all access to distributed data is synchronized. So I had to add a few synchronize blocks to the initialization code.

With those changes, the bayeux object can be clustered by running the jetty server with the terracotta JVM and giving it a simple xml configuration file:

<tc:tc-config xmlns:tc="http://www.terracotta.org/config">
  <application>
    <dso>
      <roots>
        <root>
          <field-name>
            org.mortbay.cometd.CometdServlet._bayeux
          </field-name>
        </root>
      </roots>      
      <distributed-methods>
        <method-expression>
          void org.mortbay.cometd.ContinuationClient.resume()
        </method-expression>
      </distributed-methods>
      <locks>
        <autolock>
          <method-expression>
            * org.mortbay.cometd.*.*(..)
          </method-expression>
          <lock-level>
            write
          </lock-level>
        </autolock>
      </locks>
      <instrumented-classes>
        <include>
          <class-expression>org.mortbay.cometd..*</class-expression>
          <honor-transient>true</honor-transient>
        </include>
        <include>
          <class-expression>
            org.mortbay.util.ajax..*
          </class-expression>
          <honor-transient>
            true
          </honor-transient>
        </include>
      </instrumented-classes>
    </dso>
  </application>
</tc:tc-config>

This configuration: identifies the bayeux object as a root to be distributed; specifies the classes to be instrumented for distribution; and identifies the resume() method as a distributed method. (I'm not sure what the lock clause is for, I just copied that and have not read the manual yet. It may not be required?).

This configuration is now running the cometd demos on a 2 node cluster (without a load balancer ): node1 and node2. Point a browser at each node and you can chat or drag magnets across the cluster.

Event Latency

Unfortunately, while the above approach has managed to cluster cometd with only minor code modifications, the latency between the nodes is not exactly crisp. The problem is that the distributed method mechanism was not really designed for prime time usage and thus is not optimized. Instead of the resume call being distributed with the distributed message data, the method call is broadcast to all nodes after the message data has been committed to the terracotta server. Each node then runs the method, which synchronizes with the terracotta server and downloads the updated data. So there are several more transactions with the central server than need be.

More importantly, because the distributed method mechanism sends the resume() to ALL nodes, the Client object will be lazily instantiated in ALL nodes and each node will have a real copy of all Clients and all messages. This defeats the scalability of the cluster.

I raised this issue with Jonas and the folks at terracotta and they are indeed thinking of better ways to distribute events. Today it would be possible to use synchronized queues to deliver messages to specific nodes, but that would require node-aware code to be written, which kind of invalidates the transparent clustering promise of terracotta. So other mechanisms are being investigated and this issue has been discussed in a jira . My suggestion is for an onChange callback method that would allow an object to be notified if it has been updated and this is being considered.

Conclusion

Terracotta was indeed simple to use and mostly transparent. The code changes that I needed to make could mostly be justified on the grounds of good coding practises and have not spoilt the code for stand-alone usage. For data centric applications, terracotta is indeed a non-intrusive way to cluster existing applications. However for event based code such as cometd, more work is required before events can be efficiently distributed. Only then will we be able to stress a test cluster and see how the star architecture performs.

Comments:

Good article. Would be interesting to try your stuff off of trunk. We redid dmi and it is a bunch faster.

Posted by steve on February 20, 2007 at 01:46 AM EST #

Steve,

I will try to find time to retry with terracotta trunk.  But even if the latency is better, the ALL node issue will be a killer for clusters larger than 2 or 3.

Posted by Greg Wilkins on February 20, 2007 at 03:26 AM EST #

Great article - good to see great use cases like this popping up!

Just thought I would note that if the bayeux object is actually declared as a root the null check is not strictly necessary - terracotta will already ignore the assignment.  The only reason you would need to do that is if there are side effects of the construction of the object.

It may not be obvious at first, but what happens is that Terracotta checks a root field and if it has already been created in the cluster once, then the root object is faulted in from the server and assigned instead of the newly created object.

Posted by Taylor (@terracotta) on February 20, 2007 at 10:06 AM EST #

If you hop on irc I can take you through experimenting with dmi that doesn't apply if the receiver isn't local. It should be pretty easy. We can even go furthor at some point and make it so that the server doesn't even send to nodes that don't locally have the reciever. I like to have these kinds of things felt out a bit in real use cases before we introduce them into the product to avoid becoming a big ball of features :-).

Posted by Steve on February 20, 2007 at 01:56 PM EST #

We haven't gotten to the on_change callback yet but we did just add the mode where DMI is only sent nodes that the object the dmi is called on is resident in.

Posted by Steve on March 03, 2007 at 06:53 PM EST #

thank you admin

Posted by chat on September 08, 2009 at 01:56 AM EST #

Post a Comment:
  • HTML Syntax: Allowed

Webtide

Calendar

Tags

Search

Links

Navigation