Taylor Crown has written a short paper regarding Combining the Servlet API and NIO, which has been briefly discussed on the serverside.
NIO Servlets have often been discussed as the holy grail of java web application performance. The promise of efficient buffers and reduced thread loads are very attractive for providing scalable 100% java web servers. Taylor writes about a mockup NIO server that he implemented which shows some of this promise.
Taylors results were not with a real Servlet container running realistic loads. But his results look promising and his approach has inspired me to try and apply it to the Jetty Servlet container.
The fundamental problem with using NIO with servlets is how to combine the non-blocking features of NIO with the blocking streams used by servlets. I have tried several times before to introduce a SocketChannelListener to Jetty, which only used non-blocking NIO semantics to manage idle connections. Connections with active requests were converted to blocking mode, assigned a thread and handled by the servlet container normally. Unfortunately, the cost of manipulating select sets and changing socket modes was vastly greater than any savings. So while this listener did go into production in a few sites, there was no significant gain in scalability and an actual loss in max throughput.
Taylor has tried a different approach, where a producer/consumer model is used to link NIO to servlets via piped streams. A single thread is responsible for reading all incoming packets and placing them in the non-blocking pipes. A pool of worker threads take jobs from a queue of connections with input and does the actual request handling. I have applied this approach to Jetty as follows:
- The PipedInputStream used by Taylor requires all data read to be copied into byte arrays. My natural loathing of data copies lead me to write a ByteBufferInputStream, which allows the NIO direct buffers to be used as the InputStream buffers and then recycled for later use.
- Taylors mock server uses direct NIO writes to copy data from a file to the response. While a great way to send static content, this is not realistic for a servlet container which must treat all content is as dynamic. Thus I wrote SocketChannelOutputStream to map a blocking OutputStream to a non-blocking SocketChannel. It works on the assumption that a write to a NIO stream will rarely return 0 bytes written. I have not well tested this assumption.
- There is no job queue in the Jetty implementation, instead requests are directly delegated to the current Jetty thread pool. The effect of this change is to reduce the thread savings. A thread is required for all simultaneous requests, which is better than a thread per connection, but not as trim as Taylors minimal set of worker threads. A medium sized thread pool is being used as a fixed size job queue.
- Taylors mock server only handled simple requests for static content, which may be handled with a simple 304 response. Thus no requests contained any content of size and neither do all responses. This is not a good test for the movements of real content that most web applications must do. The Jetty test setup is against a more realistic mix of static and dynamic content as well as a reasonable mix of POST requests with content.
This code has been written against Jetty 5.0 and is currently checked into Jetty CVS HEAD in the org.mortbay.http.nio package. So far I have not had time to really optimise or analyse the results, but early indications are that this is no silver bullet.
The initial effects of using the NIO listener is that the latency of the server under low load has doubled, and this latency gets worse with load. The maximum throughput of the server has been reduced by about 10%, but is maintained to much higher levels of load. In fact, with my current test setup I was unable to produce enough load to significantly reduce the throughput. So tecchnically at least, this has delivered on the scalability promise?
The producer/consumer model allows a trade off of some low and mid level performance in return for grace under extreme load. But you have to ask yourself, is this a reasonable trade? Do I want to offer crappy service to 10000 users, or reasonable service to 5000? To answer this, you have to consider the psychology of the users of the system.
Load generators do not have any psychology and are happy to wait out the increasing latency to the limits of the timeouts, often 30 seconds or more. But real users are not so well behaved and often have patience thresholds set well below the timeouts. Unfortunately a common user response to a slowly displaying web page is to hit the retry button, or worse still the shift retry! Having your server handle 1000 requests per second may not be such a great thing if 50% of those requests are retries from upset users.
I suspect that the producer/consumer model may be costing real quality of service in return for good technical numbers. Consider the logical extreme of the job queue within Taylors mock implementation. If sustained load is offered in excess of the level that the workers can handle, then that queue will simply grow and grow. The workers will still be operating at near their optimal throughput, but the latency of all requests served with increase until timeouts start to expire. Throughput is maintained, but well beyond the point of offering a reasonable quality of service.
Even with a limited job queue (as in the Jetty implementation), the simple producer/consumer model suffers from the inability to target resources to where they are best used. The single producer thread gives equal effort towards handling new requests as it does to receiving packets for requests that have already started processing. On a loaded server, it is better to use your resources to clear existing requests so that their resources may be freed for other requests. On a multi-CPU machine, it will be a significant restriction to only allow a single CPU to perform any IO reads, as other CPUs may be delayed from doing useful work or real requests, while one CPU is reading more load onto the system.
Taylors producer/consumers approach is significantly better than my preceding attempts, but has not produced an easy win when applied to a real Servlet container. I am also concerned that the analysis has focused too much on throughput without any due consideration for latency and QOS. This is not to say that this is a dead end. Just that more thought and effort are required if producer/consumer NIO is to match the wonderful job that modern JVMs do with threading.
I plan to leave the SocketChannelListener in the development branch of Jetty for some time to allow further experimentation and analysis. However, I fear that the true benefits of NIO will not be available to java web applications until we look at an API other than Servlets for our content generation.
Posted at 07:46AM Feb 10, 2004 by gregw in General | Comments[0]