Next Generation Jetty and Servlets
Jetty 5.0.0 is out the door and the 2.4 servlet spec is implemented. So what's next for Jetty and what's next for the servlet API? It's been a long journey for Jetty from it's birth in late 1995 to the 5.0.0 release. When the core of Jetty was written, there was no J2ME, J2EE, or even J2SE, nor a servlet API, no non-blocking IO and multiple CPU machines were rare and no JVM used them. The situation is much the same for the servlet API, which has grown from a simple protocol handler to a core component architecture for enterprise solutions.
While these changing environments and requirements have mostly been handled well, the results are not perfect: I have previously blogged about the servlet API problems and Jetty is no longer best of breed when it comes to raw speed or features.
Thus the I believe the mid term future for Jetty and the Servlet API should involve a bit more revolution than evolution. For this purpose the JettyExperimental(JE) branch has been created and is being used to test ideas to greatly improve the raw HTTP performance as well as the application API. This blog introduces JE and some of my ideas for how Jetty and the servlet API could change.
Push Me Pull You
At the root of many problems with the servlet API is that it is a pull-push API, where the servlet
is given control and pulls headers, parameters and other content from the request object before pushing
response code, headers and content at the request object. This style of API, while very convenient for
println style dynamic content generation, has many undesirable consequences:
- The request headers must be buffered in complex/expensive hash structures so that the application can access it in arbitrary order. One could ask why application code should be handling HTTP headers anyway...
- The application code contains the IO loops to read and write content. These IO loops are written assuming the blocking IO API.
- Pull-push API is based on stream, read and writer abstractions, which makes it impossible for the servlet application code to use efficient IO mechanisms such as gather-writes or memory mapped file buffers for static content.
- The response headers must be buffered in complex/expensive hash structures so that applications can set and reset them in arbitrary order. One could ask why application code should be writing HTTP headers anyway...
- The application code needs to be aware of HTTP codes and headers. The API itself provides no support for separating the concerns of content generation and content transport.
From a container implementers point of view, it would be far more efficient for the servlet API to be push-pull, where the container pushes headers, parameters and content into the API as they are parsed from a request and then pulls headers and content from the application as they are needed to construct the response. This would remove the need for applications to do IO, additional buffering, arbitrary ordering and dealing with application developers that don't read HTTP RFCs. Unfortunately a full push-pull API would also push an event driven model onto the application, which is not an easy model to deal with nor suitable for the simple println style of dynamic content generation used for most "hello world" inspired servlets.
The challenge of Jetty and servlet API reform is to allow the container to be written in the efficient push-pull style, but to retain the fundamentals of pull-push in the application development model we have come to know and live with. The way to do this is to change the semantic level of what is being pushed and pulled, so that the container is written to push-pull HTTP headers and bytes of data, but the application is written to pull-push content in a non-IO style.
Content IO
Except for the application/x-www-form-urlencoded
mime-type, the application must perform it's own IO to read content from the request and to
write content to the response. Due to the nature of the servlet API and threading model,
this IO is written assuming blocking semantics.
Thus it is difficult to apply alternative IO methods, such as NIO
Unfortunately the event driven nature of non-blocking IO is incompatible with the servlet threading model, so it is not possible to simply ask developers to start writing IO assuming non-blocking IO semantics or using NIO channels.
The NIO API cannot be effectively used without direct access to the low level IO classes, as low level API is required to efficiently write static content using a file MappedByteBuffer to a WritableByteChannel or to combine content and HTTP header into a single packet without copying using a GatheringByteChannel. The true power of the NIO API cannot be abstracted into InputStreams and OutputStreams. Thus to use NIO, the servlet API must either expose these low levels (bad idea - as NIO might not always be the latest and greatest) or to take away content IO responsibilities from the application developers.
The answer is to take away from the application servlets the responsibility for performing
IO. This has already been done for application/x-www-form-urlencoded, so
why not let the container handle the IO for text/xml, text/html etc.
If the responsible for reading and writing bytes (or characters) was moved to the container,
then the application servlet could code could deal with higher level content Objects
such as org.w3c.dom.Document, java.io.File or java.util.HashMap. Such a container mechanism
would avoid the current need for many webapps to provide their own implementation of a
multipart request class or
Compression filter.
If we look at the client side of HTTP connections, the
java.net package provides the
ContentHandlerFactory mechanism so that
the details of IO and parsing content can be hidden behind a simple call to
getContent(). Adding a similar mechanism (and
a setContent() equivalent) to the servlet API would move the IO responsibility
to the container. The container could push-pull bytes from the content factories and
the application could pull-push high level objects from the same factories.
Note that a content based API does not preclude streaming of content or require that large content be held in memory. Content objects passed to and from the container could include references to content (eg File), content handlers (JAXP handler) or even Readers, Writers, InputStream and OutputStreams.
HTTP Headers
As well as the IO of content, the application is currently responsible for handling the
associated meta-data such as character and content encoding, modification dates and caching control.
This meta-data is mostly well specified in HTTP and MIME RFCs and could be best handled by
the container itself rather than by the application or the libraries bundled with it. For
example it would be far better for the container to handle gzip encoding of content directly
to/from it's private buffers rather than for webapps to bundle their own CompressFilter.
Without knowledge of what HTTP headers that an application uses or in what order they will be accesses, the container is forced to parse incoming requests into expensive hashtables of headers. The vast majority of application do not deal with most headers in a HTTP request, for example with the following request from mozilla-firefox:
an application is likely to only make indirect usage of theGET /MB/search.png HTTP/1.1 Host: www.mortbay.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040614 Firefox/0.8 Accept: image/png,image/jpeg,image/gif;q=0.2,*/*;q=0.1 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 keep-alive: 300 Connection: keep-alive Referer: http://localhost:8080/MB/log/ Cookie: JSESSIONID=1ttffb8ss1idk If-Modified-Since: Fri, 21 Nov 2003 16:59:29 GMT Cache-Control: max-age=0
Host and Cookie
headers and perhaps direct usage of the If-Modified-Since and Accept-Encoding
headers. Yet all these headers and values are available via the HttpServletRequest object to be pulled
by the application at any time during the request processing. Expensive hashmaps are created
and values received as bytes either have to be stringified or buffers kept aside for later
lazy evaluation.
If the application was written at a content level, then most (if not all) HTTP header
handling could be performed by the content factories. For example, if given an org.w3c.dom.Document
to write, the container could set the http headers for a content type of text/xml with
an acceptable charset and encoding selected by server configuration and the request headers.
Once the headers are set, the byte content can be generated accordingly by the container,
but scheduled so that excess buffering is not required and non-blocking IO can be done.
Unfortunately, not all headers will be able to be handled directly from the content objects.
For example, If-Modified-Since headers could be handled for a File content Objects,
but not for a org.w3c.dom.Document. So a mechanism for the application to communicate additional
meta-data will need to be provided.
Summary and Status
JettyExperimental now implements most of HTTP/1.1 is a push-pull architecture that works with
either bio or nio. When using nio, gather writes are used to combine header and content into
a single write and static content is served directed from mapped file buffers. An advanced
NIO scheduler avoid many of the
NIO problems
inherent with a producer/consumer model.
Thus JE is ready as a platform to experiment with the content API ideas introduced above. I plan to initially work toward a pure content based application API and thus to discover what cannot be easily and efficiently fitted into that model. Hopefully what will result is lots of great ideas for the next generation servlet API and a great HTTP infrastructure for Jetty6.
Posted at 04:51AM Sep 25, 2004 by gregw in General | Comments[0]