Zimbra, One of the Largest Collaboration Environments
Now a division of Yahoo!
This one's purely in their words... "In summary, we chose Jetty not only because it supports Comet in a scalable manner, but also because the Continuation implementation of Comet is least disruptive to existing Servlet based technologies." -JJ Zhuang
Polar Rose
Hightide in Scandinavia!
Making a name for online image search
Polar Rose's unique technology gives greater meaning and context to digital photos by allowing them to be indexed online just like text documents. The company developer user friendly, fun, useful, transparent applications that have evolved from computer vision research at the Universities of Lund and Malmo in Sweden. Polar Rose is an active member of the open source community.
Challenges: Polar Rose delivers a Firefox plug-in capable of scanning and passing photographic information to back-end servers. A vibrant community of users tag names on these photos and are the driving force behind identification.
In order to fuel both community growth and indexing capacities, Polar Rose required a scalable two-way dynamic communication solution. To meet its exact needs, it also needed expert advice on implementation and customization.
Solution: Polar Rose now uses the Hightide open source distribution of Jetty from Webtide on its servers. By using the Bayeux protocol for two-way communication implemented in the Cometd project together with Java by Webtide, developers can now use efficient and secure channel-based message system.
Issues such as browser connection limits and sharing are transparently handled by Cometd. Jetty scales to host the communications using its asynchronous capabilities to optimally manage system resources. Add Webtide's Developer Advice and Production Support and Polar Rose now has an economical, small footprint, high performance, open source infrastructure.
Why Polar Rose Chose Webtide: "Hightide currently provides the industry's best Cometd support, which Polar Rose uses to get real-time feedback in its Firefox extension. Hightide also integrates many standard Java technologies in a simple lean and mean package which has removed the need for the more traditional fat application servers that one would otherwise use. Because it integrates so well in the development process and tools, it has allowed us to be significantly more agile in implementing new functionality to our rather complex solution." -Stefan Arentz, System Architect, Polar Rose.
Photacular
Personalized Products from Photos
Photacular enables users to create fun, unique products from their photographs with safe, secure, 24/7 online access. Options include T-shirts, mugs, coasters, jewelry, holiday ornaments, canvas prints, and more. All are custom made.
Challenges: The company sought to launch an easy to use rich web-based application for customizing photo-based items. The infrastructure had to be simple to deploy and highly scalable on commodity hardware in order to cost-effectively serve its customers with fast, reliable access to their products. Photacular needed access to the source code for possible customization and inspection. It also had to be backed by solid and easy-to-access support that would be available 24/7, just as they are.
Solution: Photacular found that combination in Jetty with Webtide Developer Advice. In a period of weeks, Photacular went from concept to deployment, rapidly clearing any questions or obstacles that were in the way. The service is extremely easy to use with dynamic graphics and interactive features. Administration is straight forward as is deployment to hosting services. Its site performance is outstanding, and it is now open for business.
"Working with Webtide and their developer advice system meant that we basically had them as a team member on call. Photacular could focus on its photo and product expertise and we could tap Webtide anytime for their server and Ajax knowledge. I am glad I went with Jetty as my app server." -Darla Weiss, Founder, Photacular
Please see Photacular.com to see it in action.
Cinémathèque
This case study looks at Cinémathèque from PowerSource Software Pty Ltd. It is a digital interactive entertainment system that embeds Jetty as the backend server for the set top box browser.
PowerSource Software is a boutique software developer based in Sydney, Australia. The company has particular expertise in several interesting real-time areas: conventional wagering and gaming systems (totalisator systems for on and off-track betting, lotteries and wide-area keno systems), community gaming systems (trade promotions, competitions and opinion polls via mobile devices using SMS), and IPTV and video on demand (VOD).
PowerSource became active in IPTV and VOD because the company was looking for new ways to capitalise on its core expertise - high speed transaction processing. As luck would have it, the majority of the company's betting systems ran on real-time platforms supplied by Concurrent Computer Corporation, and Concurrent had started to utilise their hardware and real-time operating systems in the development of their MediaHawk video servers. At roughly the same time, the first deployable IP-based set top boxes also appeared. However, commercialisation of these IPTV-related technologies was being stymied by the absence of an affordable way of gluing these sophisticated components together into a customer-based money making enterprise. So PowerSource developed Cinémathèque.
Cinémathèque is a comprehensive solution for service providers offering interactive digital entertainment including IPTV and VOD, for wide area residential, residential multi-dwelling, and hospitality environments. It provides a tightly integrated suite of monitoring, control and support facilities that maximise the features and facilities available to subscribers while minimising the operational burden of the service provider.
In an IPTV-VOD system, a high speed two-way network connects the video servers and management system at the head-end to set top boxes in subscribers' premises. Perhaps the biggest difference between IPTV systems and traditional hybrid-fibre-coax (HFC) pay TV deployments is the speed of the network, and especially the speed of the back channel.
When a viewer selects an on-demand program the set top box sends in a play-out request which needs to be authorised before the video server will start streaming the content. At any time the viewer can stop, pause, rewind or fast-forward the video. It is important to note that the content is streamed across the network and played out in real-time. It is not stored or buffered in the set top box for later play-out - there are no disks in IP set top boxes. All subscriber interactions with the set top box - including play-out control - are transmitted across the network in real-time to the servers at the head-end.
Of course, people will only pay to watch if there is something worthwhile to watch and it's available to them at a convenient time. In this regard, digital video on demand differentiates itself from older hotel movie systems which provide a very narrow range of content, and from traditional subscription TV which offers only "near" video on demand services in which programs start at pre-designated times. The versatility of a digital video on demand system allows a hotel, or a residential IPTV service provider, to offer not only the latest Hollywood movies, but also classics, cult films, documentaries and the crème de la crème of TV. In other words, subscribers can watch what they want, when they want.
It happens that content owners, and the Hollywood studios in particular, go to great lengths to ensure that their valuable property is presented in an appropriate manner. For this reason, most content is delivered to the set top box as a 4 Mbit per second MPEG2 Transport Stream. This bit-rate provides the viewer with a near-DVD quality viewing experience on a standard television. While this generally guarantees that a movie will be seen in the best light, it also makes simultaneously delivering a large number of streams quite challenging. Clearly, there is a big difference between streaming numerous film clips at 64 or 128 Kbps over the net compared to pumping hundreds, if not thousands, of 4 Mbps streams simultaneously. This is especially true considering how easily human eyes and ears can detect jitter in the video and audio resulting from lost frames or uneven play-out. This is the realm of "big-iron" video servers like Concurrent's MediaHawks.
An IP set top box has three principal software components: an operating system, a highly customised web browser and a media player. Many of the better IP set top boxes run Linux - which is either booted out of non-volatile memory or over the network - together with a small footprint version of the Mozilla browser. The browser is heavily customised to cater for the aspect ratio, resolution and colour palette of a standard television. It is also adapted to make it easy to use in the "lean back" environment in which people watch television.
Experience shows that in the lean back environment of the TV room, less hand-eye coordination is required to successfully operate the remote control if "compass" keys are used for navigation instead of a track-ball or other mouse-like device that uses a floating cursor. The compass keys on the remote control let the subscriber navigate and select a hyperlink; the set top box then sends an HTTP request to Cinémathèque which returns the appropriate page in response.
Cinémathèque plays a vital role in a digital entertainment service network because it is responsible for handling all subscriber interactions. Each time a subscriber follows a hyperlink, and each time they request video play-out or select some other supplementary service, Cinémathèque must accept and validate the request, secure a transaction to disk, update the subscriber's account and other persistent data structures, and format and return a suitable response. This workload represents a unique mix of web content requests and complex customer transactions.
Since Cinémathèque essentially provides the virtual shop window for the digital entertainment service provider, it must respond quickly even under considerable load, and even when the content is being generated dynamically. The content itself also has to be thoughtfully designed to facilitate effortless navigation to the items of most interest to a subscriber. The experience has to be more like watching television than surfing the web.
Although it is widely recognised that the architecture and performance of the video servers is vital to satisfy service level expectations, the performance characteristics of the management system - the so-called middleware layer - are often overlooked. But all subscriber activity starts out as an HTTP request to Cinémathèque - only when a response from Cinémathèque includes authorisation to commence video play-out does a set top box actually communicate with a media server. This is why Cinémathèque's heritage is so important: it relies heavily on PowerSource's experience building high performance transaction processing systems.
Cinémathèque is comprised of three primary functional modules:
- Jetty servlet engine
- Javelin transaction processor
- Cinémathèque application core
The Jetty servlet engine provides Cinémathèque with the flexibility of a conventional web server but without the bloat, without the inevitable performance problems, and without the implicit security worries. Jetty is embedded within Cinémathèque and acts as a servlet container and dispatcher. In this role Jetty is reliable, secure and fast. Jetty invokes specialised Cinémathèque servlets in response to requests from set top boxes; these servlets interact with the Cinémathèque application core to provide the necessary services to subscribers.
Javelin is PowerSource's secure, non-stop transaction processing engine - it is written in Java and is the component on which all of Cinémathèque's other application features and facilities are based. Javelin secures all transactions to duplicated disk files - it handles all data mirroring itself rather than delegating this to the operating system, and it provides Cinémathèque with a robust and persistent data store. On a mid-range Linux server, Javelin is capable of recording in excess of 500 transactions per second while maintaining an average response time of less than 100 milliseconds. Javelin also performs an automatic restart and recovery to ensure that no data is lost as a consequence of a system outage.
The Cinémathèque application, too, is written entirely in Java to maximise portability, reliability and flexibility. Cinémathèque supports true video on demand, as well as near video on demand via its multicast scheduler. Since every subscriber interaction is handled by Cinémathèque the number of subscribers watching on-demand programs can be monitored in real-time. Similarly, Cinémathèque also tracks, in real-time, the number of subscribers tuned to each reticulated free-to-air, pay, or multicast TV channel. This permits a service provider to perform very accurate capacity planning as well as knowing what content sells and what doesn't.
Very little of the HTML content returned to the set top box by Cinémathèque is static. Instead, Cinémathèque creates portions of many pages dynamically according to the attributes of the viewer's subscription package, the titles and packages that they've previously purchased, titles that are currently book-marked, and the rating level of the content that the current user is permitted to see (to safeguard children from accessing inappropriate content).
All transactions, including billing transactions, are processed by Cinémathèque in real-time. Cinémathèque gathers operational, statistical and performance data continuously, and records this to its transaction files; this data is available for on-demand display on system administration workstations.
Cinémathèque is set top box independent and supports any number of different types of set top boxes simultaneously within a single deployment. Similarly, it does not rely on any set top box specific features and doesn't require any specialised application software or middleware to be present in the set top box. Adding support for other set top boxes is straight forward and entails adapting several JavaScript functions which are embedded in HTML pages returned to the set top box; this JavaScript accommodates the inevitable differences between the ways that set top box vendors invoke their media players. These are important features in Cinémathèque because they help service providers avoid set top box vendor lock-in.
A virtue of IP-based set top boxes is their uniformity - almost without exception they provide a consistent "application environment" by way of of their standards compliant HTTP, HTML and JavaScript implementations. Indeed, all set top box functions, including invoking, controlling and monitoring the embedded media player are achieved with JavaScript. Although they have the capability to run a Java Virtual Machine, most IP boxes don't for two reasons: firstly, it substantially increases the memory footprint (something to be avoided in a cost sensitive consumer device), and secondly, most boxes don't have sufficient CPU resources to spare (a box with a 400 MHz clock CPU is considered fast).
Cinémathèque returns JavaScript objects to the set top box's browser in response to each HTTP request; the data embedded in these objects is then rendered using JavaScript. This mechanism allows the look-and-feel designer to expose as little or as much of the service or program-related "metadata" to subscribers as they like without the requirement to change any server-side software.
For optimum performance, Cinémathèque comes bundled with a Concurrent Computer Corporation iHawk application server. iHawks run Concurrent's RedHawk Linux operating system which is a POSIX-compliant, real-time version of the open source Linux operating system. RedHawk is based on a standard Red Hat distribution but substitutes the usual kernel with a real-time enhanced one; it provides enhancements that maximise Cinémathèque's performance.
Cinémathèque uses Java's extensive internationalisation support to make locale and language customisation straightforward. Each word and phrase that appears in the Cinémathèque administration client is maintained in a resource bundle - adding support for a new language simply requires adding the appropriate translations to the bundle. Cinémathèque currently supports English, Japanese, Korean, Simplified Chinese and Traditional Chinese and any combination of these languages can be used simultaneously within a single system on both set top boxes and the system's administration workstations.
In residential mode, a subscriber can only access IPTV and other chargeable services after first logging in with their unique client id and password. Cinémathèque lets a subscriber assign a different password to each content rating classification to prevent children from accessing inappropriate material, thereby imposing parental control. In hospitality mode, access to services is controlled by Cinémathèque which receives guest check-in and check-out notifications from the hotel's property management system.
A subscriber can have an unlimited number of simultaneously active rentals and can switch between active rentals and initiate additional rentals at any time. Whenever video play-out is suspended, Cinémathèque automatically sets a bookmark for that rental - the subscriber can resume play-out either from the start of the program or from the bookmark. The subscriber can view their active rental list and review their complete rental history via their set top box at any time. The active rentals and rental history displays are filtered according to the rating level of the current login. Again, this prevents children from seeing references to inappropriate content.
Cinémathèque also has an integral customer loyalty program that works in conjunction with its customer profiling capabilities. The loyalty program provides for standard and VIP customers and reward points can be allocated based on spending behaviour. Accumulated reward points can be redeemed for specially created package deals and service upgrades.
Jetty was selected after PowerSource's engineers had evaluated several servlet engines.
So why did PowerSource choose Jetty?
Firstly, Jetty offered superior performance. Secondly, it was easy to embed within a larger application. In this regard, PowerSource was looking for a servlet engine that didn't "get in the way" of the rest of the larger application. Thirdly, it was particularly important that the servlet engine wasn't a resource hog. And fourthly, Cinémathèque systems are installed at customer sites and are expected to run unattended in a lights out environment - PowerSource was looking for a servlet engine that the engineers could "set and forget".
Jetty's reliability and performance counted highly in its favour because Cinémathèque essentially controls the delivery of premium subscription television services that customers are buying with their discretionary expenditure. In this situation, paying customers don't tolerate service unavailability because they've become accustomed to TV not being interrupted. If the responsiveness of the IPTV service is poor, or if it is unreliable, then customers will buy their entertainment elsewhere.
Finally, as the company's software engineers were making their minds up about Jetty, it became obvious that there was another significant aspect related to performance: namely the super-responsiveness of the team at Mortbay and the enthusiasm of the Jetty users active on the mailing lists.
PowerSource have several new products under development; our positive experience with Jetty, and our ability to rely on it, means that it will remain one of the key components in PowerSource's systems.
Screen shots and diagrams
- Dialogs available on the Cinémathèque system administration client: 1 and 2
- A conceptual IPTV and VOD system: diagram
- Major components inside Cinémathèque: application stack diagram
- Kreatel IP set top box and remote control: photograph
- Examples of HTML pages for set top boxes: index page, rental page, movie page, video on demand
- Photographs of televisions showing example HTML pages: tv1, tv2
Related links
Cinematheque: http://www.powersource.com.au/cineRedHawk Linux: http://www.ccur.com/isd_solutions_redhawklinux.asp
MediaHawk video servers: http://www.ccur.com/vod_default.asp
Kreatel set top boxes: http://www.kreatel.se
Interview with Peter Rodgers of 1060 NetKernelTM
This Jetty Case Study takes a look at an intriguing software infrastructure product called 1060 NetKernelTM.
Announcing the recent release of version 2 of the product, 1060 Research describes NetKernel as "an advanced service oriented microkernel". Complex applications are produced by creating simple services and then aggregating or pipelining them together. NetKernel services interact with the external environment via pluggable transports, such as SMTP, SOAP and - importantly - HTTP.
NetKernel can be used standalone, for example in place of a J2EE application server, or alternatively embedded within a J2EE app server, or in fact embedded in any Java application.
We spoke via email with Peter Rodgers - founder and CEO of 1060 Research and one of the product's architects - to find out more about NetKernel, and how Jetty contributes to it:- You describe NetKernel as a "REST" based microkernel. Firstly, can
you explain what is "REST"?
REST is an acronym for REpresentation State Transfer. It was coined by Roy Fielding in his PhD dissertation which retrospectively presents a formalism of the Web architecture.
What it means in practise is that resources are addressed by URI and instead of a resource being generated by hidden interactions it is generated by the transfer of state to a service - often though not exclusively expressed in the URI.
Whilst this seems like a new pattern when applied in the context of the Web, it is really an old pattern that is basically one of the foundational principles of Unix. In Unix, software applications are modular services which may be configured by switches to process a resource. Higher order applications are created by orchestrating pipelines of lower-level software services - today this design pattern is frequently called "loose coupling".
The NetKernel microkernel allows software applications to be flexibly composed like a Unix system, but employs URI addressing and a URI address space abstraction to present a uniform application context.
- Now can you describe NetKernel for us?
Fundamentally NetKernel is a virtual operating system abstraction which you could describe as "Unix meets the Web". Although, more practically, the microkernel provides the basis for a general purpose Application Server which in-particular supports rich XML processes and services.
Software applications on NetKernel consist of fine-grained URI addressable services which may be composed into higher-order services or abstracted behind higher-level URI interfaces. Basically the Unix model of loose coupling in a URI address space.
NetKernel's URI address space is an *internal* abstraction - composite services may be exposed to the world by mapping their URI address onto a transport.
- You mention request shaping and request scheduling. Does NetKernel
also support load balancing of requests or failover of services?
We perform request shaping between transports and the microkernel scheduler - this ensures that we get close to ideal throughput for any given load[1].
We can do this since the microkernel has a re-entrant asynchronous scheduler - in effect every transport-generated-request initiates the execution of an asynchronous application on NetKernel.
This is different to a free-running multi-threaded model typical in unmanaged application servers. Basically once there are more than one or two threads per native CPU adding more threads does not mean more processing, it just increases the native OS context switching overhead. A well managed system will linearly increase it's throughput with load until all CPUs are fully occupied and then operate with constant throughput as load increases further - the NetKernel throttle ensures that the system operates in this regime.
The NetKernel scheduler actually allows concurrent execution of applications on a single Java thread if necessary.
In terms of system-wide load-balancing - so far we've concentrated on the architectural fundamentals. NetKernel can be embedded within J2EE application servers. So, many Enterprise deployments steal the load-balancing infrastructure of their existing application servers!
We will very soon have native JMS support which will allow NetKernel to be used as highly-scaled general purpose message-oriented middleware.
-
What about hot deployment of services? Graceful service upgrades?
As a microkernel architecture NetKernel is completely modular - it supports hot installation and updates of all services.
It also supports version enforcement on modules which means that you can concurrently execute multiple generations of the same application in isolation on the same system. Good for development and it allows legacy systems to keep running irrespective of future additions.
- Are requests asynchronous, synchronous or either?
NetKernel is an asynchronous architecture. However the microkernel will attempt to optimally reuse threads synchronously such that context switching is minimized.
However, asynchronous applications are generally not easy to build and maintain, so we provide the NetKernel Foundation API which is written so that most applications can be developed using synchronous patterns - even though under the hood they'll execute asynchronously.
At the application level, just like on Unix, applications can fork asynchronous sub-processes - they may also explicitly join a forked sub-process in-order to retrieve a result or handle exceptions.
- Requests are received on transports. What type of transports does
NetKernel support?
A transport on NetKernel is a little like a device-driver. It issues NetKernel URI requests based upon some application or application protocol specific event. It is pretty easy to add new transports.
Out of the box we have HTTP, SMTP, POP, IMAP, Telnet, In-tray (directory based), SOAP 1.1/1.2
The NetKernel administration applications and services run as Web-applications over HTTP on port 1060 - not very subtle subliminal marketing!
- Which brings us to Jetty - what role/s does Jetty play within NetKernel?
We use Jetty as the backbone of our HTTP transport module. HTTP is a very important application protocol since Web-applications are the dominant class of Enterprise application today.
It's interesting, we didn't design NetKernel as a Web-application server - we came from wanting a general resource processing model - but it turns out that it is very simple to expose NetKernel services as Web-applications over the Jetty HTTP transport.
This is quite different from a Servlet which provides a hard-boundary between Web and non-Web. With a Web-application on NetKernel the Web-boundary becomes a continuum - you can also do some cool things like aspect-oriented layering over the web-address space, but that's probably an advanced topic!
Jetty is an outstanding HTTP server.
-
Why did you select Jetty?
We needed a clean, simple HTTP server. To boot strap our system we started off writing our own - it worked, but you wouldn't have wanted to rely on it! So we looked around and discovered Jetty - it had all the features we were looking for...
- A pure HTTP server - we had no need for a Servlet engine.
- Small footprint.
- Great compliance with HTTP 1.0 and 1.1 standards.
- Highly scalable.
- Easy XML-based configurability.
- Widespread adoption.
- Extensible with a custom Request Handler chain.
- An open-source implementation that overlapped with our business model.
Jetty has proved to be an incredibly dependable infrastructure, to the point where we now just don't really think about it!
- Have you written any custom extensions to Jetty for NetKernel, and if
so, what were your experiences?
The NetKernel HTTP transport implements a NetKernel ITransport interface which is managed by the microkernel. When the transport is started it fires up the Jetty container and uses an XML configuration document to declaratively configure Jetty - this includes things like request and thread limits and SSL.
Very early on we developed a custom Request Handler which is hooked into the Jetty request handler stack. This handler hooks all HTTP requests and wraps them as URI requests against the NetKernel URI address space. So, with this handler, Jetty is a thin-bridge from the HTTP application protocol to the NetKernel virtual address space.
That's sort of where Jetty ends and NetKernel begins - though not quite. We've generalized the idea of the Servlet from being a static interface which tightly binds the HTTP protocol. Instead we offer a service called the HTTPBridge - this is a configurable HTTP Request Filter service which can be transparently layered over the URI address space.
The HTTPBridge pre-processes the low-level HTTPRequest. It can be dynamically configured to XML'ize URI parameters or POST data, pre-process file uploads, process HTTP headers, process Cookies etc etc. It is a general-purpose service which can slice and dice the low level HTTPRequest in many ways - including for example performing SOAP HTTP bindings.
The HTTPBridge re-issues the pre-processed request into the NetKernel URI space. Ultimately the Bridge receives a response for the request, for which it then performs any final HTTP specific processing - such as setting HTTP response codes, cookies, etc and of course serializing any generated resource into the HTTPResponse stream.
So the NetKernel HTTP transport is decomposed into clean, easily reconfigurable layers and of course the HTTPRequest/Response objects come from Jetty.
- Are there any features you would like to see in Jetty and why? For
example, you mention NIO in relation to the Http transport ...
Jetty is an excellent solution as it is. We have a philosophy of always trying to keep everything as simple, as lean and as minimal as possible. So we're always looking for more from less. As mentioned above we do this in the kernel by always operating with an efficiently small number of threads. We're also considering a NetKernel Micro-Edition for J2ME embedded applications.
At the moment, actually based on your empirical advice on the Jetty site, we have not used NIO. We'd be very interested to understand better if NIO would offer any advantage in terms of HTTP server footprint etc. Though obviously the performance trade-off would need to be understood.
- It looks like you guys have had fun with NetKernel - I'm referring
in particular to a home monitoring system that you've put together
(http://www.1060.org/blogxter/publish/4). . .
Yes, though NetKernel is a serious software infrastructure, it is completely general purpose. This application was put together by Tony Butterfield as an example of something a little more entertaining - it also means he can talk about how much rain we get from anywhere in the world.
I reckon a good criteria for evaluating software is 'Is this cool?'. I put Jetty into that category straight-away - our hope is that people will have the same reaction the first time they boot up NetKernel.
- What licensing options are there for NetKernel?
We have a dual license business model - basically, NetKernel is on the open-source commons, to use it we request you to OSI license your code. If you are unable to or prefer the additional benefits of a commercial relationship then we offer flexible commercial licensing.
- You've just released NetKernel 2.0, what's next on the horizon?
Immediately we have a 2.0.1 update due in the next few weeks. This ships some trivial patches but more importantly will provide a new JMS transport which didn't make the release cut for the 2.0 product.
Our short-term plans are to keep explaining what NetKernel is! We're finding that once people understand, they really like it - but when anything is fundamentally different it can take a while to get used to.
Next steps for NetKernel are a general Unix-like security infrastructure.
Jetty on the Mort Bay host.
This casestudy describes how Jetty has been used on our own sites, to show that we are "eating our own dogfood". While there is nothing revolutionary in this blog, it is sometimes good to see examples of the ordinary and I believe it is a good example of how the simplicity and flexibility of Jetty has allowed simple things to be done simply.
The jetty host is donated to the Jetty project by Mort Bay Consulting and InetU, and the machine is now not of the highest spec: 500Mhz Celeron with 128MB running FreeBSD at 1061 BogoMIPS ( about half the speed of my aging notebook ). On this machine we run over 13 web contexts for 6 domains in a single Jetty server with a 1.4.1-b21 Sun JVM using the latest release of Jetty 5.
The Sites:The websites run by the server are for diverse purposes and are implemented using diverse technologies:
| /jetty/* | - | The Jetty site is implemented as a serlvet that wraps a look and feel around static content and is deployed as a unpacked web application. |
| /demo/* | - | The demo context is custom context build with a collection of Jetty handlers using the java API called from the Jetty configuration XML. |
| /servlets-examples/* | - | The jakarta servlet examples deployed and run as a packed WAR. |
| /jsp-examples/* | - | The jakarta JSP examples deployed precompiled as a packed WAR. |
| /javadoc/* | - | The jetty javadoc in a jar file deployed as a webapplication. Because there is no WEB-INF structure, the jar is served purely as static content. |
| /cgi-bin/* | - | A context configured to run the Jetty CGI servlet |
| www.mortbay.com/ | - | The Mort Bay site is a look and feel servlet wrapping static content deployed as a standard webapplication. |
| www.mortbay.com/images/holidays | - | A foto diary site, created by deploying a directory structure as a webapplication so that it's static content is served. A HTAccess handler is used to secure access to some areas (nothing exciting I'm afraid). |
| www.mortbay.com/MB | - | This blog site, which uses the excellent Blojsom web application using velocity rendering and log4j |
| www.collettadicastelbianco.com/ | - | A site about the Italian borgo telematico in which I sometime live and work. Written in dynamic JSP2.0 with tag files and heavy use of Servlet 2.4 Filters as aspects. Responsible for me finally liking JSPs |
| www.jsig.com/ | - | Static content web application. |
| www.safari-afrika.com/ | - | Static content web application. |
| www.ncc.com.au/ | - | Static content web application. |
Configuration:
The server is configured from a single Jetty XML file using explicit adding of all contexts rather than automatic discover. Doing this is good for security, but also allows extra configuration to be added for each context, such as customized logging.
While all domains have unique IP addresses, the site is actually configured to treat them as virtual hosts. This allows simpler configuration of a single set of listeners for all contexts. A default root context is also configured to redirect requests without a host header to an appropriate context.
Two listeners (http 8080 & https 8443) are configured using a shared thread pool of max 30 threads. Ipchains is used to redirect port 80 and 8443 to these listeners and the server is run as an unpriviledged user.
Two authentication realms are defined, for jetty demo and jakarta servlet demo. Both use simple property files. The realm name is used to map the realm to the webapplications.
Logging:
Jetty 5 usings commons logging plus the Jetty 4 logger wrapped as a commons logger. This is configured in the jetty.xml to log to a file that is rolled over daily and historic files are kept for 90 days. A specific logger instance is declared for the classes from the colletta web site. This logger is also mapped to the context name so that ServletContext.log calls are also directed to it by jetty and all the log information generated by the app is in one file.
NCSA requests logs are defined for all the main contexts and a catch all request log is defined for those without. The webalizer utility is used to generate regular reports of our loads on each context.
Conclusion:
The whole thing is kept running using the Bernstein daemontools supervise program which calls "java -jar start.jar" with no special parameters.
So really nothing unusual to see here, just business as usual, so move on...
A Shared Jetty Server in a College Environment
by James Robinson
April 2004
When I was systems administrator for the University of North Carolina at Charlotte department of Computer Science, I saw the need to establish an environment for our students to experiment with servlets. The goal was to provide a centralized system for our students to deploy their servlets / web-applications on without having to shoulder the additional burden of administrating their own web / servlet containers. At the time, the department of computer science was a partner in a heavily centralized workstation-based computing system modeled after MIT's Project Athena, using AFS as an enterprise filesystem. Having such a filesystem solved the problem of getting the student's code + web-application data to the servlet container -- just save it in a readable portion of the student's AFS home directory, and the servlet server can pick it up and go.
I developed a workable solution with Jetty version 3. I found the internal architecture of Jetty to follow the 'as simple as things should be, but no simpler' rule, allowing me, a time-constrained sys admin with Java skills, to squeeze off the project in the time allowed so that courses could begin to utilize the service. Jetty's native extensibility allowed me to easily extend its functionality, allowing the students to remote-deploy and administrate their own servlet / JSP code on the machine via webspace gestures.
Implementation
The core bits of this was a new HttpHandler implementation which acted as the main controller. In Jetty 3, HttpHandlers are stacked within HandlerContext objects, which are mapped to URI patterns within the Jetty server itself. The HandlerContext most closely matching the URI named in a HTTP request is asked to handle the request, which it does so through iterating over each its contained HttpHandlers. The HandlerContext containing this HttpHandler implementation was mapped to URI "/~*", so that this handler would be considered to handle a request to "/~username/...". The handler's initial responsibilities were to:- Dynamically build contexts for users on demand by first reference to their own webspace on the machine, such as the first hit to "/~username/*". This handler would look up the user's homedir in a UNIX passwd file containing only students (no system accounts), and then create a new HandlerContext to serve out static content and JSPs out of ~username/public_html in filespace, and dynamically mapped servlets from ~username/public_html/servlets in filespace. The ability to lazily deploy a user's personal context was paramount, since possibly only 20 out of many thousands of possible students would use the server any given day. The newly-created HandlerContext would be preferred by Jetty to serve out requests to "/~username/*" over this handler's context, since the match to "/~username/" was more specific.
- Reload any one of the already-deployed user contexts, so that Jetty would reload any class files that had been recompiled. This was done through merely stopping, removing, and destroying the user context in question (easy, since the HttpHandler implementation maintained a Map of username -> created context). After removal of the old context, we would lazily initialize a new context upon next request to a resource in that context via step 1. This action was done through a custom servlet in the same package which maintained a reference to the HttpHandler via the singleton design pattern. This servlet, when asked via a webspace gesture, would make a protected method call into the HttpHandler to perform this step to user foo's context.
As time went on, additional features were added:
- Web applications. Students could deploy web-applications, either in expanded format or jar'd up into a subordinate webspace of their personal webspace of their own choosing (i.e /~username/mywebapp/*). They could then choose to undeploy, redeploy, or to view the logs generated by this web-application's servlet / JSP code (hooks to personal log sinks per each webapplication). I chose to have the deployed web-applications be 'sticky', living through a server reset. This was accomplished by serializing the Map of loaded web-applications to disk whenever it changed, and to replay it as a log upon server startup. In hindsight, I should have deferred the full reload of a known web-application until a resource within the web-application was actually referenced, reducing the memory footprint of the server, as well as greatly reducing the server startup time (150 webapps can contain quite a lot of XML to parse).
- User authentication realms. Users could configure simple Jetty HashUserRealms via indicating where in their filespace to load in the data for the realm. Realms defined by students in this way were forced to be named relative to their own username, such as 'joe:realm'. The student's web-applications could then contain security constraints referencing user / role mappings of their own choosing.
Security
Security is an issue for any resource shared by students. The servlet allowing users to remote-control their own resources was ultimately made available through SSL, locked down via a HTTP basic security realm backed by interprocess communication to C code to perform the AFS/Kerberos authentication checks given a username / password, allowing the server to accurately trust gestures controlling a given user's resources on the server. A java security policy was installed in the JVM running Jetty, limiting filespace access, as well as disallowing calls to System.exit() and other obvious baddies, as I quickly found out that their JDBC code's SQLException handler often was System.exit(). Unfortunately, the java SecurityManager model cannot protect against many types of 'attacks' brought on by untrusted student code, such as CPU-hogging broken loops, thread-bombs, and the like. A babysitting parent process was quickly written to restart the servlet server if it ever went down, as well as would bounce the server if it had consumed more CPU than it should have (probable student-code busy-loop). Daily restarting the server acted as ultimate garbage collection.AFS supports ACLs on directories, and instead of requiring all servlet-related files to be flagged as world-readable, the servlet server process ran authenticated to AFS as a particular entity which students could grant access to. This reduced the capability of just stealing their classmates code using filesystem-based methods, but they could conceivably just write a servlet to do the same thing. Possibly deeper insight into the java security model could have corrected this.
The RequestDispatcher API was another thorn in the side of security, allowing any single servlet to tunnel through a context / web-application barrier to any other URI valid on the machine, conceivably allowing a nefarious student to snarf up content served by another student's servlets, even if that student had wrapped the servlet in a security constraint.
Symbolic-link misdirection based thefts were not considered at all.
Ultimately, students were warned many times up and down that this was a shared server running your peer's untrustable code, and that you should only be using it for your coursework and explorations into the servlet world. Nothing requiring true tamper-proof security should be deployed on this box.
Lessons Learned
As the service became more and more popular, I wish that I had been able to move it to a bigger box, something other than a non-dedicated Sun Ultra 5 with 256M RAM. Having more memory available to the JVM would have greatly helped out when a 30 student section all tried to deploy SOAP-based application, each using their own jars for apache axis, xalan, etc.Using an inverse-proxy front-end to the system would have allowed splitting the users across multiple JVM / Jetty instances, allowing gains on uptimes (as seen from an individual's perspective, since a server kick to clear out a runaway thread would cause downtime for, say, 50% or 33% of the entire user base, as opposed to 100%). It would also have allowed me to have the service facade running at port 80, as opposed to the truly evil JNI hack I had to do to have Jetty start up as root, bind to port 80, then setuid() away its rights N seconds after startup. Way ugly. After the setuid() call was made, a latch was set in the custom HttpHandler, allowing it to begin servicing user contexts. However, having more than one Jetty instance would have complicated the implementation of the controlling servlet, requiring it to perform RMI when asked to control a non-local user context. This pattern could have been used to scale down to one user per JVM, with the inverse-proxy being able to fork / exec the JVM for the user upon demand, especially with Jetty now having HTTP proxy capability. That would probably be overkill for a student service, but having a static inverse-proxy with a fixed mapping to 2 or 3 Jetty instances (possibly running on distinct machines) would have been a relatively attractive performance and reliability enhancer per the effort.
Impressions from the users were mixed. When all of the code being run on the machine was benign, not memory nor CPU-hoggish, all was well and the students were generally ambivalent -- this service was something that they had to use to pass their coursework, servlet coding / debugging was slower and more cumbersome than 'regular' coding, etc. Having a hot-redeploy-capable container didn't seem whiz-bang to them because they had no other experience in servlet-land. When the machine was unhappy, such as if it was 3 AM on the night before the project was due and one student's code decided to allocate and leak way more memory than it had any right doing, causing the others to begin to get OutOfMemory exceptions left and right, then they were (rightly) annoyed and let me hear about it the next day.
If I were to re-solve the problem today, I would:
- Use some sort of inverse-proxy to smear the load over more than one JVM for higher-availablity, allowing the Jetty instances to bind to an unprivileged port.
- Use the JDK 1.4's internal Kerberos client implementation to authenticate the campus users. Both of these steps would eliminate all C code from the system.
- Run on at least one bigger dedicated machine.
- Encourage the faculty to work with me to ensure that their central APIs can be loaded by the system classloader as opposed to their student's web-application loader so we don't end up with 30 copies of XSTL or SOAP implementations all at once.
- Lazy-load web-applications and auth realms upon first demand instead of at server startup.
- Age-out defined web-applications and auth realms if they have not been referenced in the past X days, so that they'll eventually be forgotten about completely when a student finished the course.
[Copyright James Robinson 2004]
