Background
I believe that there are significant deficiencies in the proposed websocket protocol and this paper looks at how they can be rectified.
Specification Style
The web socket protocol document has adopted an algorithmic specification style, so that rather than describing the structure of websocket data framing, the document describes an algorithm that parses websocket data framing. This raises esoteric questions like: is an implementation that parses websocket data framing using a different algorithm a compliant implementation or not? But a more practical problem with this style of specification is that the spec is impenetrable as it is full of text like:
Let /b_v/ be integer corresponding to the low 7 bits of
/b/ (the value you would get by _and_ing /b/ with 0x7F).
Multiply /length/ by 128, add /b_v/ to that result, and
store the final result in /length/.
If the high-order bit of /b/ is set (i.e. if /b/ _and_ed
with 0x80 returns 0x80), then return to the step above
labeled _length_.
I challenge the reader to confirm that the client side framing and the server side framing are symmetric and implement the same data framing!
Rather than such verbose means, IETF specifications typically use the precise language of augmented Backus-Naur Form (ABNF RFC5234) to formally describe protocols in a way that is not open to confusion or mis-implementation. To illustrate the clarity possible, I've translated section 4.2 into BNF:
ws0-frame = sentinel-frame
/ length-frame
sentinel-frame = %x00-7F ; frame type
*( %x00-FE ) ; utf-8 data
%xFF ; the sentinel
length-frame = %x80-FF ; frame type
frame-length
octet-data ; binary data
frame-length = unlimited-integer
unlimited-integer = *( %x80-FF ) %x00-7F
; concatenate 7 low order bits from each octet
; to form a binary integer
octet-data = *( %x00-FF )
; the number of octets is exactly the length determined
; by the frame-length
Simplified Framing
ws1-frame = frame-length
frame-type
octet-data
frame-type = 0x00 ; utf8 frame
/ 0x01-FF ; undefined binary frame
frame-length = unlimited-integer
unlimited-integer = *( %x80-FF ) %x00-7F
; concatenate 7 low order bits from each octet
; to form a binary integer
octet-data = *( %x00-FF )
; the number of octets is exactly the length determined
; by the frame-length
Learn from HTTP
Orderly Close
An HTTP/1.1 connection may be closed by either end or by an intermediary as part of normal operation, leaving a degree of uncertainty about the delivery status of messages in the pipeline. Messages can be resent on another connection only if they are known to be idempotent (eg GET / HEAD methods). Similarly, a Websocket may be closed by either end or an intermediary as part of normal operation, leaving a degree of uncertainty about the delivery status of messages that have been sent. But Websocket has no knowledge of any message type, so it is unable to know that any messages are idempotent, thus it is unable to retransmit any messages on a new connection. Worse still, websocket has no concept of an idle connection, and thus an implementation will either keep connections open forever (DOS risk) or rick closing an in-use connection. Note also that the burden of handling disconnection and message retries falls to the application with websocket.
So if a connection closes, a websocket application does not know which messages sent have been recieved, short of acknowledging every message (which is a significant overhead and thus not practicable as a solution for all). However, if websocket can be improved with a mechanism to orderly close connections, then the delivery status of messages can be well known for normal operation and will only be uncertain if there is a real failure of a network or node. Orderly close requires a connection life-cycle to be defined and maintained by exchanging control messages between the end points:
ws2-frame = frame-length
frame-type
octet-data
frame-type = 0x00 ; utf8 frame
/ 0x01 ; control frame
/ 0x02-FF ; undefined binary frame
frame-length = unlimited-integer
unlimited-integer = *( %x80-FF ) %x00-7F
; concatenate 7 low order bits from each octet
; to form a binary integer
octet-data = *( %x00-FF )
; the number of octets is exactly the length
; determined by the frame-length
This improvement creates a control frame type that will allow messages about the lifecycle of a connection to be exchanged. To gloss over the detail, the control messages will need semantics of closing and closed, so an end point or intermediary can know if it is safe to send a message and that once a connection has been orderly closed, it is safe to assume that all message sent previously have been delivered.
Message Fragmentation
A common protocol technique to deal with this issue is to implement message fragmentation, where a single message is transmitted in several frames and the frames of unrelated messages can be interleaved on a single connection. Either a message ID or a channel (== virtual connection) ID is needed to determine which fragments are part of the same message. The following improvement adds fragmentation and a channel ID to websocket:
ws3-message = 1*(ws3-frame)
ws3-frame = frame-length
message-length
channel-id
frame-type
octet-data
frame-type = 0x00 ; control frame
/ 0x01 ; utf8 frame
/ 0x02-FF ; undefined binary frame
frame-length = 0x00 ; last frame of message
/ unlimited-integer ; frame-length
message-length = 0x00 ; unknown message length
/ unlimited-integer ; known message length
channel-id = unlimited-integer
unlimited-integer = *( %x80-FF ) %x00-7F
; concatenate 7 low order bits from each octet
; to form a binary integer
octet-data = *( %x00-FF )
; the number of octets is exactly the length determined
; by the frame-length
A message is terminated when all octets have been sent for a known message length, or when a zero length frame is sent for an unknown message length. Related messages are sent on the same channel id and are strictly ordered. The creation and orderly close of channel-ids can be coordinated by control frames sent on the channel. The implementations of the protocol end points will be responsible for fragmenting and interleaving messages. A simple endpoint may choose not to fragment messages sent, but should be capable of assembling fragmented messages received.
Multiplexing
It is frequently desirable to aggregate (aka multiplexing) message streams from multiple clients and/or components into a single stream of messages, so that resources can be shared and/or load from a single source policed as a single entity. Luckily the machinery needed for multiplexing over a transport protocol is exactly the machinery needed from message fragmentation and channels. Thus the improvements already proposed can accommodate multiplexing.
Flexibility and Extensibility
- compressed content using compress, gzip, or some future compression algorithm
- UTF-16 may be more predictable and/or efficient if messages contain significant numbers of multi-byte characters
Luckily there already exists a standard extensible system for describing content encodings, transport encoding, character sets and/or content types. The IETF standards for Mime Type are widely used by web protocols and have good mappings to existing software components that can encode, decode, compress, decompress, validate, sign and/or display an unlimited and growing family of media types.
Mime types and associated encodings are typically represented by 1 or more name value pairs of ISO-8859-1 strings (aka meta data). It would possible to extend websocket by replacing the fixed octect mapping of content encoding with a per message set of mime-type name value pairs. However, to do so would be to repeat another mistake of HTTP, namely to have verbose highly redundant meta-data transmitted with every message.
A more efficient and equally flexible solution is to associate meta-data fields such as mime type with a channel rather than with a message, so that the meta data need only be sent once and will apply to all subsequent messages in a channel, or until it is replaced by updated meta data:
ws5-message = 1* (ws5-frame)
ws5-frame = frame-length
message-length
channel-id
frame-type
octet-data
frame-type = 0x00 ; control frame
/ 0x01 ; meta-data name+value headers
/ 0x02 ; data frame
/ 0x03-FF ; undefined frame
frame-length = 0x00 ; last frame of message
/ unlimited-integer ; frame-length
message-length = 0x00 ; unknown message length
/ unlimited-integer ; known message length
channel-id = unlimited-integer
unlimited-integer = *( %x80-FF ) %x00-7F
; concatenate 7 low order bits from each octet
; to form a binary integer
octet-data = *( %x00-FF )
; the number of octets is exactly the length
; determined by the frame-length
Other Websocket improvements
Semantic Specification
5. Send the following bytes to the remote side (the server):
47 45 54 20
Send the /resource name/ value, encoded as US-ASCII.
Send the following bytes:
20 48 54 54 50 2F 31 2E 31 0D 0A 55 70 67 72 61
64 65 3A 20 57 65 62 53 6F 63 6B 65 74 0D 0A 43
6F 6E 6E 65 63 74 69 6F 6E 3A 20 55 70 67 72 61
64 65 0D 0A
GET /resource name/ HTTP/1.1 CRLF
Upgrade: WebSocket CRLF
Connection: Upgrade CRLF
HTTP transport
BWTP IETF draft
Other than proposing incremental improvements to websocket, I have also proposed an entirely new protocol. The Bidirectional Web Transfer Protocol (BWTP) is an IETF draft protocol designed to be a transport for the websocket API as well as useful for other web clients. BWTP and the improved websocket protocol are more or less semantically equivalent and the main differences are mostly stylistic.
Either approach significantly improves upon the current websocket proposal and provides a transport protocol that would truly be a step forward.
Posted at 02:32PM Oct 20, 2009 by gregw in General | Comments[8]
Posted by clara williams on November 11, 2009 at 06:35 AM EST #
Posted by William on November 12, 2009 at 09:49 AM EST #
Posted by uberVU - social comments on November 24, 2009 at 06:20 AM EST #
Posted by William Pietri on November 24, 2009 at 07:00 PM EST #
This means you get an error in send() when there is no connection, yet, but none if the connection is already closed. WTF?
Posted by Aaron Digulla on November 24, 2009 at 11:48 PM EST #
I would otherwise agree with moving the UTF-8 part on top the binary frames, but the UTF-8 frames have a property which the binary frames lack: the sender doesn't need to know in advance how much data it's sending.
Posted by Timo on December 01, 2009 at 11:45 PM EST #
Posted by James Hutton on December 31, 2009 at 08:23 AM EST #
Posted by Don Park on January 04, 2010 at 04:48 AM EST #