Message Framing in REST
April 1, 2012 Leave a comment
Most REST designers take message framing for granted; something they get for free from HTTP and don’t need to worry about because it just works. You are probably wondering what motivated me to write about such an obvious and unimportant topic. I wanted to show that REST exposes message framing details to clients. This can cause some issues and may influence your REST design decisions.
The need for message framing
That you cannot send messages directly over TCP is the first difficulty in application protocol design. There is no “send message” function. You read from an input stream by calling a “receive” method, and you write to an output stream by calling a “send” method. However, you cannot assume that a single “send” will result in a single “receive”. Message boundaries are not preserved in TCP.
HTTP uses a combination of message framing techniques, delimiters and prefixing, to send messages over TCP (Figure 1).
Delimiters are predetermined markers placed inside and around messages to help the protocol separate messages and message parts from each other when reading from an input stream. The carriage return – new line (/r/n) pair of characters divide the ASCII character stream of the HTTP message header into lines. Another delimiter, white space, divides the request and status lines into sections. A third delimiter, the colon, separates header names from header values. An empty line marks the end of the header and the beginning of the (optional) message body.
Prefixing works by sending in the first part of messages information about the remaining, variable part. HTTP uses headers for prefixing, instructing the protocol implementation how to handle the message body. Length prefixing is the most important: the Content-Length header tells the protocol implementation how many bytes to read before it reaches the end of the message body.
The message framing details are clearly visible when you look at REST messages (Figure 1) and are also partially exposed in code that generates them (Listing 1).
Not a text-based protocol
That HTTP is a text-based protocol is a widespread misconception. Only the header section is sent as ASCII characters, the message body is sent as a sequence of bytes. This has the consequence that sending text in the message body is not quite as straightforward as you might expect. You need to ensure that both client and server use the same method when converting text to bytes and back.
It is much safer to be explicit about the character set used by setting and reading the character set from the Accept, Accept-Charset, and Content-Type headers than relying on defaults. Client libraries and server frameworks are partially to blame for the text-based protocol misconception because they attempt to convert text using a default character set. The Apache client library uses ISO-8859-1 by default as required by RFC2616 section 3.7.1, but this obviously can cause problems if the server is sending JSON using UTF-8.
Listing 1: Sending a text message to a URL in Java using the Apache HTTP client library
/** * Sends a text message to the given URL * @param message the text to send * @param url where to send it to * @return true if the message was sent, false otherwise **/ public static boolean sendTextMessage(String message, String url) { boolean success = false; HttpClient httpClient = new DefaultHttpClient(); try { HttpPost request = new HttpPost(url); BasicHttpEntity entity = new BasicHttpEntity(); byte[] content = message.getBytes("utf-8"); entity.setContent(new ByteArrayInputStream(content)); entity.setContentLength(content.length); request.setEntity(entity); request.setHeader("Content-Type", "text/plain; charset=utf-8"); HttpResponse response = httpClient.execute(request); StatusLine statusLine = response.getStatusLine(); int statusCode = statusLine.getStatusCode(); success = (statusCode == 200); } catch (IOException e) { success = false; } finally { httpClient.getConnectionManager().shutdown(); } return success; }
Restrictions on headers
The use of delimiters for message framing limits what data can be safely sent in HTTP headers. You will find these limitations in RCF 2616, section 2.2 and section 4.2, but here is a short summary:
- All data need to be represented as ASCII characters
- The delimiter characters used for message framing cannot appear in header names or values
- Header names are further limited to lowercase and uppercase letters and the dash character
- There is also a maximum limit on the length of each header, typically 4 or 8 KB
- It is a convention to start all custom header names not defined in RFC2616 with “X-”
You might occasionally encounter message framing errors because some client library implementations expose the headers without enforcing these rules. If a HTTP framework or intermediary detects a framing error, it discards the request and returns the “400 Bad Request” status code. What may be even worse though, every so often a malformed message will get through, causing weird behavior or a “500 Internal Error” status code and some incomprehensible internal error message. To avoid such hard-to-trace errors do not attempt to send in HTTP headers any data which:
- comes from user input
- is shown to the user
- is persisted
- can grow in size uncapped
- you have no full control over (it is generated by third-party libraries or services)
Keeping protocol data separate from application data
Notice that I did not say don’t use headers at all. Many REST protocols chose not to use them, but this may not be the wisest protocol design decision. Headers and body serve distinct roles in protocol design and both are important.
The message header carries information needed by the protocol implementation itself. The headers tells the protocol what to do, but do not necessarily show what the application is doing. If you are sniffing the headers you are not likely to capture any business information collected and stored by an application.
The message body is where the application data is sent, but it has no or very little influence on how the protocol itself works. Protocols typically don’t interpret the data sent in message bodies and treat it as opaque streams of bytes.
Sending protocol data in the message body creates strong couplings between the various parts of the application, making further evolution difficult. Once I asked someone to return the URI of a newly created resource in the Content-Location header of a POST response, a common HTTP practice. “There is no need”, he said, “the URI is already available as a link in the message body”. This was true, of course, but the generic protocol logic in which I needed this URI was up till then completely independent of any resource representations. Forcing it to parse the URI out of the representations meant that it will likely break the next time the representations changed.
Conclusion
I hope I managed to convince you that message framing in REST is not a mere implementation detail you can safely ignore. Becoming familiar with how it works can help you avoid some common pitfalls and design more robust REST APIs. I discussed only basic HTTP message framing so far. In my next post I’ll talk about more advanced topics like chunking, compression, and multipart messages.
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.