Demystifying REST Constraints

Who do you think the actor is in the sentence “Brenda was happy to interview the actor”, a Hollywood celebrity or a member of the local community theater? The REST constraints described in Roy Fielding’s dissertation enjoy celebrity status. I cannot mention constraints without people immediately assuming that I either talk about them, ignore them, or argue against them. Few realize that I talk about additional application constraints or perhaps about strengthening one of Fielding’s constraints.  I wish I could use another word, but the well-known REST constraints and application-specific constraints aren’t much different and combine to define a concrete RESTful protocol.

I want to be free to talk about constraints because by definition a protocol is a set of rules, many of them constraints. The Wikipedia definition of a communication protocol, “…a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications” mentions rules, but examples from the Merriam-Webster dictionary illustrate even better that the word protocol is synonymous with a set of rules:

  • The soldier’s actions constitute a breach of military protocol [rules or regulations].
  • They did not follow the proper diplomatic protocols [rules or etiquette].
  • What is the proper protocol [rules or conventions] for declining a job offer?

The Geneva Protocol and the Kyoto Protocol set forth international regulation agreed upon by the participating countries. Similarly, the rules of a communication protocol are agreed upon by participants in the communication, not imposed by an external authority. Standards and specifications like RFC 2616 merely formalize the results of the agreement.

A constraint is a rule. What kind of rule is surprisingly hard to define exactly. Some would say a constraint is a rule about what not to do. This is close, but not close enough. Not close enough because any statement about what to do can be rephrased, however awkwardly, to say not to do the opposite. You can say “the protocol must be stateless” or “the protocol must not force the server to keep track of the client’s state”. Either way, it is the same constraint.

Constraints are like good parenting: instead of telling your kinds what to do, you set safe, age-appropriate limits, but otherwise let your kinds live their life and make their own decisions. Protocol constraints are used the same way to ensure safe, reliable, performant communication without restricting what the communication should be about. Constraints are also used to remove unessential variations which would complicate protocol implementation without any clear benefits. For example you can make you protocol simpler if you support JSON only. This is a valid constraint as long as XML support is not essential.

Constraints should never restrict an essential freedom. This is so important that we have an expression for such mistake: to over-constrain. I find it quite unfortunate that the noun constraint derives from the verb to constrain, which has meanings like “restrict by force” or “to secure by or as if by bonds”. These meanings suggest firmness and strength. Some of us subconsciously think of constraints as exceptionally strong and important rules. If somebody ever reminded you in a stern voice of a REST constraint, you know what I mean. To discuss constraints rationally we have to let go of our emotions. We don’t have to enforce constraints more vigorously than other protocol rules.

Let me give you some examples of relatively strict constraints. To show you that my use of constraints is consistent with Fielding’s and that some of his REST constraints are implicit in mine, I’ll gradually relax mine until they become equivalent with two of his.

Imagine a simple HTTP interface for a key-value data store. Clients send GET, PUT, and DELETE requests using URIs as keys and message bodies as values. The following constraint expresses a one-to-one mapping between URIs and message bodies: “As a response to a GET request the server must either return the status code 404 Not Found, or 200 OK and a message body which is a copy of the message body received in the last successful PUT request to the same URI”. Just to show I can, I expressed this as a rule about what the server must do. The remaining constraints are indeed easier to express in terms of what the server must not do, such as:

  1. The server must not return the 404 status code if there was a previous successful PUT request to the same URI
  2. The server must not return other HTTP status codes, for example 302
  3. The server must not allow the byte sequence representing the value assigned to the URI to change by any other means except as a response to an explicit PUT request
  4. The server must not return a message body for any URI for which no value was yet set

… and a few more. These constraints give my protocol some nice, useful properties. I can fully test my server via the protocol, for example.

“This is not REST”, you may object, “where is the hypermedia?” Good point. The hypermedia constraint does not say you must use hypermedia everywhere.  That would be a bit too strict and would render most of the media types from the IANA registry unfit for REST. The constraint says to use hypermedia to indicate what the valid client state transitions are and to help clients discover URIs. In my protocol all client state transitions are valid and the client does not need to discover URIs because it controls the URI space. Hence there is no hypermedia. You may say that the hypermedia constraint is not applicable. I prefer to say that it is satisfied.

To extend the applicability of my protocol beyond key-value data stores I need to relax some of my constraints. To turn my data store into a simple web server I need to eliminate constraint 4 and allow URIs pre-loaded with static content. Unfortunately the loss of constraint 4 also means that my clients no longer know what URIs exist and what they mean. Now I do indeed need to satisfy the hypermedia constraint. There are still many different ways to do it, for example I can provide a site map. Clients can still rely on their knowledge that values associated with URIs never change, use caches that never expire, for example.

Now imagine that my URIs identify Wikipedia articles. I don’t expect articles to change much over time, but I want to allow minor edits or formatting changes by eliminating constraint 3. Without constraint 3, however, message bodies for two successive GET requests may no longer be identical and I lose my ability to programmatically verify URIs still point to valid pages. Indeed, Wikipedia needs volunteers to make sure pages were not vandalized.

I hope I’ve emphasized enough that if I relax my constraints my protocol’s applicability increases but I lose some useful protocol properties. Just how far can I go with relaxing my constraints? Let’s see how far the editors of Wikipedia go. They accept in the page identified by the URI http://en.wikipedia.org/wiki/Tutankhamun any information about the Egyptian pharaoh that meets the quality standards of an encyclopedia. They would definitely not accept information about someone’s pet just because it happens to be named Tutankhamun and they would be equally perplexed to find information about Joe DiMaggio at this URI. Sounds like common sense? I’ve told you that I’m demystifying the REST constraints.

Let me generalize a bit. We expect both URIs and message bodies to mean something. When they are used together in HTTP, we expect their meanings to be related and this relationship between their meanings to remain the same. This is the same as to say that we don’t want to receive (or send) two HTTP message bodies with entirely different meanings from the same URI. This relaxed constraint was implicit in the first version of my protocol, which asked for strict one-to-one mapping between URI and message body. The relaxed version permits many-to-many relationship between the two as long as the relationship between the meaning remains one-to-one.

How useful is this relaxed constraint? The full answer doesn’t fit into this post, but let’s see if we can build a SOAP-like RPC protocol without having HTTP message bodies with different meanings sent to or returned from the same URI. If our service has a single URI, the message bodies mean “invoke method A”, “invoke method B”, and so on. Even if each method has its own URI, the message body sent means “invoke the method” while the message body returned means “this is the result”. Can we say they mean the same thing? I’m not saying the constraint was explicitly formulated just to prevent us from doing RPC over HTTP, only that it has this result as well.

Roy Fielding states the same constraint when he says that message bodies should be representations of resources identified by URIs even though the explanation some people give goes like this (I’m kidding you not): “Two of the REST constraints are the identification of resources and the manipulation of resources by representations. An identifier identifies a resource. A resource is anything that can be identified. A representation can be anything you can send over the network. Got it?” This is not what Fielding says although he says all these things. Each sentence is a quote from his dissertation, but taken out of context, combined into a meaningless blurb, and passed on as REST design wisdom. I won’t attempt to untangle this mess. The dissertation is available online, read at least section 5.2.

Fielding’s REST constraints are neither laws of nature nor religion. Following them does not guarantee success and violating them does not bring inevitable doom. They play the same role as “Stay on trails” signs in our national parks. If you stray from the beaten path there is a slight chance you will discover something wonderful nobody saw before, but you also risk falling into a ravine, starting an avalanche, or being killed by a grizzly bear.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

Message Encoding in REST

The basic HTTP message framing described in my last post has some performance issues. You can think of advanced message framing techniques like chunking, compression, and multipart messages as performance optimizations. The HTTP specification calls them encodings because they alter (encode) the message body for transmission and reconstruct it at arrival. All HTTP 1.1 libraries and frameworks support chunkig, most support compression, and many can handle multipart messages too.

Chunking

You need to know the length of the message body to use the Content-Length header. While this is not an issue with static content like HTML files, it creates performance problems with dynamically generated REST messages. Not only that the entire message body has to be buffered in memory, but the network sits unused during message generation and the CPU is idle during transmission. The performance improves significantly if the two steps are combined and message generation or processing is done in parallel with transmission. To support this, a new message length prefixing model, called chunked transfer encoding, was introduced in HTTP 1.1. It is described in RFC 2616, section 3.6.1

Chunked transfer encoding is length prefixing applied not to the entire message body, but to smaller parts of it (chunks). A chunk starts with the length of data in hexadecimal format separated by a CRLF (carriage return and line feed) sequence from the actual chunk data. Another CRLF pair marks the end of the chunk. After a series of chunks a zero-length chunk signals the end of the message. The presence of the Transfer-Encoding header containing the value “chunked” and the absence of the Content-Length header tell the receiver to read the message body in chunks (Figure 1).

Figure 1: Compressed and chunked HTTP response

Figure 1: Compressed and chunked HTTP response

You don’t need to take any action to enable chunking. Basic message framing is used only for short messages. Longer messages are automatically chunked by HTTP 1.1 implementations. Just remember not to depend on the Content-Length header for message processing because it is not always present. Structure your code so that you can begin processing before receiving the full message body. Use streaming and event-based programming techniques, for example event-based JSON and XML parsers.

Compression

Compression is a data optimization technique used to reduce message sizes, thus network bandwidth usage. It also speeds up message delivery. Compression looks for repeating patterns in data streams and replaces their occurrences with shorter placeholders. How effective this is depends on the type and volume of data. XML and JSON compress quite well, by 60% or more for messages over two kilobytes in length. Experience shows that the generic HTTP compression algorithms are just as effective as specially developed JSON- and XML-aware algorithms.

HTTP compresses the message body before sending it and un-compresses it at arrival. Headers are never compressed. If present, the Content-Length header shows the number of the actual (compressed) bytes sent. Similarly, chunking is applied to the already compressed stream of bytes (Figure 1). Since compression is optional in HTTP 1.1, clients must list the compression algorithms they support in the Accept-Encoding header to receive compressed messages. Likewise, REST services should not send compressed content unless the client is prepared to receive it. Compressed messages are sent with a Content-Encoding header identifying the compression algorithm used.

Client support for HTTP compression is good since well-known public domain algorithms are used. On Android, you can receive gzip compressed messages with the Spring for Android RestTemplate Module. RestKit for Apple iOS is built on top of NSURLConnection, which provides transparent gzip/deflate support for response bodies. On Blackberry you can use HttpConnection with with GZipInputStream. When writing client-side Javascript the XmlHttpRequest implementation decompresses messages automatically.

Multipart messages

As the name suggests, multipart messages contain parts of different types within as a single message body. This helps reduce protocol chattiness, eliminating the need to retrieve each part with a separate HTTP request. In REST this technique is called batching.

Figure 2: Multipart HTTP response

Figure 2: Multipart HTTP response

Multipart encoding was originally developed for email attachments (Multipurpose Internet Mail Extensions or MIME) and later extended to HTTP and the Web. It is described in six related RFC documents: RFC 2045, RFC 2046, RFC 2047, RFC 4288, RFC 4289 and RFC 2049. Figure 2 shows a multipart message example. To receive multipart messages, clients must list the multipart formats they support in the Accept header. The Content-Type header identifies the type of multipart message sent (related, mixed, etc.) and also contains the byte pattern used to mark the boundary between message parts. The mime type of each message part is transmitted separately as shown in Figure 2.

Building and parsing multipart messages is difficult unless supported out-of-the box by libraries or frameworks. You cannot expect clients to write code for it from scratch so you must provide alternative ways of retrieving the data. Common alternatives are embedding links into messages to let clients retrieve parts separately or using JSON or XML collections for embedding multiple message parts of the same type.

REST message examples are always shown in documentation with basic message framing because it is easy to read. Nonetheless, many real-life REST protocols use a combination of chunking, compression, and multipart encoding for better performance. When you develop REST clients, remember that compression and multipart encoding are optional. REST services won’t use them unless the client sends them the proper Accept and Accept-Encoding headers. When you design REST services, consider compressing large messages to save bandwidth and using multipart messages to reduce protocol chattiness.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

Message Framing in REST

Most REST designers take message framing for granted; something they get for free from HTTP and don’t need to worry about because it just works. You are probably wondering what motivated me to write about such an obvious and unimportant topic. I wanted to show that REST exposes message framing details to clients. This can cause some issues and may influence your REST design decisions.

The need for message framing

That you cannot send messages directly over TCP is the first difficulty in application protocol design. There is no “send message” function. You read from an input stream by calling a “receive” method, and you write to an output stream by calling a “send” method. However, you cannot assume that a single “send” will result in a single “receive”. Message boundaries are not preserved in TCP.

HTTP uses a combination of message framing techniques, delimiters and prefixing, to send messages over TCP (Figure 1).

Figure 1: HTTP message framing

Figure 1: HTTP message framing

Delimiters are predetermined markers placed inside and around messages to help the protocol separate messages and message parts from each other when reading from an input stream. The carriage return – new line (/r/n) pair of characters divide the ASCII character stream of the HTTP message header into lines. Another delimiter, white space, divides the request and status lines into sections. A third delimiter, the colon, separates header names from header values. An empty line marks the end of the header and the beginning of the (optional) message body.

Prefixing works by sending in the first part of messages information about the remaining, variable part. HTTP uses headers for prefixing, instructing the protocol implementation how to handle the message body. Length prefixing is the most important: the Content-Length header tells the protocol implementation how many bytes to read before it reaches the end of the message body.

The message framing details are clearly visible when you look at REST messages (Figure 1) and are also partially exposed in code that generates them (Listing 1).

Not a text-based protocol

That HTTP is a text-based protocol is a widespread misconception. Only the header section is sent as ASCII characters, the message body is sent as a sequence of bytes. This has the consequence that sending text in the message body is not quite as straightforward as you might expect. You need to ensure that both client and server use the same method when converting text to bytes and back.

It is much safer to be explicit about the character set used by setting and reading the character set from the Accept, Accept-Charset, and Content-Type headers than relying on defaults. Client libraries and server frameworks are partially to blame for the text-based protocol misconception because they attempt to convert text using a default character set. The Apache client library uses ISO-8859-1 by default as required by RFC2616 section 3.7.1, but this obviously can cause problems if the server is sending JSON using UTF-8.

Listing 1: Sending a text message to a URL in Java using the Apache HTTP client library

    /**
     * Sends a text message to the given URL
     * @param message the text to send
     * @param url where to send it to
     * @return true if the message was sent, false otherwise
     **/
    public static boolean sendTextMessage(String message, String url) {
        boolean success = false;
        HttpClient httpClient = new DefaultHttpClient();
        try {
            HttpPost request = new HttpPost(url);

            BasicHttpEntity entity = new BasicHttpEntity();
            byte[] content = message.getBytes("utf-8");
            entity.setContent(new ByteArrayInputStream(content));
            entity.setContentLength(content.length);
            request.setEntity(entity);
            request.setHeader("Content-Type", "text/plain; charset=utf-8");

            HttpResponse response = httpClient.execute(request);

            StatusLine statusLine = response.getStatusLine();
            int statusCode = statusLine.getStatusCode();
            success = (statusCode == 200);
        } catch (IOException e) {
            success = false;
        } finally {
           httpClient.getConnectionManager().shutdown();
        }
        return success;
    }

Restrictions on headers

The use of delimiters for message framing limits what data can be safely sent in HTTP headers. You will find these limitations in RCF 2616, section 2.2 and section 4.2, but here is a short summary:

  • All data need to be represented as ASCII characters
  • The delimiter characters used for message framing cannot appear in header names or values
  • Header names are further limited to lowercase and uppercase letters and the dash character
  • There is also a maximum limit on the length of each header, typically 4 or 8 KB
  • It is a convention to start all custom header names not defined in RFC2616 with “X-”

You might occasionally encounter message framing errors because some client library implementations expose the headers without enforcing these rules. If a HTTP framework or intermediary detects a framing error, it discards the request and returns the “400 Bad Request” status code. What may be even worse though, every so often a malformed message will get through, causing weird behavior or a “500 Internal Error” status code and some incomprehensible internal error message. To avoid such hard-to-trace errors do not attempt to send in HTTP headers any data which:

  • comes from user input
  • is shown to the user
  • is persisted
  • can grow in size uncapped
  • you have no full control over (it is generated by third-party libraries or services)

Keeping protocol data separate from application data

Notice that I did not say don’t use headers at all. Many REST protocols chose not to use them, but this may not be the wisest protocol design decision. Headers and body serve distinct roles in protocol design and both are important.

The message header carries information needed by the protocol implementation itself. The headers tells the protocol what to do, but do not necessarily show what the application is doing. If you are sniffing the headers you are not likely to capture any business information collected and stored by an application.

The message body is where the application data is sent, but it has no or very little influence on how the protocol itself works. Protocols typically don’t interpret the data sent in message bodies and treat it as opaque streams of bytes.

Sending protocol data in the message body creates strong couplings between the various parts of the application, making further evolution difficult. Once I asked someone to return the URI of a newly created resource in the Content-Location header of a POST response, a common HTTP practice. “There is no need”, he said, “the URI is already available as a link in the message body”. This was true, of course, but the generic protocol logic in which I needed this URI was up till then completely independent of any resource representations. Forcing it to parse the URI out of the representations meant that it will likely break the next time the representations changed.

Conclusion

I hope I managed to convince you that message framing in REST is not a mere implementation detail you can safely ignore.  Becoming familiar with how it works can help you avoid some common pitfalls and design more robust REST APIs. I discussed only basic HTTP message framing so far. In my next post I’ll talk about more advanced topics like chunking, compression, and multipart messages.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

REST Design Philosophies

In my previous post I said that REST is a protocol design technique. Chances are you do not agree or at least not completely. REST can be deceptively simple and frustratingly controversial at the same time. We often find ourselves in the middle of heated debates and are quickly labeled dogmatic RESTafarians or clueless, reckless hackers when expressing an opinion. If you don’t know what I’m talking about, read this well-researched article by Ivan Zuzak or better yet, join a REST discussion group.

Why is REST controversial? Building distributed applications is hard. REST simplifies it by leveraging existing web design patterns and protocols, but many important design tradeoffs remain. Designers resolve tradeoffs based on what they consider important. What is important is not simply a matter of personal opinion, but depends on the kind of application you are building. So instead of best practices everybody agrees on REST has a number of distinct schools of thought or design philosophies; neither inherently good nor bad, neither obviously right nor wrong; each with its champions and detractors.

I’ll attempt to describe four of these design philosophies without claiming any authority in the matter. Others would likely name and describe them differently. I decided to draw quick sketches instead of painting full portraits, capturing the distinctive features and worrying less about getting every detail right.

Hypermedia APIs

This design philosophy admires and tries to replicate the properties of the World Wide Web as a distributed application development platform. Its proponents believe that the phenomenal success of the Web is due to a number of high-level design choices which together define what they call its architectural style. Roy Fielding was the first to describe this architectural style naming it REpresentational State Transfer or REST. For this reason many consider it the only “true” definition of REST.

The single most distinctive feature of this design philosophy is the use of hypermedia or HTTP messages with embedded hyperlinks, hence the name. The links guide clients to discover available resources and operations dynamically. While specific URIs patterns are used, clients are not expected to understand them and are actively discouraged from parsing or building URIs. You can get a better understanding of how Hypermedia APIs are designed and how they work watching Jon Moore build one in this video.

Much of the Hypermedia API discussion focuses on the difficult task of writing complex, large-scale distributed applications and on evolving them over decades. Design attributes like scalability and loose coupling are considered much more important than simplicity and ease of use. This discourages some who claim that while they understand Hypermedia APIs in theory, they find them difficult to use in practice.

Data APIs

If Hypermedia APIs are an architectural style, Data APIs are first and foremost an application integration pattern. Its proponents consider standardization as the most important design goal, believing that exposing the application’s underlying data in a standard format permits easier and more innovative integrations than exposing the application’s complex, cumbersome, and often restrictive business logic. Data APIs are published by organizations wishing to share useful data without giving access to the application which collects it or are added to existing enterprise software to facilitate third-party integrations.

Data APIs use the HTTP methods POST/GET/PUT/DELETE strictly as the CRUD operations Create/Read/Update/Delete. If they expose any application logic at all, they expose it implicitly. For example, if you use a Data API to manipulate purchase orders you obviously expect that products will be shipped and people billed as a result, but this has no direct effect on how the API looks or works. The behavior is simply implied by our shared understanding of what a purchase order is.

The URI structure is standardized and client applications are encouraged to use URI templates for building queries similar to SQL select statements. The format in which the data is returned is also standardized, often based on the ATOM format. For an example of a standardized Data API take a closer look at Microsoft’s Open Data Protocol.

Web APIs

The designers of Web APIs consider achieving large-scale developer adoption as the most important API design challenge. They try to drive adoption by aiming for the widest possible platform support and maintaining a very low barrier of entry.

It is only a slight exaggeration to say that if a feature is not guaranteed to work in Adobe Flash, on an underpowered mobile phone, on a remote Pacific island, through a web proxy, and from behind an ancient firewall from the 90’s, Web API designers won’t use it. If an interface cannot be accessed directly from HTML forms or called by writing less than five lines of JavaScript, Web API designers worry about adoption.

The result is that Web APIs manage to get by using a surprisingly small subset of features. They stick to GET and POST, largely eschew HTTP headers, strongly favor JSON over XML, and rarely use hyperlinks. Information is often passed in URLs, because “URLs are easy to type into the address bar of a browser”. I call them Web APIs because I see this pattern in services offered to developers at large over the Internet (often free of charge) and are primarily used in Web Mashups or equivalent mobile apps. Many of the smaller, simpler APIs listed in the Programmable Web API Directory are Web APIs. Perhaps the best known and most frequently imitated is the Twitter API.

Pragmatic REST APIs

Pragmatists believe that design trade-offs should be resolved based on concrete, project-specific requirements and constraints. They view the indiscriminate application of the same design patterns to widely different problems as a sign of dogmatic thinking, not as disciplined design. Absent specific requirements and constraints, pragmatists prefer to use the simplest possible design. What “simplest” means is again very context dependent. It may mean a design which fits the technologies, frameworks, or tools used the best.

You might argue that Pragmatic REST is not a separate design philosophy since its practitioners freely borrow and mix design approaches from all the other schools. The “pragmatic” label best describes APIs which do not fit neatly into any of the other categories. There are quite a few of these. If you look at the collection of the Google APIs you will discover characteristics of Hypermedia APIs, Data APIs, and Web APIs, but not in their purest forms.

In Lieu of Conclusion

If memory serves me well, the Big REST Debate reached its peak sometime around 2008.  The rhetoric cooled since, largely because the protagonists agreed to disagree, do REST their own way, and mostly ignore each other. But the issue is far from being settled and I doubt it will ever be. When I decided to illustrate the principles and methods of protocol design using REST, I knew that I can’t pretend not to notice the elephant in the room. That’s why I wrote this post.  In my next post I’ll return to talk about REST and protocol design.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

REST and the Art of Protocol Design

RESTful protocol design is a topic more appropriate for a book than a blog post. Despite the catchy title my goal is very modest. I would like to show you that even limited protocol design knowledge can help you understand the essence of REST and the reasoning behind many of its best practices.

I’ll start with a core insight from protocol design, which is that programs need to share pre-existing knowledge to communicate. Without knowing how to separate incoming messages (framing), convert bytes to useful internal representations (decoding), generate correct sequences of messages to accomplish tasks (state machines), and detect and respond to error conditions, programs only see meaningless streams of bytes. Roy Fielding, who coined the term REST, said: “Of course the client has prior knowledge. Every protocol, every media type definition, every URI scheme, and every link relationship type constitutes prior knowledge that the client must know (or learn) in order to make use of that knowledge.”

What method to use to establish the shared pre-existing knowledge is one of the most fundamental protocol design decisions.

The RESTful method is to minimize the need for new (application-specific) knowledge by reusing the existing knowledge of ubiquitous, mature, and stable web protocols. New RESTful application protocols are designed by constraining and further specifying another application protocol, the HTTP protocol. Every REST message is also a valid HTTP message and can be processed with existing HTTP client libraries, browsers, firewalls, web proxies, and server-side web application frameworks. Simple RESTful applications need very little new code at the protocol layer because so much of this code is reused.

It is important to point out that RESTful protocol design fully leverages HTTP as an application protocol, which is fundamentally different from what other HTTP-based protocols do. SOAP is conceived as a protocol layer above HTTP and uses it as a transport protocol. SOAP needs formal WSDL for protocol description and complex code generation tools in part because it does not reuse any of the application protocol features of HTTP. Other protocols like WebDAV extend HTTP with new methods and headers. While such protocol extensions were anticipated in the HTTP specification, many practical difficulties arise when generic HTTP libraries, intermediaries, or frameworks fail to understand the newly added features. WebDAV does not work well over the Internet and failed to see wide-scale adoption. This is why RESTful protocol design avoids the use of HTTP protocol extensions.

A frequently cited example of a carefully designed RESTful protocol is the Atom Publishing Protocol described in RFC 5023. You can tell that it leverages and constrains HTTP as an application protocol after seeing the HTTP specification explicitly referenced 18 times, in sentences like “Any server response or server content modification not explicitly forbidden by this specification or HTTP [RFC2616] is therefore allowed”.

Reusing existing web protocols has many advantages. First among them is a very low barrier of entry and implicit support for a large variety of platforms. Visibility, or the ability of third parties to see and interact with the protocol and provide useful services, is another. Web protocols are extremely stable and resilient, to the point that in 2011 InfoQ could ran the fake news of the release of the HTTP/1.2 as an April’s Fools Day joke. They evolve and combine well thanks to a clever design which fully decouples the concerns of addressing, operations, and representations. Web applications are the best illustration of just how beneficial loose coupling is: you can run all of them in the same generic client, a web browser, without adding or modifying a single line of code.

There is, however, an important difficulty. You cannot use existing protocols as pre-existent knowledge unless you use them precisely as they were meant to be used. Protocol visibility means that server-side frameworks, network intermediaries, and client libraries read and interpret HTTP methods, status codes, and headers and automatically take specific actions, like closing the TCP connection, caching content, or clearing caches. They stop working correctly if you do something unexpected. It is good to know that when people insist on the correct way of doing RESTful protocol design, what they talk about is more than just a personal opinion.

What else do I mean by using web technologies the way they were meant to be used? Ideally, I would like to see all RESTful protocols work with a single generic client like a web browser. Before you jump to any conclusions, let me state that compatibility with browsers is not an actual REST requirement. However, the further away you move from this ideal, the more application-specific knowledge you’ll need to hard-wire into your programs. And, as I said, minimizing application-specific knowledge is at the very core of RESTful protocol design philosophy. This is again good to know, because many common REST best practices directly derive from it.

A practical advice immediately follows from the above discussion, namely that before taking the plunge into RESTful protocol design it is a good idea to study and understand the underlying web technologies. This is not an effortless undertaking. The HTTP 1.1 specification (RFC 2616) alone is 176 pages long, with enough subtleties to justify the publication of several popular books. URIs are not just strings, but also a set of rules, described in the 40 page long RCF 2396. Another 43 pages in RFC 2046 specify the MIME media types used in HTTP requests and responses and we haven’t even talked yet about XML, for which a entire set of specifications is maintained by the W3C. Even the much simpler JSON representation has a formal specification, the 10-page RCF 4627.

I get two strong objections to the above advice.

A frequent objection is this: “Why should I spend time and effort learning these protocol implementation details when I’ve already got code which handles them?” As I’ve already explained in a previous blog post, when I work with existing libraries and frameworks, I work with leaky abstractions, and, as Joel Spolsky said, “…they save us time working, but they don’t save us time learning”. I often find that I need to know the implementation details to use these abstractions correctly.

Another objection is that my advice sounds like coming from someone who completely ignores such fundamental REST principles as stateless interactions, the use of uniform interfaces, or hypertext as the engine of application state (HATEOAS). This is intentional from my part. Not that the principles aren’t important, but they are also powerful abstractions and I find them difficult to understand without concrete examples. If I put building on existing protocols at the very core of my design philosophy, it makes perfect sense to me to start by learning how these protocols work. Since they are built on the same principles, learning them also makes understanding the principles a lot easier. This approach worked well for me personally, but of course, your mileage may vary.

In my next post I’ll continue this topic and talk about various REST design philosophies, slightly different ways designers choose to leverage web protocols depending on which design trade-offs they consider more important.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

API Design meets Protocol Design

In my previous post I showed that software interfaces used in distributed applications have to satisfy two conditions:

  1. should be remotely accessible
  2. should be suitable for use in high latency environments which permit partial failures, have no shared state and no reliable synchronization mechanism

I also argued that technology alone – no matter which technology you choose – only helps you with the first requirement. The concepts, principles, and methods of protocol design can help you meet the second condition.

I’d understand if you were a bit surprised. Protocol design is indeed applicable because API and protocol are two alternative – but equally correct – ways of describing software interfaces. Wikipedia defines API as “… a particular set of rules and specifications that software programs can follow to communicate with each other” and protocol as “… a formal description of digital message formats and the rules for exchanging those messages in or between computing systems and in telecommunications.” We certainly tend to focus more on programming language constructs like methods with parameters when describing APIs and on processes exchanging messages when describing protocols, but ultimately these concepts describe, from a different point of view, how programs communicate with each other.

Consider the familiar real-time stock quote service as an example. We can describe it equally well by showing its API in a programming language (here, in Java):

Listing 1: Java client library interface

/**
* My sample real time stock quotes service
**/
public interface StockQuotes {

   /**
   * Returns a stock quote
   * @param ticker is the symbol of the stock
   * @return the current price of the stocks
   * @throws RemoteException if there is a communication error
   **/
   public float getQuote(String ticker) throws RemoteException;
}

or by showing the messages (requests) it accepts and replies it provides (in our case, a simple RESTful protocol)

Listing 2: REST request and reply

GET /quote/?ticker=AAPL HTTP/1.1
Host: StockQuotes.com
Accept: application/json

returns

HTTP/1.1 200 OK
Date: Fri, 17 Feb 2012 16:59:59 GMT
Content-Type: application/json
Content-Length: 29

{“ticker”:”AAPL”,
“price”:512.68}

The API description feels more natural when working via a client library while the description of the HTTP messages is useful for accessing the service from exotic environments (say Adobe Flash or SAP’s ABAP) for which no client library is provided. The same functionality could be also described as a SOAP Web Service with the following WSDL:

Listing 3: WSDL description

<?xml version="1.0"?>
<definitions name="StockQuoteServiceDefinitions">
   <!-- Namespace declarations omitted for brevity -->
   <message name="GetQuoteRequest">
      <part name="ticker" type="xsd:string"/>
   </message>
   <message name="GetQuoteResponse">
      <part name="price" type="xsd:float"/>
   </message>
   <portType name="StockQuotePortType">
      <operation name="getQuote">
         <input message="tns:GetQuoteRequest"/>
         <output message="tns:GetQuoteResponse"/>
      </operation>
   </portType>
   <binding name="DefaultBinding" type="tns:StockQuotePortType">
      <!-- Details of document literal binding style omitted for brevity -->
   </binding>
   <service name="StockQuoteService">
      <documentation>My sample real time stock quote service</documentation>
      <port name="StockQuotePort" binding="tns:DefaultBinding">
         <soap:address location="http://www.StockQuotes.com/quote/">
      </port>
   </service>
</definitions>

Frameworks like Microsoft’s WCF or Java’s JAX-WS can automatically generate a client library (called a binding) from this description. The generated code looks just like a regular API to the caller, but implements its functionality by making XML (SOAP) requests over HTTP to a server. It would be pointless to debate whether this WSDL describes an API or a protocol. To support automatically generated client bindings it has to describe both.

You might wonder: if API and protocol are equally appropriate for describing software interfaces, why is it that expressions like “Web Services API”, “SOAP API”, “RESTful API”, “Web API”, “Hypermedia API”, or  “Mobile API” are commonplace while few people ever talk about protocols? I don’t claim to know the answer. I can only tell you why I also frequently use the word API when I actually mean protocol.

I’ve noticed that the term protocol is very closely associated with ubiquitous low-level transport protocols like TCP/IP and I’m often concerned that it is not immediately obvious that I’m talking about high-level application protocols like SMTP, HTTP, Atom or LDAP. Since the latter are used just like APIs to build client applications for accessing email, browsing the web, reading news, or looking up someone’s phone number in a corporate directory, I simply call them APIs. It is technically correct and unlikely to be misunderstood. It is certainly more convenient than spelling out “application protocol” every time.

I also sometimes worry that developers may think I want them to write a ton of code, parse complex messages, and handle complex interaction patterns. If they had a college course on protocols, they were probably thought about message framing, byte stuffing, and many other low level details. This is not what I typically mean. Protocols are rarely built from scratch. They are layered on top of existing protocols, extend existing protocols, or constrain (further specify) existing protocols. Plus we have all the various tools. I use the word API in casual conversations to convey a subtle “this is not as hard as you might think” message.

Just to be clear, I’m not saying that calling your remote interface an API is a mistake. It is not. My point is that thinking of it as an application protocol is helpful, especially when you are designing it. It is helpful because it forces you to take the differences between local and remote communication seriously.

When I think in terms of requests and replies instead of function calls, it is hard for me to ignore latency or the possibility of partial failures. When I think in terms of data sent as messages instead of objects, I have no expectations of shared state. When I think in terms of independent processes, I know that there are no reliable methods of synchronizing them. When I think in a language-neutral way, it is a lot less likely that I assume language features which may not be available to some clients or may not work remotely.

The protocol viewpoint also helps me maintain realistic – some may even say skeptical – expectations about tools and technologies I’m using. I rely on them to take care of low-level protocol details and automatically generate repetitive, boilerplate code. I accept, however, that pretty much everything else is my responsibility.

In my next post I’ll talk about how I apply protocol design concepts, principles, and methods to the concrete case of RESTful design and how this helps me understand – and chose from – the many contradictory advice that are out there.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

Challenges of remote interface design

If you listened to some middleware vendors, you’d believe that distributed applications can be easily built by making local APIs remotely accessible. Indeed, as this 10 minute video tutorial illustrates, you can do the latter by simply adding a few annotations to your code. Vendors have products to sell and their focus on promoting ease of use is not entirely surprising.

The problem is that this approach does not work. It was convincingly demonstrated in classic papers like “A Critique of the Remote Procedure Call Paradigm” by Professors Andrew S. Tanenbaum and Robert van Renesse (1988) or more recently by Jim Waldo and his colleagues at Sun Microsystems Laboratories in “A Note on Distributed Computing” (1994).

A simple example will illustrate why. Consider a local FIFO task queue used in desktop applications to maintain the responsiveness of the user interface while processing computation-intensive tasks (Listing 1). The user interface thread creates and places tasks in the queue and a background thread processes them asynchronously (Listing 2).

Listing 1:

/**
 * A FIFO queue of tasks to be executed by a background thread
 * Note: This queue implementation is not thread safe
 */
public class TaskQueue {

    /**
     * @return True if the the queue is empty, False otherwise
     */
    public boolean isEmpty() {...}

    /**
     * Places a task into the queue
     * @param task the task to execute
     */
    public void putTask(Task task) {...}

    /**
     * Retrieves the next task from the queue
     * @return a task to execute
     */
    public Task getTask() {...}
}

Listing 2:

/**
 * Simple client showing the use of the task queue
 */
public class Client {

    final TaskQueue queue = new TaskQueue();

    /**
     * Called when the user chooses to print from the GUI
     */
    public void onPrint() {
        Task printTask = new PrintTask();
        synchronized (queue) {
            queue.putTask(printTask);
            queue.notifyAll();
        }
    }

    /**
     * This method runs in its own thread
     * @throws InterruptedException signals the thread to exit
     */
    public void processTasks() throws InterruptedException {
        Task task;
        while (true) {
           synchronized (queue) {
               while (queue.isEmpty()) {
                  queue.wait();
             }
             task = queue.getTask();
           }
           task.run();
        }
    }
}

Let’s say that we want to move the processing of tasks to a different computer. As platform vendors would quickly point out, making the queue interface (Listing 1) remotely accessible is easily solved with a wide variety of technologies. The difficult problems are latency, possibility of partial failures, lack of shared memory, and synchronization.

Network latency degrades the performance of some API calls by orders of magnitude, rendering them practically unusable. The local queue works well because the overhead of a call to the local putTask() method is both small and predictable. If the queue is on a different computer, a remote call is neither quick nor predictable. Transient network events like packet loss or congestion may block a call for a significant time. Because it no longer prevents the user interface from freezing up, moving the queue to a server renders it useless.

Partial failures lead to non-deterministic behavior and potential data loss. Assume that we kept the queue on the client to eliminate the latency issue discussed above. What happens if the network fails while the server calls the remote getTask() method? If the failure happens before the call is processed by the queue, the server can safely retry it. However, if the failure occurs after the task was removed from the queue, but before it was delivered to the server, the task is lost, and a retry will get the next task from the queue. Since we can no longer use the queue to reliably submit tasks for processing, keeping the queue on the client does not work either.

Lack of shared memory means that object references, pointers, or global state cannot be handled transparently. When a task is transferred to a different computer, the question arises what to do with the other in-memory objects it references. While sometimes it makes sense to move the referenced objects with the task, just as often we needed to look up the corresponding objects on the server. No technology can automatically handle all situations correctly, meaning that unless we change our queue and task implementations, some tasks will fail to execute on a remote server.

The absence of reliable remote synchronization primitives makes even relatively simple local behaviors difficult to replicate in distributed environments. The two local threads use the built-in Java synchronization primitives to communicate with each other (Listing 2). These synchronization methods do not work remotely. Server code written as shown won’t work. We need to rewrite it, perhaps polling the queue at regular intervals. But now the calls to the queue are no longer synchronized, and unless we make the queue implementation fully re-entrant, fatal data corruption will likely occur.

So what all of this boils down to in the end? It shows that all technologies used to make local APIs remotely accessible leak. By leaking I mean that they change the API behavior. And two APIs with different behaviors are not the same API, even if they look the same. Joel Spolsky calls this the Law of Leaky Abstractions, explaining that “one reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.”

The “don’t save us time learning” part is the reason why I don’t like calling the design of remote software interfaces “API design”. While technically correct, this choice of words encourages a false and somewhat dangerous sense of familiarity and comfort. I’ve found a better concept, one which does not swipe under the rug the important issues of latency, partial failures, lack of shared memory and synchronization. I’ll talk about this concept in the next installment.

 

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

Follow

Get every new post delivered to your Inbox.

Join 29 other followers