Demystifying REST Constraints

Who do you think the actor is in the sentence “Brenda was happy to interview the actor”, a Hollywood celebrity or a member of the local community theater? The REST constraints described in Roy Fielding’s dissertation enjoy celebrity status. I cannot mention constraints without people immediately assuming that I either talk about them, ignore them, or argue against them. Few realize that I talk about additional application constraints or perhaps about strengthening one of Fielding’s constraints.  I wish I could use another word, but the well-known REST constraints and application-specific constraints aren’t much different and combine to define a concrete RESTful protocol.

I want to be free to talk about constraints because by definition a protocol is a set of rules, many of them constraints. The Wikipedia definition of a communication protocol, “…a system of digital message formats and rules for exchanging those messages in or between computing systems and in telecommunications” mentions rules, but examples from the Merriam-Webster dictionary illustrate even better that the word protocol is synonymous with a set of rules:

  • The soldier’s actions constitute a breach of military protocol [rules or regulations].
  • They did not follow the proper diplomatic protocols [rules or etiquette].
  • What is the proper protocol [rules or conventions] for declining a job offer?

The Geneva Protocol and the Kyoto Protocol set forth international regulation agreed upon by the participating countries. Similarly, the rules of a communication protocol are agreed upon by participants in the communication, not imposed by an external authority. Standards and specifications like RFC 2616 merely formalize the results of the agreement.

A constraint is a rule. What kind of rule is surprisingly hard to define exactly. Some would say a constraint is a rule about what not to do. This is close, but not close enough. Not close enough because any statement about what to do can be rephrased, however awkwardly, to say not to do the opposite. You can say “the protocol must be stateless” or “the protocol must not force the server to keep track of the client’s state”. Either way, it is the same constraint.

Constraints are like good parenting: instead of telling your kinds what to do, you set safe, age-appropriate limits, but otherwise let your kinds live their life and make their own decisions. Protocol constraints are used the same way to ensure safe, reliable, performant communication without restricting what the communication should be about. Constraints are also used to remove unessential variations which would complicate protocol implementation without any clear benefits. For example you can make you protocol simpler if you support JSON only. This is a valid constraint as long as XML support is not essential.

Constraints should never restrict an essential freedom. This is so important that we have an expression for such mistake: to over-constrain. I find it quite unfortunate that the noun constraint derives from the verb to constrain, which has meanings like “restrict by force” or “to secure by or as if by bonds”. These meanings suggest firmness and strength. Some of us subconsciously think of constraints as exceptionally strong and important rules. If somebody ever reminded you in a stern voice of a REST constraint, you know what I mean. To discuss constraints rationally we have to let go of our emotions. We don’t have to enforce constraints more vigorously than other protocol rules.

Let me give you some examples of relatively strict constraints. To show you that my use of constraints is consistent with Fielding’s and that some of his REST constraints are implicit in mine, I’ll gradually relax mine until they become equivalent with two of his.

Imagine a simple HTTP interface for a key-value data store. Clients send GET, PUT, and DELETE requests using URIs as keys and message bodies as values. The following constraint expresses a one-to-one mapping between URIs and message bodies: “As a response to a GET request the server must either return the status code 404 Not Found, or 200 OK and a message body which is a copy of the message body received in the last successful PUT request to the same URI”. Just to show I can, I expressed this as a rule about what the server must do. The remaining constraints are indeed easier to express in terms of what the server must not do, such as:

  1. The server must not return the 404 status code if there was a previous successful PUT request to the same URI
  2. The server must not return other HTTP status codes, for example 302
  3. The server must not allow the byte sequence representing the value assigned to the URI to change by any other means except as a response to an explicit PUT request
  4. The server must not return a message body for any URI for which no value was yet set

… and a few more. These constraints give my protocol some nice, useful properties. I can fully test my server via the protocol, for example.

“This is not REST”, you may object, “where is the hypermedia?” Good point. The hypermedia constraint does not say you must use hypermedia everywhere.  That would be a bit too strict and would render most of the media types from the IANA registry unfit for REST. The constraint says to use hypermedia to indicate what the valid client state transitions are and to help clients discover URIs. In my protocol all client state transitions are valid and the client does not need to discover URIs because it controls the URI space. Hence there is no hypermedia. You may say that the hypermedia constraint is not applicable. I prefer to say that it is satisfied.

To extend the applicability of my protocol beyond key-value data stores I need to relax some of my constraints. To turn my data store into a simple web server I need to eliminate constraint 4 and allow URIs pre-loaded with static content. Unfortunately the loss of constraint 4 also means that my clients no longer know what URIs exist and what they mean. Now I do indeed need to satisfy the hypermedia constraint. There are still many different ways to do it, for example I can provide a site map. Clients can still rely on their knowledge that values associated with URIs never change, use caches that never expire, for example.

Now imagine that my URIs identify Wikipedia articles. I don’t expect articles to change much over time, but I want to allow minor edits or formatting changes by eliminating constraint 3. Without constraint 3, however, message bodies for two successive GET requests may no longer be identical and I lose my ability to programmatically verify URIs still point to valid pages. Indeed, Wikipedia needs volunteers to make sure pages were not vandalized.

I hope I’ve emphasized enough that if I relax my constraints my protocol’s applicability increases but I lose some useful protocol properties. Just how far can I go with relaxing my constraints? Let’s see how far the editors of Wikipedia go. They accept in the page identified by the URI http://en.wikipedia.org/wiki/Tutankhamun any information about the Egyptian pharaoh that meets the quality standards of an encyclopedia. They would definitely not accept information about someone’s pet just because it happens to be named Tutankhamun and they would be equally perplexed to find information about Joe DiMaggio at this URI. Sounds like common sense? I’ve told you that I’m demystifying the REST constraints.

Let me generalize a bit. We expect both URIs and message bodies to mean something. When they are used together in HTTP, we expect their meanings to be related and this relationship between their meanings to remain the same. This is the same as to say that we don’t want to receive (or send) two HTTP message bodies with entirely different meanings from the same URI. This relaxed constraint was implicit in the first version of my protocol, which asked for strict one-to-one mapping between URI and message body. The relaxed version permits many-to-many relationship between the two as long as the relationship between the meaning remains one-to-one.

How useful is this relaxed constraint? The full answer doesn’t fit into this post, but let’s see if we can build a SOAP-like RPC protocol without having HTTP message bodies with different meanings sent to or returned from the same URI. If our service has a single URI, the message bodies mean “invoke method A”, “invoke method B”, and so on. Even if each method has its own URI, the message body sent means “invoke the method” while the message body returned means “this is the result”. Can we say they mean the same thing? I’m not saying the constraint was explicitly formulated just to prevent us from doing RPC over HTTP, only that it has this result as well.

Roy Fielding states the same constraint when he says that message bodies should be representations of resources identified by URIs even though the explanation some people give goes like this (I’m kidding you not): “Two of the REST constraints are the identification of resources and the manipulation of resources by representations. An identifier identifies a resource. A resource is anything that can be identified. A representation can be anything you can send over the network. Got it?” This is not what Fielding says although he says all these things. Each sentence is a quote from his dissertation, but taken out of context, combined into a meaningless blurb, and passed on as REST design wisdom. I won’t attempt to untangle this mess. The dissertation is available online, read at least section 5.2.

Fielding’s REST constraints are neither laws of nature nor religion. Following them does not guarantee success and violating them does not bring inevitable doom. They play the same role as “Stay on trails” signs in our national parks. If you stray from the beaten path there is a slight chance you will discover something wonderful nobody saw before, but you also risk falling into a ravine, starting an avalanche, or being killed by a grizzly bear.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

Message Framing in REST

Most REST designers take message framing for granted; something they get for free from HTTP and don’t need to worry about because it just works. You are probably wondering what motivated me to write about such an obvious and unimportant topic. I wanted to show that REST exposes message framing details to clients. This can cause some issues and may influence your REST design decisions.

The need for message framing

That you cannot send messages directly over TCP is the first difficulty in application protocol design. There is no “send message” function. You read from an input stream by calling a “receive” method, and you write to an output stream by calling a “send” method. However, you cannot assume that a single “send” will result in a single “receive”. Message boundaries are not preserved in TCP.

HTTP uses a combination of message framing techniques, delimiters and prefixing, to send messages over TCP (Figure 1).

Figure 1: HTTP message framing

Figure 1: HTTP message framing

Delimiters are predetermined markers placed inside and around messages to help the protocol separate messages and message parts from each other when reading from an input stream. The carriage return – new line (/r/n) pair of characters divide the ASCII character stream of the HTTP message header into lines. Another delimiter, white space, divides the request and status lines into sections. A third delimiter, the colon, separates header names from header values. An empty line marks the end of the header and the beginning of the (optional) message body.

Prefixing works by sending in the first part of messages information about the remaining, variable part. HTTP uses headers for prefixing, instructing the protocol implementation how to handle the message body. Length prefixing is the most important: the Content-Length header tells the protocol implementation how many bytes to read before it reaches the end of the message body.

The message framing details are clearly visible when you look at REST messages (Figure 1) and are also partially exposed in code that generates them (Listing 1).

Not a text-based protocol

That HTTP is a text-based protocol is a widespread misconception. Only the header section is sent as ASCII characters, the message body is sent as a sequence of bytes. This has the consequence that sending text in the message body is not quite as straightforward as you might expect. You need to ensure that both client and server use the same method when converting text to bytes and back.

It is much safer to be explicit about the character set used by setting and reading the character set from the Accept, Accept-Charset, and Content-Type headers than relying on defaults. Client libraries and server frameworks are partially to blame for the text-based protocol misconception because they attempt to convert text using a default character set. The Apache client library uses ISO-8859-1 by default as required by RFC2616 section 3.7.1, but this obviously can cause problems if the server is sending JSON using UTF-8.

Listing 1: Sending a text message to a URL in Java using the Apache HTTP client library

    /**
     * Sends a text message to the given URL
     * @param message the text to send
     * @param url where to send it to
     * @return true if the message was sent, false otherwise
     **/
    public static boolean sendTextMessage(String message, String url) {
        boolean success = false;
        HttpClient httpClient = new DefaultHttpClient();
        try {
            HttpPost request = new HttpPost(url);

            BasicHttpEntity entity = new BasicHttpEntity();
            byte[] content = message.getBytes("utf-8");
            entity.setContent(new ByteArrayInputStream(content));
            entity.setContentLength(content.length);
            request.setEntity(entity);
            request.setHeader("Content-Type", "text/plain; charset=utf-8");

            HttpResponse response = httpClient.execute(request);

            StatusLine statusLine = response.getStatusLine();
            int statusCode = statusLine.getStatusCode();
            success = (statusCode == 200);
        } catch (IOException e) {
            success = false;
        } finally {
           httpClient.getConnectionManager().shutdown();
        }
        return success;
    }

Restrictions on headers

The use of delimiters for message framing limits what data can be safely sent in HTTP headers. You will find these limitations in RCF 2616, section 2.2 and section 4.2, but here is a short summary:

  • All data need to be represented as ASCII characters
  • The delimiter characters used for message framing cannot appear in header names or values
  • Header names are further limited to lowercase and uppercase letters and the dash character
  • There is also a maximum limit on the length of each header, typically 4 or 8 KB
  • It is a convention to start all custom header names not defined in RFC2616 with “X-”

You might occasionally encounter message framing errors because some client library implementations expose the headers without enforcing these rules. If a HTTP framework or intermediary detects a framing error, it discards the request and returns the “400 Bad Request” status code. What may be even worse though, every so often a malformed message will get through, causing weird behavior or a “500 Internal Error” status code and some incomprehensible internal error message. To avoid such hard-to-trace errors do not attempt to send in HTTP headers any data which:

  • comes from user input
  • is shown to the user
  • is persisted
  • can grow in size uncapped
  • you have no full control over (it is generated by third-party libraries or services)

Keeping protocol data separate from application data

Notice that I did not say don’t use headers at all. Many REST protocols chose not to use them, but this may not be the wisest protocol design decision. Headers and body serve distinct roles in protocol design and both are important.

The message header carries information needed by the protocol implementation itself. The headers tells the protocol what to do, but do not necessarily show what the application is doing. If you are sniffing the headers you are not likely to capture any business information collected and stored by an application.

The message body is where the application data is sent, but it has no or very little influence on how the protocol itself works. Protocols typically don’t interpret the data sent in message bodies and treat it as opaque streams of bytes.

Sending protocol data in the message body creates strong couplings between the various parts of the application, making further evolution difficult. Once I asked someone to return the URI of a newly created resource in the Content-Location header of a POST response, a common HTTP practice. “There is no need”, he said, “the URI is already available as a link in the message body”. This was true, of course, but the generic protocol logic in which I needed this URI was up till then completely independent of any resource representations. Forcing it to parse the URI out of the representations meant that it will likely break the next time the representations changed.

Conclusion

I hope I managed to convince you that message framing in REST is not a mere implementation detail you can safely ignore.  Becoming familiar with how it works can help you avoid some common pitfalls and design more robust REST APIs. I discussed only basic HTTP message framing so far. In my next post I’ll talk about more advanced topics like chunking, compression, and multipart messages.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

REST and the Art of Protocol Design

RESTful protocol design is a topic more appropriate for a book than a blog post. Despite the catchy title my goal is very modest. I would like to show you that even limited protocol design knowledge can help you understand the essence of REST and the reasoning behind many of its best practices.

I’ll start with a core insight from protocol design, which is that programs need to share pre-existing knowledge to communicate. Without knowing how to separate incoming messages (framing), convert bytes to useful internal representations (decoding), generate correct sequences of messages to accomplish tasks (state machines), and detect and respond to error conditions, programs only see meaningless streams of bytes. Roy Fielding, who coined the term REST, said: “Of course the client has prior knowledge. Every protocol, every media type definition, every URI scheme, and every link relationship type constitutes prior knowledge that the client must know (or learn) in order to make use of that knowledge.”

What method to use to establish the shared pre-existing knowledge is one of the most fundamental protocol design decisions.

The RESTful method is to minimize the need for new (application-specific) knowledge by reusing the existing knowledge of ubiquitous, mature, and stable web protocols. New RESTful application protocols are designed by constraining and further specifying another application protocol, the HTTP protocol. Every REST message is also a valid HTTP message and can be processed with existing HTTP client libraries, browsers, firewalls, web proxies, and server-side web application frameworks. Simple RESTful applications need very little new code at the protocol layer because so much of this code is reused.

It is important to point out that RESTful protocol design fully leverages HTTP as an application protocol, which is fundamentally different from what other HTTP-based protocols do. SOAP is conceived as a protocol layer above HTTP and uses it as a transport protocol. SOAP needs formal WSDL for protocol description and complex code generation tools in part because it does not reuse any of the application protocol features of HTTP. Other protocols like WebDAV extend HTTP with new methods and headers. While such protocol extensions were anticipated in the HTTP specification, many practical difficulties arise when generic HTTP libraries, intermediaries, or frameworks fail to understand the newly added features. WebDAV does not work well over the Internet and failed to see wide-scale adoption. This is why RESTful protocol design avoids the use of HTTP protocol extensions.

A frequently cited example of a carefully designed RESTful protocol is the Atom Publishing Protocol described in RFC 5023. You can tell that it leverages and constrains HTTP as an application protocol after seeing the HTTP specification explicitly referenced 18 times, in sentences like “Any server response or server content modification not explicitly forbidden by this specification or HTTP [RFC2616] is therefore allowed”.

Reusing existing web protocols has many advantages. First among them is a very low barrier of entry and implicit support for a large variety of platforms. Visibility, or the ability of third parties to see and interact with the protocol and provide useful services, is another. Web protocols are extremely stable and resilient, to the point that in 2011 InfoQ could ran the fake news of the release of the HTTP/1.2 as an April’s Fools Day joke. They evolve and combine well thanks to a clever design which fully decouples the concerns of addressing, operations, and representations. Web applications are the best illustration of just how beneficial loose coupling is: you can run all of them in the same generic client, a web browser, without adding or modifying a single line of code.

There is, however, an important difficulty. You cannot use existing protocols as pre-existent knowledge unless you use them precisely as they were meant to be used. Protocol visibility means that server-side frameworks, network intermediaries, and client libraries read and interpret HTTP methods, status codes, and headers and automatically take specific actions, like closing the TCP connection, caching content, or clearing caches. They stop working correctly if you do something unexpected. It is good to know that when people insist on the correct way of doing RESTful protocol design, what they talk about is more than just a personal opinion.

What else do I mean by using web technologies the way they were meant to be used? Ideally, I would like to see all RESTful protocols work with a single generic client like a web browser. Before you jump to any conclusions, let me state that compatibility with browsers is not an actual REST requirement. However, the further away you move from this ideal, the more application-specific knowledge you’ll need to hard-wire into your programs. And, as I said, minimizing application-specific knowledge is at the very core of RESTful protocol design philosophy. This is again good to know, because many common REST best practices directly derive from it.

A practical advice immediately follows from the above discussion, namely that before taking the plunge into RESTful protocol design it is a good idea to study and understand the underlying web technologies. This is not an effortless undertaking. The HTTP 1.1 specification (RFC 2616) alone is 176 pages long, with enough subtleties to justify the publication of several popular books. URIs are not just strings, but also a set of rules, described in the 40 page long RCF 2396. Another 43 pages in RFC 2046 specify the MIME media types used in HTTP requests and responses and we haven’t even talked yet about XML, for which a entire set of specifications is maintained by the W3C. Even the much simpler JSON representation has a formal specification, the 10-page RCF 4627.

I get two strong objections to the above advice.

A frequent objection is this: “Why should I spend time and effort learning these protocol implementation details when I’ve already got code which handles them?” As I’ve already explained in a previous blog post, when I work with existing libraries and frameworks, I work with leaky abstractions, and, as Joel Spolsky said, “…they save us time working, but they don’t save us time learning”. I often find that I need to know the implementation details to use these abstractions correctly.

Another objection is that my advice sounds like coming from someone who completely ignores such fundamental REST principles as stateless interactions, the use of uniform interfaces, or hypertext as the engine of application state (HATEOAS). This is intentional from my part. Not that the principles aren’t important, but they are also powerful abstractions and I find them difficult to understand without concrete examples. If I put building on existing protocols at the very core of my design philosophy, it makes perfect sense to me to start by learning how these protocols work. Since they are built on the same principles, learning them also makes understanding the principles a lot easier. This approach worked well for me personally, but of course, your mileage may vary.

In my next post I’ll continue this topic and talk about various REST design philosophies, slightly different ways designers choose to leverage web protocols depending on which design trade-offs they consider more important.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

API Design meets Protocol Design

In my previous post I showed that software interfaces used in distributed applications have to satisfy two conditions:

  1. should be remotely accessible
  2. should be suitable for use in high latency environments which permit partial failures, have no shared state and no reliable synchronization mechanism

I also argued that technology alone – no matter which technology you choose – only helps you with the first requirement. The concepts, principles, and methods of protocol design can help you meet the second condition.

I’d understand if you were a bit surprised. Protocol design is indeed applicable because API and protocol are two alternative – but equally correct – ways of describing software interfaces. Wikipedia defines API as “… a particular set of rules and specifications that software programs can follow to communicate with each other” and protocol as “… a formal description of digital message formats and the rules for exchanging those messages in or between computing systems and in telecommunications.” We certainly tend to focus more on programming language constructs like methods with parameters when describing APIs and on processes exchanging messages when describing protocols, but ultimately these concepts describe, from a different point of view, how programs communicate with each other.

Consider the familiar real-time stock quote service as an example. We can describe it equally well by showing its API in a programming language (here, in Java):

Listing 1: Java client library interface

/**
* My sample real time stock quotes service
**/
public interface StockQuotes {

   /**
   * Returns a stock quote
   * @param ticker is the symbol of the stock
   * @return the current price of the stocks
   * @throws RemoteException if there is a communication error
   **/
   public float getQuote(String ticker) throws RemoteException;
}

or by showing the messages (requests) it accepts and replies it provides (in our case, a simple RESTful protocol)

Listing 2: REST request and reply

GET /quote/?ticker=AAPL HTTP/1.1
Host: StockQuotes.com
Accept: application/json

returns

HTTP/1.1 200 OK
Date: Fri, 17 Feb 2012 16:59:59 GMT
Content-Type: application/json
Content-Length: 29

{“ticker”:”AAPL”,
“price”:512.68}

The API description feels more natural when working via a client library while the description of the HTTP messages is useful for accessing the service from exotic environments (say Adobe Flash or SAP’s ABAP) for which no client library is provided. The same functionality could be also described as a SOAP Web Service with the following WSDL:

Listing 3: WSDL description

<?xml version="1.0"?>
<definitions name="StockQuoteServiceDefinitions">
   <!-- Namespace declarations omitted for brevity -->
   <message name="GetQuoteRequest">
      <part name="ticker" type="xsd:string"/>
   </message>
   <message name="GetQuoteResponse">
      <part name="price" type="xsd:float"/>
   </message>
   <portType name="StockQuotePortType">
      <operation name="getQuote">
         <input message="tns:GetQuoteRequest"/>
         <output message="tns:GetQuoteResponse"/>
      </operation>
   </portType>
   <binding name="DefaultBinding" type="tns:StockQuotePortType">
      <!-- Details of document literal binding style omitted for brevity -->
   </binding>
   <service name="StockQuoteService">
      <documentation>My sample real time stock quote service</documentation>
      <port name="StockQuotePort" binding="tns:DefaultBinding">
         <soap:address location="http://www.StockQuotes.com/quote/">
      </port>
   </service>
</definitions>

Frameworks like Microsoft’s WCF or Java’s JAX-WS can automatically generate a client library (called a binding) from this description. The generated code looks just like a regular API to the caller, but implements its functionality by making XML (SOAP) requests over HTTP to a server. It would be pointless to debate whether this WSDL describes an API or a protocol. To support automatically generated client bindings it has to describe both.

You might wonder: if API and protocol are equally appropriate for describing software interfaces, why is it that expressions like “Web Services API”, “SOAP API”, “RESTful API”, “Web API”, “Hypermedia API”, or  “Mobile API” are commonplace while few people ever talk about protocols? I don’t claim to know the answer. I can only tell you why I also frequently use the word API when I actually mean protocol.

I’ve noticed that the term protocol is very closely associated with ubiquitous low-level transport protocols like TCP/IP and I’m often concerned that it is not immediately obvious that I’m talking about high-level application protocols like SMTP, HTTP, Atom or LDAP. Since the latter are used just like APIs to build client applications for accessing email, browsing the web, reading news, or looking up someone’s phone number in a corporate directory, I simply call them APIs. It is technically correct and unlikely to be misunderstood. It is certainly more convenient than spelling out “application protocol” every time.

I also sometimes worry that developers may think I want them to write a ton of code, parse complex messages, and handle complex interaction patterns. If they had a college course on protocols, they were probably thought about message framing, byte stuffing, and many other low level details. This is not what I typically mean. Protocols are rarely built from scratch. They are layered on top of existing protocols, extend existing protocols, or constrain (further specify) existing protocols. Plus we have all the various tools. I use the word API in casual conversations to convey a subtle “this is not as hard as you might think” message.

Just to be clear, I’m not saying that calling your remote interface an API is a mistake. It is not. My point is that thinking of it as an application protocol is helpful, especially when you are designing it. It is helpful because it forces you to take the differences between local and remote communication seriously.

When I think in terms of requests and replies instead of function calls, it is hard for me to ignore latency or the possibility of partial failures. When I think in terms of data sent as messages instead of objects, I have no expectations of shared state. When I think in terms of independent processes, I know that there are no reliable methods of synchronizing them. When I think in a language-neutral way, it is a lot less likely that I assume language features which may not be available to some clients or may not work remotely.

The protocol viewpoint also helps me maintain realistic – some may even say skeptical – expectations about tools and technologies I’m using. I rely on them to take care of low-level protocol details and automatically generate repetitive, boilerplate code. I accept, however, that pretty much everything else is my responsibility.

In my next post I’ll talk about how I apply protocol design concepts, principles, and methods to the concrete case of RESTful design and how this helps me understand – and chose from – the many contradictory advice that are out there.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

Challenges of remote interface design

If you listened to some middleware vendors, you’d believe that distributed applications can be easily built by making local APIs remotely accessible. Indeed, as this 10 minute video tutorial illustrates, you can do the latter by simply adding a few annotations to your code. Vendors have products to sell and their focus on promoting ease of use is not entirely surprising.

The problem is that this approach does not work. It was convincingly demonstrated in classic papers like “A Critique of the Remote Procedure Call Paradigm” by Professors Andrew S. Tanenbaum and Robert van Renesse (1988) or more recently by Jim Waldo and his colleagues at Sun Microsystems Laboratories in “A Note on Distributed Computing” (1994).

A simple example will illustrate why. Consider a local FIFO task queue used in desktop applications to maintain the responsiveness of the user interface while processing computation-intensive tasks (Listing 1). The user interface thread creates and places tasks in the queue and a background thread processes them asynchronously (Listing 2).

Listing 1:

/**
 * A FIFO queue of tasks to be executed by a background thread
 * Note: This queue implementation is not thread safe
 */
public class TaskQueue {

    /**
     * @return True if the the queue is empty, False otherwise
     */
    public boolean isEmpty() {...}

    /**
     * Places a task into the queue
     * @param task the task to execute
     */
    public void putTask(Task task) {...}

    /**
     * Retrieves the next task from the queue
     * @return a task to execute
     */
    public Task getTask() {...}
}

Listing 2:

/**
 * Simple client showing the use of the task queue
 */
public class Client {

    final TaskQueue queue = new TaskQueue();

    /**
     * Called when the user chooses to print from the GUI
     */
    public void onPrint() {
        Task printTask = new PrintTask();
        synchronized (queue) {
            queue.putTask(printTask);
            queue.notifyAll();
        }
    }

    /**
     * This method runs in its own thread
     * @throws InterruptedException signals the thread to exit
     */
    public void processTasks() throws InterruptedException {
        Task task;
        while (true) {
           synchronized (queue) {
               while (queue.isEmpty()) {
                  queue.wait();
             }
             task = queue.getTask();
           }
           task.run();
        }
    }
}

Let’s say that we want to move the processing of tasks to a different computer. As platform vendors would quickly point out, making the queue interface (Listing 1) remotely accessible is easily solved with a wide variety of technologies. The difficult problems are latency, possibility of partial failures, lack of shared memory, and synchronization.

Network latency degrades the performance of some API calls by orders of magnitude, rendering them practically unusable. The local queue works well because the overhead of a call to the local putTask() method is both small and predictable. If the queue is on a different computer, a remote call is neither quick nor predictable. Transient network events like packet loss or congestion may block a call for a significant time. Because it no longer prevents the user interface from freezing up, moving the queue to a server renders it useless.

Partial failures lead to non-deterministic behavior and potential data loss. Assume that we kept the queue on the client to eliminate the latency issue discussed above. What happens if the network fails while the server calls the remote getTask() method? If the failure happens before the call is processed by the queue, the server can safely retry it. However, if the failure occurs after the task was removed from the queue, but before it was delivered to the server, the task is lost, and a retry will get the next task from the queue. Since we can no longer use the queue to reliably submit tasks for processing, keeping the queue on the client does not work either.

Lack of shared memory means that object references, pointers, or global state cannot be handled transparently. When a task is transferred to a different computer, the question arises what to do with the other in-memory objects it references. While sometimes it makes sense to move the referenced objects with the task, just as often we needed to look up the corresponding objects on the server. No technology can automatically handle all situations correctly, meaning that unless we change our queue and task implementations, some tasks will fail to execute on a remote server.

The absence of reliable remote synchronization primitives makes even relatively simple local behaviors difficult to replicate in distributed environments. The two local threads use the built-in Java synchronization primitives to communicate with each other (Listing 2). These synchronization methods do not work remotely. Server code written as shown won’t work. We need to rewrite it, perhaps polling the queue at regular intervals. But now the calls to the queue are no longer synchronized, and unless we make the queue implementation fully re-entrant, fatal data corruption will likely occur.

So what all of this boils down to in the end? It shows that all technologies used to make local APIs remotely accessible leak. By leaking I mean that they change the API behavior. And two APIs with different behaviors are not the same API, even if they look the same. Joel Spolsky calls this the Law of Leaky Abstractions, explaining that “one reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to. Code generation tools which pretend to abstract out something, like all abstractions, leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.”

The “don’t save us time learning” part is the reason why I don’t like calling the design of remote software interfaces “API design”. While technically correct, this choice of words encourages a false and somewhat dangerous sense of familiarity and comfort. I’ve found a better concept, one which does not swipe under the rug the important issues of latency, partial failures, lack of shared memory and synchronization. I’ll talk about this concept in the next installment.

 

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

1.1.1. Favor placing API and implementation into separate packages

Note: I’ve picked this Java API Design Checklist item as an illustrative example, not because it is of a particular importance. I’ve added similar details to other checklist items as well and will add more in the future. To see such details, open the list (also available from the main menu at the top of the page) and click on [explain] where available.

Rationale:

Simplicity, Consistency, Safety and Evolution. Java only supports public and package scoped classes. You should obviously never mix public implementation classes with APIs (see checklist item). You can only place package scoped classes into API packages if you are certain they will be never needed in any other (implementation) package. Otherwise developers may inadvertently change their access to public, breaking the encapsulation of the API.

More importantly, Java module systems like OSGi use package boundaries for additional class loader isolation, dependency management and versioning. You won’t be able to take advantage of it if you combine API and implementation into one package.

Do this:

package com.company.product;

public class ApiClass {
   private ImplementationClass m;
}

package com.company.product.internal;

public class ImplementationClass {...}

Don’t do this:

package com.company.product;

public class ApiClass {

   private ImplementationClass m;
}

class ImplementationClass {...} //package scoped

Exceptions:

Very rarely, a small number of package scoped classes are useful when a separate implementation package adds no benefits, only complexity.

 

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

How to use the Java API design checklist

This before-and-after example illustrates how to use the Java API design checklist. We borrowed the original API from the free online eBook “OSGi in Practice” by Neil Bartlett. The author uses the API through the book to illustrate various OSGi features and considers it a “reasonable attempt” at defining an API.

Here is the description of the API strait from the book:

“Thanks to the internet, we are today bombarded by messages of many kinds. Email is an obvious example, but also there are the blogs that we subscribe to through RSS or ATOM feeds; SMS text messages; “microblogging” sites such as Twitter[?] or Jaiku[?]; IM systems such as AOL, MSN or IRC; and perhaps for professionals in certain fields, Reuters or Bloomberg newswire feeds and market updates. Sadly we still need to flip between several different applications to view and respond to these messages. Also there is no coherent way to apply rules and automated processing to all of our inbound messages. For example, I would like a way to apply my spam filters, which do a reasonable job for my email, to my SMS text messages, as spam is increasingly a problem in that medium.”

“To support multiple kinds of message sources, we need to build an abstraction over messages and mailboxes, so a good place to start is to think about what those abstractions should look like. In Java we would represent them as interfaces. Listing 1 contains a reasonable attempt to define a message in the most general way.”

“Objects implementing this interface are really just message headers. The body of the message could be of any type: text, image, video, etc. We need the header object to tell us what type the body data is, and how to access it.”

“Next we need a way to retrieve messages. The interface for a mailbox could look like Listing 2. We need a unique identifier to refer to each message, so we assume that an ID of type long can be generated or assigned by the mailbox implementation. We also assume that the mailbox maintains some temporal ordering of messages, and is capable of telling us about all the new messages available given the ID of the most recent message known about by the reader tool. In this way the tool can notify us only of new messages, rather than ones that we have already read.”

“Many back-end message sources, such as the IMAP protocol for retrieving email, support storing the read/unread state of a message on the server, allowing that state to be synchronized across multiple clients. So, our reader tool needs to notify the mailbox when a message has been read. In other protocols where the back-end message source does not support the read/unread status, this notification can be simply ignored.”

“Finally we need the code for the exceptions that might be thrown. These are shown in Listing 3 and Listing 4.”

Design review

The API design shown below is fairly typical for a first draft. All major functional requirements are met, but there are some remaining design issues. We will do a design review using the Java API design checklist to remember overlooked design requirements, spot mistakes, identify less-than-optimal design choices and opportunities for improvements. We marked the identified issues with “//see …” comments in the listings below. The hyperlinks point to the relevant checklist items in the list.

Listing 1: The original Message (see after)

1   package org.osgi.book.reader.api;  //see 1.2.7, 1.3.1

3   import java.io.InputStream;

5   public interface Message{ //see 2.1.8, 2.2.18, 2.4.1, 2.7.1

7   /**
8   * @return The unique (within this message’s mailbox) message ID. //see 3.9.3
9   */
10  public long getId();

12  /**
13  * @return A human-readable text summary of the message. In some
14  * messaging systems this would map to the "subject" field. //see 3.9.3
15  */
16  public String getSummary();

18  /**
19  * @return The Internet MIME type of the message content. //see 3.9.3
20  */
21  public String getMIMEType(); //see 2.2.3

23  /**
24  * Access the content of the message.
25  *
26  * @throws MessageReaderException
27  */
28  public InputStream getContent() throws MessageReaderException; //see 2.1.3, 3.2.3, 3.4.2

30  }

Listing 2: The original Mailbox (see after)

1   package org.osgi.book.reader.api;

3   public interface Mailbox{ //see 2.1.8, 2.4.1, 2.7.1

5   public static final String NAME_PROPERTY = "mailboxName";

7   /**
8   * Retrieve all messages available in the mailbox.
9   *
10  * @return An array of message IDs.
11  * @throws MailboxException
12  */
13  public long[] getAllMessages() throws MailboxException; //see 3.1.3, 3.2.3, 3.3.8, 3.4.2

15  /**
16  * Retrieve all messages received after the specified message.
17  *
18  * @param id The message ID.
19  * @return An array of message IDs.
20  * @throws MailboxException
21  */
22  public long[] getMessagesSince(long id) throws MailboxException; //see 3.1.3, 3.2.3, 3.3.1, 3.3.8, 3.4.2

24  /**
25  * Mark the specified messages as read/unread on the back-end
26  * messagesource, where supported,e.g.IMAP supports this
27  * feature.
28  *
29  * @param read Whether the specified messages have been read.
30  * @param ids An array of messageIDs.
31  * @throwsMailboxException
32  */
33  public void markRead(boolean read, long[] ids) throws MailboxException; //see 3.1.14, 3.3.25

35  /**
36  * Retrieve the specified messages.
37  *
38  * @param ids The IDs of the messages to be retrieved.
39  * @return Anarray of Messages.
40  * @throws MailboxException
41  */
42  public Message[] getMessages(long[] ids) throws MailboxException; //see 3.1.3, 3.2.3, 3.3.1, 3.3.8, 3.4.2

44  }

Listing 3: The original MessageReaderException (see after)

1   package org.osgi.book.reader.api;

3   public class MessageReaderException extends Exception{ //see 2.1.3, 2.6.2

5   private static final long serialVersionUID = 1L; //see 2.3.2

7   public MessageReaderException(String message) {
8      super(message);
9   }

11  public MessageReaderException(Throwable cause){
12     super(cause);
13  }

15  public MessageReaderException(String message,Throwable cause){
16     super(message,cause);
17  }

19  }

Listing 4: The original MailboxException (see after)

1   package org.osgi.book.reader.api;

3   public class MailboxException extends Exception{ //see 3.4.2, 2.7.1

5   private static final long serialVersionUID = 1L; //see 2.3.2

7   public MailboxException(String message) {
8      super(message);
9   }

11  public MailboxException(Throwable cause){
12     super(cause);
13  }

15  public MailboxException(String message,Throwable cause){
16     super(message,cause);
17  }

19  }

Redesign

Our design review highlighted several omissions, issues, and inconsistencies. Listings 5 6, 7, 8 and 9 show the redesigned API. During the redesign we had many tradeoffs to consider. In addition to the checklist items identified during the design review we considered several others. We show these additional checklist items within square brackets inside comments and using the //also … comments in code. The hyperlinks point to the relevant checklist items in the list.

Listing 5: package overview (package-info.java) for the redesigned API

1   /**
2   * <p>
3   * Provides simple and generic read-only access to messages from a variety
4   * of different sources like email, RSS and Atompub feeds, instant messaging
5   * services, Facebook, SMS and Twitter.</p> [1.3.3]
6   * <p>
7   * All concrete classes implement either {@link org.osgi.book.reader.MessageHeader}
8   * or {@link org.osgi.book.reader.Mailbox}. Together, these two abstract classes define the
9   * generic interface used in the package. All concrete classes are protocol-specific
10  * implementations and extensions like {@link org.osgi.book.reader.EmailHeader} or
11  * {@link org.osgi.book.reader.ImapMailbox}</p> [1.3.5]
12  * <p>
13  * Classes from this package are not intended for direct instantiation.
14  * They have no public constructors. Instead, pre-configured instances of mailboxes
15  * must be retrieved by their name through a JNDI lookup from the naming context
16  * "com/env/mailboxes"</p>
17  * <p>
18  * Classes from this package are not intended for extension. All concrete
19  * classes are final and abstract classes have no public or protected constructors.</p>
20  * <p>
21  * The code sample bellow shows how to read and print out messages
22  * from all configured mailboxes: </p> [1.3.6]
23  *
24  * <pre>
25  *     import org.osgi.book.reader.*;
26  *     import javax.naming.*;
27  *     import java.rmi.RemoteException;
28  *
29  *     try {
30  *         Context initialContext = new InitialContext();
31  *         NamingEnumeration mailboxNames = initialContext.list("com/env/mailboxes");
32  *         while(mailboxNames.hasMore())
33  *         {
34  *             NameClassPair pair = (NameClassPair) mailboxNames.next();
35  *             Mailbox<?> mailbox = (Mailbox<?>) initialContext.lookup(pair.getName());
36  *             for(MessageHeader h : mailbox.readAllMessageHeaders()) {
37  *                  System.out.println(h);
38  *             }
39  *         }
40  *     } catch (NamingException e) {
41  *         e.printStackTrace();
42  *     } catch (RemoteException e) {
43  *         e.printStackTrace();
44  *     }
45  * </pre>
46  *
47  * @version 2.0 [1.3.10]
48  *
49  * <br/>
50  * <p>This sample API is an adaptation of the original published in
51  * <a href="http://njbartlett.name/osgibook.html">"OSGi in Practice"</a> by Neil Bartlett
52  *
53  * <br/>
54  * <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">
55  * <img alt="Creative Commons License" style="border-width:0"
56  * src="http://i.creativecommons.org/l/by-sa/3.0/88x31.png" /></a>
57  *
58  * <br/>
59  * Sample API by Neil Bartlett and Ferenc Mihaly is licensed under a
60  * <a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">
61  * Creative Commons Attribution-ShareAlike 3.0 Unported License</a>. [1.3.12]
62  */
63
64  package org.osgi.book.reader;

Listing 6: Message(Header) after redesign (see before)

1   package org.osgi.book.reader;  

3   import java.io.InputStream;
3a  import java.io.Serializable;
3b  import java.rmi.RemoteException;

4a  /**
4b  * Contains descriptive (structured) information about a message and
4c  * defines methods for retrieving the (unstructured) message body. [2.7.3]
4d  * Read-only (immutable) instances of this abstract type are
4e  * read from a Mailbox implementation. [2.7.5]
4f  * Concrete derived types may offer additional information or functionality, like {@link EmailHeader}. [2.3.4]
4g  * The natural ordering is the order of reception.
4h  * For a code sample, see {@link Mailbox}.. [2.7.6, 2.7.9]
4i  */
5   public abstract class MessageHeader implements Comparable, Serializable { //also 2.2.4, 2.2.8, 2.2.10, 2.3.11, 2.3.12, 2.3.17

6a  MessageHeader() {} //also 2.3.9, 2.3.21

6b  @Override
6c  public boolean equals(Object obj) {...}
6d  @Override
6e  public int hashCode() {...}
6f  @Override
6g  public String toString() {...}
6h  @Override
6i  public int compareTo(Object o) {...} //also 2.3.10, 2.3.11

7   /**
7a  * Returns the unique (within this message’s mailbox) message ID.
8   * @return The message ID.
9   */
10  public long getId() {...}

12  /**
12a * Returns a human-readable text summary of the message.
13  * @return A human-readable text summary of the message. In some
14  * messaging systems this would map to the "subject" field. Not null.
15  */
16  public String getSummary() {...}

18  /**
18a * Returns the Internet MIME type of the message content.
19  * @return The Internet MIME type of the message content. Not null.
20  */
21  public String getMimeType() {...}

23  /**
24  * Access the content of the message from the remote mailbox. [item 3.9.14]
25  *
25a * @return An input stream for reading the message content. Not null.
26  * @throws RemoteException In case of a communication error with a remote mailbox
26  * @throws MailboxException Unrecoverable internal mailbox error
27  */
28  public abstract InputStream streamContent() throws RemoteException, MailboxException;

30  }

Listing 7: Mailbox after redesign (see before)

1   package org.osgi.book.reader;
1a
1b  import java.rmi.RemoteException;
1c  import java.util.Collection;
1c  import java.util.SortedSet;

2a  /**
2b  * Represents a generic and abstract mailbox interface for accessing incoming messages. [2.7.3]
2c  * Several implementations based on various messaging protocols (POP3, IMAP, RSS, etc.) are available.
2d  * Configured instances of mailboxes are retrieved by performing a JNDI lookup. [2.7.5]
2e  * The code sample below prints out the summary of all messages from the default mailbox. [2.7.6]
2f  * <pre>
2g  *     Context context = new InitialContext();
2h  *     Mailbox<?> mb = (Mailbox<?>) context.lookup("com/env/mailbox/default");
2i  *     for(MessageHeader h : mb.readAllMessageHeaders()) {
2j  *         System.out.println(h); //uses toString()
2k  *     }
2l  * </pre>
2m  */
3   public abstract class Mailbox<T extends MessageHeader> implements Comparable {//also 2.1.10, 2.3.11

5   public static final String NAME_PROPERTY = "mailboxName";

6a  Mailbox() {...} //also 2.3.9, 2.3.21

6b  @Override
6c  public boolean equals(Object obj) {...}
6d  @Override
6e  public int hashCode() {...}
6f  @Override
6g  public String toString() {...}
6h  @Override
6i  public int compareTo(Object o) {...} //also 2.3.10, 2.3.11

7   /**
8   * Retrieve all messages available in the mailbox.
9   *
10  * @return The ordered set of all available message headers. Not null.
10a * @throws RemoteException In case of a communication error with a remote mailbox
11  * @throws MailboxException Unrecoverable internal mailbox error
12  */
13  public abstract SortedSet<T> readAllMessageHeaders()
13a                              throws RemoteException, MailboxException; //also 3.1.3, 3.1.9, 3.3.9

15  /**
16  * Retrieve all messages received after the specified message.
17  *
18  * @param last The last read message header; not null.
19  * @return The ordered set of message headers since the lst read message header. Not null.
19a * @throws NullPointerException If parameter is null
19b * @throws RemoteException In case of a communication error with a remote mailbox
20  * @throws MailboxException Unrecoverable internal mailbox error
21  */
22  public abstract SortedSet<T> readMessageHeadersSince(T last)
22a     throws NullPointerException, RemoteException, MailboxException; //also 3.1.3, 3.1.9, 3.3.9, 3.4.6, 3.4.12

24  /**
25  * Mark the specified messages as read on the back-end
26  * messagesource, where supported,e.g.IMAP supports this
27  * feature.
28  *
29  * @param headers The list of message headers to be marked; not null.
29a * @throws NullPointerException If parameter is null
29b * @throws RemoteException In case of a communication error with a remote mailbox
31  * @throws MailboxException Unrecoverable internal mailbox error
32  */
33  public abstract void markRead(Collection<T> headers)
33a     throws NullPointerException, RemoteException, MailboxException; //also 3.1.3, 3.1.9, 3.1.10, 3.3.9, 3.4.6, 3.4.12
33b public abstract void markRead(T header)
33c     throws NullPointerException, RemoteException, MailboxException; //also 3.1.3

34a /**
34b * Mark the specified messages as unread on the back-end
34c * messagesource, where supported,e.g.IMAP supports this
34d * feature.
34e *
34f * @param headers The list of message headers to be marked
29g * @throws NullPointerException If parameter is null
34h * @throws RemoteException In case of a communication error with a remote mailbox
34i * @throws MailboxException Unrecoverable internal mailbox error
34j */
34k public abstract void markUnread(Collection<T> headers)
34l     throws NullPointerException, RemoteException, MailboxException; //also 3.1.3, 3.1.9, 3.3.9, 3.1.10, 3.4.6, 3.4.12
34m public abstract void markUnread(T header)
34n     throws NullPointerException, RemoteException, MailboxException; //also 3.1.3

Listing 8: MessageReaderException after redesign (see before)

(no longer needed)

1   package org.osgi.book.reader;

3   public class MessageReaderException extends Exception{

5   private static final long serialVersionUID = 1L;

7   public MessageReaderException(String message) {
8      super(message);
9   }

11  public MessageReaderException(Throwable cause){
12     super(cause);
13  }

15  public MessageReaderException(String message,Throwable cause){
16     super(message,cause);
17  }

19  }

Listing 9: MailboxException after redesign (see before)

1   package org.osgi.book.reader;

2a  /**
2b  * Signals an unrecoverable internal mailbox error.
2c  */
3   public class MailboxException extends RuntimeException{

5    

7   public MailboxException(String message) {
8      super(message);
9   }

11  public MailboxException(Throwable cause){
12    super(cause);
13  }

15  public MailboxException(String message,Throwable cause){
16     super(message,cause);
17  }

18a /* IMPLEMENTATION STUFF */
18b private static final long serialVersionUID = 1L;
18c
19  }

Discussion

Is the redesigned version better than the original? This is a tricky question, a bit like asking which car is better or which city is more pleasant to live in. Your answer will depend on what aspects you consider important. There is no doubt that we managed to improve many aspects of the API. The new version is noticeably more consistent, safer to use, and better documented. Listing 10 shows the code we wrote to aggregate massages from several mailboxes before and after the redesign. Regardless of your stance on the arrays versus collections debate (some developers dislike generic Java collections, finding them too verbose), you’ll probably agree that the second version is safer to use.

Listing 10: Code for aggregating messages before and after the redesign

1  /**
2  * Before
3  */
4  public static Message[]
5  getAllMessages(Mailbox[] mailboxes) throws MailboxException {
6  	Message[] result = new Message[0];
7  	for (Mailbox mailbox : mailboxes) {
8  		Message[] messages = mailbox.getMessages(mailbox.getAllMessages());
9  		result = Arrays.copyOf(result, result.length + messages.length);
10 		System.arraycopy(messages, 0, result, result.length, messages.length);
11 	}
12 	return result;
13 }

1  /**
2  * After
3  */
4  public static SortedSet<MessageHeader>
5  readAllMessageHeaders(Set<Mailbox<?>> mailboxes) throws RemoteException {
6  	SortedSet<MessageHeader> result = new TreeSet<MessageHeader>();
7  	for(Mailbox<?> mailbox : mailboxes) {
8  		result.addAll(mailbox.readAllMessageHeaders());
9  	}
10 	return result;
11 }

This being said, some of our design choices are not the best. For example, we choose JNDI lookup over public constructors to illustrate the importance of proper documentation, not because we don’t recommend constructors. We used generic abstract classes to illustrate how to improve API safety with strong compile time type checks, intentionally disregarding that some developers may be uncomfortable with Java generics.

This brings us to an important point: the API design checklist is just one of the tools we use for API design and checklist items should not be applied mechanically, without thinking. This is especially true for checklist items introduced with the words Favor, Consider, and Avoid. Considering the perspective of the caller and focusing on improving the main use cases should help decide what design trade-offs to make.

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.