Making it safe

Being safe means avoiding the risk of pain, injury, or material loss. A safety feature is a design element added to prevent inadvertent misuse of dangerous equipment. For example, one pin of the North American electric plug is intentionally wider to prevent incorrect insertion into a socket. But it was Toyota who first generalized the principle of poka-yoke (“mistake avoidance”), making it an essential part of its world-renowned manufacturing process. When similar principles of preventing, avoiding, or correcting human errors are applied to API design, the number of software defects is reduced and programmer productivity improves. Rico Mariani calls this the “pit of success”:

The Pit of Success: in stark contrast to a summit, a peak, or a journey across a desert to find victory through many trials and surprises, we want our customers to simply fall into winning practices by using our platform and frameworks. To the extent that we make it easy to get into trouble we fail.

Preventing unsafe use

Engineers place all dangerous equipment and materials – high voltage, extreme temperature, or poisonous chemicals – safely behind locked doors or inside sealed casings. Programming languages offer controlled access to classes and methods, but time and again we forget to utilize it. We leave public implementation classes in the API package. We forget to declare methods users shouldn’t call as private. We rarely disallow class construction, and seldom declare classes we don’t want callers to extend as final. We declare public interfaces even when we cannot safely accept implementations other than our own. These oversights are the equivalent of leaving the boiler room unlocked. When inadvertent access to implementation details is possible, accidents are likely to happen.

Our next line of defense is type checking. In a nutshell, type checking attempts to catch programming mistakes at the language level, either at compile time in statically typed languages, or at run time in dynamically typed languages. If interested in the details of what type checking can or cannot do for you in various languages, you should read Chris Smith’s excellent “What to know before debating type systems”. For various theoretical and practical reasons, type checking cannot catch all usage errors. It would be ideal if every statically typed API call which compiles executed safely, but present-day compilers are just not sophisticated enough to make it a reality. However, this does not mean that we should not take advantage of type checks where we can. We may be stating the obvious, yet we often see APIs which are not as type safe as they could be. The ObjectOutputStream class from the Java I/O library declares the

final void writeObject(Object obj) throws IOException

method which throws an exception if the argument is not Serializable. The alternative method signature

public final void writeObject(Serializable obj) throws IOException

could turn this runtime verification into a compile time check.

Every time a method only works for a small subset of all possible parameter values we can make it safer by introducing a more restrictive (read: safer) parameter type. Especially string, integer, or map parameter types deserve close examination because we often use these versatile types unsafely in programming. We take advantage of the fact that practically every other type can be converted into a string or represented as a map, and integers can be many more things than just numbers. This may be reasonable or even necessary in implementation code where we often need to call low-level library functions and where we control both caller and callee. APIs are, yet again, special. API safety is very important and we need to consider design trade-offs accordingly.

When evaluating design trade-offs it helps to understand that we are advocating replacing method preconditions with type invariants. This moves all safety-related program logic into a single location, the new type implementation, and relies on automatic type checking to ensure API safety everywhere else. If it removes strong and complex preconditions from multiple methods it is more likely to be worth the effort and additional complexity. For example, we recommend passing URLs as URL objects instead of strings. Many programming languages offer a built-in URL type; precisely because the rules governing what strings are valid URLs are complicated. The obvious trade-off is that callers need to construct an URL object when the URL is available as a string.

Weighing type safety against complexity is a lot like comparing apples and oranges: we must rely on our intuition, use common sense, and get lots of user feedback.  It is worth remembering that API complexity is measured from the perspective of the caller. It is difficult to tell how much the introduction of a custom type increases complexity without writing code for the use cases. Some use cases may become more complex while others may stay the same or even become simpler. In the case of the URL object, handling string URLs is more complex, but returning to a previously visited URL is roughly the same if we keep URL objects in the history list. Using URL objects result in simpler use cases for clients that build URLs from fragments or validate URLs independently from accessing the resource they refer to.

As a third and final line of defense – since type checking alone cannot always guarantee safe execution – all remaining preconditions need to be verified at run time. Very, very rarely performance considerations may dictate that we forgo such runtime checks in low-level APIs, but such cases are the exceptions. In most cases, returning incorrect results, failing with obscure internal errors, or corrupting persisted data is unacceptable API behavior. Errors resulting from incorrect usage (violated preconditions) should be clearly differentiated from those caused by internal problems and should contain messages clearly describing the mistake made by the caller. That a call caused an internal SQL error is not considered a helpful error message.

We should be particularly careful when providing classes for extension because inheritance breaks encapsulation. What does this mean? Protected methods are not a problem. Their safety can be ensured the same way as for public methods. Much bigger issues arise when we allow derived classes to override methods. Overriding is risky because callers may observe inconsistent state from within the method they override (known as the “fragile base class problem”) or may make inconsistent updates (known as the “broken contract problem”). In other words, calling otherwise safe public or protected methods from within overridden methods may be unsafe. There is no language mechanism to prevent access to public and protected methods from within overridden methods, so we often need to add additional runtime checks as illustrated below:

public Job {

   private cancelling = false;

   public void cancel() {
      ...
      cancelling =  true;
      onCancel();
      cancelling = false;
      ...
    }

    //Override this to provide custom cleanup when cancelling
    protected void onCancel() {
    }

    public void execute() {
      if(cancelling) throw IllegalStateException(“Forbidden call to
         execute() from onCancel()”);
      ...
    }
}

It is generally safer to avoid designing for class extension if it is possible. Unfortunately, simple callbacks may also expose similar safety issues, though only public methods are accessible from callbacks. In the example above, the runtime check is still needed after we make onCancel() a callback, since execute() is a public method.

Preventing data corruption

A method can only be considered safe if it preserves the invariant and prevents the caller from making inconsistent changes to internal data. The importance of preserving invariants cannot be overstated. Not long ago, a customer who used the LDAP interface to update their ADS directory reported an issue with one of our products. Occasionally the application became sluggish and consumed a lot of CPU cycles for no apparent reason. After lengthy investigations, we discovered that the customer inadvertently corrupted the directory by making an ADS group a child of its own. We fixed the issue by adding specific runtime checks to our application, but wouldn’t it be safer if the LDAP API didn’t allow you to corrupt the directory in the first place? The Windows administration tools don’t allow this, but since the LDAP interface does, applications still need to watch out for infinite recursions in the group hierarchy.

The invariant must be preserved even when methods fail. In the absence of explicit transaction support, all API calls are assumed atomic. When a call fails, no noticeable side effects are expected.

Special care must be taken when storing references to client side objects internally, as well as when returning internal object references to the client. The client code can unexpectedly modify these objects at any time, creating an invisible and particularly unsafe dependency between the client code (which we ignore) and the internal API implementation (which the client ignores). On the other hand, it is safe to store and return references to immutable objects.

If the object is mutable, it is a great deal safer to make defensive copies before storing or returning it rather than relying on the caller to do it for us. The submit() method in the example below makes defensive copies of jobs before placing them into its asynchronous execution queue, which makes it hard to misuse:

JobManager    jobManager  = ...; //initializing
Job           job = jobManager.createJob(new QueryJob());      

//adding parameters to the job
job.addParameter("query.sql", "select * from users");
job.addParameter("query.dal.connection", "hr_db");      

jobManager.submit(job); //submitting a COPY of the job to the queue      

job.addParameter("query.sql", "select * from locations"); //it is safe!
jobManager.submit(job) //submitting a SECOND job!

For the same reason, we should also avoid methods with “out” or “in-out” parameters in APIs, since they directly modify objects declared in client code. Such parameters frequently force the caller to make defensive copies of the objects prior to the method call. The .Net Socket.Select() method usage pattern shown bellow made Michi Henning frustrated enough to complain about it in his “API Design Matters“:

ArrayList readList = ...;   // Creating sockets to monitor for reading
ArrayList writeList = ...;  // Creating sockets to monitor for writing
ArrayList errorList;        // Sockets to monitor for errors.

while(!done) {

    SocketList readReady  = readList.Clone();  //making defensive copy
    SocketList writeReady = writeList.Clone(); //making defensive copy
    SocketList errorList  = readList.Clone();  //making defensive copy

    Socket.Select(readReady, writeReady, errorList, 10000);
         // readReady, writeReady, errorList were modified!
    …
}

Finally, APIs should be safe to use in multi-threaded code. Sidestepping the issue with a “this API is not thread safe” comment is no longer acceptable. APIs should be either fully re-entrant (all public methods are safe to call from multiple threads), or each thread should be able to construct its own instances to call. Making all methods thread safe may not be the best option if the API maintains state because deadlocks and race conditions are often difficult to avoid. In addition, performance may be reduced waiting for access to shared data. A combination of re-entrant methods and individual object instances may be needed for larger APIs, as exemplified by the Java Messaging Service (JMS) API, where ConnectionFactory and Connection support concurrent access, while Session does not.

Conclusion

Safety has long been neglected in programming in favor of expressive power and performance. Programmers were considered professionals, expected to be competent enough to avoid traps, and smart enough to figure out the causes of obscure failures. Programming languages like C or C++ are inherently unsafe because they permit direct memory access. Any C API call – no matter how carefully designed – may fail if memory is corrupted. However, the popularity and wide scale adoption of Java and .Net clearly signals a change. It appears that developers are demanding safer programming environments. Let’s join this emerging trend by making our APIs safer to use!

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

Advertisement

About Ferenc Mihaly
Senior Software Architect at OpenText Corporation

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: