Choosing memorable names
September 21, 2011 4 Comments
Choosing good names is half art, half science; part of it is learned from books, the rest comes from experience. After working with APIs for a while, we develop a taste and appreciation for good names. It is comparable to wine tasting: at the beginning, all wines seem to taste the same, but after a while we develop the capacity to detect subtle flavors and tell apart great vintages from so-and-so wines. But a sophisticated wine connoisseur doesn’t necessarily know how to make a good wine; for that he needs to learn the technique of wine making. It is this combination of art and science, intuitive thinking and logical reasoning, which makes naming difficult.
Avoiding naming mistakes
Bad habits are the cause of many common naming blunders. In the early days of computing(1), strict technical limitations forced programmers to write almost indecipherable code. When identifiers were limited to 8 characters and punch cards were only 80 characters wide, abbreviated names – like
fscanf – were unavoidable. It used to be standard practice to prefix C function names to prevent name conflicts at link time. Underscores(2) and other special characters in names made sense when computer terminals had no separate uppercase and lowercase characters. Hungarian notation is useful for differentiating integers representing genuine numbers (nXXX) from integers representing handles, the equivalent of pointers to complex data structures (hwndXXX – handle to a Window) in languages with fixed type systems and lacking true pointers, such as BASIC or FORTRAN. The name stuck, because developers found it just as incomprehensible as a foreign language (Charles Simonyi, its inventor, was born in Hungary). Today, unlimited identifier lengths, full namespace support, object-oriented programming, and powerful IDEs make these practices unnecessary. We should start our quest for better names by ditching these antiquated and hard-to-read naming conventions.
The next step is to use correct English spelling, grammar, and vocabulary. It is hard enough to memorize APIs, let’s not make users also remember spelling errors, made-up words or other creative use of language. Automated spell checking turned spelling errors into the least forgivable API design mistakes. US English is the de-facto dialect of programming: no matter where we live, we should spell “color” and not “colour” in code. While Printer or Parser are valid English words, appending “-er” to turn a verb into a noun doesn’t always work. We can delete, but there is no “Deleter” in the dictionary(3). The same care should be taken when turning verbs into adjectives: saying that something is “deletable” is incorrect. Finally, we should be aware of the correct word order in composite names:
AbstractNamingService sounds better, and it is easier to remember than
getNameCaseIgnore is a hopelessly mangled name.
Names are a precious and limited resource, not to be irresponsibly squandered. It is wasteful to use overly generic words, like Manager, Engine, Agent, Module, Info, Item, Container, or Descriptor in names, because they don’t contribute to the meaning:
QueryDescriptor all sound fantastic, but when we see a
QueryManager and a
QueryEngine together in an API, we have no clue which does what. While synonyms make prose more entertaining, they should be avoided in APIs: remove, delete, erase, destroy, purge or expunge mean essentially the same thing and API users won’t be able to tell the difference in behavior based on the name alone. Using completely meaningless words in names should be a criminal offense. You would think this never happens, but see if you recognize an old product name in
TeamsIdentifier or if you can remember
CrusadeJDBCSource by recalling the fantasy name of a long-forgotten R&D project? The words we choose should also accurately describe what the API does. This also sounds obvious, yet we have seen a
BLOB type which is not an actual Binary Large Object and a
Set type which is not a proper Set. Such mistakes can only happen if we don’t slow down to think about the names we are choosing.
Finding names first
Finding meaningful names for some API constructs is so difficult that it should be completely avoided. This is not a joke. We should stay away from naming types, methods and parameters after they are designed. It is almost guaranteed that if we throw unrelated fields together into a type, the best name we will find for this concoction is some sort of
Descriptor. Even a marathon brainstorming session fails to find a better name for
DocumentWrapperReferenceBuilderFactory, because it is undeniably a factory for producing builders, which can generate references to document wrappers (whatever those are). The method
VerifyAndCacheItem both verifies and caches an
Item (whatever that is) and an
IndexTree is a rather odd data structure indeed. On the other hand, when we know our core concepts before we start thinking about the structure of the API, we can rely on the nouns, verbs, and adjectives to guide us through the process. Similar to the “writing the client code first” guideline, “finding the names first” proposes to revise the traditional sequence of design steps in search of a better outcome.
It may be helpful to speak to non-programmers to find out which words they use to talk about the problem domain. Understandably, the idea of collecting words into a glossary without thinking about how and where they will be used in code sounds somewhat counter-intuitive to us, but people in numerous other professions do this exercise regularly. Let’s pretend that we need to add Web 2.0 style tagging to our API, but we don’t know where to start. We look up the corresponding Wikipedia entry and read the first paragraph:
“In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information (such as an internet bookmark, digital image, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are generally chosen informally and personally by the item’s creator or by its viewer, depending on the system.”
We underline the relevant words, group them into categories, and highlight the relationships between them. Where we find synonyms, we highlight the best match and gray out the alternatives. Where there are well known, long-established names in the domain (Author, User, Content or String), we chose these synonyms over the others:
|assign, choose||tag, keyword, term, metadata, (string)|
|find, search||bookmark, image, file, piece of information, item, (content)|
|find, browse||bookmark, image, file, piece of information, item, (content)|
It is absurdly early to do this, but if we are asked to sketch a draft API at this point, it may have methods like:
void assignTag(Content content, String tag); Content searchByTag(String tag);
Far be it from us to claim that this is a good API or that these are the best possible names. We should continue the process of looking for better alternatives. The example just illustrates that it is possible to find good names without thinking of code, and once we have them, they point towards types and methods we need in the API.
When we have the choice between two or more words to name an object or action, the least generic term is the best. Most method names we come across start with “set”, “get”, “is”, “add”, and a handful of other common verbs. This is a real shame, because more expressive verbs exist in many cases. Instead of setting a tag, we can tag something with it. For example:
Shorter names are better, but nowadays the length of names is rarely a serious concern. It works best if we set aside the most powerful nouns as the names of the main types and use longer composite names for methods or secondary types. Then when we build a scheduler, we can call the main interface
Scheduler and not
JobSchedulingService. If name confusion is a concern, we can use namespaces for clarification, a wiser choice than starting every name with “Job”.
Longer composite names are more meaningful than short ones which depend on parameter names or types to further clarify their meaning. Parameter names and types may be visible in method signatures, but not when we call the method from code:
|Method signature||Method call|
Many experts recommend writing self-documenting APIs, but only a few insist that we support writing self-documenting client code. At first, it may look like the API user should be entirely responsible for this, until we realize that he can only name his own variables, while we (the API designers) are choosing the names needed to call the API.
Naming is a complex topic, details of which require far more space to cover and fall outside the scope of this document. With so many different factors influencing naming, it is not easy to give straightforward practical advice, other than to avoid stupid mistakes. The problem domain has the strongest influence, since it is easier to describe good names for a specific API than for APIs in general. This statement is seemingly contradicted by the existence of naming conventions. But aren’t the conventions considered helpful advice? In the most generic sense, they may be. Yet when it comes to choosing descriptive, memorable, and intuitive names, the so-called naming conventions are of limited use, primarily addressing consistency concerns. Developers who closely follow naming conventions are designing consistent APIs, which is important, but not sufficient. We separated striving for consistency, with its naming and design conventions (discussed in the previous installment), from the subtle art of choosing memorable names, precisely because so many developers still believe that there is nothing more to good names than following the conventions.
(1) This paragraph is not intended as a complete and accurate description of the history of computing. Platforms and languages evolved differently and had varying limitations. For more details, see comments. (Based on reader feedback. Thank you.)
(2) The underscore example is misplaced here because its use is typically a consistency issue. If your platform has an established, consistent naming convention which uses underscores, then by any means, follow it. (Based on reader feedback. Thank you.)
(3) This is not intended as linguistic advice. Languages are constantly evolving and many new words are added to dictionaries every year. “Deleter” appears in some, but not in others. Developer-friendly design acknowledges that a significant number of users are likely not native English speakers, with varying levels of language skills. If they cannot find some terms in dictionaries, this may prevent them from thoroughly understand the precise meaning of a name. (Based on reader feedback. Thank you.)
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.