Choosing memorable names

Choosing good names is half art, half science; part of it is learned from books, the rest comes from experience. After working with APIs for a while, we develop a taste and appreciation for good names. It is comparable to wine tasting: at the beginning, all wines seem to taste the same, but after a while we develop the capacity to detect subtle flavors and tell apart great vintages from so-and-so wines. But a sophisticated wine connoisseur doesn’t necessarily know how to make a good wine; for that he needs to learn the technique of wine making. It is this combination of art and science, intuitive thinking and logical reasoning, which makes naming difficult.

Avoiding naming mistakes

Bad habits are the cause of many common naming blunders. In the early days of computing(1), strict technical limitations forced programmers to write almost indecipherable code. When identifiers were limited to 8 characters and punch cards were only 80 characters wide, abbreviated names – like strcpy or fscanf – were unavoidable. It used to be standard practice to prefix C function names to prevent name conflicts at link time. Underscores(2) and other special characters in names made sense when computer terminals had no separate uppercase and lowercase characters. Hungarian notation is useful for differentiating integers representing genuine numbers (nXXX) from integers representing handles, the equivalent of pointers to complex data structures (hwndXXX – handle to a Window) in languages with fixed type systems and lacking true pointers, such as BASIC or FORTRAN. The name stuck, because developers found it just as incomprehensible as a foreign language (Charles Simonyi, its inventor, was born in Hungary). Today, unlimited identifier lengths, full namespace support, object-oriented programming, and powerful IDEs make these practices unnecessary. We should start our quest for better names by ditching these antiquated and hard-to-read naming conventions.

The next step is to use correct English spelling, grammar, and vocabulary. It is hard enough to memorize APIs, let’s not make users also remember spelling errors, made-up words or other creative use of language. Automated spell checking turned spelling errors into the least forgivable API design mistakes. US English is the de-facto dialect of programming: no matter where we live, we should spell “color” and not “colour” in code. While Printer or Parser are valid English words, appending “-er” to turn a verb into a noun doesn’t always work. We can delete, but there is no “Deleter” in the dictionary(3). The same care should be taken when turning verbs into adjectives: saying that something is “deletable” is incorrect. Finally, we should be aware of the correct word order in composite names: AbstractNamingService sounds better, and it is easier to remember than NamingServiceAbstract, while getNameCaseIgnore is a hopelessly mangled name.

Names are a precious and limited resource, not to be irresponsibly squandered. It is wasteful to use overly generic words, like Manager, Engine, Agent, Module, Info, Item, Container, or Descriptor in names, because they don’t contribute to the meaning: QueryManager, QueryEngine, QueryAgent, QueryModule, QueryInfo, QueryItem, QueryContainer and QueryDescriptor all sound fantastic, but when we see a QueryManager and a QueryEngine together in an API, we have no clue which does what. While synonyms make prose more entertaining, they should be avoided in APIs: remove, delete, erase, destroy, purge or expunge mean essentially the same thing and API users won’t be able to tell the difference in behavior based on the name alone. Using completely meaningless words in names should be a criminal offense. You would think this never happens, but see if you recognize an old product name in TeamsIdentifier or if you can remember CrusadeJDBCSource by recalling the fantasy name of a long-forgotten R&D project? The words we choose should also accurately describe what the API does. This also sounds obvious, yet we have seen a BLOB type which is not an actual Binary Large Object and a Set type which is not a proper Set. Such mistakes can only happen if we don’t slow down to think about the names we are choosing.

Finding names first

Finding meaningful names for some API constructs is so difficult that it should be completely avoided. This is not a joke. We should stay away from naming types, methods and parameters after they are designed. It is almost guaranteed that if we throw unrelated fields together into a type, the best name we will find for this concoction is some sort of Info or Descriptor. Even a marathon brainstorming session fails to find a better name for DocumentWrapperReferenceBuilderFactory, because it is undeniably a factory for producing builders, which can generate references to document wrappers (whatever those are). The method VerifyAndCacheItem both verifies and caches an Item (whatever that is) and an IndexTree is a rather odd data structure indeed. On the other hand, when we know our core concepts before we start thinking about the structure of the API, we can rely on the nouns, verbs, and adjectives to guide us through the process. Similar to the “writing the client code first” guideline, “finding the names first” proposes to revise the traditional sequence of design steps in search of a better outcome.

It may be helpful to speak to non-programmers to find out which words they use to talk about the problem domain. Understandably, the idea of collecting words into a glossary without thinking about how and where they will be used in code sounds somewhat counter-intuitive to us, but people in numerous other professions do this exercise regularly. Let’s pretend that we need to add Web 2.0 style tagging to our API, but we don’t know where to start. We look up the corresponding Wikipedia entry and read the first paragraph:

“In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information (such as an internet bookmark, digital image, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are generally chosen informally and personally by the item’s creator or by its viewer, depending on the system.”

We underline the relevant words, group them into categories, and highlight the relationships between them. Where we find synonyms, we highlight the best match and gray out the alternatives. Where there are well known, long-established names in the domain (Author, User, Content or String), we chose these synonyms over the others:

Verbs Nouns
assign, choose tag, keyword, term, metadata, (string)
find, search bookmark, image, file, piece of information, item, (content)
find, browse bookmark, image, file, piece of information, item, (content)

It is absurdly early to do this, but if we are asked to sketch a draft API at this point, it may have methods like:

   void assignTag(Content content, String tag);
   Content[] searchByTag(String tag);

Far be it from us to claim that this is a good API or that these are the best possible names. We should continue the process of looking for better alternatives. The example just illustrates that it is possible to find good names without thinking of code, and once we have them, they point towards types and methods we need in the API.

When we have the choice between two or more words to name an object or action, the least generic term is the best. Most method names we come across start with “set”, “get”, “is”, “add”, and a handful of other common verbs. This is a real shame, because more expressive verbs exist in many cases. Instead of setting a tag, we can tag something with it. For example:

Typical Better
document.setTag(“Best Practices”); document.tag(“Best Practices”);
if(document.getTag().equals(“Best Practices”)) if(document.isTagged(“Best Practices”))

Shorter names are better, but nowadays the length of names is rarely a serious concern. It works best if we set aside the most powerful nouns as the names of the main types and use longer composite names for methods or secondary types. Then when we build a scheduler, we can call the main interface Scheduler and not JobSchedulingService. If name confusion is a concern, we can use namespaces for clarification, a wiser choice than starting every name with “Job”.

Longer composite names are more meaningful than short ones which depend on parameter names or types to further clarify their meaning. Parameter names and types may be visible in method signatures, but not when we call the method from code:

Method signature Method call
Document.remove(Tag tag); currentDocument.remove(bestPractices);
Document.removeTag(Tag tag);) currentDocument.removeTag(bestPractices);

Many experts recommend writing self-documenting APIs, but only a few insist that we support writing self-documenting client code. At first, it may look like the API user should be entirely responsible for this, until we realize that he can only name his own variables, while we (the API designers) are choosing the names needed to call the API.

Conclusion

Naming is a complex topic, details of which require far more space to cover and fall outside the scope of this document. With so many different factors influencing naming, it is not easy to give straightforward practical advice, other than to avoid stupid mistakes. The problem domain has the strongest influence, since it is easier to describe good names for a specific API than for APIs in general. This statement is seemingly contradicted by the existence of naming conventions. But aren’t the conventions considered helpful advice? In the most generic sense, they may be. Yet when it comes to choosing descriptive, memorable, and intuitive names, the so-called naming conventions are of limited use, primarily addressing consistency concerns. Developers who closely follow naming conventions are designing consistent APIs, which is important, but not sufficient. We separated striving for consistency, with its naming and design conventions (discussed in the previous installment), from the subtle art of choosing memorable names, precisely because so many developers still believe that there is nothing more to good names than following the conventions.

Notes:

(1) This paragraph is not intended as a complete and accurate description of the history of computing. Platforms and languages evolved differently and had varying limitations. For more details, see comments. (Based on reader feedback. Thank you.)

(2) The underscore example is misplaced here because its use is typically a consistency issue. If your platform has an established, consistent naming convention which uses underscores, then by any means, follow it. (Based on reader feedback. Thank you.)

(3) This is not intended as linguistic advice. Languages are constantly evolving and many new words are added to dictionaries every year. “Deleter” appears in some, but not in others. Developer-friendly design acknowledges that a significant number of users are likely not native English speakers, with varying levels of language skills. If they cannot find some terms in dictionaries, this may prevent them from thoroughly understand the precise meaning of a name. (Based on reader feedback. Thank you.)

 

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Canada License.

About these ads

About Ferenc Mihaly
Senior Software Architect at OpenText Corporation

4 Responses to Choosing memorable names

  1. You write: When identifiers were limited to 8 characters and punch cards were only 80 characters wide, abbreviated names – like strcpy or fscanf – were unavoidable. It used to be standard practice to prefix C function names to prevent name conflicts at link time. Underscores and other special characters in names made sense when computer terminals had no separate uppercase and lowercase characters.

    This is completely and utterly wrong, in so many ways… Identifiers in very early version C where not limited to 8 characters, but it was only the first 8 that where significant, this was changed early on in C (see for example C from the late or middle 1970′s–I don’t remember exactly). Lisp, and other languages had identifiers that could be far longer. C, Lisp, etc where also not designed for punchcards, so drawing that parallel is incorrect. Very few languages have or had “infinite length identifiers”. Early versions of Fortran, back in the 60′s, had this limit, but current versions of Fortran do not have such limits nor have they had them for many years now.

    Operating systems have been able to distinguish between upper-case, and lower-case letters. ASCII is from 1963, and before that you had operating systems with other encodings that differentiated be lower-case and upper-case. C differentiated between them since day one. Lisp, depended on the implementation (you could pick if you wanted to have case sensitivity or not).

    Wether using under-score, hyphen, or CamelCase is strictly a matter of taste. But they had nothing to do with “preventing name conflicts at link time”, or being “prevented from using upper/lower case characters”. I find “is_tagged” or “is-tagged” far easier to read than the equivalent CamelCase version.

    strcpy and fscanf are in fact _good_ names, functions that are used very often should have short names, it makes it easier to read source code. Functions that are not used often should have longer, more descriptive names.

  2. Andre Bogus says:

    Your last example is inconsistent. You have made tag a verb, so removeTag “removes” a verb from our object. Also I’d strongly advocate against “remove” in method names, unless the method really removes an object from the context of the system. What is usually removed is the relation between two objects, not the targeted object itself.

    A consistent signature would be currentDocument.untag(bestPractices).

    I find that the un- prefix is wildly underused in API naming, although it suits complementary actions perfectly for many actions like: stick – unstick, freeze – unfreeze (although “thaw” would be more fitting here), assign – unassign (I have used this in an API recently instead of delete, see my point above).

  3. nodenamehaw says:

    +1000 that names should not be written backwards: SearchButton, not ButtonSearch.

    However it would be a programmer with little English indeed who could not understand the (quite real) word “deletable” nor find it in a dictionary. That would be a programmer who is unaware of the word’s Googleability.

  4. Aaron Meurer says:

    I think it’s a shame that many languages choose to use CamelCase as the standard instead of underscores. In Regular English, We Don’t Write All Words With The First Letter In Upper Case, AndWeCertaintlyDon’tWriteThemWithoutSpaces. Rather, we write all characters in lower case (except the first in the sentence, which doesn’t have a parallel in variable names because they aren’t complete sentences), and we put a space between each word. Underscores more closely emulate this, and hence make things *much* more readable. CamelCase IMHO should be used only for those things that (roughly) correspond to proper nouns in English (I’m thinking in particular of the Python PEP 8 standard, where function and method names use underscores, and class names use CamelCase).

    Second, I completely disagree about your comment about adding -er or -or to the end of words. Such words are called agent nouns, and a quick Google search will reveal that these are considered grammatically correct, even if the word isn’t in the dictionary per se. In computing, we often use words that aren’t in the common English dictionary, as they are highly technical terms. Adding -er (or -or if it fits better) to the end of a verb is a good way create a noun describing something that performs the action of that verb. Without this, we’d have to use awkward constructs like DeletingObject (or replace “Object” with any one of your so-called generic words).

    I also disagree with your argument that this should be avoided because non-English speakers won’t understand them. If we take this argument further, we should use as simple of English as possible, sacrificing any complex words, even if they are much better descriptors of what we want. This is a grammatical part of the English language, and if we really want to avoid poor names, we should take advantage of all that the language has to offer us.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 31 other followers

%d bloggers like this: