Hatin’ on Nulls

Dante's Inferno; Lucifer, King of Hell

When I first read Christian Neumanns’ “Why We Should Love ‘null'”, I found myself agreeing with his position. Yes, null references have “…led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage…” per Sir C. A. R. Hoare. Yes, many people heartily dislike null references and will go to great lengths to work around the problem. Finally, yes, these workarounds may be more detrimental than the problem they are intended to solve. While I agreed with the position that null references are a necessary inconvenience (the ill effects are ultimately the result of failure to check for null, not the null condition itself), I didn’t initially see the issue as being particularly “architectural”.

Further on in the article, however, Christian covered why null references and the various workarounds, become architecturally significant. The concept of null, nothing, is semantically important. A price of zero dollars is not intrinsically the same as a missing price. A date several millenia into the future does not universally convey “unknown” or “to be determined”. Using the null object pattern may eliminate errors due to unchecked references, but it’s far from “safe”. According to Wikipedia, “…a Null Object is very predictable and has no side effects: it does nothing“. That, however, is untrue. A Null Object masks a potential error condition and allows the user to continue on in ignorance. That, in my opinion, is very much doing something.

A person commenting on Christian’s post stated that “…a crash is the worst kind of experience a user can have”. That person argued that masking a null reference error may not be as bad for the user as a crash. There’s a kernel of truth there, but it’s a matter of risk. If an application continues on and the result is a misunderstanding of what’s been done or worse, corrupted data, how bad is that? If the application in question is a game, there’s little real harm. What if the application in question is dealing with health information? I stand by the position that where there is an error, no news is bad news.

As more and more applications become platforms via being service enabled, semantic issues gain importance. Versioning strategies can ensure structural compatibility, but semantic issues can still break clients. Coherence and consistency should be considered hallmarks of an API. As Erik Dietrich noted in “Notes on Writing Discoverable Framework Code”, a good API should “make screwing up impossible”. Ambiguity makes screwing up very possible.

19 thoughts on “Hatin’ on Nulls

  1. Making anything unambiguous means finding a way for others to understand which gets us to the knotty problem of how we communicate the method we have taken to create unambiguousness. (probably not even close to being a real word).

    Liked by 1 person

  2. The fundamental problem with nulls is that they break the type system.

    String email = null;
    is a filthy lie–that’s certainly not a string–and the complier refuses to complain about it. Getting the type system right, as the Optional/Maybe pattern does, is a first step toward a solution, but alone isn’t sufficient. Following correct application of types leads toward a FP approach in the long run, but that’s a different discussion.

    Like

    • Micah,

      “String email = null;” isn’t a lie, you’re just misunderstanding what it’s saying. It’s not saying that “email” is a string, it’s saying that “email”, if populated, will point to a string. The variable isn’t the thing, it’s a location where one may find a thing of a particular type – your address isn’t your house, it’s the location where one may find your house (as long as it exists).

      Like

  3. Pingback: Design Communicates the Solution to a Problem | Form Follows Function

  4. I think the challenge with nulls is two fold. First, the problem is not whether a lack of value is useful or not, it’s the inability of a language to express when a null is not allowed. For example, if a method takes a string but null values are not allowed, you need to do a null check then throw the appropriate exception. Despite IDE or macro support, this repeated null check can be cumbersome and make code harder to read.

    Instead, the language should provide better support. For example, C# could have a [NotNull] attribute for method parameters or a “NotNullable” type (the inverse of “string?”). Internally, it could be have a similar implementation to code contracts and I am sure the compiler could use it to optimize away redundant internal null checks.

    Second, there is confusion here between languages where variables are typed (e.g. C#, Java) and those where variables are not typed (e.g. JavaScript). In languages with typed variables, as you say, assigning null to a typed variable means the variable still has a type, just no value. Polymorphism and monitors aside, they are the two states for a variable – value or null.

    In languages without typed variables, why not just assign a string to an variable that normally contains a number to represent a lack of value (or some other combination)? Nulls can be a well understood short hand but potentially an obsolete one. Of course, putting a string in a field that normally contains a number has a host of other issues but untyped languages already deal with this in their own ways.

    Some languages like Objective-C and Smalltalk state that calling a method on a null (nil) variable does nothing and some programs use this to their advantage. Personally, I do not like this. I would rather create a version of the type that does nothing when its methods are called. Although it is more work, it is an accurate indicate of the design intention rather than covering up a potential coding error, as you say.

    Like

    • I definitely agree that the [NotNull] attribute would be a nice addition, particular if it resulted in automated null checks.

      As to the use of a string in untyped languages, you’ve already identified issues there.

      Lastly, the conventions used by Objective-C and Smalltalk feel like a disaster waiting to happen. Calling a method that does nothing hides a problem that I probably want to know about. Going “boom” is not good, but it’s far better than proceeding onward without any warning.

      Like

  5. Micah is correct. The true type of “email” isn’t String. The true type of email is Maybe[String], and this should be reflected in the type system so it can be handled properly and distinguished from variables that *always* contain a String. Language without null (e.g. Haskell) handle this fine. By representing things truthfully in the type system, we can then avoid accidental failure to check for null, by forcing the developers to handle both cases – their code won’t compile unless they remember the “null check”.

    Like

    • Nullability is an inherent property of being a reference type. The ability to designate a reference as not nullable could be useful in increasing expressiveness (see the Anthony’s comment above), but also carries a risk of abuse. The goal is correctness, not a lack of error messages.

      Like

      • “Nullability is an inherent property of being a reference type”

        That’s not true, in general (it is true only in certain languages).

        Counterexamples: languages such as Haskell and OCaml have reference types but do not have null. Naturally, there’s a pointer under the hood, but the compiler simply does not allow you to set it to null, unlike Java, C# etc.

        Designating something as not nullable can only be abused in languages that allow you to violate the designation and set the reference to null. Not all languages allow this.

        I’m not sure what your point is re: correctness/versus error messages. If the compiler can prevent you from doing risky things, does that not contribute to correctness?

        Like

      • The problem with a lack of nulls is that sometimes you need a slot for something that doesn’t yet exist (like termination date for an employee). Without nulls, you cannot express this well.

        If there’s no error when you call a method on an object that should be null, then you lose your last line of defense for correctness – something should have happened had the object been non-null, but it didn’t and you don’t know about it because there was no error thrown. Past this point you’re in terra incognita and the longer you’re there the worse it’s likely to be. This is is what I was referring to re: abuse of non-nullability. If you have to assign something to reference, even though it should be null, then you have a situation where you have to remember that the something really represents the nothing that you weren’t allowed to assign to it.

        Like

      • Completely agree with the need for “a slot for something that doesn’t exist”. However, nulls are just one way to achieve this – the Option/Maybe approach is another, and works just fine.

        “If there’s no error when you call a method on an object that should be null, then you lose your last line of defense for correctness”

        In the languages I am talking about, you cannot make such a mistake. It is impossible – your code won’t compile. So you have much stronger correctness properties than with null.

        “If you have to assign something to reference, even though it should be null, then you have a situation where you have to remember that the something really represents the nothing that you weren’t allowed to assign to it.”

        No, that not the case, because you assign a specific something, designed especially for representing the nothing! In Scala it’s called None, and in Haskell it’s called Nothing. This is no more or less difficult that remembering to use null to represent the absence of something.

        However, doing it this way can be checked by the compiler, transforming runtime errors into compile-time errors. So you don’t get the problem of intermittent runtime exceptions for cases that are only triggered occasionally. It also has a whole range of benefits at the API level.

        I have posted a few examples at https://gist.github.com/davidallsopp/ffcebb9ff3c031bf9b8b which might make more sense than me trying to explain in words. Or see one of the many tutorials on Maybe in Haskell or Option in Scala.

        Like

      • no.foreach(println) // Does nothing

        It’s the “Does nothing” that’s the problem. If I expect ‘no’ to contain something and for some reason at runtime it does not, doing nothing is exactly the wrong thing. I want it to blow up, loudly. This isn’t safety, it’s hiding an error condition that I need to know about.

        Like

      • I think you are mixing together two concepts.

        – The absence of a value might be an exceptional condition/error, or it might be normal
        – Calling a method on a missing value is always a programmer error.

        Whatever approach we use, we have to handle the first case in one of various ways, depending on the situation (e.g. we wouldn’t throw an exception in every case when we test for null and find one!)

        The second case should be prevented at compile-time, and thus be impossible at run-time.

        The Option/Maybe approach provides at least as many ways as nulls to handle the first case (including throwing an exception if that’s what you want). It’s also very good at forcing the developer, via the compiler, to make sure they explicitly choose a way to handle missing values, rather than getting a runtime exception because they forgot that the value might be missing.

        It is better than nulls at handling the second case (and provides a whole range of other good stuff that I’ve barely mentioned…)

        Addressing your specific concern:

        no.foreach(println)

        is no more unsafe (in terms of accidentally hiding errors) than:

        if(no != null) {
        println(no)
        }

        If that’s not what you want in a given situation (because you expect/require a value to be present), then don’t do that.

        The Option/Maybe approach also allows for:
        – providing a default for missing values
        – calling a function to obtain a fallback value
        – taking different actions according to whether a value is present or not
        – returning a None to the caller to indicate the value was missing
        – throwing a runtime exception if the value is missing

        and in most cases does this very concisely.

        I have added some further examples to the github gist:


        // An explanation of using the Option type in Scala rather than null (in languages such as Java)
        // The equivalent in Haskell is the Maybe type
        // The Option type is for values that might or might not exist – i.e. exactly the use case
        // that null fulfills. It has two subtypes: Some[A] and None
        // ( where A is a type parameter – i.e. Some is a generic type. )
        val yes = Some("Hello world!")
        val no = None
        // This prevents NullPointExceptions because we no longer deal with the actual reference;
        // it is safely wrapped in the Some, or completely absent in the case of None.
        // We cannot call methods of the enclosed type (String, in this example) on a None, because
        // it doesn't have those methods!
        no.length() // Does not compile
        // Nor can we directly call those methods on a Some[String]:
        yes.length() // Still does not compile
        // So how can we do anything useful? We pass functions to the Option to be executed safely,
        // depending on whether we have a Some or a None.
        no.foreach(println) // Does nothing
        yes.foreach(println) // prints "Hello world!"
        no.map(msg => msg + " Goodbye.") // returns None
        yes.map(msg => msg + " Goodbye.") // returns Some("Hello World! Goodbye.")
        // Note that these return another Option, so that any following operations
        // are also performed safely, and can be conveniently chained together:
        yes.map(msg => msg + "!!!").map(_.length) // Some(15)
        no.map(msg => msg + "!!!").map(_.length) // None
        // If we were going to return a value from a function, rather than print something out,
        // then our function can return an Option too, which clearly indicates the missing
        // value to the caller rather than ignoring it:
        def doSomething(opt: Option[String]) = opt.map(usefulFunction) // returns None if opt was None
        // Or we could provide different behaviours for the two cases:
        opt match {
        case Some(a) => doSomething(a)
        case None => println("No value")
        }
        // Note that if I only provided the case Some(a) in the "match" statement above, then
        // the compiler will complain that I haven't covered the None case. So I cannot forget the "null check"
        // Alternatively, we can provide a default, so we are guaranteed to get a non-null value:
        val n: String = no.getOrElse("Shhh") // "Shhh"
        val y: String = yes.getOrElse("Shhh") // "Hello World!"
        // Or we could resort to runtime exceptions (if you insist):
        // If we want an immediate runtime exception for missing values, we can do
        println(no.get())
        // Or assert something:
        assert(no.isEmpty, "Oops!")
        assert(yes.isDefined, "Oops!")
        // Or we could be more explicit, and cover both cases:
        opt match {
        case Some(a) => doSomething(a)
        case None => throw new Exception("Missing value")
        }
        // but this is as verbose as a traditional null check, so should generally be avoided.
        // Also, we should not be using Options (or nulls) unless values really are optional.
        // If they are required, then they should be a non-nullable (ideally immutable) value
        // If they are required to be absent, then don't represent them at all!
        // If there is a lifecycle, so the requirement changes, then represent this explicitly:
        // See https://gist.github.com/davidallsopp/79fd49840197f3597b72
        // Options really start to shine for more complex (realistic) cases, where we have nested null checks.
        // Quick – did we remember all the null checks in the following code??
        // Java-style, but converted to Scala
        def listLocations(start: Int, end: Int) {
        for (id <- start to end) {
        val user = db.getUser(id)
        if (user != null) {
        val address = user.getAddress()
        if (address != null) {
        val location = address.getLocation()
        if (location != null) {
        val lat = location.getLat()
        val lon = location.getLon()
        if (lat != null && lon != null) {
        println("Location: " + lat + ", " + lon)
        }
        }
        }
        }
        }
        }
        // Using Option, the 'null checks' are performed automatically under the hood of the for-comprehension:
        def listLocations(start: Int, end: Int) {
        for {
        id <- start to end
        user <- db.getUser(id)
        address <- user.getAddress
        location <- address.getLocation
        (lat, lon) <- location.getLatLon
        } println(s"Location: ${lat}, ${lon}")
        }
        // Note: if these operations return error messages that you want to capture rather than ignore,
        // then you can use the Either type instead of Option. If they throw exceptions that you want to
        // handle, use the Try type instead.
        // In Scala we can also use Option.get() to extract the value, if it exists
        // (i.e. if the Option is a Some). The get() method is widely regarded as a mistake. Haskell
        // does not have such a mechanism. However, get() immediately throws a runtime exception if
        // there is no value, so a null does not propagate to cause mysterious damage elsewhere.
        // Note also that although Scala does also support null (for compatibility with Java), languages
        // like Haskell simply don't have null – you just use the Maybe type.
        // The examples above are just a taste – there is much more that these types can do that makes them
        // far more convenient than null-checks, as well as safer – e.g. handling lists of Options.

        (Just for the avoidance of confusion, since I didn’t explicitly state this before – Option/Maybe is totally different to the Null Object pattern. A Null Object does nothing but is otherwise indistinguishable from a real object. A None Option behaves differently from a Some Option. It only does nothing when you specifically decide to do nothing (because you “expect” that the value might be missing, and ignoring it is the correct thing to do!).

        Like

      • no.foreach(println)

        is no more unsafe (in terms of accidentally hiding errors) than:

        if(no != null) {
        println(no)
        }

        Sort of. In the second example I would have to do something affirmative (wrapping println(no) in an if block) to “eat” the exception. That’s a lot less likely to happen than the default case of typing no.foreach(println) without considering whether a missing value would be ignored or not. I’m assuming that something additional would need to happen for no.foreach(println) to throw an exception rather than just go merrily on its way.

        Like

      • It perhaps a matter of perspective. I wouldn’t regard no.foreach(println) as a “default case” – it’s merely one way of expressing what I think should happen, given an optional value.

        “without considering whether a missing value would be ignored or not”.

        Well, choosing to use foreach, rather than one of the other constructs I have shown, requires just such consideration. With or without nulls, there’s no one way that fits all circumstances.

        In practice I have never seen the type of error that you are concerned about, when writing without nulls. But I have seen many bugs due to nulls!

        However, I think there’s a more fundamental point:

        If we “expect” a value to be present, then why would we use a nullable reference or Option?
        These are for cases where the value is truly optional (hence the name).

        If the value is required to be present, it should be a non-nullable value.
        If the value is required to be absent, there should be no representation of it at all, not even a null.

        Then we cannot construct invalid states, and cannot get these types of runtime errors.

        Perhaps you are thinking of cases where an object has a lifecycle, so at certain stages in the lifecycle, we require a value to be present, and in others, we require the value to be absent?

        It is common to model such cases with nullable references, but this is dangerous and harms correctness, since (coming back to the original point) we are lying to the compiler. We tell it that a value is optional, but it is not. We have to remember, everywhere we use such an object, what state it is meant to be in and which fields we can safely use.

        In such cases it is often better to explicitly model the lifecycle, so it is impossible to perform invalid operations. Here’s a demonstration of this in Scala, using your example of the termination date of an Employee: https://gist.github.com/davidallsopp/79fd49840197f3597b72

        Like

      • “Perhaps you are thinking of cases where an object has a lifecycle, so at certain stages in the lifecycle, we require a value to be present, and in others, we require the value to be absent?”

        It’s not only lifecyle. The example you gave is an interesting one (though I disagree that something that is null should not be visible). What about the employee’s planned retirement date? It could exist or it could be undetermined for both current and ex employees.

        I certainly salute your persistence and to the extent that you can correctly express reality without hiding errors from yourself, that’s great. Ultimately, my point is that avoiding corruption of data is far far more important than avoiding an error message or even a crash.

        Like

      • I think we are in agreement on the main point, i.e. the goal of avoiding invalid state; the best means to achieve this clearly depends enormously on the language!

        Two tiny points, and then I promise I will move on 😉

        In the Employee example, I’m not making a null invisible. There is no null. There is no reference! Any operations attempting to access such a field won’t compile, because it does not exist. This pattern changes the class of the object so that only the fields that are currently meaningful exist.

        The planned retirement date is probably a good use-case for Option, because it is genuinely optional. We can handle it, according to the situation, using one of the Option idioms that either returns a None, throws an exception, logs a warning, etc, etc if we expect a value to be present but it is not.

        Liked by 1 person

  6. Null is an unassigned situation, waiting for a meaningful reference. For example declaring a variable as “Object”, and assigning any object reference to avoid Null, will not make any sense. Variables are declared for a purpose, an object meeting the purpose of that variable declaration has to be identified and assigned. Until there is a meaning full purpose to declare a variable, it should not be declared. Similarly after identifying the purpose until correct object is found Null state is the meaningful state.

    I believe any qualified programmer will only declare a variable when it finds a meaningful purpose and as soon as a proper object is available it will make assignment, and will start availing the object services (data and behaviour).

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.