Nullable use cases / expected behavior?

233 views
Skip to first unread message

Seth

unread,
Jan 5, 2015, 9:03:24 PM1/5/15
to julia...@googlegroups.com
I'm trying to figure out how (and under what circumstances) one would use Nullable. That is, it seems that it might be valuable when you don't know whether the value/object exists (sort of like Python's None, I guess), but then something like "Nullable(3) == 3" returns false, and that sort of messes up how I'm thinking about it.

The code I'd imagine would be useful would be something like

function foo(x::Int, y=Nullable{Int}())  # that is, y defaults to python's "None" but is restricted to Int
    if !isnull(y)
        return x+y  # x + get(y) works, but why must we invoke another method to get the value?
    else
        return 2x
    end
end

I'm left wondering why it wasn't reasonable to allow y to return get(y) if not null, else raise a NullException, and the conclusion I'm coming to is that I don't understand the concept of Nullable yet. Any pointers?

ele...@gmail.com

unread,
Jan 5, 2015, 10:16:00 PM1/5/15
to julia...@googlegroups.com
My reasoning for Nullable{T} is that it is type stable.  Taking your example, None and Int would be different type objects, introducing a type instability and potential performance penalty.  But Nullable{T} is always type Nullable{T} and get(Nullable{T}) is always type T.  Allowing Nullable{T} to decay into T would re-introduce the type instability.

Cheers
Lex

Milan Bouchet-Valat

unread,
Jan 6, 2015, 4:38:00 AM1/6/15
to julia...@googlegroups.com
Le lundi 05 janvier 2015 à 19:16 -0800, ele...@gmail.com a écrit :
> My reasoning for Nullable{T} is that it is type stable. Taking your
> example, None and Int would be different type objects, introducing a
> type instability and potential performance penalty. But Nullable{T}
> is always type Nullable{T} and get(Nullable{T}) is always type T.
> Allowing Nullable{T} to decay into T would re-introduce the type
> instability.
Right. But that doesn't mean `Nullable(3) == 3` shouldn't return `true`.
This operation could be allowed, provided that `Nullable{Int}() == 3`
raised a `NullException` or returned `Nullable{Bool}()`.

Regarding the original question:
> On Tuesday, January 6, 2015 12:03:24 PM UTC+10, Seth wrote:
> I'm trying to figure out how (and under what circumstances)
> one would use Nullable. That is, it seems that it might be
> valuable when you don't know whether the value/object exists
> (sort of like Python's None, I guess), but then something like
> "Nullable(3) == 3" returns false, and that sort of messes up
> how I'm thinking about it.
>
>
> The code I'd imagine would be useful would be something like
>
>
> function foo(x::Int, y=Nullable{Int}()) # that is, y defaults
> to python's "None" but is restricted to Int
> if !isnull(y)
> return x+y # x + get(y) works, but why must we invoke
> another method to get the value?
> else
> return 2x
> end
> end
>
>
> I'm left wondering why it wasn't reasonable to allow y to
> return get(y) if not null, else raise a NullException,
The question is how you define "return". In the strict sense, if you
write `return y`, then `y` must be returned, not `get(y)`, or the Julia
language would really be a mess.

That said, with return type declarations, if `foo()::Int` allowed
stating that `foo()` always returns an `Int`, then `y` could
automatically be converted to an `Int`, raising an exception if it's
`null`. But that merely allows you to type `return y` instead of
`return get(y)`, so not a big deal.

Finally, there's the question of whether `x + y` should be allowed to
mean `x + get(y)`. That's debatable, but I think a more useful behavior
would be to make it equivalent to
`isnull(y) ? Nullable(x + get(y)) : Nullable{Int}()`. That would allow
handling the possibility of missingness only when you actually want to
get an `Int` from a `Nullable{Int}`.

This has been discussed more generally for any function call at
https://github.com/JuliaLang/julia/pull/9446

> and the conclusion I'm coming to is that I don't understand
> the concept of Nullable yet. Any pointers?
>

Regards

ele...@gmail.com

unread,
Jan 6, 2015, 7:38:25 AM1/6/15
to julia...@googlegroups.com


On Tuesday, January 6, 2015 7:38:00 PM UTC+10, Milan Bouchet-Valat wrote:
Le lundi 05 janvier 2015 à 19:16 -0800, ele...@gmail.com a écrit :
> My reasoning for Nullable{T} is that it is type stable.  Taking your
> example, None and Int would be different type objects, introducing a
> type instability and potential performance penalty.  But Nullable{T}
> is always type Nullable{T} and get(Nullable{T}) is always type T.
>  Allowing Nullable{T} to decay into T would re-introduce the type
> instability.
Right. But that doesn't mean `Nullable(3) == 3` shouldn't return `true`.
This operation could be allowed, provided that `Nullable{Int}() == 3`
raised a `NullException` or returned `Nullable{Bool}()`.

Yeah, (==){T}(a::Nullable{T}, b::T) should be able to be defined as !isnull(a) && get(a) == b

Cheers
Lex

Milan Bouchet-Valat

unread,
Jan 6, 2015, 7:43:16 AM1/6/15
to julia...@googlegroups.com
Le mardi 06 janvier 2015 à 04:38 -0800, ele...@gmail.com a écrit :
>
>
> On Tuesday, January 6, 2015 7:38:00 PM UTC+10, Milan Bouchet-Valat
> wrote:
> Le lundi 05 janvier 2015 à 19:16 -0800, ele...@gmail.com a écrit :
> > My reasoning for Nullable{T} is that it is type stable. Taking your
> > example, None and Int would be different type objects, introducing a
> > type instability and potential performance penalty. But Nullable{T}
> > is always type Nullable{T} and get(Nullable{T}) is always type T.
> > Allowing Nullable{T} to decay into T would re-introduce the type
> > instability.
> Right. But that doesn't mean `Nullable(3) == 3` shouldn't return `true`.
> This operation could be allowed, provided that `Nullable{Int}() == 3`
> raised a `NullException` or returned `Nullable{Bool}()`.
>
>
> Yeah, (==){T}(a::Nullable{T}, b::T) should be able to be defined as
> !isnull(a) && get(a) == b
I'd consider this definition (which is different from the ones I
suggested above) as unsafe: if `a` is `null`, then you silently get
`false`. Better provide additional safety by either returning a
`Nullable`, or raising an exception.

ele...@gmail.com

unread,
Jan 6, 2015, 8:27:08 AM1/6/15
to julia...@googlegroups.com


On Tuesday, January 6, 2015 10:43:16 PM UTC+10, Milan Bouchet-Valat wrote:
Le mardi 06 janvier 2015 à 04:38 -0800, ele...@gmail.com a écrit :
>
>
> On Tuesday, January 6, 2015 7:38:00 PM UTC+10, Milan Bouchet-Valat
> wrote:
>         Le lundi 05 janvier 2015 à 19:16 -0800, ele...@gmail.com a écrit :
>         > My reasoning for Nullable{T} is that it is type stable.  Taking your
>         > example, None and Int would be different type objects, introducing a
>         > type instability and potential performance penalty.  But Nullable{T}
>         > is always type Nullable{T} and get(Nullable{T}) is always type T.
>         >  Allowing Nullable{T} to decay into T would re-introduce the type
>         > instability.
>         Right. But that doesn't mean `Nullable(3) == 3` shouldn't return `true`.
>         This operation could be allowed, provided that `Nullable{Int}() == 3`
>         raised a `NullException` or returned `Nullable{Bool}()`.
>
>
> Yeah, (==){T}(a::Nullable{T}, b::T) should be able to be defined as
> !isnull(a) && get(a) == b
I'd consider this definition (which is different from the ones I
suggested above) as unsafe: if `a` is `null`, then you silently get
`false`. Better provide additional safety by either returning a
`Nullable`, or raising an exception.


If the Nullable does not have a value then it doesn't equal any value of the type, so the correct answer is false.  If it returns bool or some sort of Nullable than again its type unstable and also can't be directly used in an if.  A user who cares about the null case can always check isnull() themselves directly on the original object.

And throwing exceptions is expensive and prevents the test being used in high performance code.  In fact I would consider it rather nasty if something like an equality test could throw.  That means the == function can't be used in any code that is a callback from C if there is any possibility of one of its parameters being a nullable.

Thats not to say that a user can't define their *own* version with either of these characteristics if it suits their use-case, but any general case should prefer the type stable high performance usage.

Cheers
Lex

Seth

unread,
Jan 6, 2015, 8:32:07 AM1/6/15
to julia...@googlegroups.com


On Tuesday, January 6, 2015 4:43:16 AM UTC-8, Milan Bouchet-Valat wrote:

>
> Yeah, (==){T}(a::Nullable{T}, b::T) should be able to be defined as
> !isnull(a) && get(a) == b
I'd consider this definition (which is different from the ones I
suggested above) as unsafe: if `a` is `null`, then you silently get
`false`. Better provide additional safety by either returning a
`Nullable`, or raising an exception.

But - if "null" is just another legitimate value, why wouldn't it make sense to define "null != a" for all (a != null)? Why must we treat it as some sort of special abstraction? We don't do this with, say, imaginary numbers. (Identity for null is a separate issue).

Tomas Lycken

unread,
Jan 6, 2015, 6:11:35 PM1/6/15
to julia...@googlegroups.com
I think many of the questions raised in this thread can be answered by considering the history behind why Nullable{T} was introduced in the first place; to replace NAtype and NA, from the DataArrays package. As such, Nullable{T} is supposed to be used more as Milan describes, than as a drop-in replacemet for Python's None - the idea is rather to have a wrapper type for data that, if it is "missing" (which is what NA signalled) "poisons" all calculations to return a missing value instead of the result.

Thus, equality with null should better be defined as

(==){T}(a::Nullable{T}, b::T) = !isnull(a) ? Nullable(get(a) == b) : Nullable{Bool}()

This definition will be type-stable (it will always return a Nullable{Bool}) and it will be able to signal all three possible results; get(a) == b, get(a) != b and get(a) == null.

Now, for a sum function, it becomes a little less trivial: how do we treat missing data? On one hand, we could argue that if all values are not known, then the sum is not known either, and we should return null. On the other hand, it might be more useful to return the sum of all non-null values. Either way, we should make sure to do something that is type-stable. Naïve implementations could look like

# if any nulls, return null:
function sum{T}(A::Array{Nullable{T,1}})
    @inbounds for i = 1:length(A)
        s = zero(T)
        if isnull(A[i])
            return Nullable{T}()
        else
            s += get(A[i])
        end
    end
    return Nullable(s)
end

# just ignore null values:
function sum{T}(A::Array{Nullable{T,1}})
    @inbounds for i = 1:length(A)
        s = zero(T)
        if !isnull(A[i])
            s += get(A[i])
        end
    end
    return Nullable(s)
end

If you take a look at the DataArrays package (https://github.com/JuliaStats/DataArrays.jl) you'll find lots of examples of functions like this for NA; you'll also notice that many of them are not  type stable, which - as stated above - is the original reason for the Nullable{T} type in the first place.

// T

ele...@gmail.com

unread,
Jan 6, 2015, 7:37:01 PM1/6/15
to julia...@googlegroups.com
[...]
This definition will be type-stable (it will always return a Nullable{Bool}) and it will be able to signal all three possible results; get(a) == b, get(a) != b and get(a) == null.

It is then messy to use == in an if when the return is not a bool. 

In fact you still have to write the isnull() test which is essentially the same code as your definition of == so nothing has been gained by defining ==.

Cheers
Lex

 

[...]

ele...@gmail.com

unread,
Jan 6, 2015, 7:40:40 PM1/6/15
to julia...@googlegroups.com
Oops posted too soon :)


On Wednesday, January 7, 2015 10:37:01 AM UTC+10, ele...@gmail.com wrote:
[...]
This definition will be type-stable (it will always return a Nullable{Bool}) and it will be able to signal all three possible results; get(a) == b, get(a) != b and get(a) == null.

Forgot to say Base.== returns bool, so this now makes == type unstable.

Ivar Nesje

unread,
Jan 7, 2015, 4:52:03 AM1/7/15
to julia...@googlegroups.com
Forgot to say Base.== returns bool, so this now makes == type unstable.

Type stable (in the context of Julia) means that the return type can be statically inferred from the argument types. This means that + is type stable, even though it returns a Float64 if the arguments are Float64 and Int if the arguments are Int.

+(a::Int, b::Int) would be type unstable if typemax(Int) + 1 would be a BigInt, because then you'd have to look at the value of a and b to figure out what the type of the returned value would be.

This is an easy mistake to make, but the distinction is important.

Milan Bouchet-Valat

unread,
Jan 7, 2015, 6:36:28 AM1/7/15
to julia...@googlegroups.com
We're hitting the debate about what John calls [1] ontological
missingness vs. epistemological missingness. You appear to assume that
if a `Nullable` is `null`, then it has no value (ontological
missingness). But another conception (which is more common in
statistics) is that the value exists somewhere, but we don't know it
(epistemological missingness).

In the former situation, you indeed know that `(null != 3) == true`. In
the latter, you have to propagate the uncertainty by saying `(null != 3)
== null`.

> And throwing exceptions is expensive and prevents the test being used
> in high performance code. In fact I would consider it rather nasty if
> something like an equality test could throw. That means the ==
> function can't be used in any code that is a callback from C if there
> is any possibility of one of its parameters being a nullable.
>
> Thats not to say that a user can't define their *own* version with
> either of these characteristics if it suits their use-case, but any
> general case should prefer the type stable high performance usage.
My two proposals higher in the thread are type-stable. And I don't think
raising an exception hurts performance (even `getindex` does it), and
while it can be a problem with C callbacks, that's already the case of
all functions that raise exceptions (including `get(::Nullable)` itself
-- thus any code working with `Nullable` may raise exceptions if not
written carefully).

That said, I agree that always returning a `Nullable{Bool}` may not be
very practical in most contexts, even though it's consistent. The
solutions of throwing an exception or returning `false` are certainly
more practical. But that decision should be taken with regard to the
broader scope of how applying functions to `Nullable` works (discussion
at https://github.com/JuliaLang/julia/pull/9446 )


Regards


1:
https://github.com/JuliaCon/presentations/tree/master/RepresentingData

ele...@gmail.com

unread,
Jan 7, 2015, 7:25:04 AM1/7/15
to julia...@googlegroups.com
Wasn't aware of the statistics conception, I just based it on the terminology used in the documentation, that the value is "not present" or the isnull() tests if it is "missing a value".
 

In the former situation, you indeed know that `(null != 3) == true`. In
the latter, you have to propagate the uncertainty by saying `(null != 3)
== null`.

A Nullable is immutable, its value isn't down the back of the couch (which is my understanding of epistemological missingness, usually applied to the TV remote :), it can never get a value once its null.
 

> And throwing exceptions is expensive and prevents the test being used
> in high performance code.  In fact I would consider it rather nasty if
> something like an equality test could throw.  That means the ==
> function can't be used in any code that is a callback from C if there
> is any possibility of one of its parameters being a nullable.
>
> Thats not to say that a user can't define their *own* version with
> either of these characteristics if it suits their use-case, but any
> general case should prefer the type stable high performance usage.
My two proposals higher in the thread are type-stable. And I don't think
raising an exception hurts performance (even `getindex` does it), and
while it can be a problem with C callbacks, that's already the case of
all functions that raise exceptions (including `get(::Nullable)` itself
-- thus any code working with `Nullable` may raise exceptions if not
written carefully).

Sorry I wasn't clear enough, raising exceptions for errors is fine.  But (given my ontological viewpoint) I would consider this use as flow control, not an error.
 

That said, I agree that always returning a `Nullable{Bool}` may not be
very practical in most contexts, even though it's consistent. The
solutions of throwing an exception or returning `false` are certainly
more practical. But that decision should be taken with regard to the
broader scope of how applying functions to `Nullable` works (discussion
at https://github.com/JuliaLang/julia/pull/9446 )


Well, isequal(Nullable, Nullable) is defined to return true if both are null or false if only one is null or the result of isequal(the values)

So returning false seems consistent with the one null, one not null case.

Cheers
Lex

Milan Bouchet-Valat

unread,
Jan 7, 2015, 8:09:00 AM1/7/15
to julia...@googlegroups.com
That's not a technical question (immutable/mutable), but a conceptual
one. If you have missing values in e.g. survey data, it usually means
that the individual has not replied to the question (away, refused to
reply, bug in the collect...). So you cannot say whether the value would
have been 3 or something else.


Regards

Tamas Papp

unread,
Jan 7, 2015, 8:19:58 AM1/7/15
to julia...@googlegroups.com

On Wed, Jan 07 2015, Milan Bouchet-Valat <nali...@club.fr> wrote:

> Le mercredi 07 janvier 2015 à 04:25 -0800, ele...@gmail.com a écrit :
>> A Nullable is immutable, its value isn't down the back of the couch
>> (which is my understanding of epistemological missingness, usually
>> applied to the TV remote :), it can never get a value once its null.
> That's not a technical question (immutable/mutable), but a conceptual
> one. If you have missing values in e.g. survey data, it usually means
> that the individual has not replied to the question (away, refused to
> reply, bug in the collect...). So you cannot say whether the value would
> have been 3 or something else.

IMO it is very difficult to come up with a set of rules for operations
on missing data that satisfies all users (and uses), mostly because
"epistemological" and "ontological" missingness is sometimes mixed in
the same program/library, occasionally in subtle ways.

When a first best solution is not possible, my preference is for
simplicity, which in this case means having a simple mental model of how
missingness works. If I understand Nullable correctly, there is one
simple rule to grok: "missingness propagates" -- that's it. I find this
appealing, even if I have to work around some corner cases.

My understanding is that R is based on the same principle with respect
to NA, and it seems to work out (and, at the same time, is occasionally
confusing to newbies, but that may be inevitable).

best,

Tamas

Milan Bouchet-Valat

unread,
Jan 7, 2015, 8:26:44 AM1/7/15
to julia...@googlegroups.com
I'd be fine with that, but then other people seem to have a different
idea of how Nullable should behave.


Regards

ele...@gmail.com

unread,
Jan 7, 2015, 9:26:22 AM1/7/15
to julia...@googlegroups.com

> A Nullable is immutable, its value isn't down the back of the couch
> (which is my understanding of epistemological missingness, usually
> applied to the TV remote :), it can never get a value once its null.
That's not a technical question (immutable/mutable), but a conceptual
one. If you have missing values in e.g. survey data, it usually means
that the individual has not replied to the question (away, refused to
reply, bug in the collect...). So you cannot say whether the value would
have been 3 or something else.


In this application "no answer" is a value, not an absence of value.  It is an annoying value since it is a different type from 3 being a number, but it is an answer nonetheless.  The NA/None objects provided in other languages (and as I understand were previously provided in Julia) are intended to represent that value and to propagate.

Given that both viewpoints are different but valid, perhaps what has been learnt is that Julia needs both the Null object for representing application out-of-band values for statistics and Nullable{T} for the computer science concept of "no value".

Or just leave (==){T}(Nullable{T}, T) undefined as it is now, so the user can define it as they require.

Cheers
Lex
 

Regards

Reply all
Reply to author
Forward
0 new messages