When data representations go bad (and typed nulls)

104 views

Skip to first unread message

Jon Harrop

unread,

Nov 30, 2012, 1:50:30 AM11/30/12

to pragmatic-functional...@googlegroups.com

Very interesting Stack Overflow question here about weird behaviour from F#
when .NET constructs a default value of the F# list type:

http://stackoverflow.com/questions/13520720/unchecked-defaultof-on-a-list

Perhaps a more obvious problem is this:

Array.zeroCreate<int []> 3

Which creates an array of three default values of the int array type, i.e.
an array of 3 nulls even though null is not a valid value of the array type
in .NET. For example, attempting to read the length of your default arrays
with give you a null reference exception when length=zero would be
preferable.

I'd like to push an idea that seems to get overlooked a lot. I call it the
"typed null"...

On .NET, types are either value types or reference types. Reference types
can always be null. Null can be an alluring representation to use for things
like the empty array, None or the empty list because it does not incur heap
allocation, might not even incur the write barrier and puts less stress on
the GC (fewer references to follow). For example, creating an array of 100M
is 5x faster than creating an array of empty arrays here. F# will even use
null to represent values of the first argumentless case of a union type if
you ask it to. However, using null directly is problematic because the
absence of a reference means there is no run-time type information so
functionality like pretty printing breaks down:

> None, None;;
val it : 'a option * 'b option = (null, null)

Note how the "None" value of the option type is pretty printed "null" in
this case.

I tried to solve this problem in HLVM by complicating references. In HLVM,
each reference is a quad-word containing the metadata that would normally be
put in the header of a heap-allocated block for a normal garbage collector.
So your null reference still contains a pointer to the run-time type
information so you can interpret it (e.g. HLVM prints the above correctly as
(None, None)). This turned out to have advantages and disadvantages. In
particular, interop with C is trivial because you can refer directly to raw
memory without requiring a header. Surprisingly, performance was not
degraded much. But the main disadvantage was broken memory safety because
reading and writing references is no longer atomic so you can obtain
corrupted references in multithreaded programs. I wrote a portmortem here:

http://flyingfrogblog.blogspot.co.uk/2010/09/lessons-learned-from-hlvm.html

I would still like to investigate the idea of representing references as
pairs of pointers in order to swap them over using a ragged barrier when a
GC cycle is incurred as a quick and easy way to snapshot the global roots
without global synchronization but I digress.

When I came back to .NET I realised that you can actually already create
typed nulls in .NET! The solution is to wrap your reference in a value type
that has methods that know how to interpret the NULL correctly. This also
means the default values will work.

For example:

[<StructuredFormatDisplay("{AsString}")>]
[<Struct>]
type MyOption<'a> =
val hasValue : bool
val value : 'a
new(()) = {hasValue=false; value=Unchecked.defaultof<_>}
new(x) = {hasValue=true; value=x}

member opt.AsString = opt.ToString()

override opt.ToString() =
if opt.hasValue then sprintf "%A" (Some opt.value) else "None"

does this:

> (MyOption<int>(), MyOption<int>());;
val it : MyOption<int> * MyOption<int> = (None, None)

and this:

> Array.zeroCreate<MyOption<int>> 3;;
Real: 00:00:00.000, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0
val it : MyOption<int> [] = [|None; None; None|]

I cannot find it right now but there is a function in the F# standard
library that checks for nulls because they didn't use types nulls. I've seen
it turn up on profiles a lot because it is slow...

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com

Reply all

Reply to author

Forward

0 new messages