Union types

154 views
Skip to first unread message

Cédric Beust ♔

unread,
Oct 30, 2012, 10:25:15 PM10/30/12
to java...@googlegroups.com
Bouncing off another topic mentioned in the latest podcast: union types. Yes, C had them but the first time I was exposed to them was in Pascal, and the discussion on the podcast reminded me of an awesome hack that blew my mind a very long time ago.

Pascal was well known to be very strict and safe. Among other things, it didn't let you access the memory directly, which was a big deal in the 8 bit era where memory protection was a distant dream and PEEK and POKE were how you wrote games.

And then, one day, somebody found a way to address the memory using standard Pascal. Here is the trick (from memory, so it's probably not quite correct):

type
  b = record
        x : array[1..65536] of ^integer;
        y : integer;
      end;

This declares a union type that is made of either an array of 65k pointers to integers or a single integer. Then you initialize this record in its "x" side with the memory address you want to peek and you access it by using the "x" side of the record.

It took me months to understand what this code did, but what a revelation it was when it finally clicked...

By the way, these types are formally known as "sum types" (Either is one of them).

-- 
Cédric


Ricky Clarkson

unread,
Oct 30, 2012, 10:34:12 PM10/30/12
to java...@googlegroups.com
The difference being that your Pascal example has no notion of one
side being the real side, hence it being useful for subverting the
type system, and an Either has that defined and doesn't allow any
subversion, but lets you abstract over it being one thing or the
other. In C it can be used for 'evil' too but a common approach is to
have a tag saying what the type really is.
> --
> You received this message because you are subscribed to the Google Groups
> "Java Posse" group.
> To post to this group, send email to java...@googlegroups.com.
> To unsubscribe from this group, send email to
> javaposse+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/javaposse?hl=en.

Cédric Beust ♔

unread,
Oct 30, 2012, 10:39:08 PM10/30/12
to java...@googlegroups.com
I hope you're not suggesting that I was contrasting the quality of a piece of code that's forty years old with one written in a modern programming language...

-- 
Cédric

Ricky Clarkson

unread,
Oct 30, 2012, 10:56:22 PM10/30/12
to java...@googlegroups.com
datatype 'a 'b either =
Left of 'a
| Right of 'b

ML, 1973. Pascal is only 3 years older, but much much sillier.

Cédric Beust ♔

unread,
Oct 30, 2012, 11:01:04 PM10/30/12
to java...@googlegroups.com
Again, I'm not sure what you are trying to achieve here. I just offered a funny piece of hackdom from thirty years ago but you seem to want to turn this into a language war?

-- 
Cédric

Ricky Clarkson

unread,
Oct 30, 2012, 11:08:02 PM10/30/12
to java...@googlegroups.com
Sigh. I was just making it clear in the first instance that while
what you said is related to Either, Either is safe. Relating it to a
C union gives some good intuitions but without clarification makes it
sound unsafe.

In the second instance I was hoping to find parameterised algebraic
data types in a language older than Pascal to jokingly refute the idea
that you need a modern language to get that.

I hope we haven't reached a point where you don't understand me if I'm
not directly arguing with you. :)

Casper Bang

unread,
Oct 31, 2012, 3:34:45 AM10/31/12
to java...@googlegroups.com, ced...@beust.com


On Wednesday, October 31, 2012 3:25:40 AM UTC+1, Cédric Beust ♔ wrote:
This declares a union type that is made of either an array of 65k pointers to integers or a single integer. Then you initialize this record in its "x" side with the memory address you want to peek and you access it by using the "x" side of the record.

It took me months to understand what this code did, but what a revelation it was when it finally clicked...
 
I don't get it... :) While I can see a trick in declaring a virtual memory-map overlaying the entire memory space, addressable by integer offset; how does y (and the remaining memory) become useful at all, if it consumes 64K integers? Is the trick that while you can initialize with y, x doesn't actually reserve any memory, yet remaining able to address by x?

Bruce Chapman

unread,
Oct 31, 2012, 4:22:08 AM10/31/12
to java...@googlegroups.com
I did something similar with FORTRAN, putting an array 1 element long in
common area, then finding the address of it with, IIRC, ADR() intrinsic,
then subtracting that from the address I wanted to access, then using
that result as the subscript to access the array (no array bounds
checking) and getting whatever memory location I wanted. All locations
were within my program, I never tried to see if I could access memory
owned by the OS. Circa 1984 on VAX. Used so that the interpreter for my
DSL could access variables in the main program. Not exactly a union
type, but the same effect.

Bruce

Casper Bang

unread,
Oct 31, 2012, 9:01:21 AM10/31/12
to java...@googlegroups.com, ced...@beust.com
Just wanted to point out, that although frowned upon, Java still allows free interpretation/modification of the (typically 4K) memory page under control by the JVM. I.e. if you declare a "uniony type" via an array:

        byte[] a = new byte[] {1, 2, 3, 4, 5, 6, 7, 8};

...and enter the realm of java.util.Unsafe:

        Field field = Unsafe.class.getDeclaredField("theUnsafe"); 
        field.setAccessible(true); 
        Unsafe $ = (Unsafe) field.get(null);

...one may interpret individual bytes without any bounds check:

        long offset = $.arrayBaseOffset(byte[].class);
        out.println( $.getByte(a, offset) ); // 1
        out.println( $.getByte(a, offset + 1) ); // 2
        out.println( $.getByte(a, offset + 2) ); // 3
        out.println( $.getByte(a, offset + 3) ); // 4
        out.println( $.getByte(a, offset + 4) ); // 5
        out.println( $.getByte(a, offset + 5) ); // 6
        out.println( $.getByte(a, offset + 6) ); // 7
        out.println( $.getByte(a, offset + 7) ); // 8

...or as 4 Bahama shorts:

        out.println( $.getShort(a, offset) ); // 513
        out.println( $.getShort(a, offset + 2) ); // 1027
        out.println( $.getShort(a, offset + 4) ); // 1541
        out.println( $.getShort(a, offset + 6) ); // 2055

...or 2 integers:

        out.println( $.getInt(a, offset) ); // 67305985
        out.println( $.getInt(a, offset + 4) ); // 134678021

...or 1 long:

        System.out.println( $.getLong(a, offset) ); // 578437695752307201

So although Sun made it very hard to escape the playpen, it is possible and indeed NIO's DirectByteBuffer etc. takes advantage of this. It's also possible to step beyond a types memory and access memory belonging to another type, although this feels safer to do on the stack than on the heap.



On Wednesday, October 31, 2012 3:25:40 AM UTC+1, Cédric Beust ♔ wrote:

Cédric Beust ♔

unread,
Oct 31, 2012, 11:57:34 AM10/31/12
to Casper Bang, java...@googlegroups.com
On Wed, Oct 31, 2012 at 12:34 AM, Casper Bang <caspe...@gmail.com> wrote:
I don't get it... :) While I can see a trick in declaring a virtual memory-map overlaying the entire memory space, addressable by integer offset; how does y (and the remaining memory) become useful at all, if it consumes 64K integers? Is the trick that while you can initialize with y, x doesn't actually reserve any memory, yet remaining able to address by x?

I am sorry but your maintenance contract for this piece of code expired in 1988. Feel free to contact our sales department to establish a new contract, although you need to be aware that our hourly rates for Pascal code maintenance roughly equals the GDP of a small banana republic.

-- 
Cédric

Casper Bang

unread,
Nov 1, 2012, 7:47:11 AM11/1/12
to java...@googlegroups.com, Casper Bang, ced...@beust.com


On Wednesday, October 31, 2012 4:58:01 PM UTC+1, Cédric Beust ♔ wrote:

I am sorry but your maintenance contract for this piece of code expired in 1988. Feel free to contact our sales department to establish a new contract, although you need to be aware that our hourly rates for Pascal code maintenance roughly equals the GDP of a small banana republic.

Darn, we have already spent our maintenance budget for this year on legacy Java code (well actually, the code was never gotten to, but the consultants figured out which IDE and frameworks were used so we expect to be able to compile soon, certainly no later than Christmas).
Reply all
Reply to author
Forward
0 new messages