Re: enumerated types in combo

9 views
Skip to first unread message

Linas Vepstas

unread,
May 8, 2012, 12:42:09 AM5/8/12
to opencog, Nil Geisweiller
A discussion about adding a new ability to combo & moses.  I'm rather wishy-washy about whether its a good idea or not... see below.

> On 4 May 2012 17:08, Linas Vepstas <linasv...@gmail.com> wrote:
>>
>> I've taken some weak steps to add "enumerated types" to combo, but
>> promptly realized there are some hard design choices.   I want to be able to read a
>> datafile that looks like:
>>
>> 5.3,3.7,1.5,0.2,Iris-setosa
>> 5.0,3.3,1.4,0.2,Iris-setosa
>> 7.0,3.2,4.7,1.4,Iris-versicolor
>> 6.4,3.2,4.5,1.5,Iris-versicolor
>>
>> and predict the last column (which may have 3 or more "enumerated" values.

(this is a machine-learning "classification" problem.)

>> Yes, of course I could convert these to ints, but that's not the point,
>> since enumerated values are not ints, or contins, cannot be ordered, etc.


On 7 May 2012 14:15, Nil Geisweiller <ngei...@googlemail.com> wrote:
This is certainly a worthwhile addition! It would make multi-class
classification for instance much easier, right?

But you need to add some operators too, right? like some form of
switch case, or a enum_if (something that takes a conditional and
return an enum). Or some equality operator if you want to accept enum
as inputs.

Yes.

Here's the issues that I'm facing:
-- a simple-minded, and easy approach would be to write a simple script to replace enums with true-false values. So, if a file had three enums in it, the script would produce three output files, one where each enum is marked "true" and the other two "false".  Then moses would learn the three files. Simple, dirty cheap.

-- the fancy way is to try to implement all in combo/moses. This seems to require a lot of work: besides just adding the I/O support for such tables, I also need new primitives, and it is not at all clear what these should be, what their semantics should be.  I can't really think of any good precedents I could copy. Enums are kind-of-like "multi-valued logic".  In my years of functional programming, I can't think of ever having seen anything quite like this.

To be clear: given a formula with float-pt numbers in it, and float-pt functions, and booleans, and logical ops, how do add enums to the mix? what primitive would allow me to write formulas which mix enums with bools and floats?  A case-statement? Some kind of inverse-of-a-case-statement?  ...?

New operators also mean new reduct rules, so this all increases code complexity in many different places.  This is one reason why I'm so wishy-washy about it.  The other reason is that, as a learning task, its not obviously "easier to learn" or faster or more compact, than the simple-minded approach.

In the end, its not clear that building in a enum primitive its any better than having a pre-processing step.  Perhaps the correct design goal is to have a distinct pre-processing step?

--linas

Nil Geisweiller

unread,
May 8, 2012, 3:26:13 AM5/8/12
to linasv...@gmail.com, opencog
> To be clear: given a formula with float-pt numbers in it, and float-pt
> functions, and booleans, and logical ops, how do add enums to the mix? what
> primitive would allow me to write formulas which mix enums with bools and
> floats?  A case-statement? Some kind of inverse-of-a-case-statement?  ...?
>
> New operators also mean new reduct rules, so this all increases code
> complexity in many different places.  This is one reason why I'm so
> wishy-washy about it.

Indeed, specifically I see something like

1) equality between enum

enum_equal : enum * enum -> bool

2) enum conditional

enum_if : bool -> enum

would be enough (this is basically what you suggest, case-statement
and inverse-of-a-case-statement).

2 problems about the combo language as it is implemented

1) the type system doesn't have generic type, so we need to use
enum_if, instead of generic if, and enum_equal instead of generic
equal. Having a generic type would simplify the operator set (I don't
think it would have any impact, positive or negative, on learning, it
would just make stuff simpler).

2) The reduct engine operates on operators instead of algebraic
properties, if we had the latter then adding those operators would be
much less work.

>  The other reason is that, as a learning task, its not
> obviously "easier to learn" or faster or more compact, than the
> simple-minded approach.
>
> In the end, its not clear that building in a enum primitive its any better
> than having a pre-processing step.  Perhaps the correct design goal is to
> have a distinct pre-processing step?

If there are some overlap in computation between the classes, it seems
you would benefit from having a enum type (one learning instance)
rather than several learning instances, right?

Imagine your solution with enum is as followed:

f(x1, ..., xn) = if g(x1, ..., xn) then
enum1
else if h(x1, ..., xn)
then enum2
else enum3

I see 2 ways to translate that into the following 3 bool problems?

1)

is_enum1(x1, ..., xn) = g(x1, ..., xn)

is_enum2(x1, ..., xn) = h(x1, ..., xn) and not(g(x1, ..., x2))

is_enum3(x1, ..., xn) = not(h(x1, ..., xn)) and not(g(x1, ..., x2))

You see that's more expensive, you've got to relearn g and h multiple times.

2) Only 2 functions to learn

bit1(x1, ..., xn)
bit2(x1, ..., xn)

then you say that

enum1 = not(bit1) and not(bit2) // 00

enum2 = bit1 and not(bit2) // 01

enum3 = otherwise // 10 or 11

Then how must be bit1 and bit2?

Sorry I don't the time (or the brain power) to carry out that simple
boolean algebra, again I don't think you'd end up with something as
simple as the enum type solution.

Though you might ask, although the enum type solution is the shortest
one, now I've got boolean + enum vocabulary in the set of evolving
trees, is that extra vocabulary not gonna make learning harder to the
point that it's actually equivalent to learn different instances?
Again I think it's a matter of how much overlap there is in the
expression of the solutions without enum.

Nil

>
> --linas

Nil Geisweiller

unread,
May 8, 2012, 4:36:03 AM5/8/12
to linasv...@gmail.com, opencog
(!b2 and !b1) or (!b2 and b1) = g or (h and !g)

!b2 or (b1 and !b1) = g or (h and !g)

!b2 = g or (h and !g)

!b2 = (g or h) and (g or !g)

!b2 = g or h

b2 = !g and !h

So whatever is b1, b1 + b2 still forms something more complicated than
the solution involving enum...

Nil

Linas Vepstas

unread,
May 8, 2012, 10:48:56 PM5/8/12
to Nil Geisweiller, opencog
On 8 May 2012 02:26, Nil Geisweiller <ngei...@googlemail.com> wrote:

Indeed, specifically I see something like

1) equality between enum

enum_equal : enum * enum -> boo

Yes, this is exactly one of the primitives.
 
2) enum conditional

enum_if : bool -> enum

Yes. Well, that signature isn't right  its  enum_if: bool, enum, enum -> enum  so that only one of the two enums is chosen.

alternately maybe implementing cond would be better  so (cond (predicate1 expression1) (predicate2 expression2) ... else expression n)

that is, cond is a bunch of nested if's. while if is just a special case of cond, allowing only one predicate.
 
A general issues is knowing the  knowing the domain of an enum.  I'd like to guess it while reading the table.  The domain for one column may be different from that in another column.  At any rate, the enums are not known at c++ compile time.

 
2 problems about the combo language as it is implemented

1) the type system doesn't have generic type, so we need to use
enum_if, instead of generic if, and enum_equal instead of generic
equal. Having a generic type would simplify the operator set (I don't
think it would have any impact, positive or negative, on learning, it
would just make stuff simpler).

Hmm. I don't understand.  I was thinking of partially implementing a generic equal, using something like "if (tr.child_node()==id::enum_type) {..} else if  (tr.child_node()==id::boolean_type) {..}  etc. wouldn't this work?

Or are you implying that there some more elegant way of doing type inference and thunking?


2) The reduct engine operates on operators instead of algebraic
properties, if we had the latter then adding those operators would be
much less work.

Yeah, Maybe I'll think about this a bit. I don't want to boil the ocean.  Ever read any model theory?  I suspect it will have to be added to curriculum -- I've got wilfrid hodges 'a shorter model theory', its fairly readable. 


Imagine your solution with enum is as followed:

OK, you've convinced me.

 --linas

we had a bad lightning storm a few days ago, and it knocked out my computer I lost the last few days time recovering... :-(

Nil Geisweiller

unread,
May 9, 2012, 1:14:15 AM5/9/12
to linasv...@gmail.com, opencog
>> 2) enum conditional
>>
>> enum_if : bool -> enum
>
>
> Yes. Well, that signature isn't right  its  enum_if: bool, enum, enum ->
> enum  so that only one of the two enums is chosen.

Indeed.

>
> alternately maybe implementing cond would be better  so (cond (predicate1
> expression1) (predicate2 expression2) ... else expression n)
>
> that is, cond is a bunch of nested if's. while if is just a special case of
> cond, allowing only one predicate.

Indeed.

>
> A general issues is knowing the  knowing the domain of an enum.  I'd like to
> guess it while reading the table.  The domain for one column may be
> different from that in another column.  At any rate, the enums are not known
> at c++ compile time.

Indeed.

>
>
>>
>> 2 problems about the combo language as it is implemented
>>
>> 1) the type system doesn't have generic type, so we need to use
>> enum_if, instead of generic if, and enum_equal instead of generic
>> equal. Having a generic type would simplify the operator set (I don't
>> think it would have any impact, positive or negative, on learning, it
>> would just make stuff simpler).
>
>
> Hmm. I don't understand.  I was thinking of partially implementing a generic
> equal, using something like "if (tr.child_node()==id::enum_type) {..} else
> if  (tr.child_node()==id::boolean_type) {..}  etc. wouldn't this work?
>
> Or are you implying that there some more elegant way of doing type inference
> and thunking?

2 things

1) you *can* implement that as a generic equal, actually it would be
simpler, it would just be 'return eval(child1) == eval(child2)',
boost.variant already offers the operator== for combo tree nodes.

2) but then what should be the type of that so the type checker can
deal with it? At first I though you need to express that as a variable
type

a * a -> a (or a -> a -> a)

but I'm now realizing it's stupid. You do want to compare apple and
oranges. So you could just use an uber type. So the signature would be

union(bool, contin, enum) * union(bool, contin, enum) -> bool

>
>
>> 2) The reduct engine operates on operators instead of algebraic
>> properties, if we had the latter then adding those operators would be
>> much less work.
>
>
> Yeah, Maybe I'll think about this a bit. I don't want to boil the ocean.
>  Ever read any model theory?  I suspect it will have to be added to
> curriculum -- I've got wilfrid hodges 'a shorter model theory', its fairly
> readable.
>

model theory is definitely the answer (or at least most of it). I
don't know about "a shorter model theory".

>
>> Imagine your solution with enum is as followed:
>
>
> OK, you've convinced me.
>
>  --linas
>
> we had a bad lightning storm a few days ago, and it knocked out my computer
> I lost the last few days time recovering... :-(

Once, I got fried everything, my cheap UPS, motherboard and screens.
Now I'm having everything under a surge protector with the biggest UPS
I could find connected to it (UPS are silent nowadays, even the big
ones).

Nil

Nil Geisweiller

unread,
May 9, 2012, 1:33:10 AM5/9/12
to linasv...@gmail.com, opencog
Regarding cond, yeah, it's a good idea to have that rather than if-then-else.

It would also be nice to finally have the type checker support generic
types (so we only need one cond operator, instead of one for every
type).

Maybe Mandi could work on that? Because to have list in combo, to do
that well, we need generic types supported, otherwise we can't define
the signature of the fold operator.

Nil
Reply all
Reply to author
Forward
0 new messages