Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion enumerated types in combo

Received: by 10.204.149.210 with SMTP id u18mr2073104bkv.1.1336452152047;
        Mon, 07 May 2012 21:42:32 -0700 (PDT)
X-BeenThere: opencog@googlegroups.com
Received: by 10.204.129.85 with SMTP id n21ls3057695bks.6.gmail; Mon, 07 May
 2012 21:42:30 -0700 (PDT)
Received: by 10.204.143.143 with SMTP id v15mr2072934bku.8.1336452150372;
        Mon, 07 May 2012 21:42:30 -0700 (PDT)
Received: by 10.204.143.143 with SMTP id v15mr2072933bku.8.1336452150344;
        Mon, 07 May 2012 21:42:30 -0700 (PDT)
Return-Path: <linasveps...@gmail.com>
Received: from mail-lpp01m010-f48.google.com (mail-lpp01m010-f48.google.com [209.85.215.48])
        by gmr-mx.google.com with ESMTPS id hy18si24179376bkc.2.2012.05.07.21.42.30
        (version=TLSv1/SSLv3 cipher=OTHER);
        Mon, 07 May 2012 21:42:30 -0700 (PDT)
Received-SPF: pass (google.com: domain of linasveps...@gmail.com designates 209.85.215.48 as permitted sender) client-ip=209.85.215.48;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of linasveps...@gmail.com designates 209.85.215.48 as permitted sender) smtp.mail=linasveps...@gmail.com; dkim=pass header...@gmail.com
Received: by lagu2 with SMTP id u2so3935753lag.7
        for <opencog@googlegroups.com>; Mon, 07 May 2012 21:42:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:reply-to:in-reply-to:references:from:date:message-id
         :subject:to:cc:content-type;
        bh=VljbJ4XoHTozfUmyErrrfTRqbW5nPWM9Qb/vXsR7jPo=;
        b=c530xx8xdNvszKBBMdXl99Qtyhl4IebhUVUMVD/SN9O8ntoAixoBVH2OYWcuL5ENza
         DO2lJkkweoopwhFh2ahC9ADLgWQizE8SQq5MzjGSsjERM189gSGb91n7A4Ako9dMKU75
         loEs1wy+Hv3hCcCADlFKtuTHEQUNAIi/8uwjTGZVcWPH5aEaSqAIZCYfkx3T1fXuE4xO
         0W4IS+WYSHbS+Ji4lDE/Gq7xTNwRq1yQ51ndr2VVJwUd0MNmhLyWBPu7ACGKYiBvvPiz
         eVUtGy7SSbacU1/r+hMEL+snWFKcoP77b3kGAVdB4QnsupV/tcnqLnvn21vN4CR6C87w
         Y74A==
Received: by 10.112.98.162 with SMTP id ej2mr3684818lbb.98.1336452149937; Mon,
 07 May 2012 21:42:29 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.112.39.66 with HTTP; Mon, 7 May 2012 21:42:09 -0700 (PDT)
Reply-To: linasveps...@gmail.com
In-Reply-To: <CAFwCs==rymcfZ2ab5jAL=FOpYj3j1yfgRFBpfcmE5tde0Gz...@mail.gmail.com>
References: <CAHrUA349_NbFjMbfTk=c9HanLNN0738yCpTUReoDp5mwQ6W...@mail.gmail.com>
 <CAHrUA36spxi4MYj41z2eg24kDfbkA4_9No-ihKrbwHpOQzw...@mail.gmail.com> <CAFwCs==rymcfZ2ab5jAL=FOpYj3j1yfgRFBpfcmE5tde0Gz...@mail.gmail.com>
From: Linas Vepstas <linasveps...@gmail.com>
Date: Mon, 7 May 2012 23:42:09 -0500
Message-ID: <CAHrUA36W7HN+b5Qfa0G0BDcHpJuqok5bR=50pWYmoqUBsoA...@mail.gmail.com>
Subject: Re: enumerated types in combo
To: opencog <opencog@googlegroups.com>
Cc: Nil Geisweiller <ngeis...@googlemail.com>
Content-Type: multipart/alternative; boundary=f46d0401f8e9a806c504bf7f0363

--f46d0401f8e9a806c504bf7f0363
Content-Type: text/plain; charset=UTF-8

A discussion about adding a new ability to combo & moses.  I'm rather
wishy-washy about whether its a good idea or not... see below.

> On 4 May 2012 17:08, Linas Vepstas <linasveps...@gmail.com> wrote:
>>
>> I've taken some weak steps to add "enumerated types" to combo, but
>> promptly realized there are some hard design choices.   I want to be
able to read a
>> datafile that looks like:
>>
>> 5.3,3.7,1.5,0.2,Iris-setosa
>> 5.0,3.3,1.4,0.2,Iris-setosa
>> 7.0,3.2,4.7,1.4,Iris-versicolor
>> 6.4,3.2,4.5,1.5,Iris-versicolor
>>
>> and predict the last column (which may have 3 or more "enumerated"
values.

(this is a machine-learning "classification" problem.)

>> Yes, of course I could convert these to ints, but that's not the point,
>> since enumerated values are not ints, or contins, cannot be ordered, etc.


On 7 May 2012 14:15, Nil Geisweiller <ngeis...@googlemail.com> wrote:
>
> This is certainly a worthwhile addition! It would make multi-class
> classification for instance much easier, right?
>
> But you need to add some operators too, right? like some form of
> switch case, or a enum_if (something that takes a conditional and
> return an enum). Or some equality operator if you want to accept enum
> as inputs.


Yes.

Here's the issues that I'm facing:
-- a simple-minded, and easy approach would be to write a simple script to
replace enums with true-false values. So, if a file had three enums in it,
the script would produce three output files, one where each enum is marked
"true" and the other two "false".  Then moses would learn the three files.
Simple, dirty cheap.

-- the fancy way is to try to implement all in combo/moses. This seems to
require a lot of work: besides just adding the I/O support for such tables,
I also need new primitives, and it is not at all clear what these should
be, what their semantics should be.  I can't really think of any good
precedents I could copy. Enums are kind-of-like "multi-valued logic".  In
my years of functional programming, I can't think of ever having seen
anything quite like this.

To be clear: given a formula with float-pt numbers in it, and float-pt
functions, and booleans, and logical ops, how do add enums to the mix? what
primitive would allow me to write formulas which mix enums with bools and
floats?  A case-statement? Some kind of inverse-of-a-case-statement?  ...?

New operators also mean new reduct rules, so this all increases code
complexity in many different places.  This is one reason why I'm so
wishy-washy about it.  The other reason is that, as a learning task, its
not obviously "easier to learn" or faster or more compact, than the
simple-minded approach.

In the end, its not clear that building in a enum primitive its any better
than having a pre-processing step.  Perhaps the correct design goal is to
have a distinct pre-processing step?

--linas

--f46d0401f8e9a806c504bf7f0363
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<meta http-equiv=3D"content-type" content=3D"text/html; charset=3Dutf-8">A =
discussion about adding a new ability to combo &amp; moses. =C2=A0I&#39;m r=
ather wishy-washy about whether its a good idea or not... see below.<div><b=
r>&gt; On 4 May 2012 17:08, Linas Vepstas &lt;<a href=3D"mailto:linasvepsta=
s...@gmail.com">linasveps...@gmail.com</a>&gt; wrote:<br>

&gt;&gt;<br>&gt;&gt; I&#39;ve taken some weak steps to add &quot;enumerated=
 types&quot; to combo, but<br>&gt;&gt; promptly realized there are some har=
d design choices.=C2=A0=C2=A0 I want to be able to read a<br>&gt;&gt; dataf=
ile that looks like:<br>

&gt;&gt;<br>&gt;&gt; 5.3,3.7,1.5,0.2,Iris-setosa<br>&gt;&gt; 5.0,3.3,1.4,0.=
2,Iris-setosa<br>&gt;&gt; 7.0,3.2,4.7,1.4,Iris-versicolor<br>&gt;&gt; 6.4,3=
.2,4.5,1.5,Iris-versicolor<br>&gt;&gt;<br>&gt;&gt; and predict the last col=
umn (which may have 3 or more &quot;enumerated&quot; values.</div>

<div><br></div><div>(this is a machine-learning &quot;classification&quot; =
problem.)<br><br>&gt;&gt; Yes, of course I could convert these to ints, but=
 that&#39;s not the point,<br>&gt;&gt; since enumerated values are not ints=
, or contins, cannot be ordered, etc.<br>

<br><br><div class=3D"gmail_quote">On 7 May 2012 14:15, Nil Geisweiller <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:ngeis...@googlemail.com" target=3D"_bl=
ank">ngeis...@googlemail.com</a>&gt;</span> wrote:<blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex">


This is certainly a worthwhile addition! It would make multi-class<br>
classification for instance much easier, right?<br>
<br>
But you need to add some operators too, right? like some form of<br>
switch case, or a enum_if (something that takes a conditional and<br>
return an enum). Or some equality operator if you want to accept enum<br>
as inputs.</blockquote><div><br></div><div>Yes.</div><div><br></div><div>He=
re&#39;s the issues that I&#39;m facing:</div><div>-- a simple-minded, and =
easy approach would be to write a simple script to replace enums with true-=
false values. So, if a file had three enums in it, the script would produce=
 three output files, one where each enum is marked &quot;true&quot; and the=
 other two &quot;false&quot;. =C2=A0Then moses would learn the three files.=
 Simple, dirty cheap.</div>

<div><br></div><div>-- the fancy way is to try to implement all in combo/mo=
ses. This seems to require a lot of work: besides just adding the I/O suppo=
rt for such tables, I also need new primitives, and it is not at all clear =
what these should be, what their semantics should be. =C2=A0I can&#39;t rea=
lly think of any good precedents I could copy. Enums are kind-of-like &quot=
;multi-valued logic&quot;. =C2=A0In my years of functional programming, I c=
an&#39;t think of ever having seen anything quite like this.</div>

<div><br></div><div>To be clear: given a formula with float-pt numbers in i=
t, and float-pt functions, and booleans, and logical ops, how do add enums =
to the mix? what primitive would allow me to write formulas which mix enums=
 with bools and floats? =C2=A0A case-statement? Some kind of inverse-of-a-c=
ase-statement? =C2=A0...?</div>

<div><br></div><div>New operators also mean new reduct rules, so this all i=
ncreases code complexity in many different places. =C2=A0This is one reason=
 why I&#39;m so wishy-washy about it. =C2=A0The other reason is that, as a =
learning task, its not obviously &quot;easier to learn&quot; or faster or m=
ore compact, than the simple-minded approach.</div>

<div><br></div><div>In the end, its not clear that building in a enum primi=
tive its any better than having a pre-processing step. =C2=A0Perhaps the co=
rrect design goal is to have a distinct pre-processing step?</div><div><br>
</div>
<div>--linas</div></div></div>

--f46d0401f8e9a806c504bf7f0363--