[thebeast] r734 committed - more work on StarAI paper

1 view
Skip to first unread message

codesite...@google.com

unread,
Mar 23, 2010, 1:38:20 AM3/23/10
to thebeas...@googlegroups.com
Revision: 734
Author: sebastian.riedel
Date: Mon Mar 22 22:37:56 2010
Log: more work on StarAI paper
http://code.google.com/p/thebeast/source/detail?r=734

Modified:
/branches/thefuture-modules/thebeast-core/src/main/lyx/starai.lyx

/branches/thefuture-modules/thebeast-core/src/main/scala/org/riedelcastro/thebeast/env/doubles/LogLinear.scala

=======================================
--- /branches/thefuture-modules/thebeast-core/src/main/lyx/starai.lyx Sun
Mar 21 22:42:09 2010
+++ /branches/thefuture-modules/thebeast-core/src/main/lyx/starai.lyx Mon
Mar 22 22:37:56 2010
@@ -202,7 +202,7 @@
\end_layout

\begin_layout Section
-Domains, Worlds, Terms
+Domains, Variables, Worlds
\end_layout

\begin_layout Standard
@@ -219,39 +219,63 @@

\end_inset

- provides are several abstract datatypes that we will describe using traits
- in Scala.
- A
+ provides are abstract datatypes/interfaces for building blocks, as well
+ as a set of initial constructs.
+ We will describe them using classes and traits of Scala, a functional
object-or
+iented programming language.
+
+\end_layout
+
+\begin_layout Standard
+A
\family typewriter
-Domain
+Domain[T]
\family default
- contains the objects we want to talk about; A (possible)
+ contains (Scala) objects we want to talk about; implementations of Domain
+ usually need to provide an iterator over their objects, as well as a
\family typewriter
-World
+contains
\family default
-, which describes how objects relate to each other; and a
+ method to indicate Domain membership.
+ Note that a Domain can be infinite, and also be used when no enumeration
+ of its objects is possible.
+
+\end_layout
+
+\begin_layout Standard
+The three core types of domains are
\family typewriter
-Term
+Values, Tuples,
\family default
-, which are symbols that evaluate to objects in the domain given a world.
- Terms have an important sub-class: all terms that evaluate to real values.
- We will call such a term a
+and
\family typewriter
-Factor
+Functions
\family default
-, and will soon see how they relate to the well known concept of factor
- graphs.
- Default Building Blocks: FunApp, Variable, Constant.
+.
+ The former simply represents a user-defined set of objects of type T; the
+ latter a the set of all functions from a domain to a target.

\end_layout

-\begin_layout Standard
-(possible) worlds, and terms.
-
+\begin_layout LyX-Code
+val Tokens = Values(0,1,2,3,4,5,...)
\end_layout

+\begin_layout LyX-Code
+Val Bools = Values(false,true)
+\end_layout
+
+\begin_layout LyX-Code
+val Parses = (Tokens x Labels) -> Bools
+\end_layout
+
\begin_layout Standard
-A domain in
+Ultimately we want to reason about the objects of our domains, using
generic
+ knowledge independent of their actual identity.
+ To do so we need placeholders that allow us to speak about objects in an
+ abstract fashion.
+ This is generally achieved by variables.
+ In
\begin_inset ERT
status open

@@ -264,54 +288,47 @@

\end_inset

- is a (finite) collection of values of a certain scala runtime type
+ a variable is represented by objects of the trait
\family typewriter
-T
-\family default
-, and is implemented through objects of class
-\family typewriter
-Domain[T]
+Var[T]
\family default
.
- dfacto provides several
-\family typewriter
-Domain
-\family default
- subclasses that can be used to creates different types of domains.
- The simplest class is
-\family typewriter
-Values[T] (arg[T]:*) extends Domain[T]
-\family default
- , which allows the user to explicitely define the collection of values
- in the domain:
+ Each variable has a name and a domain that specifies which values the
variable
+ can possibly take on.
+ Variables can be simple (referring to simple objects) or complex (refering
+ to functions) :
\end_layout

\begin_layout LyX-Code
-val DepLabels = Values('SUBJ,'OBJ,'DET)
-\end_layout
-
-\begin_layout LyX-Code
-val Persons = Values(
+val root = Var(
\begin_inset Quotes eld
\end_inset

-Anna
+root
\begin_inset Quotes erd
\end_inset

-,
-\begin_inset Quotes erd
+, Tokens)
+\end_layout
+
+\begin_layout LyX-Code
+val parse = Var(
+\begin_inset Quotes eld
\end_inset

-Peter
+parse
\begin_inset Quotes erd
\end_inset

-,...)
+, Parses)
\end_layout

\begin_layout Standard
-In
+One can understand each variable as a question about the world we seek to
+ model.
+ In this sense each assignment to our variables refers to a possible state
+ of the world.
+ In
\begin_inset ERT
status open

@@ -324,92 +341,51 @@

\end_inset

- Variables are instances of the class
-\end_layout
-
-\begin_layout LyX-Code
-
+ we hence refer to assignments using the
\family typewriter
-Var[T] (name:String, domain:Domain[T])
-\end_layout
-
-\begin_layout LyX-Code
-
+World
+\family default
+ trait.
+ The core method a
\family typewriter
- extends Term[T]
-\end_layout
-
-\begin_layout Standard
-that is, they are named placeholders constrained by a
+World
+\family default
+ provides is
\family typewriter
-domain
+resolveVar
\family default
- (and type parameter
+: it returns the object the variable is assigned to, or
\family typewriter
-T
+None
\family default
-).
- For now we ask the reader overlook the superclass Term[T] and bear with
- us until section
-\begin_inset CommandInset ref
-LatexCommand ref
-reference "sec:Terms"
-
-\end_inset
-
-.
- Note that often we can use scala identifiers to refer to variables, and
- hence the name can be eft out if needed.
-
+ if no such object exists (for partial worlds).
+ To construct worlds we can use
+\family typewriter
+MutableWorld
+\family default
+ objects:
\end_layout

-\begin_layout Subsection
-Variables
-\end_layout
-
-\begin_layout Standard
-Typical simple type variables are
-\end_layout
-
\begin_layout LyX-Code
-val height = Var(Doubles(0,230.0))
+val world = new MutableWorld
\end_layout

\begin_layout LyX-Code
-val person = Var(Persons)
+world(root) = 0
\end_layout

\begin_layout LyX-Code
-val pair = Var(Persons x Persons)
+world(parse) = Map((0,2)->true,(0,4)->true)
\end_layout

-\begin_layout Standard
-Variables are also used to represent the notion of predicate as used in
- Markov Logic.
- A predicate is simply a variable that has a
-\family typewriter
-FunctionDomain
-\family default
- as type for which the range are
-\family typewriter
-Bools
-\family default
-.
- A simple example would be
-\end_layout
-
\begin_layout LyX-Code
-val dependency = Var(Tokens x Tokens -> Bools)
+world.close(parse)
\end_layout

-\begin_layout LyX-Code
-val friends = Var(Persons x Persons -> Bools)
+\begin_layout Section
+Terms and Factors
\end_layout

-\begin_layout Subsection
-Terms
-\end_layout
-
\begin_layout Standard
Intuitively, a term is a symbolic expression that is, given a possible
world,
evaluated to a value.
@@ -426,93 +402,83 @@

\end_inset

- a term is an instance of a term class, which is a subclass of the class
-
+ a term is an instance of a
\family typewriter
Term
\family default
-.
- A term can have further
-\emph on
-subterms
-\emph default
- and
-\emph on
-internal data
-\emph default
-, and specifies how the internal data and evaluation results for subterms
- are combined to a value for the term itself.
- Note that in contrast to a Model in Figaro, a term has no elements of
randomnes
-s.
- Finally, a term is parametrized by the class of values it evaluates
to---in
- scala this amounts to a class
+ trait that has to implement a
\family typewriter
-Term[T]
+evaluate(world)
\family default
- for value type
+ that defines how the term maps worlds to values.
+ It also needs to provide a list of
\family typewriter
-T
+Var
\family default
-.
+ objects that it depends on.
+
\end_layout

\begin_layout Standard
-A quintessential term is
+The core built-in Term classes are
+\end_layout
+
+\begin_layout Enumerate
+
\family typewriter
-Variable[T] (name:String) extends Term[T]
+Var
\family default
-, which is evaluated to the value the variable is assigned to in the
possible
- world.
- Another core term class is
+: evaluates to the value it is assigned to
+\end_layout
+
+\begin_layout Enumerate
+
\family typewriter
-Constant[T](value:T) extends Term[T]
+Constant()
\family default
-, which is always evaluated to
+: evaluates to the given constant
+\end_layout
+
+\begin_layout Enumerate
+
\family typewriter
-value
+FunApp(f,a)
\family default
-, regardless of the given possible world.
- Note that scala's implicit conversion feature allows us to write
+: evaluates to the result of applying f.evaluate to a.evaluate
+\end_layout
+
+\begin_layout Enumerate
+
\family typewriter
-value
+Quantification(o,v,f)
\family default
- instead of
-\family typewriter
-Constant(value)
-\family default
- in contexts where terms are expected.
-
+: ...
\end_layout

\begin_layout Standard
-\begin_inset ERT
-status open
-
-\begin_layout Plain Layout
-
-
-\backslash
-lang{}
+It should be clear that the above terms can be used to create a wide array
+ of composite terms.
+ For example...
+\end_layout
+
+\begin_layout Standard
+Terms are the core building blocks we use to construct probability
distributions
+ over possible worlds.
+ In our framework, a probability distribution is nothing more than a term
+ that evaluates to real values between 0 and 1 that sum up to 1 for all
+ possible assignments of its contained variables.
+ Once your term fulfills this contract, we can apply basic brute-force
inference
+ algorithms to calculate expectations and find most likely assignments.
\end_layout

-\end_inset
-
- supports functional composition through the term class
-\family typewriter
-FunApp[T,R] (f:Term[T=>R], arg:T) extends Term[R]
-\family default
-.
- This term is evaluated by evaluating the function term
-\family typewriter
-f
-\family default
-, the argument term
-\family typewriter
-arg
-\family default
-, and then applying the function value to the argument value.
- Note that this class allows us to incorporate arbitrary native scala
functions
- into
+\begin_layout Standard
+Obviously any exhaustive inference scheme becomes intractable when the
number
+ of variables in a term is large.
+ However, inference is generally more tractable if terms factor into a
product
+ of sub-terms.
+ In this case we can apply methods such as Belief Propagation that avoid
+ summing over all possible worlds.
+ In
\begin_inset ERT
status open

@@ -525,67 +491,16 @@

\end_inset

-: for a given function
+ such terms are instances of the
\family typewriter
-fun:T=>R
+Factorizable
\family default
- we can use the term
+ trait, and need to provide a
\family typewriter
-FunApp(Constant(fun),x)
+factorize
\family default
-to represent the application of this function to the value that
-\family typewriter
-x
-\family default
- evaluates to.
-
-\family typewriter
-FunApp
-\family default
- is hence very similar to
-\family typewriter
-Apply1
-\family default
- in Figaro.
-\begin_inset Foot
-status collapsed
-
-\begin_layout Plain Layout
-One diference is the fact that we allow the function to be a term as well.
-
-\end_layout
-
-\end_inset
-
-
-\end_layout
-
-\begin_layout Standard
-Finally, we will often make use of an
-\family typewriter
-IversonBracket (arg:Term[Boolean]) extends Term[Double]
-\family default
- class.
- This term evaluates the booelan
-\family typewriter
-arg
-\family default
- term, and evaluates to 1 if
-\family typewriter
-arg
-\family default
- evaluated to
-\family typewriter
-true
-\family default
-, and to 0 otherwise.
- This term is the cornerstone of Markov Logic---it provides the mapping
- from boolean expressions to real values that sits in each ground feature.
- Again, instead of fully writing out this term we accept
-\family typewriter
-$(arg)
-\family default
-$.
+ method that returns all terms the term factors in.
+
\end_layout

\begin_layout Section
=======================================
---
/branches/thefuture-modules/thebeast-core/src/main/scala/org/riedelcastro/thebeast/env/doubles/LogLinear.scala
Thu Nov 12 23:36:25 2009
+++
/branches/thefuture-modules/thebeast-core/src/main/scala/org/riedelcastro/thebeast/env/doubles/LogLinear.scala
Mon Mar 22 22:37:56 2010
@@ -5,12 +5,12 @@
import org.riedelcastro.thebeast.solve.ExhaustiveMarginalInference
import org.riedelcastro.thebeast.env._

+
/**
*/
-case class LogLinear(sufficient:VectorTerm, weights:VectorVar,
bias:DoubleTerm)
- extends Exp(Sum(Seq(VectorDotApp(sufficient,weights),bias))) {
-
- def marginalizeLogLinear(incoming:Beliefs[Any,EnvVar[Any]],
weightsValue:Vector) : Beliefs[Any,EnvVar[Any]] = {
+case class LogLinear(sufficient: VectorTerm, weights: VectorVar, bias:
DoubleTerm)
+ extends Exp(Sum(Seq(VectorDotApp(sufficient, weights), bias))) {
+ def marginalizeLogLinear(incoming: Beliefs[Any, EnvVar[Any]],
weightsValue: Vector): Beliefs[Any, EnvVar[Any]] = {
//default implementation
val env = new MutableEnv
//set weight variables in environment
@@ -18,9 +18,34 @@
//create the grounded term (that doesn't have weight variables)
val grounded = ground(env)
//exhaustive inference
-
ExhaustiveMarginalInference.marginalizeQueries(grounded,incoming,Set(sufficient))
+ ExhaustiveMarginalInference.marginalizeQueries(grounded, incoming,
Set(sufficient))
}

}

-class Weights extends HashMap[VectorVar,Vector]
+/**
+ * A Featurized term is a term that deterministically depends on
+ * the value of a feature-vector * weight dot product. Very close
+ * to general linear models, but does not require normalization. Note that
+ * the weight vector must be the result of grounding a vector variable,
and that
+ * that the term needs to provide this variable at request.
+ */
+trait Featurized extends DoubleTerm {
+ /**
+ * The feature vector for the given world/env.
+ */
+ def features(env: Env): Vector
+
+ /**
+ * The means/expectations of features given some beliefs for the
+ * free variable in the term, and assuming that these beliefs are
independent.
+ */
+ def means(incoming: Beliefs[Any, EnvVar[Any]]): Vector
+
+ /**
+ * The original weight vector variable that was grounded to produce the
weight vector.
+ */
+ def weights: VectorVar
+}
+
+class Weights extends HashMap[VectorVar, Vector]

Reply all
Reply to author
Forward
0 new messages