Random Questions

brandon willard

unread,

Jul 9, 2012, 6:34:19 PM7/9/12

to cognitiv...@googlegroups.com

Hello,

I've been using foundry in a project of mine involving object tracking, particularly because of the bayesian interfaces and conjugate classes. Over the time I've been using it, I've gathered some questions:

Why doesn't the DefaultDataDistribution offer, or have, its internal weights in log scale?
MTJ offers a symmetric positive definite matrix class. How best could that be introduced into the framework?

I've re-written parts of the Kalman filter to use this class and the results are much better.

I've noticed that the KalmanFilter class uses inverses directly. I was curious as to why.
For index-based discrete distributions, is there a convenient class/utility to map objects to these indices (especially since some are vectors)?

I must say that I'm very excited to see a project like this and I see a real need for classes and designs of this nature, so thanks for putting it out there. Also, if you have, or plan to, make this project open to outside contribution, I would be more than willing to help.

Thanks,

Brandon

Dixon, Kevin R

unread,

Jul 11, 2012, 11:03:44 AM7/11/12

to cognitiv...@googlegroups.com, brandon...@gmail.com, Justin Basilico

Hi Brandon,

I'm happy to hear you're finding the Foundry useful! Let me see if I could try to answer/evade your questions:

Why doesn't the DefaultDataDistribution offer, or have, its internal weights in log scale?

It should... that's something I've been meaning to add for a while now... I'll try to start it this week.

MTJ offers a symmetric positive definite matrix class. How best could that be introduced into the framework?

I've re-written parts of the Kalman filter to use this class and the results are much better.

We constantly struggle with ease-of-use and computational efficiency... If more people want non-negative factorizations, etc, then we will have to put it in there. Would you mind sending me your implementation?

I've noticed that the KalmanFilter class uses inverses directly. I was curious as to why.

Laziness/abstraction. On a philosophical note, we tend to come down on the side that the Matrix.inverse() method should "do the right thing". That said the version of Kalman filter that works in inverse space is probably the way to go (in the same way that the Quasi-Newton algorithms estimate inverses directly). Would you mind sending me your implementation and I will make the commits on our side?

> Also, if you have, or plan to, make this project open to outside contribution, I would be more than willing to help.

That would be awesome... But the way our lawyers think, it would involve an insane amount of paperwork... I think they'd ask you to sign away your rights to all your code/kidneys/first born. (I'm exaggerating, but not by much :-)

We have taken quite a few external contributions at this point, but I don't see any way of making accepting contributions easier than email..... Sorry about that.

If you wouldn't mind sending me your improvements, I would be happy to include them in our next release (if they pass our unit tests etc.).

Also, if you've got any more ideas/suggestions/rants, we'd love to hear them so that we can keep improving the Foundry.

Thanks again,
Kevin

--
Kevin R. Dixon
Sandia National Laboratories (05635)
MS0621, TA-I: 324/133
tel: (505) 284-5615
fax: (505) 284-3258

From: cognitiv...@googlegroups.com [cognitiv...@googlegroups.com] on behalf of brandon willard [brandon...@gmail.com]
Sent: Monday, July 09, 2012 4:34 PM
To: cognitiv...@googlegroups.com
Subject: [EXTERNAL] [Cognitive Foundry] Random Questions

Justin Basilico

unread,

Jul 11, 2012, 11:17:49 PM7/11/12

to brandon...@gmail.com, Dixon, Kevin R, cognitiv...@googlegroups.com

Hi Brandon,

Always nice to hear someone is finding the Foundry useful. Your project looks interesting. Let us know how it goes.

DefaultDataDistribution doesn't have its internal weights in log scale since it was originally developed for accumulating data via lots of calls to increment, like counting values to build simple histograms. Since adding numbers in log-space is expensive (as opposed to multiplying them, which is cheap), it doesn't make sense to have it store in log-space by default for that use-case. That said, it probably does make sense to have another implementation of the interfaces that does keep its data in log space (and has a method to set the log value directly), especially for use-cases that are not doing things like incrementing but instead keeping small probabilities. The LogMath or LogNumber classes may be helpful in providing such an implementation. We could call it LogDataDistribution or something. : )

I also think that adding a non-negative matrix factorization could be useful. As Kevin notes, I think we limited the number of classes in MTJ that we wrapped for simplicity, but NNMF is pretty widely used, so we probably want to put it in at some point.

I'm not exactly sure what you mean by "index-based discrete distributions"? Can you explain? I'm not sure if this is what you mean, I have been considering adding an indexer class that would do bi-directional mapping between objects and indices, since that is something that seems to come up in several places when we use certain applications, like mapping data to dimensions of a vector. I have a partial implementation of this that I'll work on cleaning up and getting in there. This could also be a good bridge between InfiniteVectors and Vectors.

I also agree that we should think more about how to open up to accept more external contributions (in a way that the lawyers are happy, of course), since clearly people are interested in participating. We set up the cognitivefoundry.org site as a place to try and help build a community, but suggestions for how to make that work better are welcome.

Also, it has been a while since the last release (November); it is way past time to get another release out there.

Keep the questions and suggestions coming.

Thanks, : )

Justin

Dixon, Kevin R

unread,

Jul 12, 2012, 12:53:40 AM7/12/12

to Justin Basilico, brandon...@gmail.com, cognitiv...@googlegroups.com

> The LogMath or LogNumber classes may be helpful in providing such an implementation. We could call it LogDataDistribution or something. : )

Muahahaha... I'm almost done with the LogWeightDataDistribution implementation.

--
Kevin R. Dixon
Sandia National Laboratories (05635)
MS0621, TA-I: 324/133
tel: (505) 284-5615
fax: (505) 284-3258

From: Justin Basilico [jbas...@gmail.com]
Sent: Wednesday, July 11, 2012 9:17 PM
To: brandon...@gmail.com; Dixon, Kevin R
Cc: cognitiv...@googlegroups.com
Subject: Re: [EXTERNAL] [Cognitive Foundry] Random Questions

Dixon, Kevin R

unread,

Jul 12, 2012, 2:16:28 PM7/12/12

to cognitiv...@googlegroups.com, Justin Basilico, brandon...@gmail.com

I made a few changes to the hierarchy to allow it, but I just committed the initial revision of

LogWeightedDataDistribution

But I also retooled the

ScalarMap

hierarchy to make the classes a little more coherent... It will be available in the next release.

--
Kevin R. Dixon
Sandia National Laboratories (05635)
MS0621, TA-I: 324/133
tel: (505) 284-5615
fax: (505) 284-3258

From: cognitiv...@googlegroups.com [cognitiv...@googlegroups.com] on behalf of Dixon, Kevin R [krd...@sandia.gov]
Sent: Wednesday, July 11, 2012 10:53 PM
To: Justin Basilico; brandon...@gmail.com
Cc: cognitiv...@googlegroups.com
Subject: RE: [EXTERNAL] [Cognitive Foundry] Random Questions

Justin Basilico

unread,

Jul 13, 2012, 11:45:50 PM7/13/12

to cognitiv...@googlegroups.com, brandon...@gmail.com

Cool. What retooling did you have to do?

BTW, what is the "weighted" part? Does it not just store the values in log scale?

Thanks, : )

Justin

Dixon, Kevin R

unread,

Jul 14, 2012, 12:01:15 AM7/14/12

to cognitiv...@googlegroups.com, brandon...@gmail.com

It's a Weighted data distribution... that is, the keys (data) have real-valued weights. Now they're just represented in log space, so LogWeightedDataDistribution.

I re-tooled the abstract implementations of ScalarMap/NumericMap so that MutableDouble wasn't the only type of Number that could be easily re-implemented.

--
Kevin R. Dixon
Sandia National Laboratories (05635)
MS0621, TA-I: 324/133
tel: (505) 284-5615
fax: (505) 284-3258

From: cognitiv...@googlegroups.com [cognitiv...@googlegroups.com] on behalf of Justin Basilico [jbas...@gmail.com]
Sent: Friday, July 13, 2012 9:45 PM
To: cognitiv...@googlegroups.com
Cc: brandon...@gmail.com

brandon willard

unread,

Jul 19, 2012, 3:13:37 PM7/19/12

to cognitiv...@googlegroups.com, brandon...@gmail.com, Justin Basilico, krd...@sandia.gov

Sorry guys, I didn't get a single notification about your responses. You'd think there would be such a setting, but I can't find it.

Anyway, yeah, I can send some of my implementations (although some are complete hacks, and not really worthy of an API). One project that I'm using Foundry for is https://github.com/openplans/openplans-tracking-tools, the other, which uses only a little, is https://github.com/camsys/onebusaway-nyc. I'll point out the specifics once I get things settled in the tracking project; it was put together asap and part of cleaning it up revolves around finalizing the interface/design/usage of Foundry. In particular, the Bayesian conjugacy code, which I would love to see working with our models in a nearly plug-and-play fashion, so that we can easily define terms with conjugates and let the code "integrate" the steps that it can.

Regarding the log-scale data distribution, I put some really scary code in there to track the integer count of how many times an equal term is added (the count was needed for resampling later on). Btw, onebusaway-nyc uses a totally separate log-based data distribution that's backed by a multi-map.

A couple of other things that might be of interest:

In the tracking project there's an implementation of a univariate normal CDF that provides log values (borrowed from a java library that used the exact R implementation). That has proven useful. As well, I'll be adding a truncated univariate normal distribution.

As far as contribution is concerned, something like a github repo would be awesome, even if only to pull updates.

On Wednesday, July 11, 2012 10:03:44 AM UTC-5, Dixon, Kevin R wrote:

Hi Brandon,

I'm happy to hear you're finding the Foundry useful! Let me see if I could try to answer/evade your questions:

Why doesn't the DefaultDataDistribution offer, or have, its internal weights in log scale?
It should... that's something I've been meaning to add for a while now... I'll try to start it this week.

MTJ offers a symmetric positive definite matrix class. How best could that be introduced into the framework?

I've re-written parts of the Kalman filter to use this class and the results are much better.
We constantly struggle with ease-of-use and computational efficiency... If more people want non-negative factorizations, etc, then we will have to put it in there. Would you mind sending me your implementation?

I've noticed that the KalmanFilter class uses inverses directly. I was curious as to why.
Laziness/abstraction. On a philosophical note, we tend to come down on the side that the Matrix.inverse() method should "do the right thing". That said the version of Kalman filter that works in inverse space is probably the way to go (in the same way that the Quasi-Newton algorithms estimate inverses directly). Would you mind sending me your implementation and I will make the commits on our side?

> Also, if you have, or plan to, make this project open to outside contribution, I would be more than willing to help.

That would be awesome... But the way our lawyers think, it would involve an insane amount of paperwork... I think they'd ask you to sign away your rights to all your code/kidneys/first born. (I'm exaggerating, but not by much :-)

We have taken quite a few external contributions at this point, but I don't see any way of making accepting contributions easier than email..... Sorry about that.

If you wouldn't mind sending me your improvements, I would be happy to include them in our next release (if they pass our unit tests etc.).

Also, if you've got any more ideas/suggestions/rants, we'd love to hear them so that we can keep improving the Foundry.

Thanks again,
Kevin

--
Kevin R. Dixon
Sandia National Laboratories (05635)
MS0621, TA-I: 324/133
tel: (505) 284-5615
fax: (505) 284-3258

From: cognitive-foundry@googlegroups.com [cognitive-foundry@googlegroups.com] on behalf of brandon willard [brandon...@gmail.com]

Sent: Monday, July 09, 2012 4:34 PM

To: cognitive-foundry@googlegroups.com

Subject: [EXTERNAL] [Cognitive Foundry] Random Questions

brandon willard

unread,

Jul 19, 2012, 3:18:01 PM7/19/12

to cognitiv...@googlegroups.com, brandon...@gmail.com, Dixon, Kevin R

Yeah, something like your indexer class is what I was asking about.

From: cognitive-foundry@googlegroups.com [cognitive-foundry@googlegroups.com] on behalf of brandon willard [brandon...@gmail.com]

Sent: Monday, July 09, 2012 4:34 PM

To: cognitive-foundry@googlegroups.com

Subject: [EXTERNAL] [Cognitive Foundry] Random Questions

Reply all

Reply to author

Forward