error: LLDA with TMT-0.3.3

140 views
Skip to first unread message

Fahim

unread,
Jun 1, 2011, 9:51:16 AM6/1/11
to ScalaNLP
Hi,

I am new in topic modeling. I have gone through the examples provided
in the following link:
http://nlp.stanford.edu/software/tmt/tmt-0.3/

Now I am trying to implement 'Inference on the labeled LDA model',
which can be regarded as a followup of the 'example-6-llda-
learn.scala' provided on the website.

I have written 'example-6-llda-infer.scala' as follows:
--------------------------------------------------------------------------------------------------------------------------------------------------
import scalanlp.io._;
import scalanlp.stage._;
import scalanlp.stage.text._;
import scalanlp.text.tokenize._;
import scalanlp.pipes.Pipes.global._;

import edu.stanford.nlp.tmt.stage._;
import edu.stanford.nlp.tmt.model.lda._;
import edu.stanford.nlp.tmt.model.llda._;

// the path of the model to load
val modelPath = file("llda-cvb0-59ea15c7-31-61406081-7fb4b5bd"); //
labeled LDA

println("Loading "+modelPath);
val model = LoadCVB0LabeledLDA(modelPath);

// A new dataset for inference. (Here we use the same dataset
// that we trained against, but this file could be something new.)
val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);

val text = {
source ~> // read from the source file
Column(4) ~> // select column containing
text
TokenizeWith(model.tokenizer.get) // tokenize with existing
model's tokenizer
}

// define fields from the dataset we are going to slice against
val labels = {
source ~> // read from the source file
Column(2) ~> // take column three, the
tags
TokenizeWith(WhitespaceTokenizer()) ~> // turns label field into an
array
TermCounter() ~> // collect label counts
TermMinimumDocumentCountFilter(10) // filter labels in < 10 docs
}

// Base name of output files to generate
val output = file(modelPath,
source.meta[java.io.File].getName.replaceAll(".csv",""));

// turn the text into a dataset ready to be used with LLDA
val dataset = LabeledLDADataset(text, labels, model.termIndex,
model.topicIndex);

println("Writing document distributions to "+output+"-document-topic-
distributions.csv");
val perDocTopicDistributions =
InferCVB0LabeledLDADocumentTopicDistributions(model, dataset);
CSVFile(output+"-document-topic-
distributuions.csv").write(perDocTopicDistributions);
--------------------------------------------------------------------------------------------------------------------------------------------------

But the last line of the above code yields the following error
message:
example-6-llda-infer.scala:55: error: could not find implicit value
for evidence parameter of type
scalanlp.serialization.TableWritable[scalanlp.collection.LazyIterable[(String,
scalala.collection.sparse.SparseArray[Double])]]
CSVFile(output+"-document-topic-
distributuions.csv").write(perDocTopicDistributions);

I have tried to go through the documentations but could not resolve
this. Can you please suggest?

Thanks in advance.
Fahim

Fahim

unread,
Jun 6, 2011, 1:42:54 PM6/6/11
to ScalaNLP
Is the group inactive? Am I wasting my time posting here?

David Hall

unread,
Jun 6, 2011, 1:58:21 PM6/6/11
to scal...@googlegroups.com, Daniel Ramage
On Mon, Jun 6, 2011 at 10:42 AM, Fahim <minhaz...@gmail.com> wrote:
> Is the group inactive? Am I wasting my time posting here?

No, it's (slightly) active, but TMT (and most of the pieces in
Scalanlp that talk to TMT) is Dan Ramage's baby and he can be very
slow to respond to emails. Especially since he's trying to graduate
and he's getting married in less than a month.

Without knowing the details, my guess would be to change the lazy
iterable to a strict iterable, possibly by calling IndexedSeq.empty ++
(iterable).

I'll cc Dan on this message.

-- David

> --
> You received this message because you are subscribed to the Google Groups "ScalaNLP" group.
> To post to this group, send email to scal...@googlegroups.com.
> To unsubscribe from this group, send email to scalanlp+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scalanlp?hl=en.
>
>

Fahim

unread,
Jun 6, 2011, 3:40:19 PM6/6/11
to ScalaNLP
Thanks a lot for responding. It keeps my hope alive. I will be trying
with what you suggested. I am new to both scala and TMT. I highly
appreciate your help.

On Jun 6, 11:58 am, David Hall <d...@cs.berkeley.edu> wrote:

Fahim

unread,
Jun 6, 2011, 4:40:06 PM6/6/11
to ScalaNLP
According to your suggestion, I replaced the last line by the
following two lines:

val DTD = IndexedSeq.empty ++ (perDocTopicDistributions)
CSVFile(output+"-document-topic-distributuions.csv").write(DTD);

It now gives the following error (similar to the earlier):

example-6-llda-infer.scala:56: error: could not find implicit value
for evidence parameter of type
scalanlp.serialization.TableWritable[IndexedSeq[(String,
scalala.collection.sparse.SparseArray[Double])]]
CSVFile(output+"-llda-document-topic-distributuions.csv").write(DTD);

I think, the issue here is with the SparseArray.
Could you please suggest more?


On Jun 6, 11:58 am, David Hall <d...@cs.berkeley.edu> wrote:

Fahim

unread,
Jun 9, 2011, 8:26:11 AM6/9/11
to ScalaNLP
Ok, since there is no response from anyone, I have devised a work-
around. Here it is:
---------------------------------------------------------------------------
val DTD = InferCVB0LabeledLDADocumentTopicDistributions(model,
dataset);
val it = DTD.iterator
while(it.hasNext){
val element = it.next
val count = element._1
print("\n" + count)
val sprsArray = element._2
var i = 0
while (i < sprsArray.length) {
val ar = sprsArray.toArray(i)
print(", " + ar)
i += 1
}
}
---------------------------------------------------------------------------
Not quite sure, if this is correct. Any comments?

David Hall

unread,
Jun 11, 2011, 8:11:21 PM6/11/11
to scal...@googlegroups.com
That looks reasonable. There are more-scala-y ways of doing the same
thing, but it gets the job done.

-- David

Fahim

unread,
Jun 12, 2011, 8:08:49 PM6/12/11
to ScalaNLP
Thanks for your comments, they give me confidence..
Now, could you please comment on my other post at
http://groups.google.com/group/scalanlp/browse_thread/thread/14e16dece0d22b5c
I will highly admire !

On Jun 11, 6:11 pm, David Hall <d...@cs.berkeley.edu> wrote:
> That looks reasonable. There are more-scala-y ways of doing the same
> thing, but it gets the job done.
>
> -- David
>

David Hall

unread,
Jun 12, 2011, 9:00:40 PM6/12/11
to scal...@googlegroups.com
I just have no idea without knowing how TMT works. Dan is your best bet.

M. Fahim Zibran

unread,
Jun 12, 2011, 9:04:09 PM6/12/11
to scal...@googlegroups.com
Oh, thank you though.
I hope Dan gets some time to reply me..
--
M. Fahim Zibran
Ph.D. Student, Computer Science
University of Saskatchewan, Canada
Voice: (306) 251 1919
Web: http://www.usask.ca/~minhaz.zibran
Reply all
Reply to author
Forward
0 new messages