TextTickle + Mallet

38 views
Skip to first unread message

Scott White

unread,
Nov 11, 2010, 6:34:25 PM11/11/10
to ScalaNLP
I came across the Java ResearchAssistant project and the scala-nlp
project and I was wondering if you guys have open sourced the
TextTickle platform described in the RA paper. It seems that would
also be a good working example to play with scala-nlp. I don't see any
documentation on how to set up something similar in scala-nlp. Is
there an example somewhere of how to set up the standard kind of
pipeline described mentioned there: Parse -> Create Corpus -> Split
Corpus -> Train -> Test -> etc.

Another related question is how you guys would compare scala-nlp with
something like Mallet? I'm guessing Mallet is more feature complete
since it's been around a lot longer but what might the strengths of
scala-nlp be over Mallet, if any?

Any feedback, especially examples, would be appreciated.

thanks,
Scott

Daniel Ramage

unread,
Nov 11, 2010, 10:21:48 PM11/11/10
to scal...@googlegroups.com
Hi Scott,

Thanks for your note. I'm pleasantly surprised you dug through the RA
paper and noticed TextTickle. While we never did release that, it was
mostly a toy example for our own understanding of how such a system
should work.

A lot of those ideas did make it into ScalaNLP in the stage package in
a possibly cleaner way, with some examples in the topic modeling
toolbox http://nlp.stanford.edu/software/tmt/. The pipelines for
turning a CSV or TSV file with text fields into a dataset are all part
of scalanlp, while the topic modeling bits are part of TMT. I haven't
hooked up other models to it, but it's a pretty straightforward task.
The stages in the pipeline store a signature of their history, which
allows for nice automatic caching and such.

Mallet is a great piece of software which includes lots of models.
ScalaNLP is designed to be more of a common platform for developers
who want to work with text. We'd love to include more text algorithms
as things mature, possibly in a bazaar type model with external
contributions, but that hasn't been a priority as of yet.

dan

> --
> You received this message because you are subscribed to the Google Groups "ScalaNLP" group.
> To post to this group, send email to scal...@googlegroups.com.
> To unsubscribe from this group, send email to scalanlp+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scalanlp?hl=en.
>
>

Reply all
Reply to author
Forward
0 new messages