INTRO: Part Of Speech tagger TiddlyWiki/TiddlySpace

107 views
Skip to first unread message

PMario

unread,
Jul 5, 2012, 5:20:46 PM7/5/12
to TiddlyWiki
Inspired by "html javascript not working in TW" [1] conversation, I
did create the postagger space http://postagger.tiddlyspace.com/.

I think, it's allready quite usefull but improvement can allway
happen :)
It can be used, if you include the "postagger" space into your own
space.

The functions itself are implemented as <<tiddler ...>>
transclusions.

helpTag [3] tiddler contains functions to tag sentenses and add some
help information on mouse over.

StyleSheet tiddler contains some tag coloring and editor CSS.

posTag [4] has the same function, but doesn't create the mouse over
stuff.

Usage
=====
! Input
some text to tag.

! Output
<<tiddler "helpTag##tag" with: Input>>

where Input is a section within the same tiddler, that contains the
stuff, to be tagged. There are several other examples tagged
"example"

have fun!
-mario

[1] http://groups.google.com/group/tiddlywiki/browse_thread/thread/bdf28fdece2b4aa0
[2] http://postagger.tiddlyspace.com/
[3] http://postagger.tiddlyspace.com/#helpTag
[4] http://postagger.tiddlyspace.com/#posTag

----------------
If you want to improve this project, feedback and contribution is very
welcome.
A motivation donation is also welcome at:
http://pmario.tiddlyspace.com/#Motivation

tejjyid

unread,
Jul 6, 2012, 6:34:40 AM7/6/12
to tiddl...@googlegroups.com
Thanks very much - I've started to play around with it. I'm curious as to why what you've done solves the problem, but maybe I don't need to know.

Can you tell me, does that big lexicon variable get loaded once, for the page, and then re-used?

Is PMario your id? I'll add it to andrewsimon.tiddlyspace.com, you can see what I do with it...

BTW, the output turns blue and bold when I run it?
I'll have to start doing some documentation work for this TW project, I think. It's an area where I think I can do something useful.

Andrew

PMario

unread,
Jul 6, 2012, 11:45:08 AM7/6/12
to TiddlyWiki
On Jul 6, 12:34 pm, tejjyid <andrew.x.w...@gmail.com> wrote:
> Can you tell me, does that big lexicon variable get loaded once, for the
> page, and then re-used?
Yes. it's within the Lexicon.js_ tiddler. about 900kByte

> Is PMario your id? I'll add it to andrewsimon.tiddlyspace.com, you can see
> what I do with it...
yes
http://pmario.tiddlyspace.com is my main page.

> BTW, the output turns blue and bold when I run it?
You did define it that way in your StyleSheet tiddler.
.viewer span {} did it.
I did deactivate it with /* */ since it will affect every tiddler.
What should this definition do?

You shouldn't progammatically color you output. It makes every change
errorprone.

just add a calss, like I did with class="low" and style the class
within StyleSheet tiddler.

PMario

unread,
Jul 6, 2012, 11:55:09 AM7/6/12
to TiddlyWiki
On Jul 6, 12:34 pm, tejjyid <andrew.x.w...@gmail.com> wrote:
> Thanks very much - I've started to play around with it. I'm curious as to
> why what you've done solves the problem, but maybe I don't need to know.

the lexer.js uses loops like

for (i in objectVar) { .. }
https://developer.mozilla.org/en/JavaScript/Reference/Statements/for...in

This loop iterates over every element that is part of the objectVar.
Not only the string variables. It iterates the whole prototype chain.
(see the mdn info above)

In our case the LexerNode() function seen below, does the
"string.match(regex)" The problem is, that the string variable can be
a function() reference because of the loop mentioned above. This
actually caused the error message "string.match() ..."

function LexerNode(string, regex, regexs) {
var a;
this.string = string;
this.children = [];
if (string) {
this.matches = string.match(regex);
...

I did change the loop at several places in the lexer.

====
The tagger actually has some of those loops too. But I didn't change
them.

have fun!
mario

twgrp

unread,
Jul 7, 2012, 4:59:04 PM7/7/12
to TiddlyWiki
Disclaimer: I hardly understand what this is about at all. Some kind
of grammatical analysis tooI?

Anyway, here's a friendly suggestion: Instead of writing out each, hm,
thing, after every word like this:
Bonaparte/NNP has/VBZ been/VBN
I have to alternate suggestions.

1) If it is desired that the text is still readible, then I'd suggest
to (somehow) put each.. thing... on a separate row under the original
text, like this:
Bonaparte has been
NNP ____ VBZ VBN
I had to put in the line to create a space. In TW you could put in eg
&nbsp; or perhaps put it all in a table.

2) If it is not necessary to see the very.. things.. directly, then a
hover popup to show them would make for a more readible text, still
with the analysis readily available.

<:-)


On Jul 6, 5:55 pm, PMario <pmari...@gmail.com> wrote:
> On Jul 6, 12:34 pm, tejjyid <andrew.x.w...@gmail.com> wrote:
>
> > Thanks very much - I've started to play around with it. I'm curious as to
> > why what you've done solves the problem, but maybe I don't need to know.
>
> the lexer.js uses loops like
>
> for (i in objectVar) { .. }https://developer.mozilla.org/en/JavaScript/Reference/Statements/for....

PMario

unread,
Jul 8, 2012, 1:12:33 PM7/8/12
to TiddlyWiki
On 7 Jul., 22:59, twgrp <matiasg...@gmail.com> wrote:
> Disclaimer: I hardly understand what this is about at all. Some kind
> of grammatical analysis tooI?
hihi, it's like me.

> Anyway, here's a friendly suggestion: Instead of writing out each, hm,
> thing, after every word like this:
> Bonaparte/NNP has/VBZ been/VBN
> I have to alternate suggestions.
This format seemed to be "best practice". I did make it a different
color and smaller, to make the tags less "aggressive".

Andrew did an other tagger, that just colored the words but I can't
find it anymore at his space [1] :(

> 1) If it is desired that the text is still readible, then I'd suggest
> to (somehow) put each.. thing... on a separate row under the original
> text, like this:
> Bonaparte has been
> NNP ____ VBZ VBN

I did a test page http://postagger.tiddlyspace.com/#TestSlices which
shows an other possibility to get some tagging per sentence. I used TW
slices to reference the texts. IMO this would be intersting sice you
can use the results using eg: <<tiddler "anyTiddler::o1">> somewhere
else in any tiddler.

> I had to put in the line to create a space. In TW you could put in eg
> &nbsp; or perhaps put it all in a table.
>
> 2) If it is not necessary to see the very.. things.. directly, then a
> hover popup to show them would make for a more readible text, still
> with the analysis readily available.

I'm sure, there are many possibilities. IMO @andrew needs to tell us,
what's needed :) He seems to be experimenting atm....

have fun!
mario

[1] http://andrewsimon.tiddlyspace.com

PMario

unread,
Jul 8, 2012, 1:23:15 PM7/8/12
to TiddlyWiki
On 8 Jul., 19:12, PMario <pmari...@gmail.com> wrote:
> I'm sure, there are many possibilities. ...
To create the mouse over tooltips, I did use a legend.json [1]
tiddler. This tiddler is like a configuration file. It could be also
used to assign colors to the words. Some info if a tag should be
shown, or hidden. ...

@andrew Just an info
Do you know the "NaturalNode" project [2]. It can't do POST at the
moment but many other things that may be interesting in your context.

==quote==
"Natural" is a general natural language facility for nodejs.
Tokenizing, stemming, classification, phonetics, tf-idf, WordNet,
string similarity, and some inflection are currently supported.
====

-m

[1] http://postagger.tiddlyspace.com/#legend.json
[2] https://github.com/NaturalNode/natural

tejjyid

unread,
Jul 13, 2012, 2:06:29 AM7/13/12
to tiddl...@googlegroups.com
You can find my attempt at color coding in the "demo*" tiddlers.
Definitely the basic code is too dense for any typical presentation purpose; I think it's more intended to illustrate the POS tagger results (which are quite flawed in some ways).

Anyway, for teaching purposes I will pretty much always be contrasting 2 or 3 features, because more is confusing for students. Also, I'll be talking to my classes about ways they think they can use the tool. But for 2 or 3 features, I think color works fine. You can see this at andrewsimon.tiddlyspace.

The 2-line approach seems the best way for maximum detail/clarity. I might combine that with inline table editing as a way of manually fixing the errors, where 100% accuracy is important, but that has other implications (as in, a loss of dynamism). so might remins the operative word.

Andrew

tejjyid

unread,
Jul 13, 2012, 2:08:08 AM7/13/12
to tiddl...@googlegroups.com
I'll check out the Natural node project, thanks...

tejjyid

unread,
Jul 16, 2012, 6:49:37 PM7/16/12
to tiddl...@googlegroups.com
BTW, the lexicon.js_ plugin won't load in IE8, I assume because of size? Is that a known feature of IE8? Can it be disabled?
 
Thanks

PMario

unread,
Jul 17, 2012, 3:13:41 PM7/17/12
to TiddlyWiki
On Jul 13, 8:06 am, tejjyid <andrew.x.w...@gmail.com> wrote:
> You can find my attempt at color coding in the "demo*" tiddlers.
> Definitely the basic code is too dense for any typical presentation
> purpose; I think it's more intended to illustrate the POS tagger results
> (which are quite flawed in some ways).
I did read a bit about POST algorithms at wikipedia [1-2]. Also some
other stuff (can't find the links), where they listed the possible
"accuracy". It seems to be a reasonable algorithm + dictionary if they
are above 90%. Depending on the dictionary (size and "training") it
goes up to 95%+. But for the good ones, you have to "insert coins" :)

At several places, the "wordnet"[3] is mentioned. The natural project,
I mentioned in my last post, has an interface to wordnet. All this
stuff needs big files/dictionaries to work well. ..... So a browser
only approach may have limited success :/

[1] http://en.wikipedia.org/wiki/Part-of-speech_tagging
[2] http://en.wikipedia.org/wiki/Word-sense_disambiguation
[3] http://wordnet.princeton.edu/

> Anyway, for teaching purposes I will pretty much always be contrasting 2 or
> 3 features, because more is confusing for students. Also, I'll be talking
> to my classes about ways they think they can use the tool. But for 2 or 3
> features, I think color works fine. You can see this at
> andrewsimon.tiddlyspace.
The explanations, are extremely interesting. I did find the "demo*"
stuff again :) IMO there should be a possibility to get better
formatting. ...

> The 2-line approach seems the best way for maximum detail/clarity. I might
> combine that with inline table editing as a way of manually fixing the
> errors, where 100% accuracy is important, but that has other implications
> (as in, a loss of dynamism). so might remins the operative word.
You are right. 100% accruacy will not be possible with software only.
Which is a good thing too, because it means, our brain can't be
replaced that easy :))

have fun!
mario



PMario

unread,
Jul 17, 2012, 3:26:44 PM7/17/12
to TiddlyWiki
On Jul 17, 12:49 am, tejjyid <andrew.x.w...@gmail.com> wrote:
> BTW, the lexicon.js_ plugin won't load in IE8, I assume because of size? Is
> that a known feature of IE8? ..
Yea, That's a feature :)

It should tell the IT, that people need good _and_ new tools, to
deliver good work. Every electrician or builder knows. If the tools
you have to use are old and rusty, the outcome will be accordingly.

> .. Can it be disabled?
I'm not sure about your question. The lexicon.js_ can't be disabled.

-mario

PMario

unread,
Jul 17, 2012, 5:22:49 PM7/17/12
to TiddlyWiki
I did manage to fix the IE8 lexicon problem but there still is a
syntax error at the script. With my VirtualBox IE8 winxp installation
I can't debug it. It is just too slow. It simply sucks.

I just did a test, with IE9 win7. It doesn't work there either. It
says, there is a "memory problem". I don't know, if it worked prior to
the IE8 fix :/ Did you test it with IE9?

Is there someone out there, who knows how to fix this memory problem
IE9 64bit and 32bit.

-m

tejjyid

unread,
Jul 18, 2012, 6:03:52 AM7/18/12
to tiddl...@googlegroups.com
Hi PMario - thanks...sorry about my 'joke' - I meant: Can the "feature" be disabled? (My instinct told me this would be an MS gift to us all).

I'll keep working on my IT department; unfortunately there are 2 sorts of IT department and ours doesn't believe in empowering its users. But they do let me run my own PC on their network, so I'll just have to do that for the time being.

We've only just got IE8, so sadly, I have no IE9 data.

Andrew

PMario

unread,
Jul 18, 2012, 7:01:05 AM7/18/12
to TiddlyWiki
On Jul 18, 12:03 pm, tejjyid <andrew.x.w...@gmail.com> wrote:
> I'll keep working on my IT department; unfortunately there are 2 sorts of
> IT department and ours doesn't believe in empowering its users. But they do
> let me run my own PC on their network, so I'll just have to do that for the
> time being.
>
> We've only just got IE8, so sadly, I have no IE9 data.
I know about the problem in coorporate environments and I can
understand the arguments but this is an ohter discussion. I just had
to beat a dead horse ;)

For me this is a "just for fun" project, because it is really
interesting stuff. I think TiddlyWiki and TiddlySpace should be used
more in teaching environments. It really should. ..

I did spend several hours, to install a test environment to get IE8
runnung, just to see, if dealing with this kind of browser, is as
complicated as one can read all over the web. I have to say, for me it
is. It is boring, it is frustrating, it isn't fun! ... and ... there
are many alternatives, that work out of the box even on winXP. ..

So this is the chance for every IE enthusiast to prove me wrong. May
be everything is simple and I just don't know it (which is true for
sure;) I'd really like to get patches, to make it work with IE8 and
IE9.

Just include the http://postagger.tiddlyspace.com/ space into your
space. Apply your changes at your space and make them global, so we
can get them too.

have fun!
mario

Reply all
Reply to author
Forward
0 new messages