NLTK or Equivalent

354 views
Skip to first unread message

Brandon Wirtz

unread,
Mar 5, 2012, 12:21:32 PM3/5/12
to google-a...@googlegroups.com

I was looking to do some things using NLTK and found code.google.com/p/nltk-gae/ which looks promising, but the code on the site has quite a few errors (most of which I worked through) but has a LOT to be implemented (and relies on MemCache never being flushed to work)

Does anyone know of a better NLTK for GAE implementation? Or do we have an ETA on NLTK being supported on GAE?

 

-Brandon

alex

unread,
Mar 5, 2012, 12:41:40 PM3/5/12
to google-a...@googlegroups.com
dunno what you're working so, it might totally be not your case but have you considered https://developers.google.com/prediction/ API? you can do some cool stuff related to NLP.

alex.

Brandon Wirtz

unread,
Mar 5, 2012, 12:58:55 PM3/5/12
to google-a...@googlegroups.com

Kind of sort of, not really.

 

Prediction doesn’t give you back any of the NLP, so you can’t say “What is this sentence about”, or what was user trying to search for.
But also it is SOOOO VERYYYYYY Expensive.    Loading the training data for 10k users to build something like a content recommendation system and then loading all of the data to make recommendations from was going to be in the neighborhood of $20k before I got to the point that I could even evaluate if the system was going to work.

I have played with the prediction API for looking at traffic data to say “Hey you are always ‘UP’ in October” but that was hardly rocket science.

 

What I specifically need in this case is the ability to boil content down to the important bits.

 

 

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/WouZtSgGQdUJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

alex

unread,
Mar 5, 2012, 2:51:26 PM3/5/12
to google-a...@googlegroups.com
Well, that only tells me you haven't played enough with your data or haven't chosen best models to base your predictions of. No NLP software/whatever in the world will tell you "what is this sentence about" with high confidence without giving the system a concrete context. That's a task people's been trying to solve for many years. Once you have a context though, you're not that far from what Predictions API does.

It pretty much often looks like you're taking some numbers from the sky and round them up. Take a close look at https://developers.google.com/prediction/docs/pricing, 10k predictions/months cost $0; $0.50 for each 1k above 10k; $0.50 for 250Mb of 1 dataset training. cloud storage cost nothing. I don't see $20k not even close.

Well, that of course depends on what you need exactly. Predictions API obviously isn't a silver bullet. 

Nice thing about your posts though is you often get me laughing reading your so highly confident numbers :)

To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

Brandon Wirtz

unread,
Mar 5, 2012, 4:21:45 PM3/5/12
to google-a...@googlegroups.com

Wasn’t my number. That was the price quoted me by the Predictions team, given the data set were attempting to load.

 

And using NLTK in combination with a library I built I get very high accuracy on sentence context.  Not Meaning, just subject matter.

 

My high confidence comes from this being one of the last places I go to ask questions.  If I end up asking here it is usually because it is 3am and everyone else is sleeping. Or because I exhausted everything else and am hoping some crazy person here knows about some guy in Korea who did a port but the documentation is in a funny language.

 

I ended up writing my own Word and Sentence Tokenizer and am starting on a Stemmer.  They are much faster than the NLKT version, but not having tested them with 10 billion words I always worry that I will have screwed something up.

 

 

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.


For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/u1lph_nA054J.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Anand Mistry

unread,
Mar 5, 2012, 5:46:43 PM3/5/12
to google-a...@googlegroups.com
Have you looked at using the 'real' NLTK on Python 2.7? As far as I can tell, the only hard dependency is NumPy, which we have.

Brandon Wirtz

unread,
Mar 5, 2012, 6:17:56 PM3/5/12
to google-a...@googlegroups.com

> Have you looked at using the 'real' NLTK on Python 2.7? As far as I can tell, the only hard dependency is NumPy, which we have.

It times out doing imports,

 

Nick, made a few comments on the issues, and I decided if he couldn’t get it to work I wasn’t going to try J

 

I may revisit as I start to hit the limits of what the stuff I built in a day can do. 

Anand K. Mistry

unread,
Mar 5, 2012, 6:26:53 PM3/5/12
to google-a...@googlegroups.com
On 6 March 2012 10:17, Brandon Wirtz <dra...@digerat.com> wrote:

> Have you looked at using the 'real' NLTK on Python 2.7? As far as I can tell, the only hard dependency is NumPy, which we have.

It times out doing imports,


Out of curiosity, importing NLTK, or NumPy?
 

 

Nick, made a few comments on the issues, and I decided if he couldn’t get it to work I wasn’t going to try J

 

I may revisit as I start to hit the limits of what the stuff I built in a day can do. 

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Anand K. Mistry
Software Engineer
Google Australia

Brandon Wirtz

unread,
Mar 5, 2012, 6:35:58 PM3/5/12
to google-a...@googlegroups.com

rutherford

unread,
Oct 28, 2012, 1:45:50 PM10/28/12
to google-a...@googlegroups.com
Not sure how many features you are looking for, but today I have tokenizing & tagging working on app engine. My code is at https://github.com/rutherford/nltk-gae and there's a sample app included along with instructions for setting up.

Mani Doraisamy

unread,
Oct 28, 2012, 5:16:10 PM10/28/12
to google-a...@googlegroups.com
I use Stanford NLP (http://nlp.stanford.edu/) on java version of app engine. If you are looking for tagging or categorization, try AlchemyAPI with urlfetch.
Reply all
Reply to author
Forward
0 new messages