--
You received this message because you are subscribed to the Google Groups "dstk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
You received this message because you are subscribed to a topic in the Google Groups "dstk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dstk-users/ZamLf-ujoIU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
-- Sean C. Rife, M.A. Doctoral Candidate Department of Psychology Kent State University Kent, OH 44240
Pete:
Many thanks for your help with this. I’ve managed to put everything together (I went ahead and installed the geocoding components as well, as I might have a use for them at some point, and it seemed like a good challenge). Everything seemed to go well, but I’m running into errors when I try to execute some features. For example, fatal exception errors when I try to use Boilerpipe, and “ERROR: Cannot fork() a new process: Cannot allocate memory (errno=12)” in the Apache error log when I try to geocode. The only consistent factor I can see is Pool2/Implementation.cpp:1162 popping up repeatedly in the error log.
A bit of Googling indicates that this is related to passenger, but beyond that I’m stumped (I’ve always been a php guy, so Ruby is a bit foreign to me). Does anything jump out at you here? I can provide the entire log if that would help (it’s over 200k).
Any assistance would be much appreciated. Thanks!
Hi Sean,� � � � � � �it can be hard to debug why things are going wrong when everything's running as part of a webserver. I try to include short informal unit tests in every module so you can run them as standalone shell scripts, that can really help narrow down what's going wrong.
I noticed I didn't have one for street2coordinates, so I've just checked one in. If you pull the latest code from github you can run it like this:
cd ~/sources/dstk
ruby street2coordinates.rb�
and it should give output like this:
2543 Graystone Place, Simi Valley, CA 93065{
� "2543 Graystone Place, Simi Valley, CA 93065\n": {� � "country_code3": "USA",� � "street_number": "2543",� � "latitude": 34.280874,� � "country_name": "United States",� � "street_name": "Graystone Pl",� � "region": "CA",� � "confidence": 1.0,� � "longitude": -118.766282,� � "locality": "Simi Valley",� � "fips_county": "06111",� � "country_code": "US",� � "street_address": "2543 Graystone Pl"� }
}************11 Meadow Lane, Over, Cambridge CB24 5NF{
� "11 Meadow Lane, Over, Cambridge CB24 5NF\n": {� � "country_code3": "GBR",� � "street_number": "11",� � "latitude": "52.3168159",� � "country_name": "United Kingdom",� � "street_name": "Meadow Lane",� � "region": "South Cambridgeshire",� � "confidence": 9,� � "longitude": "0.0202307",� � "locality": "Willingham and Over",� � "fips_county": null,� � "country_code": "UK",� � "street_address": "11 Meadow Lane"� }
}************400 Duboce Ave, San Francisco, CA 94117{
� "400 Duboce Ave, San Francisco, CA 94117\n": {� � "country_code3": "USA",� � "street_number": "401",� � "latitude": 37.769456,� � "country_name": "United States",� � "street_name": "Duboce Ave",� � "region": "CA",� � "confidence": 0.878,� � "longitude": -122.429127,� � "locality": "San Francisco",� � "fips_county": "06075",� � "country_code": "US",� � "street_address": "401 Duboce Ave"� }
}************
Could you give that a try and let me know what you see?
cheers,
� � � � � �Pete
On Tue, Jun 4, 2013 at 11:49 PM, Sean C. Rife <sean...@gmail.com> wrote:
Pete:
�
Many thanks for your help with this. I�ve managed to put everything together (I went ahead and installed the geocoding components as well, as I might have a use for them at some point, and it seemed like a good challenge). Everything seemed to go well, but I�m running into errors when I try to execute some features. For example, fatal exception errors when I try to use Boilerpipe, and �ERROR: Cannot fork() a new process: Cannot allocate memory (errno=12)� in the Apache error log when I try to geocode. The only consistent factor I can see is Pool2/Implementation.cpp:1162 popping up repeatedly in the error log.
�
A bit of Googling indicates that this is related to passenger, but beyond that I�m stumped (I�ve always been a php guy, so Ruby is a bit foreign to me). Does anything jump out at you here? I can provide the entire log if that would help (it�s over 200k).
�
Any assistance would be much appreciated. Thanks!
�
From: dstk-...@googlegroups.com [mailto:dstk-...@googlegroups.com] On Behalf Of Pete Warden
Sent: Thursday, May 23, 2013 4:40 PM
To: dstk-...@googlegroups.com
Subject: Re: 32 bit VM
�
In that case you may just want to take a look at the source code for the APIs you care about too. A lot of the text cleaning functions are fairly standalone, apart from the html2story which depends on boilerpipe, and the sentiment analysis is completely self-contained in text2sentiment.rb. You could try something as simple as this:
�
�
and then try running something like this to test the functionality:
�
ruby text2sentiment.rb
�
That would be a lot quicker than rebuilding the whole VM!
�
�
On Thu, May 23, 2013 at 1:10 PM, Sean C. Rife <sean...@gmail.com> wrote:
Ah, very good to hear. I'll give it a shot and look out for issues on line 13. I'm planning on using the text parsing/cleaning and sentiment analysis tools (I'm a social psychologist studying social media), so I may do as you suggest and skip the stats/geocoding.
I'll post back and let you know how it goes. Thanks!
On 5/23/2013 3:31 PM, Pete Warden wrote:
Hi Sean,
� � � � � � �thanks for your kind words! It should definitely be possible, and in theory you should just be able to follow the steps in the setup documentation starting with a 32-bit Ubuntu 12.04 image:
In practice, you'll likely hit odd dependency issues (eg line 13 where I have to set up an explicit link to a 64bit library as a workaround) and memory problems when you're building data. It will make your life a lot easier if you can skip the statistics and geocoding data, as well as speeding up the process. Which APIs do you need running on the final VM?
�
cheers,
� � � � � �Pete
�
On Thu, May 23, 2013 at 12:12 PM, Sean C. Rife <sean...@gmail.com> wrote:
I've been trying to get the VM up and running (both using Vagrant and just in VirtualBox), but had no luck. This appears to be because the VM is 64 bit. I'm broke, so the machine I'm trying to use as the host is 32 bit, with 2GB of RAM (I know, I know). So I'm wondering if there's any reason I can't just create my own VM using Ubuntu server 32, then install DSTK (perhaps make the appliance publicly available in case anyone else is similarly situated).
�
Is there any reason why this might be problematic? Obviously, some of the tools are going to require some serious memory, but I couldn't find any specific requirements.
�
Any thoughts would be much appreciated. And of course, many thanks for Pete for putting this together!
�
Best,
�
�- Sean
--
You received this message because you are subscribed to the Google Groups "dstk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
�
--
Jetpac
You should get the app for your iPad!
--
You received this message because you are subscribed to a topic in the Google Groups "dstk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dstk-users/ZamLf-ujoIU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
--Sean C. Rife, M.A.Doctoral CandidateDepartment of PsychologyKent State University
Kent, OH� 44240
--
You received this message because you are subscribed to the Google Groups "dstk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
�
--
Jetpac
You should get the app for your iPad!
--
You received this message because you are subscribed to a topic in the Google Groups "dstk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dstk-users/ZamLf-ujoIU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
--
You received this message because you are subscribed to the Google Groups "dstk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
--
You received this message because you are subscribed to a topic in the Google Groups "dstk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dstk-users/ZamLf-ujoIU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�
Pete:
Good idea... this returned the following:
#<SQLite3::MemoryException: out of memory>["/usr/local/lib/site_ruby/1.8/sqlite3/resultset.rb:108:in `step'", "/usr/local/lib/site_ruby/1.8/sqlite3/resultset.rb:108:in `next'", "/usr/local/lib/site_ruby/1.8/sqlite3/resultset.rb:137:in `each'", "../geocoder/lib/geocoder/us/database.rb:155:in `execute_statement'", "../geocoder/lib/geocoder/us/database.rb:53:in `synchronize'", "../geocoder/lib/geocoder/us/database.rb:53:in `synchronize'", "../geocoder/lib/geocoder/us/database.rb:152:in `execute_statement'", "../geocoder/lib/geocoder/us/database.rb:137:in `execute'", "../geocoder/lib/geocoder/us/database.rb:249:in `ranges_by_feature'", "../geocoder/lib/geocoder/us/database.rb:407:in `add_ranges!'", "../geocoder/lib/geocoder/us/database.rb:719:in `geocode_address'", "../geocoder/lib/geocoder/us/database.rb:786:in `geocode'", "street2coordinates.rb:471:in `geocode_us_address'", "street2coordinates.rb:70:in `street2coordinates'", "street2coordinates.rb:66:in `each'", "street2coordinates.rb:66:in `street2coordinates'", "street2coordinates.rb:883", "street2coordinates.rb:882:in `each_line'", "street2coordinates.rb:882"]
2543 Graystone Place, Simi Valley, CA 93065
{
"2543 Graystone Place, Simi Valley, CA 93065\n": null
}
************
clean_address='" 11 Meadow Lane Over Cambridge CB24 5NF \n "'
post_code_re='/( |^)([A-Z][A-Z]?[0-9R][0-9A-Z]?) ?([0-9][A-Z]{2})( |$)/i'
post_code_match='#<MatchData " CB24 5NF " 1:" " 2:"CB24" 3:"5NF" 4:" ">'
post_code_select='SELECT postcode,country_code,county_code,district_code,ward_code,ST_Y(location::geometry) as latitude, ST_X(location::geometry) AS longitude FROM "uk_postcodes" WHERE postcode='CB245NF' LIMIT 1;'
select_as_hashes() - no connection for reversegeo
...so it seems I'm running out of memory (the VM only has 512MB). The obvious solution is to increase the RAM, but I'm wondering why this is such a memory-intensive task (it also took over a minute to run). Interestingly, other features (such as extracting text from a large PDF) go off without a hitch.
Any thoughts?
- Sean
On 6/5/2013 3:20 PM, Pete Warden wrote:
Hi Sean,
it can be hard to debug why things are going wrong when everything's running as part of a webserver. I try to include short informal unit tests in every module so you can run them as standalone shell scripts, that can really help narrow down what's going wrong.
I noticed I didn't have one for street2coordinates, so I've just checked one in. If you pull the latest code from github you can run it like this:
cd ~/sources/dstkruby street2coordinates.rb
and it should give output like this:
2543 Graystone Place, Simi Valley, CA 93065{
"2543 Graystone Place, Simi Valley, CA 93065\n": {"country_code3": "USA","street_number": "2543","latitude": 34.280874,"country_name": "United States","street_name": "Graystone Pl","region": "CA","confidence": 1.0,"longitude": -118.766282,"locality": "Simi Valley","fips_county": "06111","country_code": "US","street_address": "2543 Graystone Pl"}}
************11 Meadow Lane, Over, Cambridge CB24 5NF{
"11 Meadow Lane, Over, Cambridge CB24 5NF\n": {
"country_code3": "GBR","street_number": "11","latitude": "52.3168159","country_name": "United Kingdom","street_name": "Meadow Lane","region": "South Cambridgeshire","confidence": 9,"longitude": "0.0202307",
"locality": "Willingham and Over",
"fips_county": null,"country_code": "UK","street_address": "11 Meadow Lane"}}
************400 Duboce Ave, San Francisco, CA 94117{
"400 Duboce Ave, San Francisco, CA 94117\n": {
"country_code3": "USA","street_number": "401","latitude": 37.769456,"country_name": "United States","street_name": "Duboce Ave","region": "CA","confidence": 0.878,"longitude": -122.429127,"locality": "San Francisco","fips_county": "06075","country_code": "US","street_address": "401 Duboce Ave"}}
************
Could you give that a try and let me know what you see?
cheers,
Pete
On Tue, Jun 4, 2013 at 11:49 PM, Sean C. Rife <sean...@gmail.com> wrote:
Pete:
Many thanks for your help with this. I’ve managed to put everything together (I went ahead and installed the geocoding components as well, as I might have a use for them at some point, and it seemed like a good challenge). Everything seemed to go well, but I’m running into errors when I try to execute some features. For example, fatal exception errors when I try to use Boilerpipe, and “ERROR: Cannot fork() a new process: Cannot allocate memory (errno=12)” in the Apache error log when I try to geocode. The only consistent factor I can see is Pool2/Implementation.cpp:1162 popping up repeatedly in the error log.
A bit of Googling indicates that this is related to passenger, but beyond that I’m stumped (I’ve always been a php guy, so Ruby is a bit foreign to me). Does anything jump out at you here? I can provide the entire log if that would help (it’s over 200k).
Any assistance would be much appreciated. Thanks!
From: dstk-...@googlegroups.com [mailto:dstk-...@googlegroups.com] On Behalf Of Pete Warden
Sent: Thursday, May 23, 2013 4:40 PM
To: dstk-...@googlegroups.com
Subject: Re: 32 bit VM
In that case you may just want to take a look at the source code for the APIs you care about too. A lot of the text cleaning functions are fairly standalone, apart from the html2story which depends on boilerpipe, and the sentiment analysis is completely self-contained in text2sentiment.rb. You could try something as simple as this:
git clone git://github.com/petewarden/dstk.git
cd dstk
sudo gem install bundler
sudo bundle install
and then try running something like this to test the functionality:
ruby text2sentiment.rb
That would be a lot quicker than rebuilding the whole VM!
On Thu, May 23, 2013 at 1:10 PM, Sean C. Rife <sean...@gmail.com> wrote:
Ah, very good to hear. I'll give it a shot and look out for issues on line 13. I'm planning on using the text parsing/cleaning and sentiment analysis tools (I'm a social psychologist studying social media), so I may do as you suggest and skip the stats/geocoding.
I'll post back and let you know how it goes. Thanks!
On 5/23/2013 3:31 PM, Pete Warden wrote:
Hi Sean,
thanks for your kind words! It should definitely be possible, and in theory you should just be able to follow the steps in the setup documentation starting with a 32-bit Ubuntu 12.04 image:
In practice, you'll likely hit odd dependency issues (eg line 13 where I have to set up an explicit link to a 64bit library as a workaround) and memory problems when you're building data. It will make your life a lot easier if you can skip the statistics and geocoding data, as well as speeding up the process. Which APIs do you need running on the final VM?
cheers,
Pete
On Thu, May 23, 2013 at 12:12 PM, Sean C. Rife <sean...@gmail.com> wrote:
I've been trying to get the VM up and running (both using Vagrant and just in VirtualBox), but had no luck. This appears to be because the VM is 64 bit. I'm broke, so the machine I'm trying to use as the host is 32 bit, with 2GB of RAM (I know, I know). So I'm wondering if there's any reason I can't just create my own VM using Ubuntu server 32, then install DSTK (perhaps make the appliance publicly available in case anyone else is similarly situated).
Is there any reason why this might be problematic? Obviously, some of the tools are going to require some serious memory, but I couldn't find any specific requirements.
Any thoughts would be much appreciated. And of course, many thanks for Pete for putting this together!
Best,
- Sean
--
You received this message because you are subscribed to the Google Groups "dstk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Jetpac
You should get the app for your iPad!
--
You received this message because you are subscribed to a topic in the Google Groups "dstk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dstk-users/ZamLf-ujoIU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--Sean C. Rife, M.A.Doctoral CandidateDepartment of PsychologyKent State UniversityKent, OH 44240
--
You received this message because you are subscribed to the Google Groups "dstk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
Jetpac
You should get the app for your iPad!
--
You received this message because you are subscribed to a topic in the Google Groups "dstk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dstk-users/ZamLf-ujoIU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "dstk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
JetpacYou should get the app for your iPad!
--
You received this message because you are subscribed to a topic in the Google Groups "dstk-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dstk-users/ZamLf-ujoIU/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
-- Sean C. Rife, M.A. Doctoral Candidate Department of Psychology Kent State University Kent, OH 44240
--
You received this message because you are subscribed to the Google Groups "dstk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dstk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.