Crash when installing 'boris' package from GitHub URL on OSX 10.6.8

David K. Storrs

unread,

Dec 9, 2015, 8:50:19 PM12/9/15

to Racket Users

Hi folks,

I've done a bit of Scheme in the past (although not a lot) and am just getting started with Racket. Yesterday I tried installing a package -- 'Boris', a web spider -- and raco crashed with the output shown below.

Two questions:

1) Is there a web-spidering package that people recommend? I could use wget and then parse things from disk, but I'd like to have something that's easily composable into CLI scripts.

2) Given the crash shown below, what happened and how do I keep it from happening again?

Context: OSX 10.6.8; Racket 6.3; running in Terminal inside an Emacs 22.1.1 *shell* buffer

$ raco pkg install github://github.com/emdonahu/boris/master
Querying Git references for boris at github://github.com/emdonahu/boris/master
Downloading repository github://github.com/emdonahu/boris/master
raco setup: version: 6.3
raco setup: platform: x86_64-macosx [3m]
raco setup: installation name: 6.3
raco setup: variants: 3m
raco setup: main collects: /Applications/Racket_v6.3/collects
raco setup: collects paths:
raco setup: /Users/dstorrs/Library/Racket/6.3/collects
raco setup: /Applications/Racket_v6.3/collects
raco setup: main pkgs: /Applications/Racket_v6.3/share/pkgs
raco setup: pkgs paths:
raco setup: /Applications/Racket_v6.3/share/pkgs
raco setup: /Users/dstorrs/Library/Racket/6.3/pkgs
raco setup: links files:
raco setup: /Applications/Racket_v6.3/share/links.rktd
raco setup: /Users/dstorrs/Library/Racket/6.3/links.rktd
raco setup: main docs: /Applications/Racket_v6.3/doc
raco setup: --- updating info-domain tables ---
raco setup: updating: /Users/dstorrs/Library/Racket/6.3/share/info-cache.rktd
raco setup: --- pre-installing collections ---
raco setup: --- installing foreign libraries ---
raco setup: --- installing shared files ---
raco setup: --- compiling collections ---
raco setup: --- parallel build using 8 jobs ---
raco setup: 7 making: <pkgs>/boris/boris
raco setup: 6 making: <pkgs>/boris/echo-server
raco setup: 5 making: <pkgs>/boris/hypertext-browser
raco setup: 4 making: <pkgs>/boris/persistent
raco setup: 3 making: <pkgs>/boris/tests
raco setup: 3 making: <pkgs>/boris/tests/boris
raco setup: 2 making: <pkgs>/boris/utils
raco setup: 2 making: <pkgs>/boris/utils/emd
raco setup: 7 making: <pkgs>/boris/boris/interpreter
Assertion failed: (((((((intptr_t)((Scheme_Object *)(scopes))) & 0x1)?(Scheme_Type)scheme_int\
eger_type:((Scheme_Object *)((Scheme_Object *)(scopes)))->type) >= scheme_hash_tree_type) && \
(((((intptr_t)((Scheme_Object *)(scopes))) & 0x1)?(Scheme_Type)scheme_integer_type:((Scheme_O\
bject *)((Scheme_Object *)(scopes)))->type) <= scheme_hash_tree_indirection_type))), function\
add_conditional_as_reachable, file ../../../racket/gc2/../src/syntax.c, line 5337.
Abort trap

Neil Van Dyke

unread,

Dec 9, 2015, 9:33:02 PM12/9/15

to David K. Storrs, Racket Users

David K. Storrs wrote on 12/09/2015 08:50 PM:
> 1) Is there a web-spidering package that people recommend? I could use wget and then parse things from disk, but I'd like to have something that's easily composable into CLI scripts.

I've done a lot of Web crawling and scraping successfully with Racket
and Scheme, over the last 14-15 years. I released an HTML parser
("http://www.neilvandyke.org/racket-html-parsing/"), which I still use
today. From that parse, you might then extract the info you need with
`sxml-match`
("http://planet.racket-lang.org/display.ss?package=sxml-match.plt&owner=jim")
and/or SXPath. For HTTP, the client modules in Racket are often
satisfactory, and other times I've used my own packages that implement
HTTP in pure Racket or that wrap `curl` or `wget` for special
requirements. For storing pages and links/metadata, there's the
filesystem, the core Racket RDBMS database support, and cloud stores
like AWS S3. The un-AJAX-ing and site-specific scraping behavior you
might have to do yourself, if you need it. (I have a backlog of related
tools to release someday.)

P.S., Fortunately, the `sxml-match` Racket package has been preserved on
the official Racket PLaneT package server, :) since the author's Web
site with the package home page is down/disappeared.

Neil V.

David K. Storrs

unread,

Dec 10, 2015, 1:59:38 AM12/10/15

to Racket Users, david....@gmail.com

On Wednesday, December 9, 2015 at 6:33:02 PM UTC-8, Neil Van Dyke wrote:
> David K. Storrs wrote on 12/09/2015 08:50 PM:
> > 1) Is there a web-spidering package that people recommend? I could use wget and then parse things from disk, but I'd like to have something that's easily composable into CLI scripts.
>
>
> I've done a lot of Web crawling and scraping successfully with Racket
> and Scheme, over the last 14-15 years. I released an HTML parser
> ("http://www.neilvandyke.org/racket-html-parsing/"), which I still use
> today. From that parse, you might then extract the info you need with
> `sxml-match`
> ("http://planet.racket-lang.org/display.ss?package=sxml-match.plt&owner=jim")
> and/or SXPath.

Thank you; I've been rolling through the docs and playing around on thse, and they seem really useful. One question though -- I stumbled across a mention of the sxml/html module while I was reading, but had no luck installing it. None of the following worked:

(require sxml/html)
$ raco pkg install sxml/html
$ raco pkg install 'sxml/html' # Maybe the shell was having trouble with '/'?

I don't know that I need it, but I'd like to know how to deal with modules like this in future.

> For HTTP, the client modules in Racket are often
> satisfactory, and other times I've used my own packages that implement
> HTTP in pure Racket or that wrap `curl` or `wget` for special
> requirements. For storing pages and links/metadata, there's the
> filesystem, the core Racket RDBMS database support, and cloud stores
> like AWS S3. The un-AJAX-ing and site-specific scraping behavior you
> might have to do yourself, if you need it. (I have a backlog of related
> tools to release someday.)

Great, thank you. Yeah, I'd really like to be able to automate posting to Patreon. (Every week I publish a chapter of my novel there.) Unfortunately, their whole site is pointlessly AJAX. I spent some time Firebugging their code to see what the relevant calls were and then decided to spend my time on something more useful.

From what you say it sounds like there's no "magically make stupid AJAX / DOM-manipulating sites easy to deal with" module for Racket? Something that processed the site and handed back the final HTML as the browser gets it post-JS would be lovely. It's a bit much to ask for, I realize.

Thanks again for all this -- it's a big help.

Dave

Matthew Flatt

unread,

Dec 10, 2015, 7:05:02 AM12/10/15

to David K. Storrs, Racket Users

I've pushed a repair for the crash, which was due to a bug in the new
macro expander.

The "boris/hypertext-browser/http/head.rkt" module creates a pattern of
binding and shadowing that had never been exercised before --- at least
not in code saved in bytecode form. (It took me much longer to
construct a test case that navigates the same path than to fix the
bug.) I think that `scribble/srcdoc` is a key step in making "head.rkt"
trigger the expander bug, so if a workaround is needed, probably it
involves avoiding `scribble/srcdoc`.

Thanks for the report!

> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

David K. Storrs

unread,

Dec 10, 2015, 12:12:24 PM12/10/15

to Racket Users, david....@gmail.com

On Thursday, December 10, 2015 at 4:05:02 AM UTC-8, Matthew Flatt wrote:
> I've pushed a repair for the crash, which was due to a bug in the new
> macro expander.
>
> The "boris/hypertext-browser/http/head.rkt" module creates a pattern of
> binding and shadowing that had never been exercised before --- at least
> not in code saved in bytecode form. (It took me much longer to
> construct a test case that navigates the same path than to fix the
> bug.) I think that `scribble/srcdoc` is a key step in making "head.rkt"
> trigger the expander bug, so if a workaround is needed, probably it
> involves avoiding `scribble/srcdoc`.
>
> Thanks for the report!

Wow, that was fast! Thank you.

I didn't actually do anything related to scribble, I just tried to install the module -- although presumably that automatically attempted to setup scribble docs.

John Clements

unread,

Dec 10, 2015, 1:04:36 PM12/10/15

to David K. Storrs, Racket Users

> On Dec 9, 2015, at 10:59 PM, David K. Storrs <david....@gmail.com> wrote:
>
> On Wednesday, December 9, 2015 at 6:33:02 PM UTC-8, Neil Van Dyke wrote:
>> David K. Storrs wrote on 12/09/2015 08:50 PM:
>>> 1) Is there a web-spidering package that people recommend? I could use wget and then parse things from disk, but I'd like to have something that's easily composable into CLI scripts.
>>
>>
>> I've done a lot of Web crawling and scraping successfully with Racket
>> and Scheme, over the last 14-15 years. I released an HTML parser
>> ("http://www.neilvandyke.org/racket-html-parsing/"), which I still use
>> today. From that parse, you might then extract the info you need with
>> `sxml-match`
>> ("http://planet.racket-lang.org/display.ss?package=sxml-match.plt&owner=jim")
>> and/or SXPath.
>
> Thank you; I've been rolling through the docs and playing around on thse, and they seem really useful. One question though -- I stumbled across a mention of the sxml/html module while I was reading, but had no luck installing it. None of the following worked:
>
> (require sxml/html)
> $ raco pkg install sxml/html
> $ raco pkg install 'sxml/html' # Maybe the shell was having trouble with '/'?
>
> I don't know that I need it, but I'd like to know how to deal with modules like this in future.

I don’t believe that package names should contain slashes.

However, it could easily be the case that a package (presumably from Neil Van Dyke) could contain code that would be installed into the html subdirectory of the sxml collection, and I’m guessing that’s what you’re referring to.

It’s perhaps also worth mentioning that Racket has an older package system (PLaneT), and a newer one (packages), and there’s a certain amount of confusion that may result from that transition. When you write ‘raco pkg install …”, you’re referring to the new system. It appears to me that, for instance, Neil’s html-parsing library is not currently available in package form. (ObResearch: quick search for 'neil@ne' through pkgs.)

John Clements

Neil Van Dyke

unread,

Dec 10, 2015, 2:00:14 PM12/10/15

to Racket Users

'John Clements' via Racket Users wrote on 12/10/2015 01:04 PM:
> However, it could easily be the case that a package (presumably from Neil Van Dyke) could contain code that would be installed into the html subdirectory of the sxml collection, and I’m guessing that’s what you’re referring to.

Though that's technically possible, I'd strongly prefer that that not
happen in practice for third-party packages, and at least it's not
something that I'd likely ever do with my packages.

For decentralized third-party reusable component development, taxonomies
generally do not get coded in package names themselves. It's much less
flexible, and easier to mess up in a counterproductive way (even when
centralized, but especially when decentralized).

And it would be simpler (and also facilitate multiple-version support
someday), if packages just kept their files compartmentalized to their
own installation directory. So I'd discourage third-party package
developers using the power of the new package system to sprinkle their
package's files in arbitrary other locations, except when one has a
really-really good reason.

> It appears to me that, for instance, Neil’s html-parsing library is not currently available in package form. (ObResearch: quick search for 'neil@ne' through pkgs.)

Correct, my `html-parsing` package is still maintained in PLaneT, and I
have not yet moved it to the new package system. I do intend to move my
packages to the new package system, but I've put nothing in the new
package system catalog yet.

Neil V.

David K. Storrs

unread,

Dec 10, 2015, 3:08:42 PM12/10/15

to Racket Users

I have to say, the fact that installing a package === using a package is really cool. Go Racket!

Reply all

Reply to author

Forward