enlive as a hammer

116 views
Skip to first unread message

Eric Gebhart

unread,
Oct 29, 2015, 5:57:29 AM10/29/15
to Enlive

 I'm using enlive for a work project, but thought I would try getting familiar with it through a side project.

I decided to do some scraping with it. Perhaps this is more than I bargained for.  The html snippet that 
I decided to scrape has almost no identifiers and no classes.

I tried posting this once, in great detail, so I'm a little annoyed to be posting again, so I'll be brief.

I need to turn something that is not a tree into a tree.  It is, essentially an h3,span:content followed by
 <div><table>   <table/> <table/> <table/>   </table>  Each sub table has a single <th> which is it's name
followed by data rows which are the content.

What I want to create is a tree of these.

{:span-content {:th-content {:row-name  :value :description }}}

I have a select which gives me the  span next to the parent table for each group of tables.  I'm perfectly willing
to take that seq and do what I need to do with clojure.  But I'm wondering if I would be missing out on what
enlive can do for this problem.

Currently what I'm getting out my enlive select is this.

<span> 
<table ........  <table> <table> ...
<table ........  <table> <table> ...
<table ........  <table> <table> ...

This is fine, but I don't see how to go any further with enlive.
My first instinct is to just iterate through the seq and walk the trees
as I go.  I am only interested in the content, I can't see any need for any other attributes.

Is there a better way with enlive?  I feel like I'm using pliers for a hammer.

Thanks.

Linus Ericsson

unread,
Nov 6, 2015, 8:45:39 AM11/6/15
to enliv...@googlegroups.com
Sorry for a late answer but...

I think you want to use the underlying zipper mechanisms. They are carefully hidden in enlive, but I made a short example [1] on how to get going. I don't understand exactly how to use the enlive selectors in the zippers (the combination of z/down and z/right:s and the node transforms are essentially what enlive does really well, but I think you can manage to find that out).

The major thing is that you want to get the full zipper, and call z/path on it, and find a way to pick out data from the path sequence to suit your needs for the aggregation. An example of an intermediate representation could be

[[span-id th-id {:row-name rn :value val :description desc}]
...]

and then do the reduce assoc-in as described in the gist [1]

/Linus

--
You received this message because you are subscribed to the Google Groups "Enlive" group.
To unsubscribe from this group and stop receiving emails from it, send an email to enlive-clj+...@googlegroups.com.
To post to this group, send email to enliv...@googlegroups.com.
Visit this group at http://groups.google.com/group/enlive-clj.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages