Re: QueryPath Scrape to Array

112 views
Skip to first unread message

TechnoSophos

unread,
Aug 28, 2012, 5:19:35 PM8/28/12
to support-...@googlegroups.com
By "each match" do you mean each text field from the table, each cell (or row) of table data?

Let's say that you want an array of text for each td. You'd do something like this:

$cells = $qp->branch('td');
$addresses = array();
foreach ($cells as $cell) {
  $addresses[] = $cell->text();
}

There are a few other ways you could try to do this, but the above is probably the simplest.

Matt

-- 
TechnoSophos
Twitter: @technosophos
Sent with Sparrow

On Tuesday, August 28, 2012 at 4:12 PM, KenJ wrote:

I'm extracting text from an HTML table using QueryPath. The structure of the tag looks like this:

<td class="address">Medford <br/>
16 This Street<br/>
Medford, MA 02155<br/><br/><strong>Other Stuff:</strong><br/></td>

I'm able to get all of the text but I'd like to put each match into an array so I can work with them.

How can I put each match into an array?


--
You received this message because you are subscribed to the Google Groups "support-querypath" group.
To view this discussion on the web visit https://groups.google.com/d/msg/support-querypath/-/Bv-f0eHIsx4J.
To post to this group, send email to support-...@googlegroups.com.
To unsubscribe from this group, send email to support-queryp...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/support-querypath?hl=en.

KenJ

unread,
Dec 8, 2013, 9:12:29 PM12/8/13
to support-...@googlegroups.com
How could I get the data between the <br>'s. I'm looking at textbefore() and textafter() but not able to get them working like this:

123 This Street<br>This Town<br>GA<br>00000

$address = $item->branch('div.address');
$street = $address1->find('br')->textBefore();

TechnoSophos

unread,
Dec 11, 2013, 11:50:45 AM12/11/13
to support-...@googlegroups.com
There are a few ways to do it, but the easiest may be the strangest looking one:


$address = $item->branch('div.address');
$parts = $address->children()->get()
$text = array()
foreach ($parts as $node) {
  if ($node->nodeType == XML_TEXT_NODE) {
    $text[] = $node->data
  }
}

After that, $text will have one text entry per piece of the address. You may need to trim off whitespace.



To unsubscribe from this group and stop receiving emails from it, send an email to support-queryp...@googlegroups.com.

To post to this group, send email to support-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages