On 11/19/10, Robert Gonzalez <
robert.anth...@gmail.com> wrote:
> Can you not just use the built in DOM object? Or even SimpleXML for this?
> I'm pretty sure both support XPath.
>
> On Sun, Nov 14, 2010 at 3:40 PM, Martin Kaspar
> <
martin...@campus-24.com>wrote:
>
>> hello dear PHP-Friends,
>>
>>
>> does the PHP Simple HTML DOM Parser (see
>>
http://simplehtmldom.sourceforge.net/
>> )
>> support xpaths - i am not very sure.
>>
>> i want to parse the data structure of a fetched page:
>>
>> Here some details: well since we have several hundred of resultpages
>> derived from
>> this one:
http://www.educa.ch/dyn/79362.asp?action=search
>>
>> Note: i want to itterate over the resultpages - with a loop.
>>
>>
http://www.educa.ch/dyn/79376.asp?id=1568
>>
http://www.educa.ch/dyn/79376.asp?id=2149
>>
>>
>> i take this loop:
>> PHP Code:
>> for($i=1;$i<=$match[1];$i++)
>> {
>> $url = "
http://www.example.com/page?page={$i}";
>> // access new sub-page, extract necessary data
>> }
>>
>>
>> as the example we can set in here this domain:
>>
http://www.educa.ch/dyn/79362.asp?action=search
>>
>> Note - you see that we have lots of targets....:
>>
http://www.educa.ch/dyn/79376.asp?id=1568
>>
http://www.educa.ch/dyn/79376.asp?id=2149
>>
>> and lots of others more:
>>
>> what do you think? What about the Loop over the target-Urls?
>>
>> BTW: you see - there will be some pages empty. Note - the empty pages
>> should be thrown away. I do not want to store "empty" stuff.
>>
>> well this is what i want to. And now i need to have a good parser-
>> script.
>>
>> Note: this is a tree-part-job:
>>
>> 1. fetching the sub-pages
>> 2. parsing them and if all goes well .... then we would have a third
>> part:
>> 3. storing the data in a mysql-db
>>
>>
>>
>> b. the Paser-Part:
>> Well - the problem - some of the above mentioned pages are empty. so i
>> need to find a solution to leave them aside - unless i do not want to
>> populate my mysql-db with too much infos..
>> Btw: parsing should be a part that can be done with DomDocument - What
>> do you think?
>> I need to combine the first part with tthe second - can you give me
>> some starting points and hints to get this.
>> The fetching-job should be done with CuRL - and to process the data
>> into a DomDocument-Parser-Job. No Problem here: But how to do the DOM-
>> Document-Job ...
>>
>> i have installed FireBug into the FireFox...
>>
>> now i have the Xpaths for the sites:
>>
>>
http://www.educa.ch/dyn/79376.asp?id=1187
>>
http://www.educa.ch/dyn/79376.asp?id=2939
>>
http://www.educa.ch/dyn/79376.asp?id=1515
>>
http://www.educa.ch/dyn/79376.asp?id=1469
>>
>>
>> see the details:
>>
>> Altes Schulhaus Ossingen :: /html/body/div[2]
>> Guntibachstrasse 10 :: /html/body/div[4]
>> 8475 Ossingen :: /html/body/div[6]
>>
sekretariat...@bluewin.ch :: /html/body/div[9]/a
>> Tel:052 317 15 45 :: /html/body/div[11]
>> Fax:052 317 04 42 :: /html/body/div[12]
>>
>> question - does SimpleDomDocument support xpaths
>>
>> --
>> This group is managed and maintained by the development staff at 360 PSG.
>> An enterprise application development company utilizing open-source
>> technologies for todays small-to-medium size businesses.
>>
>> For information or project assistance please visit :
>>
http://www.360psg.com
>>
>> You received this message because you are subscribed to the Google Groups
>> "Professional PHP Developers" group.
>> To post to this group, send email to
Professi...@googlegroups.com
>> To unsubscribe from this group, send email to
>>
Professional-P...@googlegroups.com
>> For more options, visit this group at
>>
http://groups.google.com/group/Professional-PHP
>
> --
> This group is managed and maintained by the development staff at 360 PSG. An
> enterprise application development company utilizing open-source
> technologies for todays small-to-medium size businesses.
>
> For information or project assistance please visit :
>
http://www.360psg.com
>
> You received this message because you are subscribed to the Google Groups
> "Professional PHP Developers" group.
> To post to this group, send email to
Professi...@googlegroups.com
> To unsubscribe from this group, send email to
>
Professional-P...@googlegroups.com
> For more options, visit this group at
>
http://groups.google.com/group/Professional-PHP