Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DOM related question and problem

1 view
Skip to first unread message

elca

unread,
Nov 18, 2009, 1:04:10 PM11/18/09
to pytho...@python.org

Hello,
these day im making python script related with DOM.

problem is these day many website structure is very complicate .

what is best method to check DOM structure and path..

i mean...following is some example.

what is best method to check can extract such like following info quickly?

before i was spent much time to extract such info .

and yes im also new to python and DOM.

IE.Document.Frames(1).Document.forms('comment').value = 'hello'

if i use DOM inspector, can i extract such info quickly ? if so would you
show me some sample?

here is some site . i want to extract some dom info.

today i was spent all day long to extract what is dom info. but failed

http://www.segye.com/Articles/News/Politics/Article.asp?aid=20091118001261&ctg1=06&ctg2=00&subctg1=06&subctg2=00&cid=0101010600000

at the end of this page,can find some comment input box.

i want to know what kind of dom element should have to use, such like

IE.Document.Frames(1).Document.forms('comment').value = 'hello'

anyhelp much appreciate thanks


--
View this message in context: http://old.nabble.com/DOM-related-question-and-problem-tp26412730p26412730.html
Sent from the Python - python-list mailing list archive at Nabble.com.

Chris Rebert

unread,
Nov 18, 2009, 1:10:44 PM11/18/09
to elca, pytho...@python.org
On Wed, Nov 18, 2009 at 10:04 AM, elca <hig...@gmail.com> wrote:
> Hello,
> these day im making python script related with DOM.
>
> problem is these day many website structure is very complicate .
>
> what is best method to check DOM structure and path..
>
> i mean...following is some example.
>
> what is best method to check  can extract such like following info quickly?
>
> before i was spent much time to extract such info .
>
> and yes im also new to python and DOM.
>
>    IE.Document.Frames(1).Document.forms('comment').value = 'hello'
>
> if i use DOM inspector, can i extract such info quickly ? if so would you
> show me some sample?
>
> here is some site . i want to extract some dom info.
>
> today i was spent all day long to extract what is dom info. but failed
>
> http://www.segye.com/Articles/News/Politics/Article.asp?aid=20091118001261&ctg1=06&ctg2=00&subctg1=06&subctg2=00&cid=0101010600000
>
> at the end of this page,can find some comment input box.
>
> i want to know what kind of dom element should have to use, such like
>
>    IE.Document.Frames(1).Document.forms('comment').value = 'hello'
>
> anyhelp much appreciate thanks

This sounds suspiciously like a spambot. Why do you want to submit
comments in an automated fashion exactly?

Cheers,
Chris
--
http://blog.rebertia.com

elca

unread,
Nov 18, 2009, 7:50:08 PM11/18/09
to pytho...@python.org

> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
Hello
this is not spambot actually.
it related with my blog scraper..
anyone can help me or advice much appreciate
--
View this message in context: http://old.nabble.com/DOM-related-question-and-problem-tp26412730p26418556.html

Stefan Behnel

unread,
Nov 20, 2009, 2:55:40 AM11/20/09
to
elca, 18.11.2009 19:04:

> these day im making python script related with DOM.
>
> problem is these day many website structure is very complicate .
> [...]

> what is best method to check can extract such like following info quickly?

This should help:

http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/

Stefan

elca

unread,
Nov 21, 2009, 6:49:36 AM11/21/09
to pytho...@python.org

> --
> http://mail.python.org/mailman/listinfo/python-list
>
>

hello
yes..i know this website already.
but failed to use it lxml solution

--
View this message in context: http://old.nabble.com/DOM-related-question-and-problem-tp26412730p26455800.html

bla bla

unread,
Dec 1, 2009, 7:36:42 PM12/1/09
to
Nice post on extracting data, simple and too the point :), I use
python for simple html extracting data, but for larger projects like
the web, files, or documents i tried <a href="http://
www.extractingdata.com">extract data</a> which worked great, they
build quick custom screen scrapers, extracting data, and data parsing
programs

Diez B. Roggisch

unread,
Dec 1, 2009, 7:55:30 PM12/1/09
to
bla bla schrieb:

You don't happen to be affiliated with that commercial venture?

Which seems to be shady, to say the least. No real address, dns
registered by a rather shady provider... better steer clear from this,
and use lxml.

Diez

0 new messages