Get the xpath expression of a node element

13 views
Skip to first unread message

Luis Miguel Morillas

unread,
May 2, 2012, 6:51:45 AM5/2/12
to ak...@googlegroups.com
Do the amara nodes have a method to get the xpath route from the root
of the xml document?


Saludos,

-- luismiguel  (@lmorillas)

Uche Ogbuji

unread,
May 8, 2012, 5:45:04 PM5/8/12
to ak...@googlegroups.com
Sorry, somehow I missed this. There is:

amara.lib.xpath.util.abspath

I think I had planned to update it in some way, but I think it works for the most part.

--Uche


--
You received this message because you are subscribed to the Google Groups "akara" group.
To post to this group, send email to ak...@googlegroups.com.
To unsubscribe from this group, send email to akara+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/akara?hl=en.




--
Uche Ogbuji                       http://uche.ogbuji.net
Weblog: http://copia.ogbuji.net
Poetry ed @TNB: http://www.thenervousbreakdown.com/author/uogbuji/
Founding Partner, Zepheira        http://zepheira.com
Linked-in: http://www.linkedin.com/in/ucheogbuji
Articles: http://uche.ogbuji.net/tech/publications/
Friendfeed: http://friendfeed.com/uche
Twitter: http://twitter.com/uogbuji
http://www.google.com/profiles/uche.ogbuji

Luis Miguel Morillas

unread,
May 8, 2012, 6:35:57 PM5/8/12
to ak...@googlegroups.com
2012/5/8 Uche Ogbuji <uc...@ogbuji.net>:
> Sorry, somehow I missed this. There is:
>
> amara.lib.xpath.util.abspath
>
> I think I had planned to update it in some way, but I think it works for the
> most part.
>

Yes, it's working for me. I wrote a discover funcion to help users to
extract xpath expressions that finds some text in web pages, so our
system can learn to find info in similar pages.


def discover_path(text, node):
''' Discover best xpaths to extract text.

I.e. it can find 'Hello, World' in these cases:
<h1>Hello, World</h1>
<p>Hello, <span>World</span></p>
'''
paths = []
if text in U(node):
if hasattr(node, 'xml_children'):
for child in node.xml_children:
path = discover_path(text, child)
if path:
paths.extend(path)
if not paths: # no more specific path in childs
paths.append(util.abspath(node))
return paths


Saludos,

-- luismiguel  (@lmorillas)
Reply all
Reply to author
Forward
0 new messages