I need the following functions:
Search and get all the records that share the same tree structures.
Search records containing a specific node or sub tree structres.
Does anyone know about something suitable? Or other powerful XML search
engine in Python?
Thank you so much for help.
PS: I do not care about the learning curve, as long as it is not a
cliff to climb.
> I am now using XML to record my lab records in quite a complex way, in
> which about 80 tags are used to identify different types of data. I
> think it is a good idea to search for a popular and mature XML search
> engine before I started to program it myself:
> I need the following functions:
> Search and get all the records that share the same tree structures.
> Search records containing a specific node or sub tree structres.
> Does anyone know about something suitable? Or other powerful XML search
> engine in Python?
> Thank you so much for help.
XPath is your friend. Available at least in the 4suite-xml-libraries. I'm
not aware of anything that resembles a more database-like approach, but at
least you can issue queries against the xml.
However you might reconsider your choice of a persistence model. Queries is
what SQL is for, and you very well might be better off using e.g. sqlite
and a XML-im/export if that is what you need.
There are good database approaches to XML too, especially if you are
going to query a document collection as a whole rather than file by
file. You can try XQuery. I think 4Suite can do this (But I am too
sleepy to confirm :-) ). You also use eXist (Java but you can use
XMLRPC or SOAP to interface with it from Python). Not optimal like
parent said, but if it is XML that have to live with ...
4Suite does not support XQuery. It does support full XPath plus EXSLT
and enough other extensions to come close to the power of XQuery.
Amara  makes it really easy to get XQuery-like power from right
within Python, as I've blogged before (e.g. ).
I don't know whether full-text indexing of XML is something the OP
needs as well. If so, see .