Hi,
I have an elaborate example for you, which likely will result in a good way
to do the work yourself.
First of all we want to fetch the NeTEx XSD "next" branch. Lets say; latest
and greatest features. Your code will parse everything and beyond.
1. git clone --branch next
https://github.com/NeTEx-CEN/NeTEx
For generating python code want to use xsData, it has some pretty decent
documentation. Tested with version 21.11, 21.6 does not work.
<
https://xsdata.readthedocs.io/en/latest/>
2. pip install xsdata[cli,lxml,soap]
In order prevent circular dependencies we are going to create a
"single-package" in order to have the ability to parse fields in the order
they appear, versus separate list we use "compound fields".
3. xsdata NeTEx/xsd/NeTEx_publication.xsd -ss single-package -cf -p netex
; rm __init__.py
4. Lets see if it works;
```python
#!/usr/bin/env python3
import sys
from lxml import etree
from lxml.etree import Element
from netex import ScheduledStopPoint
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.parsers.handlers import LxmlEventHandler
root = etree.parse(sys.argv[1])
parser = XmlParser(handler=LxmlEventHandler)
for scheduledStopPoint in
root.findall('.//{
http://www.netex.org.uk/netex}ScheduledStopPoint'):
ssp = parser.parse(scheduledStopPoint, ScheduledStopPoint)
print(ssp.name.value)
```
You may have noticed that I am using an xpath expression and only then
parse the ScheduledStopPoints into objects. You could also parse a huge
NeTEx file into memory, all at once.
```python
#!/usr/bin/env python3
import sys
from netex import PublicationDelivery
from xsdata.formats.dataclass.context import XmlContext
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.parsers.handlers import LxmlEventHandler
parser = XmlParser(context=XmlContext())
pd = parser.parse(sys.argv[1], PublicationDelivery)
for ssp in
pd.data_objects.composite_frame[0].frames.service_frame[0].scheduled_stop_points.scheduled_stop_point:
print(ssp.name.value)
```
As you can see from the above: you have to understand the structure of the
provided NeTEx-profile, while with the xpath expression, anything would
match, including everything under it. Now what if you would like to check
*just* which frames exist in the file, you would actually want: give me the
CompositeFrame, and one level (in this case: 3) under it.
I have developed something because I saw someone having the same issue on
StackOverflow, a breadth first approach to deepcopy.
```python
#!/usr/bin/env python3
import sys
from lxml import etree
from lxml.etree import Element
from collections import deque
def copyme(el):
n = Element(el.tag, el.attrib)
n.text = el.text
n.tail = el.tail
return n
def depthcopy(root, max_depth=1):
queue = deque([(root, None, 0)])
keep = None
while queue:
el, parent, depth = queue.popleft()
if depth == max_depth:
break
new_parent = copyme(el)
if parent is None:
keep = new_parent
else:
parent.append(new_parent)
queue.extend([(x, new_parent, depth + 1) for x in el])
return keep
if __name__ == "__main__":
from netex import CompositeFrame
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.parsers.handlers import LxmlEventHandler
parser = XmlParser(handler=LxmlEventHandler)
root = etree.parse(sys.argv[1])
needle = root.find('.//{
http://www.netex.org.uk/netex}CompositeFrame')
x = depthcopy(needle, 3)
y = parser.parse(x, CompositeFrame)
print(len(y.frames.fare_frame))
```
I hope that the above examples will allow you to successfully create a
parser and use it to your advantage. From my own experience with processing
NeTEx: try to stay within the vocabulary. For example when you are creating
a distance matrix, build the NeTEx equivalent. This gives you a
derivedObject that can be validated with an audit trail.
If you have any questions or want to collaborate on parsing and presenting
software, there are several opportunities here.
--
Stefan