Can I obtain byte offsets from the streaming XML reader?
14 views
Skip to first unread message
Adam Wojnakowski
unread,
Jul 4, 2022, 7:24:48 AM7/4/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Woodstox User Mailing List
I am looking for a way to obtain offsets of start and end element tags, ones that I can later use to efficiently read a portion of XML data from the same file. I know that you can obtain character offsets but can't seem to find an easy way to get the byte offsets of InputStream.
The getByteStartingOffset() or getByteEndingOffset() of LocationInfo return -1 even if I construct the reader from an InputStream.
Is there any support for tracking byte offsets of the current tag in the streaming XML reader?
Thanks in advance,
Adam
Tatu Saloranta
unread,
Jul 5, 2022, 2:06:14 PM7/5/22
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Adam Wojnakowski, Woodstox User Mailing List
No, unfortunately this is not possible with Woodstox: all input will
be read using a Reader, so conversion from bytes to chars occurs
before any decoding/parsing. So only character offsets are available.
Aalto parser (https://github.com/FasterXML/aalto-xml/) would provide
byte offsets, if it worked for your use case -- the main limitation
being that it does not support DTDs.