Archives of XML messages with embedded CR/LF

66 views
Skip to first unread message

Peter Hicks

unread,
Sep 18, 2025, 2:28:28 PM (4 days ago) Sep 18
to A gathering place for the Open Rail Data community
All,

The passenger train consist messages contain, in some circumstances, embedded carriage returns and line feeds.  In XML, this is not a problem - but it makes archiving them a little awkward, as a message can be split over two lines.

There are ways around this.  One of them involves archiving individual messages to separate files and naming them such that you preserve the order - then you can read them in to a system in order and knowing one file is one message.  Another way is to replace the CR/LFs with HTML entities, but that means that the archive isn't truly representative of the original message.  A final way is to base64-encoding the entire XML message, potentially (up to) tripling the size of the file but keeping it on a single line.

What's your preferred method of archiving?


Peter

Gaelan Steele

unread,
Sep 19, 2025, 6:41:20 AM (3 days ago) Sep 19
to openrail...@googlegroups.com
A minimal-changes approach would be to use a NULL (\0) byte instead of a newline to separate messages. I might also consider dumping them in an SQLite database.

Best wishes,
Gaelan

On Sep 18, 2025, at 8:28 PM, 'Peter Hicks' via A gathering place for the Open Rail Data community <openrail...@googlegroups.com> wrote:


--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/openraildata-talk/LsI4bR-RHcPYE0jQiayUTWE3s5dZurvLKj645FWNUMKt7kIj7Pl3sfWC8ArIOK7t_nkK7xtCDaoT00zEiz47JJqpScje6VgRBg4VOWFT64U%3D%40poggs.co.uk.

Peter Hicks

unread,
Sep 19, 2025, 6:53:49 AM (3 days ago) Sep 19
to openrail...@googlegroups.com
On Friday, 19 September 2025 at 11:41, Gaelan Steele <g...@canishe.com> wrote:

A minimal-changes approach would be to use a NULL (\0) byte instead of a newline to separate messages. I might also consider dumping them in an SQLite database.

Interesting approaches - and for anyone who doesn't know, NUL isn't allowed in XML so that's fine.

I'd considered using ASCII SOT and EOT characters to top and tail each message, as that's what they were put in ASCII to do.  In fact, the socket-based Push Port used SOT/EOT to split up messages on a TCP stream.

SQLite would make it difficult to consume the messages unless you had a suitable client, but that's not really an issue is it?

My preference would be to go for the SOT/EOT or NUL options but I'll also have an ask around elsewhere.


Peter

Ben Woodward

unread,
Sep 19, 2025, 7:00:22 AM (3 days ago) Sep 19
to openraildata-talk
Why not just archive the XML as is?

--
You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.

Peter Hicks

unread,
Sep 19, 2025, 7:43:02 AM (3 days ago) Sep 19
to openrail...@googlegroups.com
On Friday, 19 September 2025 at 12:00, Ben Woodward <bwood...@gmail.com> wrote:

Why not just archive the XML as is?

Referring back to the original post:  "The passenger train consist messages contain, in some circumstances, embedded carriage returns and line feeds.  In XML, this is not a problem - but it makes archiving them a little awkward, as a message can be split over two lines".

One message per file preserves the message body, but it's hugely inefficient.


Peter

Tom Cairns

unread,
Sep 19, 2025, 7:53:28 AM (3 days ago) Sep 19
to openrail...@googlegroups.com

(As I’m sure Peter is aware of… but others may not be)

 

The way NR handle this in one system is to prepend the start of a new message with a character that will not be seen from the message itself, the time of arrival in ms and then that character again. It may not be a bad idea if there’s a desire to retain the integrity of the message in completeness.

 

Tom

 

--

You received this message because you are subscribed to the Google Groups "A gathering place for the Open Rail Data community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openraildata-t...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages