Any ideas where hex values may be coming in from?

5 views
Skip to first unread message

Wayne Brissette

unread,
Jun 9, 2020, 6:18:15 AM6/9/20
to nokogiri-talk
I'm seeing something that I can't quite put my finger on. I have some
files that are being written out in XML as I would expect when I view
them in oXygen. Then I have a few files that when written out in instead
of carriage returns, I see: 

I recognize that this is just a hex representation, but I also think it
may be causing me some processing issues. That being said, when I look
one particular file throughout the Intellij debugger, I don't see this
until builder is done with it.

Any ideas as to what may cause this in some files, but not others? All
of these files are produced exactly the same. So it's not like the are
coming in from different sources.

-Wayne

Mike Dalessio

unread,
Jun 9, 2020, 8:52:44 AM6/9/20
to nokogiri-talk
Hey Wayne,

Without seeing your code and the source documents, I can only guess. But here's my guess.

`
` is hex for "carriage return", as in hex 0D and part of the common line terminator in the Windows world. It can be seen when "canonicalizing" documents with CRLF line terminators (i.e., Nokogiri::HTML::Document#canonicalize), is that how you're serializing?

#! /usr/bin/env ruby

require "nokogiri"

html = <<-EOXML
<root>
  <element foo="asdf\r\n">this has a CR and a LF at the end of it\r\n</element>
</root>
EOXML

puts Nokogiri::HTML(html).canonicalize

outputs

<html><body><root>
  <element foo="asdf&#xD;&#xA;">this has a CR and a LF at the end of it&#xD;
</element>
</root>
</body></html>



--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-tal...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nokogiri-talk/556df5fc-be3d-8604-7439-47dd88fa18ae%40att.net.
Reply all
Reply to author
Forward
0 new messages