cause this will make the parse choke on the invalid ones. At the same time, I cannot do
doc = Nokogiri::HTML::Document.parse(io,url, nil)
(which allows nokogiri to consider the meta tags) as this will cause parsing of some UTF-8 files as.. something else, producing mangled strings.
This means that to have a behaviour like "consider metatags, or fallback to utf8' I need to either reimplement EncodingReader or access private/undocumented/internal Nokogiri code.
Am I overlooking something?
If not, I have a tiny patch that allows passing an options hash as the value of encoding, which allows me to specify a fallback encoding and still get autodetection. This api is rather bad but at least it lets old code work fine without being aware of it.
Possibly a better thing would be to document EncodingReader and at the same time allow instances of it to be passed as options to HTML::Document methods?
(or make it an ivar and thus overrideable, but that seems to incur into modifying global state)
I also believe the same issue occurse wrt to non-file readable objects, as there is a line: