On Jun 6, 2:17 pm, Mike Dalessio <
mike.dales...@gmail.com> wrote:
> ...
> I, personally, would prefer to leave the default as it is; which would
> require people to use the `nonet` parse option when parsing untrusted
> documents. This is largely driven by my assumption (perhaps flawed) that
> the 99% use case is for people dealing with trusted documents. I understand
> "secure by default", though in this case I believe that there is a tradeoff
> with respect to usability that should be considered -- mainly that
> validation and entity replacement often require external connections.
For a piece of software open to use by developers of varying ability &
experience, many of whom may not read or follow any, let alone all, of
the official documentation[1], "secure by default" should trump other
considerations. It appears that the Rails team's failure to accept
this was a factor allowing Homakov's GitHub hack to proceed; please
don't lead Nokogiri down the same unsafe path.
Counting your replies so far:
* Expressed concern about the default: DR, JR, WLD, MS, AP.
* Puzzling: YH[2].
The majority opinion in this thread seems to be very clear :)
> OK, here's my proposal:
> 1) make the default XML parsing options include NONET (the default HTML
> parsing option already sets NONET)
Ah, that sounds like a good start. I think Michal Suchanek's
suggestion is the best solution, but setting NONET by default is
certainly better than having DONET as the default.
Regards,
Sam
[1] Plenty of tutorials exist for using Nokogiri to do screen scraping
or other things. If the default were to remain as it is, and if a user
unaware of the XXE vulnerability were to follow an existing tutorial,
likely they wouldn't find out about the danger until too late.
[2] Yoko Harada says, "Attack 2 cannot be prevented by just stopping
well." However, that very document states, "When an XML parser such as
a SAX Parser reads the XML in, if running for example on a *NIX system
will result in the loading of the contents of the /etc/passwd file
into the contents of the resulting parsed document. If the same is
returned to the person invoking the attack, well you can imagine their
glee at accessing this sensitive data." Note the last sentence, which
seems to contradict YH's conclusion. The document goes on to say,
"Clearly the way to restrict this from happening is either to scan
requests at the network level or follow a direction to strictly
enforce which entities can be resolved."