XXE (Xml eXternal Entity) Attack

Mike Dalessio

unread,

Jun 6, 2012, 9:17:09 AM6/6/12

to nokogiri-talk

Hello all,

We had a request (at https://github.com/tenderlove/nokogiri/issues/693) that Nokogiri's default behavior during parsing should be to avoid making network connections (currently done by the lower-level XML libraries (libxml/xerces) for, e.g., entitly replacement and validation).

Some background on why this behavior is insecure when an application is using Nokogiri to parse untrusted documents is here:

http://www.securiteam.com/securitynews/6D0100A5PU.html

I, personally, would prefer to leave the default as it is; which would require people to use the `nonet` parse option when parsing untrusted documents. This is largely driven by my assumption (perhaps flawed) that the 99% use case is for people dealing with trusted documents. I understand "secure by default", though in this case I believe that there is a tradeoff with respect to usability that should be considered -- mainly that validation and entity replacement often require external connections. My current thinking is that we can make this option easier to use (perhaps introducing `Document#parse_untrusted`) and certainly documenting this behavior more clearly for those who are unaware that it happens.

I'd very much like audience participation on this one, so please share your thoughts both on whether this behavior should change in a point release (1.5.x); and what you think a better (read "more semantic") `parse` API might look like (because we are planning API overhauls in 2.0).

Thank you for your time,

-mike

---

mike dalessio / @flavorjones

Dmitry Ratnikov

unread,

Jun 6, 2012, 10:57:34 AM6/6/12

to nokogi...@googlegroups.com

Hey Mike,

Thanks for kicking off this discussion. :)

I guess I'm on the other side of the fence, because this functionality
is rather obscure it's unlikely that developers will make an informed
decision of whether to turn it off until they realize they need this
functionality or they've been hit with the XXE attack. Latter case
kind of sucks.

Also, turning it off by default is probably low impact:
1) If you don't care about whether external documents are parsed (or
not even aware about such possibility), you probably want it turned
off.
2) If you want to have such functionality turned on, you're likely to
find out that setting pretty quick.

As far trusted documents go, if nokogiri is used to parse javascript
XML requests, that's a vulnerability. Additionally, I could think of a
MiM vector for fetching feeds or whatever.

-- Dmitry

PS Just for record, I'm slightly biased here, since both felixgr and I
work at the same company. Just to make my allegiances clear. ;)

> --
> You received this message because you are subscribed to the Google Groups
> "nokogiri-talk" group.
> To post to this group, send email to nokogi...@googlegroups.com.
> To unsubscribe from this group, send email to
> nokogiri-tal...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/nokogiri-talk?hl=en.

Jonathan Rochkind

unread,

Jun 6, 2012, 10:59:24 AM6/6/12

to nokogi...@googlegroups.com, Mike Dalessio

My primary use case for nokogiri is parsing XML returned by API calls
from third party services -- that's pretty much "untrusted", at least to
some extent. I don't _expect_ my third-party api provider to be
attacking me, but it's also something I have no control over, and can
include bugs or have been hacked itself in some way.

I think I'm probably not alone; I don't fully understand the nature of
the issue under discussion, but I'd strongly suggest that nokogiri
should be secure by default parsing untrusted XML, not require special
measures to be taken to be secure -- I think parsing 'untrusted' third
party XML is probably a quite common use case.

Walter Lee Davis

unread,

Jun 6, 2012, 11:03:54 AM6/6/12

to nokogi...@googlegroups.com

Mike, is the issue that when you parse HTML or another format that declares an external DTD, you are actually requesting that file from the W3C or wherever? I mostly use Noko to parse XML and HTML files, and most of them do have these declarations.

Walter

Michal Suchanek

unread,

Jun 6, 2012, 1:36:20 PM6/6/12

to nokogi...@googlegroups.com

Hello,

I would suggest a third solution:

The default would be to generate a parse error when the document does
include an external entity reference.

You could specify nonet or donet to replace the external entities with
empty string of fetch them, respectively.

Thanks

Michal

Aaron Patterson

unread,

Jun 6, 2012, 6:09:01 PM6/6/12

to nokogi...@googlegroups.com

On Wed, Jun 6, 2012 at 8:17 AM, Mike Dalessio <mike.d...@gmail.com> wrote:
> Hello all,
>
> We had a request (at https://github.com/tenderlove/nokogiri/issues/693) that
> Nokogiri's default behavior during parsing should be to avoid making network
> connections (currently done by the lower-level XML libraries (libxml/xerces)
> for, e.g., entitly replacement and validation).
>
> Some background on why this behavior is insecure when an application is
> using Nokogiri to parse untrusted documents is here:
>
> http://www.securiteam.com/securitynews/6D0100A5PU.html
>
> I, personally, would prefer to leave the default as it is; which would
> require people to use the `nonet` parse option when parsing untrusted
> documents.

Do we know what the impact will be if we turn on `nonet` by default?
What does libxml2 to by default?

> This is largely driven by my assumption (perhaps flawed) that the
> 99% use case is for people dealing with trusted documents. I understand
> "secure by default", though in this case I believe that there is a tradeoff
> with respect to usability that should be considered -- mainly that
> validation and entity replacement often require external connections. My
> current thinking is that we can make this option easier to use (perhaps
> introducing `Document#parse_untrusted`) and certainly documenting this
> behavior more clearly for those who are unaware that it happens.
>
> I'd very much like audience participation on this one, so please share your
> thoughts both on whether this behavior should change in a point release
> (1.5.x); and what you think a better (read "more semantic") `parse` API
> might look like (because we are planning API overhauls in 2.0).

I prefer we prevent net connections by default, but it would be nice
to understand the impact first. It /may/ be more of a PITA for users
(we don't know this yet), but I prefer to be secure by default.

--
Aaron Patterson
http://tenderlovemaking.com/

Yoko Harada

unread,

Jun 7, 2012, 9:03:32 AM6/7/12

to nokogi...@googlegroups.com

Hello,

This is kind of scary problem. In my opinion, Mike's plan is the best.

The issue reporter, @felixgr, are saying about libxml version, but the
same issue might be in pure Java version as well. So, I tried the
given snippet on pure Java version. I got the result below on ubuntu
(OSX doesn't have strace command):

strace -e connect jruby -Ilib ../tmp/issue693.rb
--- SIGCHLD (Child exited) @ 0 (0) ---
--- SIGCHLD (Child exited) @ 0 (0) ---
--- SIGCHLD (Child exited) @ 0 (0) ---
--- SIGCHLD (Child exited) @ 0 (0) ---
--- SIGCHLD (Child exited) @ 0 (0) ---

I'm not familiar with "strace" command. So, I looked at the result of
netstat command at the same time. The "netstat" didn't show any proof
of internet connection based on the entity in XML document.

Does this mean pure Java version is safe in terms of entity parsing?
Somebody, give me a comment.

Also, I made a research a bit since it is an old issue of 2002. I want
to know the reason of a suspected vulnerability has been left for 10
years in well-used, mature OSS products.

As for Apache Xerces, it parses entity, but doesn't replace the value
during parsing. This might be the reason pure Java version didn't
connect to the internet while parsing entity, I guess.

As far as I understand, there's two types of attacks caused by XXE.

1. DoS by long and long size of entity value

2. XML Injection which exposes /etc/passwd file or other sensitive system files

Attack 1 has been already fixed in both libxml and pure Java versions.
Attack 2 cannot be prevented by just stopping internet connection.
Probably, XXE section of
http://sleeplessinslc.blogspot.jp/2010/09/xml-injection.html explains
well.

As a result, the problem happens when untrusted XML documents will be parsed.

Who knows it is a trusted/untrusted XML document? I think a Nokogiri user is.

So, I second to Mike. Most Nokogiri users parse trusted XML documents (I guess).

My confession. Pure Java version doesn't use nonet option at all at
this moment. I need to fix this.

-Yoko

Jonathan Rochkind

unread,

Jun 7, 2012, 10:47:44 AM6/7/12

to nokogi...@googlegroups.com

On 6/7/2012 9:03 AM, Yoko Harada wrote:
> So, I second to Mike. Most Nokogiri users parse trusted XML documents (I guess).

Hmm, I'd guess the opposite, but I guess it's hard to say, we all assume
that 'most' users do what we do. I mainly parse untrusted documents.

I guess there's a question of what counts as 'trusted' vs 'untrusted'. I
mainly parse documents from third party API services, since they're
coming from third parties, I'd consider them 'untrusted'.

Dmitry Ratnikov

unread,

Jun 7, 2012, 5:37:26 PM6/7/12

to nokogi...@googlegroups.com

On Thu, Jun 7, 2012 at 10:47 AM, Jonathan Rochkind <roch...@jhu.edu> wrote:
> On 6/7/2012 9:03 AM, Yoko Harada wrote:
>>
>> So, I second to Mike. Most Nokogiri users parse trusted XML documents (I
>> guess).
>
>
> Hmm, I'd guess the opposite, but I guess it's hard to say, we all assume
> that 'most' users do what we do. I mainly parse untrusted documents.

I'm seconding Jonathan here as well, since I don't think most users
will make an informed decision: I didn't know about the remote
entities until Felix mentioned it to me, and I've been doing XML for a
couple of years. Thus even if the option existed, I'd probably not
have read the documentation closely enough to disable it.

Mike, Yoko do you happen to know whether anyone actually uses the
remote entity functionality? Perhaps if it's an obscure case, it's
okay to break backward compatibility?

-- D

>
> I guess there's a question of what counts as 'trusted' vs 'untrusted'. I
> mainly parse documents from third party API services, since they're coming
> from third parties, I'd consider them 'untrusted'.
>
>

Aaron Patterson

unread,

Jun 7, 2012, 5:52:18 PM6/7/12

to nokogi...@googlegroups.com

I think we should treat this as a bug. It's worth breaking apps if it means they are more secure.

/2cents

--
Aaron Patterson
http://tenderlovemaking.com/

I'm on an iPhone so I apologize for top posting.

Mike Dalessio

unread,

Jun 8, 2012, 10:33:20 AM6/8/12

to nokogi...@googlegroups.com

On Thu, Jun 7, 2012 at 5:52 PM, Aaron Patterson <aaron.p...@gmail.com> wrote:

I think we should treat this as a bug. It's worth breaking apps if it means they are more secure.

/2cents

OK, here's my proposal:

1) make the default XML parsing options include NONET (the default HTML parsing option already sets NONET)

2) make it easy to *unset* parse options. How does a method on ParseOptions like "#unset_foo" sound?

If nobody has objections, I'll do this today.

Then, for Nokogiri 2.0 I would propose some semantic config settings, like "#trusted_document" which would just do the right thing across multiple settings.

@yokolet - can you look into implementing the nonet option under JRuby?

Yoko Harada

unread,

Jun 9, 2012, 1:23:38 AM6/9/12

to nokogi...@googlegroups.com

Mike

Yes, I've been looking at how to make it secure for a couple of days.
There are a few Xerces setting not to load external DTD.
Here's my note:

1. Current Nokogiri test doesn't see internet connection

When I ran test on my laptop physically disconnect from the Internet,
I got the exactly the same result from pure Java WITH the Internet.
So, I think we need new test cases to know whether nonet option is
working or not.

2. Nokogiri::XML::Document should have new parsing option for DTD validation

Xerces validates only when parsing is going on, not like libxml. To
make it look like libxml, pure Java version parses with validation on
just in case validation is requested later. This would introduce
untrusted entity parsing.
So, I want an option for validation in initialization method of XML::Document.

Other than those, I'll soon update EntityResolver. Currently,
EntityResovler might resolve something like file:///etc/passwd entity
value. The fix is not so complicated.

-Yoko

sampablokuper

unread,

Jun 13, 2012, 7:30:49 AM6/13/12

to nokogiri-talk, Sam Pablo Kuper

On Jun 6, 2:17 pm, Mike Dalessio <mike.dales...@gmail.com> wrote:
> ...

> I, personally, would prefer to leave the default as it is; which would
> require people to use the `nonet` parse option when parsing untrusted
> documents. This is largely driven by my assumption (perhaps flawed) that
> the 99% use case is for people dealing with trusted documents. I understand
> "secure by default", though in this case I believe that there is a tradeoff
> with respect to usability that should be considered -- mainly that
> validation and entity replacement often require external connections.

For a piece of software open to use by developers of varying ability &
experience, many of whom may not read or follow any, let alone all, of
the official documentation[1], "secure by default" should trump other
considerations. It appears that the Rails team's failure to accept
this was a factor allowing Homakov's GitHub hack to proceed; please
don't lead Nokogiri down the same unsafe path.

Counting your replies so far:

* Expressed concern about the default: DR, JR, WLD, MS, AP.
* Puzzling: YH[2].

The majority opinion in this thread seems to be very clear :)

> OK, here's my proposal:
> 1) make the default XML parsing options include NONET (the default HTML
> parsing option already sets NONET)

Ah, that sounds like a good start. I think Michal Suchanek's
suggestion is the best solution, but setting NONET by default is
certainly better than having DONET as the default.

Regards,

Sam

[1] Plenty of tutorials exist for using Nokogiri to do screen scraping
or other things. If the default were to remain as it is, and if a user
unaware of the XXE vulnerability were to follow an existing tutorial,
likely they wouldn't find out about the danger until too late.
[2] Yoko Harada says, "Attack 2 cannot be prevented by just stopping

internet connection. Probably, XXE section of
http://sleeplessinslc.blogspot.jp/2010/09/xml-injection.html explains

well." However, that very document states, "When an XML parser such as
a SAX Parser reads the XML in, if running for example on a *NIX system
will result in the loading of the contents of the /etc/passwd file
into the contents of the resulting parsed document. If the same is
returned to the person invoking the attack, well you can imagine their
glee at accessing this sensitive data." Note the last sentence, which
seems to contradict YH's conclusion. The document goes on to say,
"Clearly the way to restrict this from happening is either to scan
requests at the network level or follow a direction to strictly
enforce which entities can be resolved."

Mike Dalessio

unread,

Jun 13, 2012, 9:42:31 AM6/13/12

to nokogi...@googlegroups.com

On Wed, Jun 13, 2012 at 7:30 AM, sampablokuper <sampab...@googlemail.com> wrote:

On Jun 6, 2:17 pm, Mike Dalessio <mike.dales...@gmail.com> wrote:
> ...

> I, personally, would prefer to leave the default as it is; which would
> require people to use the `nonet` parse option when parsing untrusted
> documents. This is largely driven by my assumption (perhaps flawed) that
> the 99% use case is for people dealing with trusted documents. I understand
> "secure by default", though in this case I believe that there is a tradeoff
> with respect to usability that should be considered -- mainly that
> validation and entity replacement often require external connections.

For a piece of software open to use by developers of varying ability &
experience, many of whom may not read or follow any, let alone all, of
the official documentation[1], "secure by default" should trump other
considerations. It appears that the Rails team's failure to accept
this was a factor allowing Homakov's GitHub hack to proceed; please
don't lead Nokogiri down the same unsafe path.

I appreciate your passion, but we already released Nokogiri 1.5.4 with this network vulnerability closed:

https://groups.google.com/group/nokogiri-talk/browse_thread/thread/9f2c8f7fe99dc85a

Thanks for using Nokogiri!

Reply all

Reply to author

Forward