Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Remove XML node before validating

68 views
Skip to first unread message

agda.k...@gmail.com

unread,
Oct 27, 2008, 9:52:33 AM10/27/08
to
Hello,

I need to remove the DTD reference from an xml document, the reason
for this is that we want to validate against a schema instead (which
we have locally). It takes up to a minute to fetch all documents
referred to in the DTD, and as we have no use for them I want to
remove the reference.

I'm using XmlReaderSettings to pass in the xml document and the
schema, but when I loop through the reader it goes and tries to get
the DTD before I can remove it, so I'm assuming there's a better way
to remove it before doing the validation. I've tried using XPath but I
don't know how to find the doctype node. Is it Xpath that I should
use?

I'd be very grateful if anyone could point me in the right direction.

Thanks,

AK

Marc Gravell

unread,
Oct 27, 2008, 10:07:44 AM10/27/08
to

Martin Honnen

unread,
Oct 27, 2008, 10:26:51 AM10/27/08
to
agda.k...@gmail.com wrote:

> I need to remove the DTD reference from an xml document, the reason
> for this is that we want to validate against a schema instead (which
> we have locally). It takes up to a minute to fetch all documents
> referred to in the DTD, and as we have no use for them I want to
> remove the reference.
>
> I'm using XmlReaderSettings to pass in the xml document and the
> schema, but when I loop through the reader it goes and tries to get
> the DTD before I can remove it, so I'm assuming there's a better way
> to remove it before doing the validation. I've tried using XPath but I
> don't know how to find the doctype node. Is it Xpath that I should
> use?

No, the XPath data model does not know any DTDs so it does certainly not
help.
If you want the XmlReader (or XmlDocument) to ignore the referenced DTD
then you can try to set the XmlResolver property (of the
XmlReaderSettings you create your XmlReader with
http://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings.xmlresolver.aspx)
to null. That way the reader will not fetch any resources. That will
only work however if the XML document does not references any entities
defined in the DTD.
A bit more work but a more complete solution is to set the XmlResolver
to your own implementation of UrlResolver, for instance by subclassing
XmlUrlResolver, that then uses a locally cached copy of the DTDs.
--

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

AK

unread,
Oct 28, 2008, 9:54:02 AM10/28/08
to
On Oct 27, 2:26 pm, Martin Honnen <mahotr...@yahoo.de> wrote:
> No, the XPath data model does not know any DTDs so it does certainly not
> help.
> If you want the XmlReader (or XmlDocument) to ignore the referenced DTD
> then you can try to set the XmlResolver property (of the
> XmlReaderSettings you create your XmlReader withhttp://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings....)

> to null. That way the reader will not fetch any resources. That will
> only work however if the XML document does not references any entities
> defined in the DTD.
> A bit more work but a more complete solution is to set the XmlResolver
> to your own implementation of UrlResolver, for instance by subclassing
> XmlUrlResolver, that then uses a locally cached copy of the DTDs.

Thanks for your answer, it took so long before my post came up that I
actually thought it had gone missing at first, only noticed it now!

This is the code I'm using at the moment:

XmlDocument xdoc = new XmlDocument();
bool docIsValid = false;

try
{
xdoc.XmlResolver = null;
xdoc.Load(scorePath);

docIsValid = true;
}
catch (System.Exception ex)
{
errorList.Add(ex.Message);
}

if (docIsValid == true)
{
foreach (XmlNode node in xdoc.ChildNodes)
{
if (node.GetType().ToString().Contains("DocumentType"))
{
// Delete it
xdoc.RemoveChild(node);
}
}

MemoryStream ms = new MemoryStream();
xdoc.Save(ms);
ms.Position = 0;
XmlReader xmlDoc = XmlReader.Create(ms);

XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
settings.XmlResolver = new LocalXmlResolver();

settings.ValidationEventHandler += new
System.Xml.Schema.ValidationEventHandler(settings_ValidationEventHandler);

XmlSchema x =
XmlSchema.Read(Utilities.getSchemaFromResources(pvgschema),
settings_ValidationEventHandler);
settings.Schemas.Add(x);

settings.ValidationType = ValidationType.Schema;

XmlReader reader = XmlReader.Create(xmlDoc, settings);

while (reader.Read())
{

}
}

Basically I want to validate against a locally saved schema (which is
set to an embedded resource), and never validate against the DTD. The
code above is not ideal as I'm validating the xml file twice, once to
remove the DTD reference then once against the schema, however it does
avoid me having to go get all the documents referenced in the DTD
(which could take up to a minute).

Also, I've saved all the schemas referenced to in 'pvgschema' locally
and added them as embedded resources, but it doesn't seem like the
XmlResolver works as I thought as it still does an HTTP get for those
schemas on the line settings.Schemas.Add(x);.

Is there a simpler way of doing this?

Many thanks,

AK

AK

unread,
Oct 29, 2008, 6:45:35 AM10/29/08
to
On Oct 28, 1:54 pm, AK <agda.karlb...@gmail.com> wrote:
> Basically I want to validate against a locally saved schema (which is
> set to an embedded resource), and never validate against the DTD. The
> code above is not ideal as I'm validating the xml file twice, once to
> remove the DTD reference then once against the schema, however it does
> avoid me having to go get all the documents referenced in the DTD
> (which could take up to a minute).
>
> Also, I've saved all the schemas referenced to in 'pvgschema' locally
> and added them as embedded resources, but it doesn't seem like the
> XmlResolver works as I thought as it still does an HTTP get for those
> schemas on the line settings.Schemas.Add(x);.

For the second point, I had made a mistake in the resolver. It now
tries to get the embedded schema but fails as the schema has a
"xs:redefine schemaLocation" in it and I get the error message
"schemaLocation must successfully resolve if <redefine> contains any
child other than <annotation>". Is it possible to solve this or would
it be better to remove the redefine from the schema?

(Apologies if someone has already answered this - I've had troubles
seeing updates and only saw my own answer to this when I came in this
morning even if I posted it yesterday afternoon.)

Many thanks,

AK

0 new messages