malformed XML in reports?

44 views
Skip to first unread message

jessica

unread,
Oct 10, 2007, 7:46:10 PM10/10/07
to AdWords API Forum
Hi all,

I have come across a situation where the CustomReport I requested
contains XML that I am unable to parse with a Java SAXParser, and tidy
fails with errors as well. The reports I am having problems with are
quite large - I'm working on narrowing down exactly what types of
reports fail, but in the meantime I'm wondering if this is a known
issue? Has anyone else run into this problem? Partial Java stack
traces and tidy error output are below.

Thanks in advance to anyone who can shed light on the problem.

jessica

org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x0)
was found in the value of attribute "kwDestUrl" and element is "row".
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:
236)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:
215)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:
386)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:
316)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:
1438)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:
969)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:
1033)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:
851)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl
$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:
1693)
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:
368)
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:
834)
...

com.imi.util.exception.BadFileFormatException: Parsing error
at
com.imi.util.parser.GoogleXMLParser.parse(GoogleXMLParser.java:328)
at
com.imi.batches.help.AdwordsReportFilter.processTriggerAPI(AdwordsReportFilter.java:
327)
at
com.imi.batches.help.AdwordsReportFilter.process(AdwordsReportFilter.java:
133)
at com.imi.batches.ProcessMsgThread.run(ProcessEmailBatch.java:
203)
Caused by: org.xml.sax.SAXParseException: XML document structures must
start and end within the same entity.
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:
236)
at
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:
215)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:
386)
at
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:
316)
at
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:
1438)
....


line 3 column 261785 - Warning: <row> attribute with missing trailing
quote mark
line 3 column 456499 - Error: unexpected </ro> in <row>
line 3 column 476226 - Error: unexpected </row> in <rowyword>
line 3 column 487495 - Error: unexpected </row> in <rowesort->
line 3 column 689950 - Warning: <row> attribute "ca/row" lacks value
line 3 column 740127 - Warning: <row> attribute name
"mmunity"campaign" (value="Boca Raton") is invalid

zdvso...@hotmail.com

unread,
Oct 11, 2007, 4:12:47 AM10/11/07
to AdWords API Forum
Had a different error - no root element. My software is .NET.

System.Exception: Error processing report: Root element is missing. ---
> System.Xml.XmlException: Root element is missing.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ThrowWithoutLineInfo(String res)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlReader.MoveToContent()

Happened Oct 11th 2007, 09:00h CET.

Reed

unread,
Oct 11, 2007, 8:12:54 AM10/11/07
to AdWords API Forum
I've been seeing this off and on for a long time, always on long
documents. I ended up putting try/catch logic around the XML parsing
so that the Java error didn't blow my script out of the water.
Whenever I save the bad document to a file and look at it, it appears
to just end in the middle of a line of XML - makes me think that
adWords has some internal limitation on how big of an XML doc it can
create. The "solution" was to schedule a CSV report in the UI and
have it emailed to me each night.

On Oct 10, 7:46 pm, jessica <unidad2...@gmail.com> wrote:
> Hi all,
>
> I have come across a situation where the CustomReport I requested
> contains XML that I am unable to parse with a Java SAXParser, and tidy
> fails with errors as well. The reports I am having problems with are
> quite large - I'm working on narrowing down exactly what types of
> reports fail, but in the meantime I'm wondering if this is a known
> issue? Has anyone else run into this problem? Partial Java stack
> traces and tidy error output are below.
>
> Thanks in advance to anyone who can shed light on the problem.
>
> jessica
>
> org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x0)
> was found in the value of attribute "kwDestUrl" and element is "row".
> at

> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseE­xception(ErrorHandlerWrapper.java:
> 236)
> at
> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Erro­rHandlerWrapper.java:
> 215)
> at
> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErr­orReporter.java:
> 386)
> at
> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErr­orReporter.java:
> 316)
> at
> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScan­ner.java:
> 1438)
> at
> com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLSc­anner.java:
> 969)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scan­Attribute(XMLDocumentFragmentScannerImpl.java:
> 1033)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scan­StartElement(XMLDocumentFragmentScannerImpl.java:


> 851)
> at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl
> $FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:
> 1693)
> at

> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scan­Document(XMLDocumentFragmentScannerImpl.java:
> 368)
> at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Co­nfiguration.java:


> 834)
> ...
>
> com.imi.util.exception.BadFileFormatException: Parsing error
> at
> com.imi.util.parser.GoogleXMLParser.parse(GoogleXMLParser.java:328)
> at

> com.imi.batches.help.AdwordsReportFilter.processTriggerAPI(AdwordsReportFil­ter.java:


> 327)
> at
> com.imi.batches.help.AdwordsReportFilter.process(AdwordsReportFilter.java:
> 133)
> at com.imi.batches.ProcessMsgThread.run(ProcessEmailBatch.java:
> 203)
> Caused by: org.xml.sax.SAXParseException: XML document structures must
> start and end within the same entity.
> at

> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseE­xception(ErrorHandlerWrapper.java:
> 236)
> at
> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Erro­rHandlerWrapper.java:
> 215)
> at
> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErr­orReporter.java:
> 386)
> at
> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErr­orReporter.java:
> 316)
> at
> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScan­ner.java:

hmcclungiii

unread,
Oct 11, 2007, 3:56:35 PM10/11/07
to AdWords API Forum
Sounds like you guys are having a timeout problem. Can you get your
reports compressed?

jessica

unread,
Oct 11, 2007, 5:05:54 PM10/11/07
to AdWords API Forum
Looking at the problem a bit more closely, it seems that my XML
documents are being truncated also. As far as I can tell, the
document is well-formed otherwise. My truncated files are of varying
lengths, so there doesn't appear to be a hard-coded limit within
adwords that is causing the problem.

I ran across this in the api v10 release notes: "Very large reports
that include zero impression rows may fail with error 112. If your
report fails, try breaking it down into smaller ones." I am running
very large reports with zero impressions, although I have seen no
specific mention of error 112.

I guess I will try getting the zipped report and/or breaking my
reports in smaller parts, then let you all know how it goes.

jessica

zdvso...@hotmail.com

unread,
Oct 12, 2007, 4:43:11 AM10/12/07
to AdWords API Forum
I now get a 'chopped off' report too. Here is the .NET exception

System.Xml.XmlException: There is an unclosed literal string. Line 3,
position 82125311.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.Throw(String res, String arg)
at System.Xml.XmlTextReaderImpl.ParseAttributeValueSlow(Int32
curPos, Char quoteChar, NodeData attr)
at System.Xml.XmlTextReaderImpl.ParseAttributes()
at System.Xml.XmlTextReaderImpl.ParseElement()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlReader.SkipSubtree()
at System.Xml.XmlReader.ReadToNextSibling(String name)

The report was scheduled with ScheduleReportJob() and retrieved with
getGzipReportDownloadUrl().

jessica

unread,
Oct 12, 2007, 12:28:51 PM10/12/07
to AdWords API Forum
Well I split the reports in half and got the gziped version, and
things are looking much better. Before I was only getting about a 30%
success rate, but last night I was able to get 6 reports to all finish
without problem. I am crossing my fingers that the last test run
wasn't just a fluke.

zdsoftware, not sure if you are running reports larger than mine? The
largest report I ran was around 200MB unzipped.

jessica

zdvso...@hotmail.com

unread,
Oct 12, 2007, 5:25:16 PM10/12/07
to AdWords API Forum
I try to keep the reports under 256 MB, which was in 2005 about the
maximum I could get. For two years I didn't get this error, and I've
downloaded thousands of reports. Until last thursday: three reports
were corrupt, two others got a 404 when downloading from the URL, six
succeeded.

BTW: I *think* I read somewhere that the URL returned by
getReportDownloadUrl()/getGzipReportDownloadUrl() is only valid for
five minutes, in which case a poor performance of the server with the
URL could be an explanation.

AdWords API Advisor

unread,
Oct 15, 2007, 5:09:36 PM10/15/07
to AdWords API Forum
Hi everyone,

If you're getting truncated reports and you're not requesting them
gzipped, please try using compression. zdvsoftware is correct that the
reports machines can be picky about serving files > ~250MB.

If you continue to get inoperable gzip files or truncated XML, please
post the report job ID in this thread so we can look into the problem.

Thanks,
-Aaron Karp
AdWords API Team

On Oct 12, 5:25 pm, "zdvsoftw...@hotmail.com"

Reply all
Reply to author
Forward
0 new messages