Bad XML output on 0 hits

20 views
Skip to first unread message

Tomaž Erjavec

unread,
May 28, 2020, 3:13:45 AM5/28/20
to NoSketch Engine
Hi,

SkE URLs have the very useful "format" parameter, and I've just written
a small XSLT script, where you give it a list of CQL query / corpus name
pairs, and the XSLT constructs the query URLs with &format=xml,
retrieves the concordances via the document() function, slightly
polishes the results and returns a XML document with all the hits. So
far, so good.

However, if any query returns 0 results, the return XML ends with a
"spam" comment which contains a HTML page with a a stack trace, as if an
error occurred, as below.

The problem is that the <body> elements in the comment end with "-->",
i.e. the XML comment in fact ends there, and the rest is taken as part
of the XML, resulting in ill-formed XML, which can't be parsed and hence
used.

It is not a big issue, but it would simplify matters for those using the
XML output if the complete spam comment at the end were removed, or at
least if all the "-->" were removed from it.

Best,

Tomaž

Example XML output with 0 hits:

<?xml version='1.0' encoding='UTF-8' ?>
<export>
<header>
  <corpus>imp</corpus>
  <subcorpus>-</subcorpus>
  <query>
    <subquery operation="Query" size="0">[lc=&quot;tvit&quot; |
lemma_lc=&quot;tvit&quot;]</subquery>
  </query>
</header>
<concordance>
<!--: spam
Content-Type: text/html

<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> -->
<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> --> -->
</font> </font> </font> </script> </object> </blockquote> </pre>
</table> </table> </table> </table> </table> </font> </font>
</font><body bgcolor="#f0f0f8">
<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="heading">
<tr bgcolor="#6622aa">
<td valign=bottom>&nbsp;<br>
<font color="#ffffff" face="helvetica,
arial">&nbsp;<br><big><big><strong>&lt;type
'exceptions.KeyError'&gt;</strong></big></big></font></td><td
align=right valign=bottom><font color="#ffffff" face="helvetica,
arial">Python 2.7.17: /usr/bin/python<br>Wed May 27 17:46:56
2020</font></td></tr></table>

<p>A problem occurred in a Python script.  Here is the sequence of
function calls leading up to the error, in the order they occurred.</p>

...

Miloš Jakubíček

unread,
May 28, 2020, 3:49:47 AM5/28/20
to Tomaž Erjavec, NoSketch Engine
This is a feature of Python's cgitb module. If you disable it in run.cgi, it will go away.

Best
Milos

--
You received this message because you are subscribed to the Google Groups "NoSketch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noske+un...@sketchengine.co.uk.
To view this discussion on the web visit https://groups.google.com/a/sketchengine.co.uk/d/msgid/noske/c1ed6597-1460-5c4f-942b-893aa3de2500%40ijs.si.

Tomaž Erjavec

unread,
May 28, 2020, 5:56:40 AM5/28/20
to NoSketch Engine

Hi Miloš,

thanks for the answer - I tried it out, and the comment indeed disappears. However, the resulting XML is now truncated, i.e. it is still invalid and now looks like this:

<?xml version='1.0' encoding='UTF-8' ?>
<export>
<header>

  <corpus>ssj500k22</corpus>
  <subcorpus>-</subcorpus>
  <query>
    <subquery operation="Query" size="0">[lc=&quot;kravaxxxx&quot; | lemma_lc=&quot;kravaxxxx&quot;]</subquery>
  </query>
</header>
<concordance>

Best,

Tomaž

Miloš Jakubíček je 28/05/2020 ob 09:49 napisal:

Tomáš Svoboda

unread,
Jun 15, 2020, 4:31:03 AM6/15/20
to NoSketch Engine
Hi everyone,
Tomaž and I have already solved the issue. The problem was using an obsolete Bonito. Described bug is already fixed in the current version.


Best regards

Tomas
Reply all
Reply to author
Forward
0 new messages