lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1

Igor Yakushin

unread,

Sep 25, 2014, 11:24:50 AM9/25/14

to python-...@googlegroups.com

Hi,

I am trying to do a simplest query and I am getting the error:
===========
File "./test.py", line 7, in <module>
    response = si.query(uid="ivy1").execute()
File "/usr/local/lib/python2.7/dist-packages/sunburnt/search.py", line 599, in execute
    result = self.interface.search(**self.options())
File "/usr/local/lib/python2.7/dist-packages/sunburnt/sunburnt.py", line 212, in search
    return self.schema.parse_response(self.conn.select(params))
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 510, in parse_response
    return SolrResponse(self, msg)
File "/usr/local/lib/python2.7/dist-packages/sunburnt/schema.py", line 643, in __init__
    doc = lxml.etree.fromstring(xmlmsg)
File "lxml.etree.pyx", line 3032, in lxml.etree.fromstring (src/lxml/lxml.etree.c:68106)
File "parser.pxi", line 1785, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:102455)
File "parser.pxi", line 1673, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:101284)
File "parser.pxi", line 1074, in lxml.etree._BaseParser._parseDoc (src/lxml/lxml.etree.c:96466)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:91275)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
File "parser.pxi", line 622, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91757)
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
===========
It looks like the returned XML file cannot be parsed properly?
How to fix it?
I am using Cloudera 4.3 distribution for Solr. At the moment upgrading is not an option.

Thank you,
Igor

Mike Lissner

unread,

Sep 25, 2014, 12:42:30 PM9/25/14

to python-...@googlegroups.com, Igor Yakushin

Hi Igor,

Nobody seems to respond to this list anymore, and sunburnt is basically
unsupported. If you're just now deploying using sunburnt, you might want
to investigate other solutions. If you like sunburnt (I do!) you might
actually have better luck on stackoverflow or similar.

If you decide to stick with sunburnt, you might want to set a debug
breakpoint on the line that reads doc = lxml.etree.fromstring(blah). You
could then inspect the value that's failing and maybe figure out what's
wrong with it.

(I'm not affilliated with the project, just a happy user.)

Mike

> --
> You received this message because you are subscribed to the Google
> Groups "Python Sunburnt" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to python-sunbur...@googlegroups.com
> <mailto:python-sunbur...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Igor Yakushin

unread,

Sep 25, 2014, 12:47:34 PM9/25/14

to Mike Lissner, python-...@googlegroups.com

Hi Mike,
All the python packages I tried (solrpy, pysolr, sunburnt), fail to
query my Solr server. So I think, it has more to do with either me
doing something stupid or Cloudera distribution returning non-expected
output. In particular, querying with curl, seems to suggest that it
returns an extra first line that might confuse all XML parsers:
<?xml version="1.0" encoding="UTF-8"?>
<response>
...
</response>
When you query your Solr server with curl, is the first line returned?
Is there a way to get rid of the first line?
Thank you,
Igor

Mike Lissner

unread,

Sep 25, 2014, 1:01:36 PM9/25/14

to Igor Yakushin, python-...@googlegroups.com

That line is the XML declaration. You need it. The problem is something
else, for sure. If you're having an issue with the low-level libraries
pysolr and solrpy, that means something bigger is going on. I'd get one
of those working by taking a look at the code in these and seeing what's
going on.

Mike

Igor Yakushin

unread,

Sep 25, 2014, 2:52:42 PM9/25/14

to Mike Lissner, python-...@googlegroups.com

Using solrpy's raw query (that just returns a string without trying to
parse it), I am getting for
========
response = s.raw_query(q='*:*', rows='2', wt='xml', fl=('uid'))
print response
========
What's this 195 in front and 0 at the end? Would not they confuse XML parser?

========
$ ./test.py
195

<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader"><int name="status">0</int><int
name="QTime">21</int><lst name="params"><str name="fl">uid</str><str
name="q">*:*</str><str name="wt">xml</str><str
name="rows">2</str></lst></lst><result name="response"
numFound="645758" start="0"><doc><str
name="uid">yzs123</str></doc><doc><str
name="uid">root</str></doc></result>
</response>

0
========
I guess, one way to proceed would be to use raw_query, strip the
resulting string from those numbers before feeding to some XML parser.

On Thu, Sep 25, 2014 at 1:01 PM, Mike Lissner

Reply all

Reply to author

Forward