result in quotes - can't query pyspider results

29 views
Skip to first unread message

Nick Gilmour

unread,
Feb 22, 2018, 2:35:31 PM2/22/18
to pyspider-users
Hi all,

I'm trying to query the results from pyspider in PostgreSQL and in mongoDB but I'm having following issue:
pyspider puts the result within quotes and escapes all quotes in it. E.g. this is what I see in mongoDB:

/* 1 */
{
    "_id" : ObjectId("5a8eef873b2d1c76a4616217"),
    "taskid" : "6530092164d4d034f151d22ed10396b8",
    "result" : "{\"title\": \"Response - pyspider\", \"url\": \"http://docs.pyspider.org/en/latest/apis/Response/\"}",
    "updatetime" : 1519316871.55921,
    "url" : "http://docs.pyspider.org/en/latest/apis/Response/"
}

Why is pyspider doing that? 
I want to run a query like this:
db.getCollection('test').find({"result.title" : {$regex : "pyspider"}});
but I can't.

When I create a similar document in the collection without quotes, like this one:

/* 2 */
{
    "_id" : ObjectId("5a8f185a086088d1d2382821"),
    "taskid" : "4530092164d4d034f151d22ed10396b9",
    "result" : {
        "title" : "Response - pyspider",
    },
    "updatetime" : 9519316871.55921,
}

the above query works fine.


Regards,
Nick

Roy Binux

unread,
Feb 22, 2018, 2:46:35 PM2/22/18
to Nick Gilmour, pyspider-users
Prevent reserved key name in user's result.
You cloud create your own database structure and use result worker to feed into it.

--
You received this message because you are subscribed to the Google Groups "pyspider-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyspider-user...@googlegroups.com.
To post to this group, send email to pyspide...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyspider-users/CAH-droxOknw-6RHLgc8cM4TjrNKTpaikYVu%3DETBt04tFk1GRtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Nick Gilmour

unread,
Feb 22, 2018, 3:44:49 PM2/22/18
to Roy Binux, pyspider-users
Hi Roy,

thanks for the quick response!

OK. I think I understand, I have found this: 
but I'm trying to apply the provided snippet and I'm getting either ValueError or Exception (wrong scheme format).

How should
<your resutldb connection url>
look like if I have this in my config:
"resultdb": "mongodb+resultdb:\/\/127.0.0.1:27017\/pyspider_resultdb",

Regards,
Nick

To unsubscribe from this group and stop receiving emails from it, send an email to pyspider-users+unsubscribe@googlegroups.com.
To post to this group, send email to pyspider-users@googlegroups.com.

Roy Binux

unread,
Feb 22, 2018, 4:06:00 PM2/22/18
to Nick Gilmour, pyspider-users
You need create a new collection with the scheme you need. Not the connection of existing resultdb.

To unsubscribe from this group and stop receiving emails from it, send an email to pyspider-user...@googlegroups.com.
To post to this group, send email to pyspide...@googlegroups.com.

Nick Gilmour

unread,
Feb 23, 2018, 10:18:19 AM2/23/18
to Roy Binux, pyspider-users
I meant this one:
resultdb = connect_database("sqlite+resultdb:////home/user/data/result.db")
This has worked. I could get the results from the resultdb.
After that I tried to save the results with the ResultWorker in mongoDB. This has worked too.
Now I can do my queries in this DB without a problem!

Thanks!

Regards,
Nick


To unsubscribe from this group and stop receiving emails from it, send an email to pyspider-users+unsubscribe@googlegroups.com.
To post to this group, send email to pyspider-users@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages