REST Batch Size Limit?

305 views
Skip to first unread message

Nigel Small

unread,
Oct 3, 2012, 4:54:51 AM10/3/12
to Neo4J
Hi all

Is there any request count/total size/total time limit to batch requests via the REST interface? I'm currently trying to diagnose an issue for a py2neo user and it seems that while small batches are always fine, larger ones end up failing (seemingly after around 200 seconds from the tests I've run so far). With a series of simple tests which just create an ever-increasing number of nodes in a single batch, I get the following results:

/usr/bin/python2.7 /home/elgin/opt/pycharm-2.5.1/helpers/pycharm/utrunner.py /home/elgin/git/py2neo/test/py2neo/batch_test.py::TestBigBatches true
Testing started at 09:40 ...
creating batch of 100
submitting batch
checking batch
removing evidence
creating batch of 1000
submitting batch
checking batch
removing evidence
creating batch of 10000
submitting batch

Error
Traceback (most recent call last):
  File "/usr/lib/python2.7/unittest/case.py", line 327, in run
    testMethod()
  File "/home/elgin/git/py2neo/test/py2neo/batch_test.py", line 91, in test_can_send_batch_of_10000
    self._send_big_batch(10000)
  File "/home/elgin/git/py2neo/test/py2neo/batch_test.py", line 76, in _send_big_batch
    nodes = batch.submit()
  File "/home/elgin/git/py2neo/src/py2neo/neo4j.py", line 105, in submit
    for response in self._submit()
  File "/home/elgin/git/py2neo/src/py2neo/neo4j.py", line 75, in _submit
    for i, request in enumerate(self.requests)
  File "/home/elgin/git/py2neo/src/py2neo/rest.py", line 374, in _send
    raise SocketError(err)
SocketError: error(104, 'Connection reset by peer')


Process finished with exit code 0

This has also previously failed with a request count of around 6000 so I'm not sure of the actual cut-off point. Any insight gratefully received!

Cheers
Nige

Michael Hunger

unread,
Oct 3, 2012, 6:00:27 AM10/3/12
to ne...@googlegroups.com
Localhost or heroku?
Perhaps just a conn timeout?
And do you immediately start to pull the output stream?

Do you use streaming or not?

Michael

Sent from mobile device
--
 
 

Nigel Small

unread,
Oct 3, 2012, 7:05:54 AM10/3/12
to ne...@googlegroups.com
On 3 October 2012 11:00, Michael Hunger <michael...@neopersistence.com> wrote:
Localhost or heroku?
Localhost.

Perhaps just a conn timeout?
Possibly, but it happens routinely, not just as a one-off.

And do you immediately start to pull the output stream?
The HTTP request is built and sent and the response is then received immediately afterwards. There is no concurrency or other complexity and this works successfully for smaller batches and other request types.
 

Do you use streaming or not?
Yes, the X-Stream header is now sent for all py2neo requests by default.
 
--
 
 

Samir Ahmed

unread,
Nov 7, 2012, 12:43:34 AM11/7/12
to ne...@googlegroups.com
I have encountered exactly this aswell

Localhost 
+ Happens pretty routinely 
+ Streaming is Used 
+ Works Succesfully in smaller batches..

I would really like to increase the batch size but I keep getting 
'Connection reset by peer'

aswell.
@Nigel, did you manage to fix this? 

Thanks

Samir

Javier de la Rosa

unread,
Nov 7, 2012, 10:57:06 AM11/7/12
to ne...@googlegroups.com
Using a local Neo4j 1.9M01 server and neo4j-rest-client for Python with no streaming, I successfully done ~70k operations in a single batch request, taking ~70min, in my i5 laptop with 4GB of RAM (and a bunch of stuff open at the same time).


--
 
 



--
Javier de la Rosa
http://versae.es

Michael Hunger

unread,
Nov 7, 2012, 12:04:39 PM11/7/12
to ne...@googlegroups.com
70 min sounds like a long time
Large tx take up much memory, so usually 20k elements is what we recommend

Probably the system was busy gc'ing and swapping

What did your request consist of?

I did a 30k request in a few seconds

Michael

Sent from mobile device
--
 
 

Jacob Hansson

unread,
Nov 7, 2012, 12:22:02 PM11/7/12
to ne...@googlegroups.com
Nigel: There are no hard limits in the Neo4j codebase, no, so this is not expected behavior. Since it says connection reset, I assume it is forcibly closed by the server (or http client lib, for that part), perhaps due to timeouts or some such.

Can you check these things, in order of priority:

 * Check the server logs
 * Ensure you are using persistent http connections
 * Check if the server returns a timeout limit when it replies with the keep-alive header

Also, it might be worth ruling out the python http stack as the culprit - see if you can run a request against some http server that is known to not time out, and see if that works as expected.


/jake


--
 
 



--
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins

Javier de la Rosa

unread,
Nov 7, 2012, 1:20:00 PM11/7/12
to ne...@googlegroups.com
On Wed, Nov 7, 2012 at 12:04 PM, Michael Hunger <michael...@neopersistence.com> wrote:
What did your request consist of?

I was just creating random nodes and relationships among them. You are right, the system was swapping all the time, but I just wanted to inform that the server didn't close up the connection after more then a hour.
Reply all
Reply to author
Forward
0 new messages