Breaking up large pushes to (hopefully?) avoid HTTP 502

20 views
Skip to first unread message

Jason Heeris

unread,
Dec 18, 2017, 6:18:33 PM12/18/17
to dulwich-discuss
I've written a script using Dulwich to fix up and push an SVN repo to Gitlab. It's converted via git-svn, I use Dulwich to fix up branch and tag refs and do the push. It's in Python 3.5, latest Dulwich installed in a virtualenv via pip.

My problem is that when I do the push to the HTTPS repo, I get a HTTP 502 response. Digging around a bit, it looks like this is due to the server timing out on a massive HTTPS POST (it's a big repo with big files). We're talking 300s (5min) between two machines on the same local network.

If I use Git from the command line (my current workaround), I get no such error (via HTTPS or SSH). If I use Dulwich via SSH, I get no error, but SSH is a pain because it requires interactivity (known hosts prompt) and extra configuration (keys, config).

I don't really have access to the server; I have some say but honestly I'd rather fix my script that have an arms race with the timeout.

Is there some way I can break up the transfer? I've tried writing a write_pack_objects() function for send_pack() that does or doesn't deltify the packs, but it didn't make a difference. Pushing the master alone will trigger the 502, so looping over the refs I'm sending won't do it. I suspect, but haven't tested, that no individual commit would be over the limit though. What else can I try?

Thanks,
Jason

Jelmer Vernooij

unread,
Dec 19, 2017, 4:48:05 AM12/19/17
to Jason Heeris, dulwich-discuss
300 seconds is a lot. :/ Are you running the latest version?

I haven't used Dulwich on large repositories that much; my suspicion is that there is some easy gains to be made. Can you do a profile run (e.g. with lsprof) to see where it spends most of its time in your case?

Another option is to improve Dulwich to start sending objects earlier, so the server doesn't time out.

--
You received this message because you are subscribed to the Google Groups "dulwich-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dulwich-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to dulwich-discuss@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/dulwich-discuss/654ac8c4-6899-458f-a478-c096b7298e65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jason Heeris

unread,
Dec 19, 2017, 6:18:34 PM12/19/17
to Jelmer Vernooij, dulwich-discuss
On 19 December 2017 at 20:48, Jelmer Vernooij <jver...@gmail.com> wrote:
300 seconds is a lot. :/ Are you running the latest version?

I'm running the latest *release*, I could try whatever's in master later too.
 
I haven't used Dulwich on large repositories that much; my suspicion is that there is some easy gains to be made. Can you do a profile run (e.g. with lsprof) to see where it spends most of its time in your case?

I'll check it out and let you know.

- Jason

Jason Heeris

unread,
Dec 21, 2017, 1:10:37 AM12/21/17
to Jelmer Vernooij, dulwich-discuss
> Can you do a profile run (e.g. with lsprof) to see where it spends most of its time in your case?
 
I profiled it, but I'm afraid there's not much to say. It genuinely looks like it's the transfer taking too long. That's how I interpret the jump in cumulative time from the SSLSocket write method to the next one down, Dulwich's own write_pack_object. I've attached the output from pstats.

- Jason

gitlab-svn-stats.txt

Jelmer Vernooij

unread,
Dec 21, 2017, 10:41:56 AM12/21/17
to Jason Heeris, dulwich-discuss
Thanks, I'll check it out - it'll probably take me a day or two to get back to you.

Jason Heeris

unread,
Dec 21, 2017, 5:26:09 PM12/21/17
to Jelmer Vernooij, dulwich-discuss
I won't be touching again this until mid-January, so no rush :) Enjoy your Christmas!

Cheers,
Jason

Jason Heeris

unread,
Dec 21, 2017, 8:45:18 PM12/21/17
to Jelmer Vernooij, dulwich-discuss
Incidentally, the python-requests package might have a way of making this easier:


- Jason

Jelmer Vernooij

unread,
Dec 26, 2017, 4:57:31 AM12/26/17
to Jason Heeris, dulwich-discuss
Newer versions of Dulwich should already be sending chunked requests.

Jason Heeris

unread,
Dec 26, 2017, 5:03:20 AM12/26/17
to Jelmer Vernooij, dulwich-discuss
On 26 Dec. 2017 8:57 pm, "Jelmer Vernooij" <jver...@gmail.com> wrote:
Newer versions of Dulwich should already be sending chunked requests.

Is there a way to tune the chunk size (not necessarily exposed in the API)? I hunted through, even going so far as to try creating a subclass client that used 'requests', but I don't remember seeing the chunking.

— Jason

Jelmer Vernooij

unread,
Dec 26, 2017, 5:59:38 AM12/26/17
to Jason Heeris, dulwich-discuss
If I remember correctly, the chunk size is hidden somewhere deep in
the httplib implementation. :(

From looking at the file you posted, I think the main issue is that
we're recompressing all objects. I'll do some more digging.

Merry christmas :)

Jelmer Vernooij

unread,
Mar 25, 2018, 10:19:25 AM3/25/18
to dulwich-discuss
For those not paying close attention to NEWS, this issue has been fixed in 0.19.0.
Reply all
Reply to author
Forward
0 new messages