Reference to non-existent node

Davor Josipovic

unread,

Feb 12, 2018, 2:47:22 AM2/12/18

to us...@subversion.apache.org

It seems to me there is a bug in libapache2-mod-svn/stable,stable,now 1.9.5-1+deb9u1 amd64 [installed].

I described it here:https://superuser.com/questions/1293699/svn-error-occurred-while-committing-the-transaction

I assume this is the correct place to report?

I concerns the error 160014 "Reference to non-existent node". Some changes I can not commit through https://, but I can commit directly through the file:// protocol.

Any idea's?

Johan Corveleyn

unread,

Feb 18, 2018, 2:40:06 PM2/18/18

to Davor Josipovic, us...@subversion.apache.org

Yes, this is the correct place to report such an issue. But to
maximize your chances of people helping you, can you please copy the
relevant contents / context to the mailinglist here, instead of
forcing people to click a link?

--
Johan

Davor Josipovic

unread,

Feb 18, 2018, 3:58:21 PM2/18/18

to Johan Corveleyn, us...@subversion.apache.org

Here you go (copy from the link):

----------------------------------------------------------------------------------------------------------------------------

Can someone help me debug this issue? Everything works (i.e. I can commit, checkout, etc.) but this one commit always fails as described below. I tried checking out in a new repository, and committing, but same issue. Note also that this is a very large repository (500GB), and a very large commit (36k files totaling 10GB).

This is the `apache2` error log:

[Sat Feb 10 03:25:16.262765 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Could not DELETE /svn/repo/!svn/txn/463-e8. [500, #0]

[Sat Feb 10 03:25:16.265881 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] could not abort transaction. [500, #2]

[Sat Feb 10 03:25:16.265905 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Transaction '463-e8' cleanup failed [500, #2]

[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]

[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove file '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or directory [500, #2]

[Sat Feb 10 03:25:30.640591 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] Could not MERGE resource "/svn/repo/!svn/txn/463-e8" into "/svn/repo/repofolder". [500, #0]

[Sat Feb 10 03:25:30.642889 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] An error occurred while committing the transaction. [500, #160014]

[Sat Feb 10 03:25:30.643003 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] Reference to non-existent node '_1bqk.0.t463-e8' in filesystem '/mnt/vc/svn/repo/db' [500, #160014]

The client side gives:

done

Committing transaction...

svn: E175012: Commit failed (details follow):

svn: E175012: Connection timed out

svn: E200042: Additional errors:

svn: E175002: Unexpected server error 500 'Internal Server Error' on '/svn/repo/!svn/txn/463-e8'

First I thought this was a permission issue. So I did this on the server:

su - www-data -s /bin/bash -c "svnadmin rmtxns /mnt/vc/svn/repo/ 463-e8"

Transaction '463-e8' removed.

No error whatsoever. Apache runs as `www-data`. Following permissions are set:

chown www-data:www-data /mnt/vc -R

chmod 0774 /mnt/vc -R

I have run `svnadmin verify` and found no problems. `dmesg` doesn't show anything new.

The apache svn config is pretty much default.

KeepAlive On

MaxKeepAliveRequests 0

# Set to 10h.

Timeout 36000

SVNCompressionLevel 5

SVNInMemoryCacheSize 16384

SVNCacheTextDeltas On

SVNCacheFullTexts On

SVNAllowBulkUpdates Prefer

DAV svn

SVNParentPath /mnt/vc/svn

SVNListParentPath On

# Allow large request

LimitXMLRequestBody 0

</Location>

Extra info:

Server: Debian 9.3

apache2/stable,stable,now 2.4.25-3+deb9u3 amd64 [installed]

libapache2-mod-svn/stable,stable,now 1.9.5-1+deb9u1 amd64 [installed]

Server svn client: 1.9.5-1+deb9u1

Client: svn 1.9.7

**EDIT:** I copied the working copy to the repository server. Then I tried to commit, but had the same error. I then relocated the working copy to direct repository location: `svn relocate file:///mnt/vc/svn/repo`. Commit worked this time. So it has to do something with `dav_svn`?

----------------------------------------------------------------------------------------------------------------------------

Davor Josipovic

unread,

Feb 18, 2018, 4:04:23 PM2/18/18

to Johan Corveleyn, us...@subversion.apache.org

Johan Corveleyn

unread,

Feb 19, 2018, 4:49:04 PM2/19/18

to Davor Josipovic, us...@subversion.apache.org

I must say I don't immediately have an idea why this fails with
mod_dav_svn, and not with file:///.

I did compare with httpd.conf settings we have set on our server (but
I must say we never had such large commits). These are our relevant
settings:

[[[
MaxKeepAliveRequests 20000
KeepAliveTimeout 60
LimitXMLRequestBody 0
MaxMemFree 16384

SVNInMemoryCacheSize 131072
SVNCacheFullTexts off
SVNCacheTextDeltas on
]]]

Maybe you can experiment with some of these settings to see if it
makes a difference?

--
Johan

Philip Martin

unread,

Feb 20, 2018, 7:09:46 AM2/20/18

to Johan Corveleyn, Davor Josipovic, us...@subversion.apache.org

The server error suggests that at least one of the files that make up
the transaction is missing from disk when the MERGE is attempted.
That's odd, it is not a commonly seen error.

Even more odd is that you show "svnadmin rmtxns" succeeding. This is
odd, very odd, because when a MERGE fails the server code in mod_dav_svn
reacts by removing the transaction from disk. In addition, when the
client gets the failed MERGE repsonse it also reacts and sends a DELETE
to remove the transaction from disk; this DELETE usually fails with 404:

[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Could not fetch resource information. [404, #0]
[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Named transaction doesn't exist. [404, #175002]

Do you see the DELETE in the log after the failed MERGE? Was there an
error?

After a failed MERGE an attempt to run "svnadmin rmtxns" manually would
normally fail with:

E160007: No such transaction '463-e8'

but your attempt succeeded so at least some of the transaction was
present on disk.

So it looks like several strange things have happened:

- one of the files on disk that make up the transaction is missing
causing the MERGE to fail

- both automatic attempts to delete the transaction on disk have failed
causing the transaction to persist on disk

- whatever caused the transaction to persist on disk doesn't prevent
"svnadmin rmtxns" from working

The "_1bqk" implies that this transaction affects over 60,000 nodes,
this is large but Subversion should handle it.

What sort of filesystem hosts the repository?

--
Philip

Davor Josipovic

unread,

Feb 20, 2018, 11:26:35 AM2/20/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> Do you see the DELETE in the log after the failed MERGE? Was there an
> error?

The apache2 log I posted is the whole log for that day as far as I recall. There was nothing else. You see the DELETE error before the MERGE error. I think that is due to a timing issue. In my previous tries (there were transaction 463-e1 up to 463-e9...) the MERGE would sometimes show before the DELETE in the error log. That is what let me believe that MERGE error was primary, then the DELETE error (even though it seems the other way around in the apache2 log).

> What sort of filesystem hosts the repository?

The filesystem for the reported transaction 463-e8 is EXT4 (noatime, nodiratime, data=ordered). Prior to that, the repository was on a NTFS formatted drive. The error was the same, but since I thought NTFS was the culprit, I moved to EXT4.

I have also tried with a older client (1.8, cf. stakoverflow question edit history), but the error was the same serverside (and slightly different clientside).

Since then I had some 20 commits. No issues...

Davor Josipovic

unread,

Feb 20, 2018, 11:31:40 AM2/20/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> Do you see the DELETE in the log after the failed MERGE? Was there an
> error?

The apache2 log I posted is the whole log for that day as far as I recall. There was nothing else. You see the DELETE error before the MERGE error. I think that is due to a timing issue. In my previous tries (there were transaction 463-e1 up to 463-e9...) the MERGE would sometimes show before the DELETE in the error log. That is what let me believe that MERGE error was primary, then the DELETE error (even though it seems the other way around in the apache2 log).

> What sort of filesystem hosts the repository?

The filesystem for the reported transaction 463-e8 is EXT4 (noatime, nodiratime, data=ordered). Prior to that, the repository was on a NTFS formatted drive. The error was the same, but since I thought NTFS was the culprit, I moved to EXT4.

I have also tried with a older client (1.8, cf. stakoverflow question edit history), but the error was the same serverside (and slightly different clientside).

Since then I had some 20 commits. No issues...

On 20 February 2018 at 13:09, Philip Martin <phi...@codematters.co.uk> wrote:

Davor Josipovic

unread,

Feb 20, 2018, 11:33:10 AM2/20/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> Do you see the DELETE in the log after the failed MERGE? Was there an
> error?

The apache2 log I posted is the whole log for that day as far as I recall. There was nothing else. You see the DELETE error before the MERGE error. I think that is due to a timing issue. In my previous tries (there were transaction 463-e1 up to 463-e9...) the MERGE would sometimes show before the DELETE in the error log. That is what let me believe that MERGE error was primary, then the DELETE error (even though it seems the other way around in the apache2 log).

> What sort of filesystem hosts the repository?

The filesystem for the reported transaction 463-e8 is EXT4 (noatime, nodiratime, data=ordered). Prior to that, the repository was on a NTFS formatted drive. The error was the same, but since I thought NTFS was the culprit, I moved to EXT4.

I have also tried with a older client (1.8, cf. stakoverflow question edit history), but the error was the same serverside (and slightly different clientside).

Since then I had some 20 commits. No issues...

On 20 February 2018 at 13:09, Philip Martin <phi...@codematters.co.uk> wrote:

Philip Martin

unread,

Feb 20, 2018, 1:14:40 PM2/20/18

to Davor Josipovic, Philip Martin, Johan Corveleyn, us...@subversion.apache.org

Davor Josipovic <dav...@live.com> writes:

>> Do you see the DELETE in the log after the failed MERGE? Was there an
>> error?
>
> The apache2 log I posted is the whole log for that day as far as I
> recall. There was nothing else. You see the DELETE error before the
> MERGE error. I think that is due to a timing issue. In my previous
> tries (there were transaction 463-e1 up to 463-e9...) the MERGE would
> sometimes show before the DELETE in the error log. That is what let me
> believe that MERGE error was primary, then the DELETE error (even
> though it seems the other way around in the apache2 log).

Sorry, missed that. The DELETE error is interesting:

[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]
[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove file '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or directory [500, #2]

DELETE uses low-level IO rather than parsing the transaction files.
Subversion simply does readdir() to get a list of children and then
loops deleting them. If readdir() returns a name that does not exist
then either the disk/filesystem is corrupt or some other process has
deleted the file between the readdir() and the delete attempt.

Corruption seems unlikely since you say you have moved filesystems. We
are left with some other process deleting files while apache is
attempting to access them.

Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?

>> What sort of filesystem hosts the repository?
>
> The filesystem for the reported transaction 463-e8 is EXT4 (noatime,
> nodiratime, data=ordered). Prior to that, the repository was on a NTFS
> formatted drive. The error was the same, but since I thought NTFS was
> the culprit, I moved to EXT4.

Are you running Linux or Windows? Is the disk local or networked?

--
Philip

Davor Josipovic

unread,

Feb 20, 2018, 1:38:15 PM2/20/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> Are you running Linux or Windows? Is the disk local or networked?

Server is Debian 9.3. Disk is mounted through fstab with options noatime,nodiratime,data=ordered,nofail.

> Do you have some other process running that mointors the filesystem or

> the repository? Does it actively delete files?

I am quite sure there is nothing of that kind. Server is a minimal Debian install. Only Apache processes run as www-data. There is only a post-commit hook for Trac (which was disabled during at least one try. Also, post-commit shouldn't be called until commit finishes.) Furthermore, that mount is only given access to dav_svn in apache2 config.

Davor Josipovic

unread,

Feb 20, 2018, 1:38:16 PM2/20/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> Are you running Linux or Windows? Is the disk local or networked?

Server is Debian 9.3. Disk is mounted through fstab with options noatime,nodiratime,data=ordered,nofail.

> Do you have some other process running that mointors the filesystem or

> the repository? Does it actively delete files?

I am quite sure there is nothing of that kind. Server is a minimal Debian install. Only Apache processes run as www-data. There is only a post-commit hook for Trac (which was disabled during at least one try. Also, post-commit shouldn't be called until commit finishes.) Furthermore, that mount is only given access to dav_svn in apache2 config.

On 20 February 2018 at 19:14, Philip Martin <phi...@codematters.co.uk> wrote:

Davor Josipovic

unread,

Feb 20, 2018, 1:38:16 PM2/20/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> Are you running Linux or Windows? Is the disk local or networked?

Server is Debian 9.3. Disk is mounted through fstab with options noatime,nodiratime,data=ordered,nofail.

> Do you have some other process running that mointors the filesystem or

> the repository? Does it actively delete files?

I am quite sure there is nothing of that kind. Server is a minimal Debian install. Only Apache processes run as www-data. There is only a post-commit hook for Trac (which was disabled during at least one try. Also, post-commit shouldn't be called until commit finishes.) Furthermore, that mount is only given access to dav_svn in apache2 config.

On 20 February 2018 at 19:14, Philip Martin <phi...@codematters.co.uk> wrote:

Philip Martin

unread,

Feb 20, 2018, 5:11:25 PM2/20/18

to Davor Josipovic, Philip Martin, Johan Corveleyn, us...@subversion.apache.org

Davor Josipovic <dav...@live.com> writes:

>> Are you running Linux or Windows? Is the disk local or networked?
>
> Server is Debian 9.3. Disk is mounted through fstab with options
> noatime,nodiratime,data=ordered,nofail.
>
>> Do you have some other process running that mointors the filesystem or
>> the repository? Does it actively delete files?
>
> I am quite sure there is nothing of that kind. Server is a minimal
> Debian install. Only Apache processes run as www-data. There is only a
> post-commit hook for Trac (which was disabled during at least one
> try. Also, post-commit shouldn't be called until commit finishes.)
> Furthermore, that mount is only given access to dav_svn in apache2
> config.

I am at a bit of a loss. Something appears to be deleting files from
underneath Subversion. The file that goes missing while processing the
DELETE request was almost certainly present when we started processing
the DELETE, since readdir() returned its name, but a very short time
later was not present. The code that implements DELETE is relatively
simple and in the log you gave it is not even running in the same
process that processed the MERGE. It could be a Subversion bug but I
struggle to see what sort of bug would lead to this effect.

--
Philip

Davor Josipovic

unread,

Feb 21, 2018, 7:47:56 AM2/21/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> I am at a bit of a loss. Something appears to be deleting files from
> underneath Subversion. The file that goes missing while processing the
> DELETE request was almost certainly present when we started processing
> the DELETE, since readdir() returned its name, but a very short time
> later was not present. The code that implements DELETE is relatively
> simple and in the log you gave it is not even running in the same
> process that processed the MERGE. It could be a Subversion bug but I
> struggle to see what sort of bug would lead to this effect.

You make the PID sound very important, and with good reason too! I just went back through the logs to make sure that I didn't mess up the PID before posting. I didn't. In each case, the PID was different. This might explain the time difference. Now, as far as I understand apache2, this can only be caused by:

# After MaxConnectionsPerChild

# connections, the child process will die. If MaxConnectionsPerChild is 0, then

# the process will never expire.

# Setting MaxConnectionsPerChild to a non-zero value limits the amount of memory that

# process can consume by (accidental) memory leakage.

StartServers 1

MinSpareServers 0

MaxSpareServers 1

MaxRequestWorkers 50

MaxConnectionsPerChild 50

</IfModule>

These were also the settings I played with during my trials. MaxConnectionsPerChild is set to a limited number to make sure that no memory leaks explode.

I'll look further into this later. The default Debian apache2 install came with mpm_event enabled. According to my logs I switched to mpm_prefork later (due to performance reasons recommended for an other module).

PS Sorry for the double-postings. Live is double-sending for some reason.

Davor Josipovic

unread,

Feb 21, 2018, 7:54:01 AM2/21/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> I am at a bit of a loss. Something appears to be deleting files from
> underneath Subversion. The file that goes missing while processing the
> DELETE request was almost certainly present when we started processing
> the DELETE, since readdir() returned its name, but a very short time
> later was not present. The code that implements DELETE is relatively
> simple and in the log you gave it is not even running in the same
> process that processed the MERGE. It could be a Subversion bug but I
> struggle to see what sort of bug would lead to this effect.

You make the PID sound very important, and with good reason too! I just went back through the logs to make sure that I didn't mess up the PID before posting. I didn't. In each case, the PID was different. This might explain the time difference. Now, as far as I understand apache2, this can only be caused by:

# After MaxConnectionsPerChild

# connections, the child process will die. If MaxConnectionsPerChild is 0, then

# the process will never expire.

# Setting MaxConnectionsPerChild to a non-zero value limits the amount of memory that

# process can consume by (accidental) memory leakage.

StartServers 1

MinSpareServers 0

MaxSpareServers 1

MaxRequestWorkers 50

MaxConnectionsPerChild 50

</IfModule>

These were also the settings I played with during my trials. MaxConnectionsPerChild is set to a limited number to make sure that no memory leaks explode.

I'll look further into this later. The default Debian apache2 install came with mpm_event enabled. According to my logs I switched to mpm_prefork later (due to performance reasons recommended for an other module).

PS Sorry for the double-postings. Live is double-sending for some reason.

On 20 February 2018 at 23:11, Philip Martin <phi...@codematters.co.uk> wrote:

Davor Josipovic

unread,

Feb 21, 2018, 7:54:13 AM2/21/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

> I am at a bit of a loss. Something appears to be deleting files from
> underneath Subversion. The file that goes missing while processing the
> DELETE request was almost certainly present when we started processing
> the DELETE, since readdir() returned its name, but a very short time
> later was not present. The code that implements DELETE is relatively
> simple and in the log you gave it is not even running in the same
> process that processed the MERGE. It could be a Subversion bug but I
> struggle to see what sort of bug would lead to this effect.

You make the PID sound very important, and with good reason too! I just went back through the logs to make sure that I didn't mess up the PID before posting. I didn't. In each case, the PID was different. This might explain the time difference. Now, as far as I understand apache2, this can only be caused by:

# After MaxConnectionsPerChild

# connections, the child process will die. If MaxConnectionsPerChild is 0, then

# the process will never expire.

# Setting MaxConnectionsPerChild to a non-zero value limits the amount of memory that

# process can consume by (accidental) memory leakage.

StartServers 1

MinSpareServers 0

MaxSpareServers 1

MaxRequestWorkers 50

MaxConnectionsPerChild 50

</IfModule>

These were also the settings I played with during my trials. MaxConnectionsPerChild is set to a limited number to make sure that no memory leaks explode.

I'll look further into this later. The default Debian apache2 install came with mpm_event enabled. According to my logs I switched to mpm_prefork later (due to performance reasons recommended for an other module).

PS Sorry for the double-postings. Live is double-sending for some reason.

On 20 February 2018 at 23:11, Philip Martin <phi...@codematters.co.uk> wrote:

Philip Martin

unread,

Feb 21, 2018, 10:27:50 AM2/21/18

to Davor Josipovic, Philip Martin, Johan Corveleyn, us...@subversion.apache.org

Davor Josipovic <dav...@live.com> writes:

> You make the PID sound very important, and with good reason too! I
> just went back through the logs to make sure that I didn't mess up the
> PID before posting. I didn't. In each case, the PID was
> different. This might explain the time difference.

The Subversion client can make multiple http connections, particularly
when a commit involves a large number of files. It is not unusual for
multiple PIDs to handle the requests for any one commit.

> I'll look further into this later. The default Debian apache2 install
> came with mpm_event enabled. According to my logs I switched to
> mpm_prefork later (due to performance reasons recommended for an other
> module).

Prefork is typically the lowest performace for Subversion. Subversion
server performance will generally be better with a small number of
processes each with a large number of threads as this allows you to
increase the size of the per-process Subversion cache.

--
Philip

Davor Josipovic

unread,

May 21, 2018, 4:05:27 PM5/21/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

I have finally found the culprit. By coincidence, I had the same configuration running on two servers, one backed with a SSD, the other with a HDD.

The same commit worked on the SSD-backed server, and always failed on the HDD one with the described error.

Monitoring the process revealed that after the file transfer of the commit completes, server starts finalizing. Not sure what exactly happens during that period serverside (maybe someone can explain?), but for a 40k files commit, it took the SSD-backed one to complete in about 30 seconds, while the HDD-backed one struggled more than 30 minutes. During those 30 minutes, the client got no response, so it timed out, which resulted in the described error on the server side.

Anyway, setting http-timeout = 3600 in the "servers" file fixes the issue permanently (tested it 2 times to make sure). I think the default is 10 minutes.

Also, I am quite surprised at the performance penalty when using a HDD at this stage. Any way to tweak this for low memory servers?

Davor Josipovic

unread,

May 21, 2018, 4:12:49 PM5/21/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

I have finally found the culprit. By coincidence, I had the same configuration running on two servers, one backed with a SSD, the other with a HDD.

The same commit worked on the SSD-backed server, and always failed on the HDD one with the described error.

Monitoring the process revealed that after the file transfer of the commit completes, server starts finalizing. Not sure what exactly happens during that period serverside (maybe someone can explain?), but for a 40k files commit, it took the SSD-backed one to complete in about 30 seconds, while the HDD-backed one struggled more than 30 minutes. During those 30 minutes, the client got no response, so it timed out, which resulted in the described error on the server side.

Anyway, setting http-timeout = 3600 in the "servers" file fixes the issue permanently (tested it 2 times to make sure). I think the default is 10 minutes.

Also, I am quite surprised at the performance penalty when using a HDD at this stage. Any way to tweak this for low memory servers?

On 21 February 2018 at 16:27, Philip Martin <phi...@codematters.co.uk> wrote:

Davor Josipovic

unread,

May 21, 2018, 4:12:59 PM5/21/18

to Philip Martin, Johan Corveleyn, us...@subversion.apache.org

I have finally found the culprit. By coincidence, I had the same configuration running on two servers, one backed with a SSD, the other with a HDD.

The same commit worked on the SSD-backed server, and always failed on the HDD one with the described error.

Monitoring the process revealed that after the file transfer of the commit completes, server starts finalizing. Not sure what exactly happens during that period serverside (maybe someone can explain?), but for a 40k files commit, it took the SSD-backed one to complete in about 30 seconds, while the HDD-backed one struggled more than 30 minutes. During those 30 minutes, the client got no response, so it timed out, which resulted in the described error on the server side.

Anyway, setting http-timeout = 3600 in the "servers" file fixes the issue permanently (tested it 2 times to make sure). I think the default is 10 minutes.

Also, I am quite surprised at the performance penalty when using a HDD at this stage. Any way to tweak this for low memory servers?

On 21 February 2018 at 16:27, Philip Martin <phi...@codematters.co.uk> wrote:

Nathan Hartman

unread,

May 21, 2018, 7:17:19 PM5/21/18

to Davor Josipovic, Johan Corveleyn, Philip Martin, us...@subversion.apache.org

On Mon, May 21, 2018 at 4:05 PM Davor Josipovic <dav...@live.com> wrote:

I have finally found the culprit. By coincidence, I had the same configuration running on two servers, one backed with a SSD, the other with a HDD.

The same commit worked on the SSD-backed server, and always failed on the HDD one with the described error.

Monitoring the process revealed that after the file transfer of the commit completes, server starts finalizing. Not sure what exactly happens during that period serverside (maybe someone can explain?), but for a 40k files commit, it took the SSD-backed one to complete in about 30 seconds, while the HDD-backed one struggled more than 30 minutes. During those 30 minutes, the client got no response, so it timed out, which resulted in the described error on the server side.

(Snip)

Now that is interesting. 40k doesn't seem to be such a large amount of data for modern computers. Very slow and fragmented hard drive? Or perhaps there's something else going on that is manifesting this way?

Bo Berglund

unread,

May 22, 2018, 1:14:17 AM5/22/18

to us...@subversion.apache.org

I think he meant that he has committed 40.000 individual (small) files
in one go. That is a *lot* of files and in a large capacity HDD it
means wasting a lot of space too since every small file will uses up a
full cluster of disk space. I believe it can be improved (if svn uses
fsfs) by packing the database.
Please see:
http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.diskspace.fsfspacking

--
Bo Berglund
Developer in Sweden

Davor Josipovic

unread,

May 22, 2018, 2:31:04 PM5/22/18

to Nathan Hartman, Johan Corveleyn, Philip Martin, us...@subversion.apache.org

> Now that is interesting. 40k doesn't seem to be such a large amount of data for modern computers. Very slow and fragmented hard drive? Or perhaps there's something else going on that is manifesting this way?

The HDD is indeed on the slowside, and together with low memory...

But I think this also show how I/O intensive SVN is. On the client side, for each committed file, one copy is placed in .svn folder, and an other copy in a temporary folder (which is deleted after file transfer in v1.9). So for each file committed, a double copy is made client-side. This temporary copy is really necessary?

Server-side, I see similar disk bashing. For each committed file, max 2 (?) copies are made in the transaction directory.

So any way to reduce the I/O?

Davor Josipovic

unread,

May 22, 2018, 2:31:18 PM5/22/18

to Nathan Hartman, Johan Corveleyn, Philip Martin, us...@subversion.apache.org

> Now that is interesting. 40k doesn't seem to be such a large amount of data for modern computers. Very slow and fragmented hard drive? Or perhaps there's something else going on that is manifesting this way?

Davor Josipovic

unread,

May 22, 2018, 2:36:44 PM5/22/18

to Nathan Hartman, Johan Corveleyn, Philip Martin, us...@subversion.apache.org

> Now that is interesting. 40k doesn't seem to be such a large amount of data for modern computers. Very slow and fragmented hard drive? Or perhaps there's something else going on that is manifesting this way?

Nico Kadel-Garcia

unread,

May 22, 2018, 10:20:46 PM5/22/18

to dav...@live.com, Nathan Hartman, Johan Corveleyn, Philip Martin, Subversion

On Tue, May 22, 2018 at 2:36 PM Davor Josipovic <dav...@live.com> wrote:

> > Now that is interesting. 40k doesn't seem to be such a large amount of
data for modern computers. Very slow and fragmented hard drive? Or perhaps
there's something else going on that is manifesting this way?

> The HDD is indeed on the slowside, and together with low memory...

> But I think this also show how I/O intensive SVN is. On the client side,
for each committed file, one copy is placed in .svn folder, and an other
copy in a temporary folder (which is deleted after file transfer in v1.9).
So for each file committed, a double copy is made client-side. This
temporary copy is really necessary?

I think it shows how I/O intensive using 40,000 small files is. Especially
if they are in the same directory, many filesystems get increasingly
unhappy as they try to manage that many files in one directory.

Johan Corveleyn

unread,

May 25, 2018, 10:29:16 AM5/25/18

to Davor Josipovic, Nathan Hartman, Philip Martin, us...@subversion.apache.org

A couple of improvements have been made in 1.10. It would be
interesting to test this same scenario with a 1.10 client (to verify
the client-side improvements) and if possible also with a 1.10 server,
with a dumped+loaded FSFS back-end (to get an idea of any server-side
improvements).

Client-side there were some changes to make the (http) client stream
svndiff deltas without creating temporary files (during 'commit' and
'import'):
https://svn.apache.org/r1803143
https://svn.apache.org/r1803144
https://svn.apache.org/r1804819

Server-side, LZ4 compression was added as the default compression
algorithm in the FSFS back-end, and on the wire (not sure if that
helps in your case ... it won't impact the number of files that are
created temporarily I suppose):
https://subversion.apache.org/docs/release-notes/1.10.html#lz4

Also, maybe a commit such as this one helps too (though it seems to
only talk about reads):
https://svn.apache.org/r1759399

As always, probably further improvements can be made I guess, but
someone will need to dig into it.

If you'd be able to test with 1.10 both client and server-side, that
would give some more data points about where we are with the latest
state of things ...

Finally, one last tip: this (rather old) performance tuning page
suggests putting the "transactions" folder on a separate (faster)
filesystem to speed up commits:
https://www.orcaware.com/svn/wiki/Server_performance_tuning_for_Linux_and_Unix#Reducing_the_number_of_writes

I don't know how realistic that is in your setup (plus you need to
make sure that transactions filesystem is large enough for your
commits).

--
Johan

Reply all

Reply to author

Forward