"svnadmin load" a huge file

Victor Sudakov

unread,

Dec 30, 2010, 10:07:32 PM12/30/10

to us...@subversion.apache.org

Colleagues,

I have a CVS repository sized 54M with 17751 files.

"cvs2svn --dumpfile" produces a dump sized 13G. svnadmin cannot load
this dump aborting with an out of memory condition on a FreeBSD
8.1-RELEASE box with 1G of RAM and 2.5G of swap.

I really need to convert this repository to SVN. What should i do? Any
advice is appreciated.

--
Victor Sudakov, VAS4-RIPE, VAS47-RIPN
sip:sud...@sibptus.tomsk.ru

Daniel Shahaf

unread,

Dec 31, 2010, 4:07:40 AM12/31/10

to Victor Sudakov, us...@subversion.apache.org

Split the dumpfile to smaller dumpfiles or try a newer version of svnadmin.

(or dive into the source and help us plug that memory leak --- compile
with APR pool debugging enabled)

Victor Sudakov

unread,

Dec 31, 2010, 10:09:34 AM12/31/10

to us...@subversion.apache.org

Daniel Shahaf wrote:
> Split the dumpfile to smaller dumpfiles

How do I do that? I have not found such an option in cvs2svn.
I don't mind writing a script if I knew the idea how to split the dump.
I haven't found any "svnadmin load" option to import part of a dump
either. man what?

> or try a newer version of svnadmin.

I am using subversion-1.6.15, it seems to be the latest ported to
FreeBSD.

>
> (or dive into the source and help us plug that memory leak --- compile
> with APR pool debugging enabled)

I will try to do that but unfortunately I need some immediate
workaround :(

Brian Brophy

unread,

Dec 31, 2010, 10:35:51 AM12/31/10

to Victor Sudakov, us...@subversion.apache.org

I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN
1.3. Our repo had many sections (projects) within it. We had to
migrate each project independently so that it's team could coordinate
when they migrated to SVN. As such, I dumped each project when ready
and then svnadmin loaded each dump into it's own path/root (so as not to
overwrite anything previously loaded and unrelated to this project's
import).

So, you can do it by controlling which path/portion of CVS you use
cvs2vn to create the dump file from.

Brian

Victor Sudakov

unread,

Dec 31, 2010, 11:15:46 AM12/31/10

to us...@subversion.apache.org

Brian Brophy wrote:
> I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN
> 1.3. Our repo had many sections (projects) within it. We had to
> migrate each project independently so that it's team could coordinate
> when they migrated to SVN. As such, I dumped each project when ready
> and then svnadmin loaded each dump into it's own path/root (so as not to
> overwrite anything previously loaded and unrelated to this project's
> import).
>
> So, you can do it by controlling which path/portion of CVS you use
> cvs2vn to create the dump file from.

The CVS repository in question (with the size 54M with 17751 files) is
exactly one project. It's the history of a geographical DNS zone for
more than 10 years.

Stephen Connolly

unread,

Dec 31, 2010, 11:22:56 AM12/31/10

to Victor Sudakov, us...@subversion.apache.org

Google is your friend: svndumptool

You moght need to append a .py

Also if this is a _top post_ it's three phone what done it... Haven't figured out how to control where it puts the reply

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen

Daniel Shahaf

unread,

Dec 31, 2010, 11:40:47 AM12/31/10

to Stephen Connolly, Victor Sudakov, us...@subversion.apache.org

> On 31 Dec 2010 15:10, "Victor Sudakov" <sud...@sibptus.tomsk.ru> wrote:
> > Daniel Shahaf wrote:
> >> (or dive into the source and help us plug that memory leak --- compile
> >> with APR pool debugging enabled)
> >
> > I will try to do that but unfortunately I need some immediate
> > workaround :(
> >

Thanks. I was referring specifically to -DAPR_POOL_DEBUG=19; there's
more information in HACKING:
http://subversion.apache.org/docs/community-guide/

Stephen Connolly wrote on Fri, Dec 31, 2010 at 16:22:56 +0000:
> Google is your friend: svndumptool
>

That's a better suggestion than I would have made.

Brian Brophy

unread,

Jan 3, 2011, 7:59:09 PM1/3/11

to Victor Sudakov, us...@subversion.apache.org

Fair enough, the same pattern is still applicable. For example, in our
CVS repo what separated one "project" from another was basically a
root-level folder.

In kind, you could similarly use cvs2svn to "chunk/dump" subdirectories
at a time.

For example, if in CVS you have something like:
/Folder1
/Folder2
/Folder3

... you run cvs2svn three times, once for each subdirectory, producing
folder1.dump, folder2.dump, and folder3.dump respectively.

Then, svnadmin load each individually:
- manually create the root folders: Folder1, Folder2, Folder3
- svnadmin load --parent-dir Folder1 /path/to/svn/repo < folder1.dump
- svnadmin load --parent-dir Folder2 /path/to/svn/repo < folder2.dump
- svnadmin load --parent-dir Folder3 /path/to/svn/repo < folder3.dump

Victor Sudakov

unread,

Jan 7, 2011, 8:57:20 AM1/7/11

to us...@subversion.apache.org

Brian Brophy wrote:
> >
> >>I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN
> >>1.3. Our repo had many sections (projects) within it. We had to
> >>migrate each project independently so that it's team could coordinate
> >>when they migrated to SVN. As such, I dumped each project when ready
> >>and then svnadmin loaded each dump into it's own path/root (so as not to
> >>overwrite anything previously loaded and unrelated to this project's
> >>import).
> >>
> >>So, you can do it by controlling which path/portion of CVS you use
> >>cvs2vn to create the dump file from.
> >>
> >
> >The CVS repository in question (with the size 54M with 17751 files) is
> >exactly one project. It's the history of a geographical DNS zone for
> >more than 10 years.

> Fair enough, the same pattern is still applicable. For example, in our

> CVS repo what separated one "project" from another was basically a
> root-level folder.
>
> In kind, you could similarly use cvs2svn to "chunk/dump" subdirectories
> at a time.
>
> For example, if in CVS you have something like:
> /Folder1
> /Folder2
> /Folder3
>
> ... you run cvs2svn three times, once for each subdirectory, producing
> folder1.dump, folder2.dump, and folder3.dump respectively.
>
> Then, svnadmin load each individually:

It would be fine if the project in question did not contain almost all
the files in one directory. You may call the layout silly, but CVS does
not seem to mind. OTOH, I would have distributed the files over
several subdirectories, but CVS does not handle moving files well.

I wonder if cvs2svn is to blame that it produces a dump svnadmin
cannot load. Or I am always risking that "svnadmin dump" may one day
produce a dump a subsequent "svnadmin load" will be unable to swallow?

I mean, if by hook or by crook, by using third party utilities like
svndumptool, I will eventually be able to convert this project from
CVS to SVN. Is there a chance that a subsequent dump will be again
unloadable?

Les Mikesell

unread,

Jan 7, 2011, 11:37:12 AM1/7/11

to us...@subversion.apache.org

On 1/7/2011 7:57 AM, Victor Sudakov wrote:
>
>>>> I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN
>>>> 1.3. Our repo had many sections (projects) within it. We had to
>>>> migrate each project independently so that it's team could coordinate
>>>> when they migrated to SVN. As such, I dumped each project when ready
>>>> and then svnadmin loaded each dump into it's own path/root (so as not to
>>>> overwrite anything previously loaded and unrelated to this project's
>>>> import).
>>>>

>

> It would be fine if the project in question did not contain almost all
> the files in one directory. You may call the layout silly, but CVS does
> not seem to mind. OTOH, I would have distributed the files over
> several subdirectories, but CVS does not handle moving files well.
>
> I wonder if cvs2svn is to blame that it produces a dump svnadmin
> cannot load. Or I am always risking that "svnadmin dump" may one day
> produce a dump a subsequent "svnadmin load" will be unable to swallow?
>
> I mean, if by hook or by crook, by using third party utilities like
> svndumptool, I will eventually be able to convert this project from
> CVS to SVN. Is there a chance that a subsequent dump will be again
> unloadable?

I don't think you are hitting some absolute limit in the software here,
just running out of RAM on your particular machine. Can you do the
conversion on a machine with more RAM?

--
Les Mikesell
lesmi...@gmail.com

Daniel Shahaf

unread,

Jan 7, 2011, 12:58:23 PM1/7/11

to Les Mikesell, us...@subversion.apache.org

I believe there are known issues with memory usage in svnadmin. See the
issue tracker.

I don't know cvs2svn, but it could have a --sharded-output option, so eg
it would produce a dumpfile per 1000 revisions, rather than one huge
dumpfile.

> --
> Les Mikesell
> lesmi...@gmail.com

Victor Sudakov

unread,

Jan 7, 2011, 2:31:03 PM1/7/11

to us...@subversion.apache.org

I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
so much swap specially for the occasion). svnadmin crashed after
reaching the SIZE about 2.5 GB.

Is 1 GB RAM and 25 GB swap not enough?

I don't know if this gdb output will be useful:

(gdb) where
#0 0x2841e117 in kill () from /lib/libc.so.7
#1 0x2841e076 in raise () from /lib/libc.so.7
#2 0x2841cc4a in abort () from /lib/libc.so.7
#3 0x28116ec5 in abort_on_pool_failure (retcode=Could not find the frame base f
or "abort_on_pool_failure".
)
at subversion/libsvn_subr/pool.c:49
#4 0x283095fb in apr_palloc (pool=0xba46d018, in_size=204800)
at memory/unix/apr_pools.c:663
#5 0x280ebad3 in svn_txdelta_target_push (
handler=0x280eb140 <window_handler>, handler_baton=0x5b26c058,
source=0xba470738, pool=0xba46d018)
at subversion/libsvn_delta/text_delta.c:528
#6 0x280d4d62 in svn_fs_fs__set_contents (stream=0xbfbfe7e4, fs=0x28512020,
noderev=0x27dea528, pool=0x2852a018)
at subversion/libsvn_fs_fs/fs_fs.c:5066
#7 0x280c9d52 in svn_fs_fs__dag_get_edit_stream (contents=0x2852a138,
file=0x5b2e61a0, pool=0x2852a018) at subversion/libsvn_fs_fs/dag.c:997
#8 0x280de42e in fs_apply_text (contents_p=0xbfbfe904, root=0x5aef8058,
path=0x2852a080 "ns2/trunk/tomsk.ru/SOA", result_checksum=0x2852a110,
pool=0x2852a018) at subversion/libsvn_fs_fs/tree.c:2615
#9 0x280bf44e in svn_fs_apply_text (contents_p=0xbfbfe904, root=0x5aef8058,
path=0x2852a080 "ns2/trunk/tomsk.ru/SOA",
result_checksum=0x2852a0e8 "b03cbddfbc11be113cbf675862eb971e",
pool=0x2852a018) at subversion/libsvn_fs/fs-loader.c:1096
---Type <return> to continue, or q <return> to quit---

Victor Sudakov

unread,

Jan 7, 2011, 2:38:00 PM1/7/11

to us...@subversion.apache.org

Daniel Shahaf wrote:

[dd]

>
> I believe there are known issues with memory usage in svnadmin. See the
> issue tracker.

Namely?

>
> I don't know cvs2svn, but it could have a --sharded-output option, so eg
> it would produce a dumpfile per 1000 revisions, rather than one huge
> dumpfile.

cvs2svn-2.3.0_2 does not seem to have such an option:
"cvs2svn: error: no such option: --sharded-output"

Les Mikesell

unread,

Jan 7, 2011, 2:47:58 PM1/7/11

to us...@subversion.apache.org

On 1/7/2011 1:31 PM, Victor Sudakov wrote:

>>
>> I don't think you are hitting some absolute limit in the software here,
>> just running out of RAM on your particular machine. Can you do the
>> conversion on a machine with more RAM?
>
> I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
> so much swap specially for the occasion). svnadmin crashed after
> reaching the SIZE about 2.5 GB.
>
> Is 1 GB RAM and 25 GB swap not enough?

If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or
4 gigs. Or maybe some quota setting before that.

--
Les Mikesell
lesmi...@gmail.com

Johan Corveleyn

unread,

Jan 7, 2011, 3:03:32 PM1/7/11

to Les Mikesell, us...@subversion.apache.org

Like Stephen Connolly suggested a week ago: I think you should take a
look at svndumptool: http://svn.borg.ch/svndumptool/

I've never used it myself, but in the README.txt file, there is
mention of a subcommand "split":

[[[
Split
-----

Splits a dump file into multiple smaller dump files.

svndumptool.py split inputfile [startrev endrev filename]...

options:
--version show program's version number and exit
-h, --help show this help message and exit

Known bugs:
* None
]]]

HTH
--
Johan

Victor Sudakov

unread,

Jan 8, 2011, 5:33:40 AM1/8/11

to us...@subversion.apache.org

Les Mikesell wrote:
>
> >>I don't think you are hitting some absolute limit in the software here,
> >>just running out of RAM on your particular machine. Can you do the
> >>conversion on a machine with more RAM?
> >
> >I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
> >so much swap specially for the occasion). svnadmin crashed after
> >reaching the SIZE about 2.5 GB.
> >
> >Is 1 GB RAM and 25 GB swap not enough?
>
> If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or
> 4 gigs. Or maybe some quota setting before that.

The more I think about it, the more likely it seems.

Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
cycle will eventually fail as the repository grows beyond a certain
size?

BTW here are the limits for the svn user:

$ whoami
svn
$ limits
Resource limits (current):
cputime infinity secs
filesize infinity kB
datasize 524288 kB
stacksize 65536 kB
coredumpsize infinity kB
memoryuse infinity kB
memorylocked infinity kB
maxprocesses 5547
openfiles 11095
sbsize infinity bytes
vmemoryuse infinity kB
pseudo-terminals infinity
swapuse infinity kB
$ uname -srm
FreeBSD 8.1-RELEASE-p2 i386

Victor Sudakov

unread,

Jan 8, 2011, 5:36:00 AM1/8/11

to us...@subversion.apache.org

Johan Corveleyn wrote:
>
> Like Stephen Connolly suggested a week ago: I think you should take a
> look at svndumptool: http://svn.borg.ch/svndumptool/
>
> I've never used it myself, but in the README.txt file, there is
> mention of a subcommand "split":

I am already trying it but it turns out not as easy as it seems. I
will share what comes of it.

Daniel Shahaf

unread,

Jan 8, 2011, 7:05:07 AM1/8/11

to Victor Sudakov, us...@subversion.apache.org

Victor Sudakov wrote on Sat, Jan 08, 2011 at 01:38:00 +0600:
> Daniel Shahaf wrote:
>
> [dd]
>
> >
> > I believe there are known issues with memory usage in svnadmin. See the
> > issue tracker.
>
> Namely?
>

Search for 'svnadmin' and you should find it.

> >
> > I don't know cvs2svn, but it could have a --sharded-output option, so eg
> > it would produce a dumpfile per 1000 revisions, rather than one huge
> > dumpfile.
>
> cvs2svn-2.3.0_2 does not seem to have such an option:
> "cvs2svn: error: no such option: --sharded-output"
>

I said "could have", which means I don't know whether or not such an
option already exists and if so how it's called.

Les Mikesell

unread,

Jan 8, 2011, 11:54:12 AM1/8/11

to Victor Sudakov, us...@subversion.apache.org

On Sat, Jan 8, 2011 at 4:33 AM, Victor Sudakov <sud...@sibptus.tomsk.ru> wrote:

>> >I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
>> >so much swap specially for the occasion). svnadmin crashed after
>> >reaching the SIZE about 2.5 GB.
>> >
>> >Is 1 GB RAM and 25 GB swap not enough?
>>
>> If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or
>> 4 gigs. Or maybe some quota setting before that.
>
> The more I think about it, the more likely it seems.
>
> Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
> cycle will eventually fail as the repository grows beyond a certain
> size?

A 'real' svnadmin dump would let you specify revision ranges so you
could do it incrementally but cvs2svn doesn't have an equivalent
option other than splitting out directories. Perhaps someone could do
the load on a larger 64-bit machine and dump it back in smaller ranges
if you can't find a better way to split it.

--
Les Mikesell
lesmi...@gmail.com

Victor Sudakov

unread,

Jan 9, 2011, 1:25:06 AM1/9/11

to us...@subversion.apache.org

Les Mikesell wrote:

[dd]

> > Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
> > cycle will eventually fail as the repository grows beyond a certain
> > size?
>
> A 'real' svnadmin dump would let you specify revision ranges so you
> could do it incrementally but cvs2svn doesn't have an equivalent
> option other than splitting out directories. Perhaps someone could do
> the load on a larger 64-bit machine and dump it back in smaller ranges
> if you can't find a better way to split it.

I have also noticed that the --deltas option dramatically decreases
the dump size (it becomes megabytes instead of gigabytes).
Unfortunately cvs2svn cannot do deltas. I will try to load on a
64-bit machine and dump it back with the --deltas option.

Kevin Grover

unread,

Jan 9, 2011, 7:02:28 PM1/9/11

to Victor Sudakov, us...@subversion.apache.org

Two suggestions
1) svndumptool split (already suggested)
2) Don't use '--dumpfile' on cvs2svn, let cvs2svn load it into a subversion repo directly.

Victor Sudakov

unread,

Jan 10, 2011, 2:25:37 AM1/10/11

to us...@subversion.apache.org

Kevin Grover wrote:

[dd]

> 2) Don't use '--dumpfile' on cvs2svn, let cvs2svn load it into a subversion
> repo directly.

It did not make any difference. Frankly speaking, I would be
surprised if it did.

Starting Subversion r10773 / 23520
Starting Subversion r10774 / 23520
Starting Subversion r10775 / 23520
ERROR: svnadmin failed with the following output while loading the dumpfile:

$

Michael Haggerty

unread,

Jan 10, 2011, 3:30:24 AM1/10/11

to Victor Sudakov, us...@subversion.apache.org

On 01/07/2011 08:38 PM, Victor Sudakov wrote:
> Daniel Shahaf wrote:

>> I don't know cvs2svn, but it could have a --sharded-output option, so eg
>> it would produce a dumpfile per 1000 revisions, rather than one huge
>> dumpfile.
>
> cvs2svn-2.3.0_2 does not seem to have such an option:
> "cvs2svn: error: no such option: --sharded-output"

cvs2svn doesn't have such an option, but it wouldn't be very difficult
to implement. Let me know if you would like some pointers to help you
get started implementing it.

Michael

--
Michael Haggerty
mha...@alum.mit.edu
http://softwareswirl.blogspot.com/

Jan Keirse

unread,

Jan 10, 2011, 3:36:39 AM1/10/11

to us...@subversion.apache.org

> On 1/7/2011 7:57 AM, Victor Sudakov wrote:
> > It would be fine if the project in question did not contain almost all
> > the files in one directory. You may call the layout silly, but CVS
does
> > not seem to mind. OTOH, I would have distributed the files over
> > several subdirectories, but CVS does not handle moving files well.
> >
> > I wonder if cvs2svn is to blame that it produces a dump svnadmin
> > cannot load. Or I am always risking that "svnadmin dump" may one day
> > produce a dump a subsequent "svnadmin load" will be unable to swallow?
> >
> > I mean, if by hook or by crook, by using third party utilities like
> > svndumptool, I will eventually be able to convert this project from
> > CVS to SVN. Is there a chance that a subsequent dump will be again
> > unloadable?
>

Have you tried the following:
- Copy your CVS repository (say /myreypository to /myrepositoryconv)
- In the copy move the ,v files into several subdirectories (using the
operating system, not using CVS commands.)
- Convert the directories one at a time and load them into svn.
- Once loaded into svn you can move everything back into one folder (using
svn commands) if desired.

Manually moving around ,v files in a cvs repository is generally not
adviced primarily because it will annoy users with checked out working
copies (and it's unversioned), but those working copies won't be of any
use anyway once the server has been migrated to subversion so that
shouldn't be a problem, so I don't think it could cause problems, but keep
the original repository around just in case...

Kind Regards,

JAN KEIRSE
ICT-DEPARTMENT
Software quality & Systems: Software Engineer

**** DISCLAIMER ****

http://www.tvh.com/newen2/emaildisclaimer/default.html

"This message is delivered to all addressees subject to the conditions
set forth in the attached disclaimer, which is an integral part of this
message."

Stephen Connolly

unread,

Jan 10, 2011, 3:52:48 AM1/10/11

to Michael Haggerty, Victor Sudakov, us...@subversion.apache.org

On 10 January 2011 08:30, Michael Haggerty <mha...@alum.mit.edu> wrote:
> On 01/07/2011 08:38 PM, Victor Sudakov wrote:
>> Daniel Shahaf wrote:
>>> I don't know cvs2svn, but it could have a --sharded-output option, so eg
>>> it would produce a dumpfile per 1000 revisions, rather than one huge
>>> dumpfile.
>>
>> cvs2svn-2.3.0_2 does not seem to have such an option:
>> "cvs2svn: error: no such option: --sharded-output"
>
> cvs2svn doesn't have such an option, but it wouldn't be very difficult
> to implement. Let me know if you would like some pointers to help you
> get started implementing it.
>
> Michael

Hello... I said this a week ago

Split
-----

Splits a dump file into multiple smaller dump files.

svndumptool.py split inputfile [startrev endrev filename]...

options:
--version show program's version number and exit
-h, --help show this help message and exit

Known bugs:
* None

No need to go implementing anything

-Stephen

Ramesh Nadupalli

unread,

Jan 10, 2011, 9:48:29 PM1/10/11

to Brian Brophy, Victor Sudakov, us...@subversion.apache.org

Recently I was doing a migration from cvs to svn and experienced a strange
issue. It was ignoring tabs and spl characters (^M) inside the text files.
Did anyone experience similar issues?

Thanks
Ramesh

Victor Sudakov

unread,

Jan 20, 2011, 3:18:00 AM1/20/11

to us...@subversion.apache.org

Colleagues,

I have finally completed a test cvs2svn conversion on an amd64 system.
The peak memory requirement of svnadmin during the conversion was
9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.

"svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
2161M RES of memory at its peak. Such memory requirements make this
repo completely unusable on i386 systems.

The original CVS repo is 59M on disk with 17859 files (including those
in the Attic) and total 23911 revisions (in SVN terms). All files are
strictly text.

Something seems to be very suboptimal either about SVN itself or about
the cvs2svn utility. I am especially surprised by the 8.5G size of the
resulting SVN repository (though the result of "svnadmin dump --deltas"
is 44M).

> - Copy your CVS repository (say /myreypository to /myrepositoryconv)
> - In the copy move the ,v files into several subdirectories (using the
> operating system, not using CVS commands.)
> - Convert the directories one at a time and load them into svn.
> - Once loaded into svn you can move everything back into one folder
> (using svn commands) if desired.

Even if I do this, after moving everything back I will not be able to
do "svnadmin dump" on an i386 system, perhaps unless I write some
script which will iterate and keep track of dumped revision numbers.

Cooke, Mark

unread,

Jan 20, 2011, 5:13:56 AM1/20/11

to us...@subversion.apache.org, Victor Sudakov

Did you also notice the --incremental option? Is that what you mean by
'keeping track of revision numbers'?

http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.dump.html

This allows you to dump the repo in sections (by specifying a revision
range)

You do not mention what verison of svn you are using but newer versions
allow the repository to be packed, would this help your storage issues?

http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.rep
osadmin.maint.diskspace.fsfspacking

~ mark c

Johan Corveleyn

unread,

Jan 20, 2011, 7:30:22 AM1/20/11

to Victor Sudakov, us...@subversion.apache.org

On Thu, Jan 20, 2011 at 9:18 AM, Victor Sudakov
<sud...@sibptus.tomsk.ru> wrote:
> Colleagues,
>
> I have finally completed a test cvs2svn conversion on an amd64 system.
> The peak memory requirement of svnadmin during the conversion was
> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
>
> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
> 2161M RES of memory at its peak. Such memory requirements make this
> repo completely unusable on i386 systems.
>
> The original CVS repo is 59M on disk with 17859 files (including those
> in the Attic) and total 23911 revisions (in SVN terms). All files are
> strictly text.
>
> Something seems to be very suboptimal either about SVN itself or about
> the cvs2svn utility. I am especially surprised by the 8.5G size of the
> resulting SVN repository (though the result of "svnadmin dump --deltas"
> is 44M).

Do you have a lot of files in the same directory? (are all those 17859
files in one single directory?)

I don't know the details, but I know that svn rev files (and probably
also some memory structures, explaining the huge memory usage) become
very big for commits in a directory that has many files.

It has something to do with the way SVN tracks directories (all
directory entries are always listed in full in those rev files).

At least, that's what I remember vaguely from some past discussions.
Maybe there is even an issue in the issue tracker for this (or
previous discussions on users- or dev-mailinglist), but I don't have
time to search now ...

If this is the case, a possible workaround could be that you
restructure the project in CVS, or in a copy of your CVS repository
(creating some subdirs, and moving ,v files into them). Of course, I
understand this may be an unworkable solution (depends on the amount
of flexibility you have in moving things around).

Cheers,
--
Johan

Daniel Shahaf

unread,

Jan 20, 2011, 12:11:53 PM1/20/11

to Victor Sudakov, us...@subversion.apache.org

That's not a nice result, but I think I said somewhere in this thread
that there are known memory-usage bugs in svnadmin dump/load. Which
means the fix (as opposed to 'workaround') to this issue is to have
someone (possibly you or someone you hire) look into those bugs.

With a bit of luck, this will boil down to looking for some place where
allocations should be done in a scratch_pool or iterpool instead of some
long-lived result_pool (which may be called 'pool'). One can compile
with APR pool debugging enabled to information about what's allocated
from which pool.

Paul Burba's work on the recent fixed-in-1.6.15
DoS-via-memory-consumption CVE can serve as an example.

Daniel
(workarounds are plenty --- svnsync, incremental dump, whatnot --- they
are discussed elsethread)

Johan Corveleyn

unread,

Jan 20, 2011, 1:09:01 PM1/20/11

to Daniel Shahaf, Victor Sudakov, us...@subversion.apache.org

But that doesn't explain why the resulting repository is so large
(compared to the original CVS repository). Sure, there might be memory
usage problems in dump/load (it uses more memory than the resulting
repository uses diskspace), but I think there is more going on.

That's why I'm guessing on rev files being large (and the
corresponding memory structures) because of the amount of dir entries
in each revision. I'm not that intimately familiar with how this is
all represented, and how the rev files are structured and all that, so
I'm just guessing ... I seem to remember something like this from
another discussion in the past.

Cheers,
--
Johan

Victor Sudakov

unread,

Feb 8, 2011, 12:32:47 PM2/8/11

to us...@subversion.apache.org

Johan Corveleyn wrote:

[dd]

> But that doesn't explain why the resulting repository is so large
> (compared to the original CVS repository). Sure, there might be memory
> usage problems in dump/load (it uses more memory than the resulting
> repository uses diskspace), but I think there is more going on.
>
> That's why I'm guessing on rev files being large (and the
> corresponding memory structures) because of the amount of dir entries
> in each revision. I'm not that intimately familiar with how this is
> all represented, and how the rev files are structured and all that, so
> I'm just guessing ... I seem to remember something like this from
> another discussion in the past.

I have created a small testcase script:

#!/bin/sh

for i in `jot 15000`
do
cat > Testfile_${i}.txt << __END__
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
__END__

svn add Testfile_${i}.txt
svn commit -m "Iteration $i"
done

After the 15000th commit, the size of the repository on disk is 5.5G
with the working directory size being 120M. Besides, after several
thousand commits to this directory SVN slows down considerably. This
must be some design flaw (or peculiarity if you like) of SVN.

Stefan Sperling

unread,

Feb 8, 2011, 2:34:43 PM2/8/11

to Victor Sudakov, us...@subversion.apache.org

On Tue, Feb 08, 2011 at 11:32:47PM +0600, Victor Sudakov wrote:
> After the 15000th commit, the size of the repository on disk is 5.5G
> with the working directory size being 120M. Besides, after several
> thousand commits to this directory SVN slows down considerably. This
> must be some design flaw (or peculiarity if you like) of SVN.

Probably related to the way directories are represented in the repository.
See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
to how this currently works.

Les Mikesell

unread,

Feb 8, 2011, 3:11:14 PM2/8/11

to us...@subversion.apache.org

I'd expect even local operations like the compare against the pristine
versions to decide what to commit to become slow when you put many
thousands of files in one directory because most filesystems aren't good
at that either (although they make fake it with caching). It's one of
those "if it hurts, don't do it" things.

--
Les Mikesell
lesmi...@gmail.com

Victor Sudakov

unread,

Feb 8, 2011, 10:26:03 PM2/8/11

to us...@subversion.apache.org

I did not know it would hurt until I tried to migrate this particular
repository from CVS to SVN.

FreeBSD by itself handles large directories very well due to its
dirhash feature.

Victor Sudakov

unread,

Feb 8, 2011, 10:26:43 PM2/8/11

to us...@subversion.apache.org

BTW I use the FSFS backend if it makes any difference.

Reply all

Reply to author

Forward