I have a CVS repository sized 54M with 17751 files.
"cvs2svn --dumpfile" produces a dump sized 13G. svnadmin cannot load
this dump aborting with an out of memory condition on a FreeBSD
8.1-RELEASE box with 1G of RAM and 2.5G of swap.
I really need to convert this repository to SVN. What should i do? Any
advice is appreciated.
--
Victor Sudakov, VAS4-RIPE, VAS47-RIPN
sip:sud...@sibptus.tomsk.ru
(or dive into the source and help us plug that memory leak --- compile
with APR pool debugging enabled)
How do I do that? I have not found such an option in cvs2svn.
I don't mind writing a script if I knew the idea how to split the dump.
I haven't found any "svnadmin load" option to import part of a dump
either. man what?
> or try a newer version of svnadmin.
I am using subversion-1.6.15, it seems to be the latest ported to
FreeBSD.
>
> (or dive into the source and help us plug that memory leak --- compile
> with APR pool debugging enabled)
I will try to do that but unfortunately I need some immediate
workaround :(
So, you can do it by controlling which path/portion of CVS you use
cvs2vn to create the dump file from.
Brian
The CVS repository in question (with the size 54M with 17751 files) is
exactly one project. It's the history of a geographical DNS zone for
more than 10 years.
Google is your friend: svndumptool
You moght need to append a .py
Also if this is a _top post_ it's three phone what done it... Haven't figured out how to control where it puts the reply
- Stephen
---
Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen
Thanks. I was referring specifically to -DAPR_POOL_DEBUG=19; there's
more information in HACKING:
http://subversion.apache.org/docs/community-guide/
Stephen Connolly wrote on Fri, Dec 31, 2010 at 16:22:56 +0000:
> Google is your friend: svndumptool
>
That's a better suggestion than I would have made.
In kind, you could similarly use cvs2svn to "chunk/dump" subdirectories
at a time.
For example, if in CVS you have something like:
/Folder1
/Folder2
/Folder3
... you run cvs2svn three times, once for each subdirectory, producing
folder1.dump, folder2.dump, and folder3.dump respectively.
Then, svnadmin load each individually:
- manually create the root folders: Folder1, Folder2, Folder3
- svnadmin load --parent-dir Folder1 /path/to/svn/repo < folder1.dump
- svnadmin load --parent-dir Folder2 /path/to/svn/repo < folder2.dump
- svnadmin load --parent-dir Folder3 /path/to/svn/repo < folder3.dump
> Fair enough, the same pattern is still applicable. For example, in our
> CVS repo what separated one "project" from another was basically a
> root-level folder.
>
> In kind, you could similarly use cvs2svn to "chunk/dump" subdirectories
> at a time.
>
> For example, if in CVS you have something like:
> /Folder1
> /Folder2
> /Folder3
>
> ... you run cvs2svn three times, once for each subdirectory, producing
> folder1.dump, folder2.dump, and folder3.dump respectively.
>
> Then, svnadmin load each individually:
It would be fine if the project in question did not contain almost all
the files in one directory. You may call the layout silly, but CVS does
not seem to mind. OTOH, I would have distributed the files over
several subdirectories, but CVS does not handle moving files well.
I wonder if cvs2svn is to blame that it produces a dump svnadmin
cannot load. Or I am always risking that "svnadmin dump" may one day
produce a dump a subsequent "svnadmin load" will be unable to swallow?
I mean, if by hook or by crook, by using third party utilities like
svndumptool, I will eventually be able to convert this project from
CVS to SVN. Is there a chance that a subsequent dump will be again
unloadable?
>
> It would be fine if the project in question did not contain almost all
> the files in one directory. You may call the layout silly, but CVS does
> not seem to mind. OTOH, I would have distributed the files over
> several subdirectories, but CVS does not handle moving files well.
>
> I wonder if cvs2svn is to blame that it produces a dump svnadmin
> cannot load. Or I am always risking that "svnadmin dump" may one day
> produce a dump a subsequent "svnadmin load" will be unable to swallow?
>
> I mean, if by hook or by crook, by using third party utilities like
> svndumptool, I will eventually be able to convert this project from
> CVS to SVN. Is there a chance that a subsequent dump will be again
> unloadable?
I don't think you are hitting some absolute limit in the software here,
just running out of RAM on your particular machine. Can you do the
conversion on a machine with more RAM?
--
Les Mikesell
lesmi...@gmail.com
I believe there are known issues with memory usage in svnadmin. See the
issue tracker.
I don't know cvs2svn, but it could have a --sharded-output option, so eg
it would produce a dumpfile per 1000 revisions, rather than one huge
dumpfile.
> --
> Les Mikesell
> lesmi...@gmail.com
I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
so much swap specially for the occasion). svnadmin crashed after
reaching the SIZE about 2.5 GB.
Is 1 GB RAM and 25 GB swap not enough?
I don't know if this gdb output will be useful:
(gdb) where
#0 0x2841e117 in kill () from /lib/libc.so.7
#1 0x2841e076 in raise () from /lib/libc.so.7
#2 0x2841cc4a in abort () from /lib/libc.so.7
#3 0x28116ec5 in abort_on_pool_failure (retcode=Could not find the frame base f
or "abort_on_pool_failure".
)
at subversion/libsvn_subr/pool.c:49
#4 0x283095fb in apr_palloc (pool=0xba46d018, in_size=204800)
at memory/unix/apr_pools.c:663
#5 0x280ebad3 in svn_txdelta_target_push (
handler=0x280eb140 <window_handler>, handler_baton=0x5b26c058,
source=0xba470738, pool=0xba46d018)
at subversion/libsvn_delta/text_delta.c:528
#6 0x280d4d62 in svn_fs_fs__set_contents (stream=0xbfbfe7e4, fs=0x28512020,
noderev=0x27dea528, pool=0x2852a018)
at subversion/libsvn_fs_fs/fs_fs.c:5066
#7 0x280c9d52 in svn_fs_fs__dag_get_edit_stream (contents=0x2852a138,
file=0x5b2e61a0, pool=0x2852a018) at subversion/libsvn_fs_fs/dag.c:997
#8 0x280de42e in fs_apply_text (contents_p=0xbfbfe904, root=0x5aef8058,
path=0x2852a080 "ns2/trunk/tomsk.ru/SOA", result_checksum=0x2852a110,
pool=0x2852a018) at subversion/libsvn_fs_fs/tree.c:2615
#9 0x280bf44e in svn_fs_apply_text (contents_p=0xbfbfe904, root=0x5aef8058,
path=0x2852a080 "ns2/trunk/tomsk.ru/SOA",
result_checksum=0x2852a0e8 "b03cbddfbc11be113cbf675862eb971e",
pool=0x2852a018) at subversion/libsvn_fs/fs-loader.c:1096
---Type <return> to continue, or q <return> to quit---
[dd]
>
> I believe there are known issues with memory usage in svnadmin. See the
> issue tracker.
Namely?
>
> I don't know cvs2svn, but it could have a --sharded-output option, so eg
> it would produce a dumpfile per 1000 revisions, rather than one huge
> dumpfile.
cvs2svn-2.3.0_2 does not seem to have such an option:
"cvs2svn: error: no such option: --sharded-output"
>>
>> I don't think you are hitting some absolute limit in the software here,
>> just running out of RAM on your particular machine. Can you do the
>> conversion on a machine with more RAM?
>
> I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
> so much swap specially for the occasion). svnadmin crashed after
> reaching the SIZE about 2.5 GB.
>
> Is 1 GB RAM and 25 GB swap not enough?
If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or
4 gigs. Or maybe some quota setting before that.
--
Les Mikesell
lesmi...@gmail.com
Like Stephen Connolly suggested a week ago: I think you should take a
look at svndumptool: http://svn.borg.ch/svndumptool/
I've never used it myself, but in the README.txt file, there is
mention of a subcommand "split":
[[[
Split
-----
Splits a dump file into multiple smaller dump files.
svndumptool.py split inputfile [startrev endrev filename]...
options:
--version show program's version number and exit
-h, --help show this help message and exit
Known bugs:
* None
]]]
HTH
--
Johan
The more I think about it, the more likely it seems.
Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
cycle will eventually fail as the repository grows beyond a certain
size?
BTW here are the limits for the svn user:
$ whoami
svn
$ limits
Resource limits (current):
cputime infinity secs
filesize infinity kB
datasize 524288 kB
stacksize 65536 kB
coredumpsize infinity kB
memoryuse infinity kB
memorylocked infinity kB
maxprocesses 5547
openfiles 11095
sbsize infinity bytes
vmemoryuse infinity kB
pseudo-terminals infinity
swapuse infinity kB
$ uname -srm
FreeBSD 8.1-RELEASE-p2 i386
I am already trying it but it turns out not as easy as it seems. I
will share what comes of it.
Search for 'svnadmin' and you should find it.
> >
> > I don't know cvs2svn, but it could have a --sharded-output option, so eg
> > it would produce a dumpfile per 1000 revisions, rather than one huge
> > dumpfile.
>
> cvs2svn-2.3.0_2 does not seem to have such an option:
> "cvs2svn: error: no such option: --sharded-output"
>
I said "could have", which means I don't know whether or not such an
option already exists and if so how it's called.
>> >I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
>> >so much swap specially for the occasion). svnadmin crashed after
>> >reaching the SIZE about 2.5 GB.
>> >
>> >Is 1 GB RAM and 25 GB swap not enough?
>>
>> If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or
>> 4 gigs. Or maybe some quota setting before that.
>
> The more I think about it, the more likely it seems.
>
> Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
> cycle will eventually fail as the repository grows beyond a certain
> size?
A 'real' svnadmin dump would let you specify revision ranges so you
could do it incrementally but cvs2svn doesn't have an equivalent
option other than splitting out directories. Perhaps someone could do
the load on a larger 64-bit machine and dump it back in smaller ranges
if you can't find a better way to split it.
--
Les Mikesell
lesmi...@gmail.com
[dd]
> > Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
> > cycle will eventually fail as the repository grows beyond a certain
> > size?
>
> A 'real' svnadmin dump would let you specify revision ranges so you
> could do it incrementally but cvs2svn doesn't have an equivalent
> option other than splitting out directories. Perhaps someone could do
> the load on a larger 64-bit machine and dump it back in smaller ranges
> if you can't find a better way to split it.
I have also noticed that the --deltas option dramatically decreases
the dump size (it becomes megabytes instead of gigabytes).
Unfortunately cvs2svn cannot do deltas. I will try to load on a
64-bit machine and dump it back with the --deltas option.
[dd]
> 2) Don't use '--dumpfile' on cvs2svn, let cvs2svn load it into a subversion
> repo directly.
It did not make any difference. Frankly speaking, I would be
surprised if it did.
Starting Subversion r10773 / 23520
Starting Subversion r10774 / 23520
Starting Subversion r10775 / 23520
ERROR: svnadmin failed with the following output while loading the dumpfile:
$
cvs2svn doesn't have such an option, but it wouldn't be very difficult
to implement. Let me know if you would like some pointers to help you
get started implementing it.
Michael
--
Michael Haggerty
mha...@alum.mit.edu
http://softwareswirl.blogspot.com/
Have you tried the following:
- Copy your CVS repository (say /myreypository to /myrepositoryconv)
- In the copy move the ,v files into several subdirectories (using the
operating system, not using CVS commands.)
- Convert the directories one at a time and load them into svn.
- Once loaded into svn you can move everything back into one folder (using
svn commands) if desired.
Manually moving around ,v files in a cvs repository is generally not
adviced primarily because it will annoy users with checked out working
copies (and it's unversioned), but those working copies won't be of any
use anyway once the server has been migrated to subversion so that
shouldn't be a problem, so I don't think it could cause problems, but keep
the original repository around just in case...
Kind Regards,
JAN KEIRSE
ICT-DEPARTMENT
Software quality & Systems: Software Engineer
**** DISCLAIMER ****
http://www.tvh.com/newen2/emaildisclaimer/default.html
"This message is delivered to all addressees subject to the conditions
set forth in the attached disclaimer, which is an integral part of this
message."
Hello... I said this a week ago
Split
-----
Splits a dump file into multiple smaller dump files.
svndumptool.py split inputfile [startrev endrev filename]...
options:
--version show program's version number and exit
-h, --help show this help message and exit
Known bugs:
* None
No need to go implementing anything
-Stephen
Thanks
Ramesh
I have finally completed a test cvs2svn conversion on an amd64 system.
The peak memory requirement of svnadmin during the conversion was
9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
"svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
2161M RES of memory at its peak. Such memory requirements make this
repo completely unusable on i386 systems.
The original CVS repo is 59M on disk with 17859 files (including those
in the Attic) and total 23911 revisions (in SVN terms). All files are
strictly text.
Something seems to be very suboptimal either about SVN itself or about
the cvs2svn utility. I am especially surprised by the 8.5G size of the
resulting SVN repository (though the result of "svnadmin dump --deltas"
is 44M).
> - Copy your CVS repository (say /myreypository to /myrepositoryconv)
> - In the copy move the ,v files into several subdirectories (using the
> operating system, not using CVS commands.)
> - Convert the directories one at a time and load them into svn.
> - Once loaded into svn you can move everything back into one folder
> (using svn commands) if desired.
Even if I do this, after moving everything back I will not be able to
do "svnadmin dump" on an i386 system, perhaps unless I write some
script which will iterate and keep track of dumped revision numbers.
http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.dump.html
This allows you to dump the repo in sections (by specifying a revision
range)
You do not mention what verison of svn you are using but newer versions
allow the repository to be packed, would this help your storage issues?
http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.rep
osadmin.maint.diskspace.fsfspacking
~ mark c
Do you have a lot of files in the same directory? (are all those 17859
files in one single directory?)
I don't know the details, but I know that svn rev files (and probably
also some memory structures, explaining the huge memory usage) become
very big for commits in a directory that has many files.
It has something to do with the way SVN tracks directories (all
directory entries are always listed in full in those rev files).
At least, that's what I remember vaguely from some past discussions.
Maybe there is even an issue in the issue tracker for this (or
previous discussions on users- or dev-mailinglist), but I don't have
time to search now ...
If this is the case, a possible workaround could be that you
restructure the project in CVS, or in a copy of your CVS repository
(creating some subdirs, and moving ,v files into them). Of course, I
understand this may be an unworkable solution (depends on the amount
of flexibility you have in moving things around).
Cheers,
--
Johan
With a bit of luck, this will boil down to looking for some place where
allocations should be done in a scratch_pool or iterpool instead of some
long-lived result_pool (which may be called 'pool'). One can compile
with APR pool debugging enabled to information about what's allocated
from which pool.
Paul Burba's work on the recent fixed-in-1.6.15
DoS-via-memory-consumption CVE can serve as an example.
Daniel
(workarounds are plenty --- svnsync, incremental dump, whatnot --- they
are discussed elsethread)
But that doesn't explain why the resulting repository is so large
(compared to the original CVS repository). Sure, there might be memory
usage problems in dump/load (it uses more memory than the resulting
repository uses diskspace), but I think there is more going on.
That's why I'm guessing on rev files being large (and the
corresponding memory structures) because of the amount of dir entries
in each revision. I'm not that intimately familiar with how this is
all represented, and how the rev files are structured and all that, so
I'm just guessing ... I seem to remember something like this from
another discussion in the past.
Cheers,
--
Johan
[dd]
> But that doesn't explain why the resulting repository is so large
> (compared to the original CVS repository). Sure, there might be memory
> usage problems in dump/load (it uses more memory than the resulting
> repository uses diskspace), but I think there is more going on.
>
> That's why I'm guessing on rev files being large (and the
> corresponding memory structures) because of the amount of dir entries
> in each revision. I'm not that intimately familiar with how this is
> all represented, and how the rev files are structured and all that, so
> I'm just guessing ... I seem to remember something like this from
> another discussion in the past.
I have created a small testcase script:
#!/bin/sh
for i in `jot 15000`
do
cat > Testfile_${i}.txt << __END__
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
__END__
svn add Testfile_${i}.txt
svn commit -m "Iteration $i"
done
After the 15000th commit, the size of the repository on disk is 5.5G
with the working directory size being 120M. Besides, after several
thousand commits to this directory SVN slows down considerably. This
must be some design flaw (or peculiarity if you like) of SVN.
Probably related to the way directories are represented in the repository.
See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
to how this currently works.
I'd expect even local operations like the compare against the pristine
versions to decide what to commit to become slow when you put many
thousands of files in one directory because most filesystems aren't good
at that either (although they make fake it with caching). It's one of
those "if it hurts, don't do it" things.
--
Les Mikesell
lesmi...@gmail.com
I did not know it would hurt until I tried to migrate this particular
repository from CVS to SVN.
FreeBSD by itself handles large directories very well due to its
dirhash feature.
BTW I use the FSFS backend if it makes any difference.