--- Comment #1 from joluinfa...@gmail.com 2011-05-29 01:43:29 UTC --- More information: I did compile the rsync 3.0.8 version, statically, in centos 3.9 32 bits, because this is the platform to build utilities for esxi 4.1.
--- Comment #3 from joluinfa...@gmail.com 2011-05-29 04:46:15 UTC --- The size of file is 8gb (not 36gb). Without parameters --inplace and --sparse, I did check, in the destination disk, the temporary file (.cata-920-flat.vmdk.WKiDRx), increase up to 8gb, and, then, I see the rsync error in screen, and the temporary file is deleted.
What |Removed |Added --------------------------------------------------------------------------- - Status|NEW |ASSIGNED
--- Comment #5 from Wayne Davison <way...@samba.org> 2011-06-18 17:41:13 UTC --- Any time there is a read error, rsync tweaks the checksum so that it doesn't match, thus alerting the receiver that the file is bogus. You need to figure out what is causing this error from your OS:
That error is reported at the end of all the reading, and indicates that one of the read() calls for the sender's file returned that errno. If you strace the sending side, you should be able to see what the parameters are to the read() that trigger that OS error.
--- Comment #6 from joluinfa...@gmail.com 2011-07-07 15:38:30 UTC --- Excuse me by the delay, and thanks for your reply. The problem are: * I need use the rsync (statically compiled) to rsync files between a esxi-free 4.1 (vmware) and a linux filesystem. * Then, I did compile the rsync from sources, in a rhel platform (appropiate to generate binaries for this version of vmware), and transfer it to esxi host. * In esxi-free 4.1 (the sender side), we don't have tools to debug. * I don't have experience with sparse files in other platforms, to test it.
Can you give me a little help to modify the source to add debug information in the syscall?. I did try it, but, because the size of file is very large, and I see a lot of debug information.
(In reply to comment #5)
> Any time there is a read error, rsync tweaks the checksum so that it doesn't > match, thus alerting the receiver that the file is bogus. You need to figure > out what is causing this error from your OS:
> That error is reported at the end of all the reading, and indicates that one of > the read() calls for the sender's file returned that errno. If you strace the > sending side, you should be able to see what the parameters are to the read() > that trigger that OS error.
--- Comment #8 from Rick <r...@ict-concept.nl> 2012-03-01 10:08:28 UTC ---
Hi Jorge,
You're not alone :)
I'm using rsync the backup some vmware servers. Small files work without any
problems but the large (-flat.vmdk) file only works the first time. (I do
create a snapshot before syncing)
This is the log:
--------------------------
Sync the -flat.vmdk file (Vcenter Server-flat.vmdk)
rsync: read errors mapping
"/vmfs/volumes/4cf36d72-d19e4c8a-cec8-d485645b13ad/Vcenter Server/Vcenter
Server-flat.vmdk": Invalid argument (22)
ERROR: Vcenter Server-flat.vmdk failed verification -- update discarded.
rsync: read errors mapping
"/vmfs/volumes/4cf36d72-d19e4c8a-cec8-d485645b13ad/Vcenter Server/Vcenter
Server-flat.vmdk": Invalid argument (22)
rsync error: some files/attrs were not transferred (see previous errors) (code
23) at main.c(1518) [generator=3.0.9]
Thu Mar 1 10:43:00 CET 2012 FAILED: rsync on Vcenter Server failed (vmdk-flat
file) (0)
--------------------------
Some details:
The both systems are running VMware ESX 4.1.0 build-260247 (Red Hat 5.1).
The filesystem is ext3.
I tried both rsync 3.0.7 and 3.0.9.
The filesize is 80G / 200G (both ESX servers make copies of the other ESX
server. Both ESX servers give the same error)
Removing the --sparse parameter doesn't work
The rsync command used is: /usr/local/bin/rsync -t -e "ssh -i
/operator/vmbackup/vm_image_key -l root" --rsync-path=/operator/rsync --sparse
--stats
"192.168.1.10:/vmfs/volumes/4cf36d72-d19e4c8a-cec8-d485645b13ad/Server02/Se rver02-flat.vmdk"
"/vmfs/volumes/VM-Datastore-VMHOST02//Server02/"
The backup script runs at multiple customers (some are also ESX 4.1 machines),
but this is the only one giving an error.
--- Comment #11 from Remco Hosman <re...@hosman.xs4all.nl> 2012-03-10 20:37:32 UTC ---
I am having the same issue with ESXi5.0 and rsync 3.0.9. Compiled rsync on a
debian 4.0 system. I have a VM and make a snapshot. at that point the original disk file should
not change anymore. When the problem does not occur:
*) when the VM is not running
*) when the image file does not exists on the target
when the problem does occur:
*) when the VM is running and the file already exists on the target.
I attached a tgz with straces of the following command:
strace -v -tt -f -o out.log ./rsync --copy-links --recursive --compress
--progress --inplace --bwlimit=100
/vmfs/volumes/4e9007cc-5eb44d46-eace-0019b9e8ff4d/backups/
user@host:/targetdir/
in all 4 cases (running/not running and existing/non existing target file)
--- Comment #12 from grarp...@gmail.com 2012-03-10 22:47:50 UTC ---
As an aside, be aware that you may encounter rather poor
performance with sparse files. If so, and until this bug
is addressed, you may need to look at other transfer options.
https://bugzilla.samba.org/show_bug.cgi?id=7854
You may also want to backup a snapshot (maybe one application
level write point of fail) instead of missing all the writes
numbered 2+ that will slip by while waiting on those hundred
GB to copy. More properly, shut the vm down first or use any
built in snapshot tool the vm has.
Also, check *.* syslog and dmesg for any errors, run memtest86.
But it say mmap 22, so maybe not disk media error.
--- Comment #13 from Remco Hosman <re...@hosman.xs4all.nl> 2012-03-11 08:08:22 UTC ---
The reason for me to use rsync is the following:
I want to backup a remote server over the internet. so bandwidth usage is an
issue. The backup target is a btrfs system (again, not the cause of the
problem, happens with ext3 on the target too). I make a reflink copy of
yesterdays file and do rsync --inplace to backup. uses very little bandwidth
and diskspace. The time it takes for the rsync to take place is not an issue.
The snapshot is made and does not change over time until delete it after rsync
is done.
i did some more testing and found out the problem is not related to the file
being sparse or big. when i have a non-sparse 16MB file, the problem persists.
now im not an expert in interpreting strace output, but i dont see much
difference. but does strace display things differently depending on the return
value of a call?
--- Comment #14 from Remco Hosman <re...@hosman.xs4all.nl> 2012-03-11 10:38:02 UTC ---
Is it possible the issue in the non-blocksized-read ?
the first read() is always 262144 bytes.
when the target does not exist yet, all other read()'s are 262144 bytes too,
however, when the target file does exist, the next read()'s are 1 byte less:
262143 bytes.
Why this only matters when the VM is running i do not know. Must be an internal
VMFS issue.
--- Comment #15 from Remco Hosman <re...@hosman.xs4all.nl> 2012-03-11 10:52:55 UTC ---
confirmed. the following text program demonstrates the problem:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
the first read() returns 262144, as it should. the 2nd returns -1 on VMFS when
the VM is running.
The required block-alignment is a multiple of 512 bytes.
--- Comment #16 from Remco Hosman <re...@hosman.xs4all.nl> 2012-03-11 20:11:46 UTC ---
Someone pointed out i forgot to test the error returned by the failed reads. so
here they are.
after a failed read errno is 22, meaning EINVAL (also the same numerical error
rsync reports).
I also found that lseek()'ing to a non-512 byte boundary makes read()'s of any
size fail until you seek to a 512-byte boundary again.
--- Comment #17 from molniev <moln...@hotmail.com> 2012-10-19 12:49:07 UTC --- Hi. I'am using rsync 3.0.9 on ESXi 5.0 and detect this bug (thanks Remco Hosman). I made a patch that solves the problem. This code reads the files, aligned with the 4096 border. The size of the blocks in (4096 * 4096 * 2) bytes (I think it will be faster, but you sacrifice memory).
Replace the file "fileio.c" function "map_ptr" (line 180-264) with the following code:
What |Removed |Added
--------------------------------------------------------------------------- -
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #22 from Wayne Davison <way...@samba.org> 2013-01-18 23:38:52 UTC ---
I have checked in a change that makes the map_ptr() function align its reads on
1024-byte boundaries. Hopefully that will fix this issue without making things
inefficient.
I looked at the patch from molniev, and it has a bug in the read look should
the read return less than the number of bytes requested (the next read would
overwrite earlier bytes instead of appending them). My patch is instead just a
simpler bit of rounding changes made to the stock code.