Is there a document stating how to report crashes? Was looking for one, but did not see it. Other than output from mount log anything else we need to provide?
Have a volume I had populated with rsync without problems. Switched to unison and has been crashing. First it was file descriptors, after increasing now got a new error. Going to try switching back to rsync and see if that finishes. Each unison run seems to take many hours before it crashes so will report more once I have been able to do more testing.
ulimit is set to 25,000 Have another mount point with max cache entries set to 8000 that is not having any problems. Main difference between the two is that the one crashing the average file size is 126MB.
2012-11-10 06:34:48.301 [13731] Dummy-16: [fuse] Exception after kill: Traceback (most recent call last): File "handlers.pxi", line 296, in llfuse.fuse_read (src/llfuse.c:6832) File "handlers.pxi", line 297, in llfuse.fuse_read (src/llfuse.c:6776) File "/usr/lib/s3ql/s3ql/fs.py", line 932, in read tmp = self._read(fh, offset, length) File "/usr/lib/s3ql/s3ql/fs.py", line 964, in _read with self.cache.get(id_, blockno) as fh: File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/s3ql/s3ql/block_cache.py", line 538, in get el = backend.perform_read(do_read, 's3ql_data_%d' % obj_id) File "/usr/lib/s3ql/s3ql/backends/common.py", line 61, in wrapped return fn(self, *a, **kw) File "/usr/lib/s3ql/s3ql/backends/common.py", line 224, in perform_read return fn(fh) File "/usr/lib/s3ql/s3ql/block_cache.py", line 533, in do_read shutil.copyfileobj(fh, el, BUFSIZE) File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj buf = fsrc.read(length) File "/usr/lib/s3ql/s3ql/backends/common.py", line 619, in read buf = self._read(BUFSIZE) File "/usr/lib/s3ql/s3ql/backends/common.py", line 703, in _read buf = self.fh.read(size) File "/usr/lib/s3ql/s3ql/backends/common.py", line 619, in read buf = self._read(BUFSIZE) File "/usr/lib/s3ql/s3ql/backends/common.py", line 858, in _read :self.remaining + self.off_size])[0] error: unpack requires a string argument of length 4 2012-11-10 06:34:48.359 [13731] MainThread: [mount] Encountered exception, trying to clean up... 2012-11-10 06:34:48.359 [13731] MainThread: [mount] Unmounting file system... 2012-11-10 06:34:48.871 [13731] MainThread: [mount] Exception during cleanup: Traceback (most recent call last): File "/usr/lib/s3ql/s3ql/mount.py", line 187, in main op() File "/usr/lib/s3ql/s3ql/block_cache.py", line 275, in destroy os.rmdir(self.path) OSError: [Errno 39] Directory not empty: '/S3ql-cache/s3c:=2F=2Fs.greenqloud.com=2Fnatserv-default-cache' 2012-11-10 06:34:48.872 [13731] MainThread: [root] Uncaught top-level exception: Traceback (most recent call last): File "/usr/bin/mount.s3ql", line 9, in <module> load_entry_point('s3ql==1.12', 'console_scripts', 'mount.s3ql')() File "/usr/lib/s3ql/s3ql/mount.py", line 139, in main llfuse.main(options.single) File "fuse_api.pxi", line 213, in llfuse.main (src/llfuse.c:18034) File "handlers.pxi", line 296, in llfuse.fuse_read (src/llfuse.c:6832) File "handlers.pxi", line 297, in llfuse.fuse_read (src/llfuse.c:6776) File "/usr/lib/s3ql/s3ql/fs.py", line 932, in read tmp = self._read(fh, offset, length) File "/usr/lib/s3ql/s3ql/fs.py", line 964, in _read with self.cache.get(id_, blockno) as fh: File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/s3ql/s3ql/block_cache.py", line 538, in get el = backend.perform_read(do_read, 's3ql_data_%d' % obj_id) File "/usr/lib/s3ql/s3ql/backends/common.py", line 61, in wrapped return fn(self, *a, **kw) File "/usr/lib/s3ql/s3ql/backends/common.py", line 224, in perform_read return fn(fh) File "/usr/lib/s3ql/s3ql/block_cache.py", line 533, in do_read shutil.copyfileobj(fh, el, BUFSIZE) File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj buf = fsrc.read(length) File "/usr/lib/s3ql/s3ql/backends/common.py", line 619, in read buf = self._read(BUFSIZE) File "/usr/lib/s3ql/s3ql/backends/common.py", line 703, in _read buf = self.fh.read(size) File "/usr/lib/s3ql/s3ql/backends/common.py", line 619, in read buf = self._read(BUFSIZE) File "/usr/lib/s3ql/s3ql/backends/common.py", line 858, in _read :self.remaining + self.off_size])[0] error: unpack requires a string argument of length 4
On Saturday, November 10, 2012 7:01:32 AM UTC-5, Francisco Reyes wrote:
> Have a volume I had populated with rsync without problems. Switched to > unison and has been crashing.
Tried with rsync and now it is also crashing. After each crash have to use fusermount and fsck before can try again.
Traceback (most recent call last): File "handlers.pxi", line 296, in llfuse.fuse_read (src/llfuse.c:6832) File "handlers.pxi", line 297, in llfuse.fuse_read (src/llfuse.c:6776) File "/usr/lib/s3ql/s3ql/fs.py", line 932, in read tmp = self._read(fh, offset, length) File "/usr/lib/s3ql/s3ql/fs.py", line 964, in _read with self.cache.get(id_, blockno) as fh: File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/s3ql/s3ql/block_cache.py", line 538, in get el = backend.perform_read(do_read, 's3ql_data_%d' % obj_id) File "/usr/lib/s3ql/s3ql/backends/common.py", line 61, in wrapped return fn(self, *a, **kw) File "/usr/lib/s3ql/s3ql/backends/common.py", line 224, in perform_read return fn(fh) File "/usr/lib/s3ql/s3ql/block_cache.py", line 533, in do_read shutil.copyfileobj(fh, el, BUFSIZE) File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj buf = fsrc.read(length) File "/usr/lib/s3ql/s3ql/backends/common.py", line 619, in read buf = self._read(BUFSIZE) File "/usr/lib/s3ql/s3ql/backends/common.py", line 703, in _read buf = self.fh.read(size) File "/usr/lib/s3ql/s3ql/backends/common.py", line 619, in read buf = self._read(BUFSIZE) File "/usr/lib/s3ql/s3ql/backends/common.py", line 858, in _read :self.remaining + self.off_size])[0] error: unpack requires a string argument of length 4 2012-11-10 16:26:45.411 [14495] MainThread: [mount] Encountered exception, trying to clean up... 2012-11-10 16:26:45.416 [14495] MainThread: [mount] Unmounting file system... 2012-11-10 16:26:46.281 [14495] MainThread: [mount] Exception during cleanup: Traceback (most recent call last): File "/usr/lib/s3ql/s3ql/mount.py", line 187, in main op() File "/usr/lib/s3ql/s3ql/block_cache.py", line 275, in destroy os.rmdir(self.path) OSError: [Errno 39] Directory not empty: '/S3ql-cache/s3c:=2F=2Fs.greenqloud.com=2Fnatserv-default-cache' 2012-11-10 16:26:46.289 [14495] MainThread: [root] Uncaught top-level exception: Traceback (most recent call last): File "/usr/bin/mount.s3ql", line 9, in <module> load_entry_point('s3ql==1.12', 'console_scripts', 'mount.s3ql')() File "/usr/lib/s3ql/s3ql/mount.py", line 139, in main llfuse.main(options.single) File "fuse_api.pxi", line 213, in llfuse.main (src/llfuse.c:18034) File "handlers.pxi", line 296, in llfuse.fuse_read (src/llfuse.c:6832) File "handlers.pxi", line 297, in llfuse.fuse_read (src/llfuse.c:6776) File "/usr/lib/s3ql/s3ql/fs.py", line 932, in read tmp = self._read(fh, offset, length) File "/usr/lib/s3ql/s3ql/fs.py", line 964, in _read with self.cache.get(id_, blockno) as fh: File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/usr/lib/s3ql/s3ql/block_cache.py", line 538, in get el = backend.perform_read(do_read, 's3ql_data_%d' % obj_id) File "/usr/lib/s3ql/s3ql/backends/common.py", line 61, in wrapped return fn(self, *a, **kw) File "/usr/lib/s3ql/s3ql/backends/common.py", line 224, in perform_read return fn(fh) File "/usr/lib/s3ql/s3ql/block_cache.py", line 533, in do_read shutil.copyfileobj(fh, el, BUFSIZE) File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj buf = fsrc.read(length) File "/usr/lib/s3ql/s3ql/backends/common.py", line 619, in read buf = self._read(BUFSIZE) File "/usr/lib/s3ql/s3ql/backends/common.py", line 703, in _read buf = self.fh.read(size) File "/usr/lib/s3ql/s3ql/backends/common.py", line 619, in read buf = self._read(BUFSIZE) File "/usr/lib/s3ql/s3ql/backends/common.py", line 858, in _read :self.remaining + self.off_size])[0] error: unpack requires a string argument of length 4 2012-11-10 16:31:26.633 [10455] Metadata-Upload-Thread: [mount] File system unchanged, not uploading metadata.
>Run setfattr -n fuse_stacktrace [mountpoint] >Make a copy of the mount.log logfile in ~/.s3ql and attach it to your
email or bug report
Before the crash? i.e. Mount, run setfattr, try rsync/unison again to cause the crash? Going to try and follow the instructions on providing debug info..
> Before the crash? i.e.
> Mount, run setfattr, try rsync/unison again to cause the crash?
> Going to try and follow the instructions on providing debug info..
Unfortunately I don't think I'll have time to look into this before Mid
December, but I'll do my best.
You could help a lot if you could find a way for me to reliably
reproduce the problem (either starting from an empty volume, or by
giving me access to a test volume that has the problem). That'd make
debugging a lot easier for me.
You do not need to run setfattr, this is just for debugging a
hanging/frozen process.
According to the log, the error happens when data is read from the
backend, but I if I understood you correctly, you are using rsync to
*write* data. Is that correct? Or did you use the -c flag, so that rsync
might also be reading?
Best,
-Nikolaus
-- »Time flies like an arrow, fruit flies like a Banana.«
On Saturday, November 10, 2012 5:18:42 PM UTC-5, Nikolaus Rath wrote: >>Thanks for reporting this. Could you please open an issue on >>http://code.google.com/p/s3ql/issues/list as well?
Will do.
>>Unfortunately I don't think I'll have time to look into this before Mid >>December, but I'll do my best.
Let me know what I can do to help. One thing I could to that is near trivial is to re-run a few times and see if it always crashes on the same file.
>>You could help a lot if you could find a way for me to reliably >>reproduce the problem
Will try, but we are talking about 100GB+ The entire set had already gotten uploaded without problems. This all started when I switched to unison instead of rsync. Whatever the problem seems to have stuck after that.
>> (either starting from an empty volume, >> or by giving me access to a test volume that has the problem).
Will see what I can do.
>> That'd make debugging a lot easier for me.
Ok.. in the mean time let me know if there is anything else.
>>You do not need to run setfattr, this is just for debugging a >>hanging/frozen process.
Ok
>>According to the log, the error happens when data is read from the >>backend, but I if I understood you correctly, you are using rsync to >>*write* data. Is that correct? Or did you use the -c flag, so that rsync >>might also be reading?
Was using the -c flag so yes, it was reading. Just had an idea.. will try and compute the MD5 of all the files and see if that triggers the problem and if so, in which file.
I think I will try: * Attempt to compute md5 of all files * If md5 does not crash try running rsync several times and see if it crashes on the same file
Both of those although time/bandwith/compute intensive use so little of my time that are worth trying. Specially if somehow they show it always crashes on the same file. Would make testing go a lot faster. Right now it is hours between crashes.
>>>Unfortunately I don't think I'll have time to look into this before Mid
>>>December, but I'll do my best.
> Let me know what I can do to help.
> One thing I could to that is near trivial is to re-run a few times and
> see if it always crashes on the same file.
>>>You could help a lot if you could find a way for me to reliably
>>>reproduce the problem
> Will try, but we are talking about 100GB+
> The entire set had already gotten uploaded without problems. This all
> started when I switched to unison instead of rsync. Whatever the problem
> seems to have stuck after that.
>>>According to the log, the error happens when data is read from the
>>>backend, but I if I understood you correctly, you are using rsync to
>>>*write* data. Is that correct? Or did you use the -c flag, so that rsync
>>>might also be reading?
> Was using the -c flag so yes, it was reading.
> Just had an idea.. will try and compute the MD5 of all the files and see
> if that triggers the problem and if so, in which file.
> I think I will try:
> * Attempt to compute md5 of all files
> * If md5 does not crash try running rsync several times and see if it
> crashes on the same file
> Both of those although time/bandwith/compute intensive use so little of
> my time that are worth trying. Specially if somehow they show it always
> crashes on the same file. Would make testing go a lot faster. Right now
> it is hours between crashes.
From the error message, it seems that a storage object has not been
written correctly. Therefore, I expect that it will always crash at the
same time, and it should not matter how you read that file (md5sum,
unison, rsync). That said, this is a hypothesis so it should still be
tested.
If that is correct, then the more important question is really what went
wrong when the object was written. This can most likely not be
determined after the fact, thus my interested in reproducing the problem
with a fresh volume.
Best,
-Nikolaus
-- »Time flies like an arrow, fruit flies like a Banana.«
On Sunday, November 11, 2012 11:33:09 AM UTC-5, Nikolaus Rath wrote: >From the error message, it seems that a storage object has not been >written correctly. Therefore, I expect that it will always crash at the >same time
Ok. That should help. Waiting to hear back from the cloud provider I am using to find out if I pay for bandwith between the VM and their cloud storage ( both at same provider. Got account few weeks ago and have not seen that info). If they charge for traffic between VM and storage then I may change provider and may not be able to continue troubleshooting otherwise every time I run an MD5 or "rsync -c" would be paying for bandwith.
>If that is correct, then the more important question is really what went >wrong when the object was written. This can most likely not be >determined after the fact
I guess in that case the best would be trap the error and mark the file as bad and delete it.
>, thus my interested in reproducing the problem with a fresh volume.
Have a machine coming which will store 100GB + for a client. Will let you know if bump into that error.
> On Sunday, November 11, 2012 11:33:09 AM UTC-5, Nikolaus Rath wrote:
>>From the error message, it seems that a storage object has not been
>>written correctly. Therefore, I expect that it will always crash at the
>>same time
> Ok. That should help.
> Waiting to hear back from the cloud provider I am using to find out if I
> pay for bandwith between the VM and their cloud storage ( both at same
> provider. Got account few weeks ago and have not seen that info). If
> they charge for traffic between VM and storage then I may change
> provider and may not be able to continue troubleshooting otherwise every
> time I run an MD5 or "rsync -c" would be paying for bandwith.
>>If that is correct, then the more important question is really what went
>>wrong when the object was written. This can most likely not be
>>determined after the fact
> I guess in that case the best would be trap the error and mark the file
> as bad and delete it.
Yeah, that'll correct the problem. I can give you a patch for that if
you want to.
However, I'm much more interested in finding out how the error got
produced, so that I can ensure it won't happen again. Your file
system/computer didn't by any chance crash while you were writing data
or something like that?
Best,
-Nikolaus
-- »Time flies like an arrow, fruit flies like a Banana.«
On Sunday, November 11, 2012 3:41:11 PM UTC-5, Nikolaus Rath wrote: >produced, so that I can ensure it won't happen again. Your file >system/computer didn't by any chance crash while you were writing data >or something like that?
The complete history of the error is.. Did rsync (without -c). Worked. Did it a few times over a few days without problems. 100GB+
Switched to unison. Crashed.
Unison kept crashing.
Figured during those crashes something may have gone wrong so switched to "rsync -c". Has been crashing ever since.
On Saturday, November 10, 2012 7:01:32 AM UTC-5, Francisco Reyes wrote: >>Have a volume I had populated with rsync without problems. Switched to
unison and has been crashing.
Have an update. Ran md5. It worked against the entire filesystem. Ran rsync -c, now worked.. Ran rsync -c again.. still worked.. Tried unison again, crashed with the same error have been getting:
File "/usr/lib/s3ql/s3ql/backends/common.py", line 858, in _read :self.remaining + self.off_size])[0]
Did an md5 on the file unison reported, but it ran ok.
> Not sure what impact this has, but I discovered a significant amount of > packet drops on the last hops between source and destination. Between 15% > and 70%.
Starting to wonder if the times I tried unison this was occurring or if it is that unison is more susceptible to fail and somehow corrupt a file in S3QL due to bad connectivity.
> On Saturday, November 10, 2012 7:01:32 AM UTC-5, Francisco Reyes wrote:
>>>Have a volume I had populated with rsync without problems. Switched to
> unison and has been crashing.
> Have an update.
> Ran md5. It worked against the entire filesystem.
> Ran rsync -c, now worked..
> Ran rsync -c again.. still worked..
> Tried unison again, crashed with the same error have been getting:
> File "/usr/lib/s3ql/s3ql/backends/common.py", line 858, in _read
> :self.remaining + self.off_size])[0]
> Did an md5 on the file unison reported, but it ran ok.
Can you ran unison only on that file? Or maybe that file + the previous
3 and next 3?
Best,
-Nikolaus
-- »Time flies like an arrow, fruit flies like a Banana.«
On Wednesday, November 14, 2012 8:47:00 AM UTC-5, Nikolaus Rath wrote: >Can you ran unison only on that file? Or maybe that file + the previous >3 and next 3?
Will setup unison to do the folder after the current rsync finishes. I use this setup for offsite backup so once a day or every two days I upload whatever has changed. The file that crashed I don't believe even has been changed in a while so don't expect the current rsync that is running now to change it.
Also, heard back from the cloud provider, GreenQloud, and they are telling me the dropped packets as reported by mtr are normal.. even if I don't see that behavior with any other provider I use..
Will report back once the current backup/rsync finishes and try again unison.
I finished an "rsync -c" on a set of data earlier today. Ran it again, just as a tets.
No crashes.
Tried unison and it crashed. Did fsck.
Changed unison to run on just the folder that had crashed before. It crashed again.
Did fsck again. Did md5sum of all files in the folder unison was crashing. No problems. Ran unison again. This time instead of crashing it detected a single file as different. There is no way that file could have changed. Nothing updated it and had run two rsync -c today. Is it possible the fsck changed it? Shouldn't the fsck have deleted/moved to lost+found.. the file?
Unison is not set to do checksum right now.. so it is possible only attributes may have changed.. but even that would fsck have changed any attributes?
> I finished an "rsync -c" on a set of data earlier today.
> Ran it again, just as a tets.
> No crashes.
> Tried unison and it crashed.
> Did fsck.
> Changed unison to run on just the folder that had crashed before. It
> crashed again.
> Did fsck again.
> Did md5sum of all files in the folder unison was crashing. No problems.
> Ran unison again. This time instead of crashing it detected a single
> file as different. There is no way that file could have changed. Nothing
> updated it and had run two rsync -c today.
> Is it possible the fsck changed it? Shouldn't the fsck have
> deleted/moved to lost+found.. the file?
> Unison is not set to do checksum right now.. so it is possible only
> attributes may have changed.. but even that would fsck have changed any
> attributes?
Hmm. I just can't get a clear picture of what is happening. Here are my
thoughts:
* Any file whose contents are touched by fsck will be moved to lost+found
* fsck may change file attributes if they are corrupted, but it will
always report this
* The S3QL crash happens in a function whose only parameter is the name
of the storage object that needs to be retrieved. I can not imagine how
this can possibly fail with unison but work with rsync and md5sum.
* The S3QL crash can only happen if some process is reading file system
data. So either your crash is not actually produced by unison but some
other process, or unison is doing more than comparing checksums.
* If you do not write any data, fsck should not find any errors. If you
want, I can give you a special fsck that just marks the file system as
intact (so save time when testing). Let me know if you want it.
* We need to get more logging information. Running mount.s3ql with
--debug, and unison under strace will do that, but the information is
only useful if it isn't too much. How big is the folder that seems to be
consistently crashing?
* Did you flush the S3QL cache between the different runs of unison,
rsync and md5sum? If not, this may explain the different results.
* How many unison/rsync/md5sum runs have you done in total? So far, the
most likely explanation is still that S3QL fails to detect some data
corruption when retrieving data from the network. But in that case the
differences with rsync et al would need to be by chance...
I'm sorry that this is so hard to debug. I very much appreciate the time
you're spending on this.
Best,
-Nikolaus
-- Time flies like an arrow, fruit flies like a Banana.
On Thursday, November 15, 2012 2:36:15 PM UTC-5, Nikolaus Rath wrote:
Before I reply.. Today was doing a "rsync -c" and it crashed. It is the first time it crashes with rsync.
>>Hmm. I just can't get a clear picture of what is happening. Here are my >>thoughts:
I am tempted to create a new bucket and move everything over.
>* The S3QL crash happens in a function whose only parameter is the name
>of the storage object that needs to be retrieved. I can not imagine how >this can possibly fail with unison but work with rsync and md5sum.
Not sure why has happened, or changed, but this time it crashed with rsync -c, for the first time.
>* The S3QL crash can only happen if some process is reading file system >data. So either your crash is not actually produced by unison but some >other process, or unison is doing more than comparing checksums.
So far, have not had a crash with regular rsync. The reason unison is doing a checksum check is because it is the first pass. Have yet to have a single clean pass with unison.
>* If you do not write any data, fsck should not find any errors. If you >want, I can give you a special fsck that just marks the file system as >intact (so save time when testing). Let me know if you want it.
My biggest concern is to help you figure out what is going on. This is just backup and I have all data in at least a second machine. This is my offsite, disaster recovery backup.
>>* We need to get more logging information. Running mount.s3ql with >>--debug, and unison under strace will do that, but the information is >>only useful if it isn't too much. How big is the folder that seems to be >>consistently crashing?
244 files, 9.3G
>* Did you flush the S3QL cache between the different runs of unison, >rsync and md5sum? If not, this may explain the different results.
No. How would I do that? s3qlctrl flushcache ?
>* How many unison/rsync/md5sum runs have you done in total?
The reason so many more rsyncs is that I have been doing backups any time I have some important data I want to backup, in addition to probably 4 to 6 that were just testing.
>most likely explanation is still that S3QL fails to detect some data >corruption when retrieving data from the network. But in that case the >differences with rsync et al would need to be by chance...
Agree that it is something along those lines.
>I'm sorry that this is so hard to debug. I very much appreciate the time >you're spending on this.
Not at all. I am grateful for all your work with this program and I am glad to help in any way I can.
What really has me puzzled is that it has never crashed with the md5s.
> On Thursday, November 15, 2012 2:36:15 PM UTC-5, Nikolaus Rath wrote:
> Before I reply..
> Today was doing a "rsync -c" and it crashed.
> It is the first time it crashes with rsync.
>>>Hmm. I just can't get a clear picture of what is happening. Here are my
>>>thoughts:
> I am tempted to create a new bucket and move everything over.
> >* The S3QL crash happens in a function whose only parameter is the name
>>of the storage object that needs to be retrieved. I can not imagine how
>>this can possibly fail with unison but work with rsync and md5sum.
> Not sure why has happened, or changed, but this time it crashed with
> rsync -c, for the first time.
Ah. That's a good think, actually. It confirms my theory :-).
>>* The S3QL crash can only happen if some process is reading file system
>>data. So either your crash is not actually produced by unison but some
>>other process, or unison is doing more than comparing checksums.
> So far, have not had a crash with regular rsync. The reason unison is
> doing a checksum check is because it is the first pass. Have yet to have
> a single clean pass with unison.
Ah, ok.
>>>* We need to get more logging information. Running mount.s3ql with
>>>--debug, and unison under strace will do that, but the information is
>>>only useful if it isn't too much. How big is the folder that seems to be
>>>consistently crashing?
> 244 files, 9.3G
Ok, that would produce way to much information in the logs. In any case,
I am pretty sure that there is no specific damaged file here but a
random network problem.
>>* Did you flush the S3QL cache between the different runs of unison,
>>rsync and md5sum? If not, this may explain the different results.
> No. How would I do that?
> s3qlctrl flushcache ?
Exactly.
>>* How many unison/rsync/md5sum runs have you done in total?
Ok, if we combine this with the cache not being flushed, then I'm
relatively confident that it was just by chance that unison appeared to
fail but rsync appeared to work.
As soon as I have time for some coding, I'm going to implement some
extra checks (based on my guess of it being a network problem) and send
you a patch. That will hopefully confirm my theory and either allow me
to fix the problem or allow you to complain to your storage provider
:-). (but I guess even then I should change S3QL to generate a more
useful error message instead of just crashing).
Best,
-Nikolaus
-- »Time flies like an arrow, fruit flies like a Banana.«
On Saturday, November 17, 2012 7:09:18 PM UTC-5, Nikolaus Rath wrote: >>> s3qlctrl flushcache ? >Exactly.
Md5 just crashed.
>As soon as I have time for some coding, I'm going to implement some >extra checks (based on my guess of it being a network problem) and send >you a patch.
>>As soon as I have time for some coding, I'm going to implement some
>>extra checks (based on my guess of it being a network problem) and send
>>you a patch.
> Thanks.
> How come fsck is not catching there is a problem?
Well, because it's probably a network problem. The file system is
perfectly alright :-).
Best,
-Nikolaus
-- »Time flies like an arrow, fruit flies like a Banana.«
my s3ql evaluation system crashes with the same error, when doing a rsync -c. So maybe I can help debugging. I am a professional software developer, so I know one or two strategies to hunt a bug. But I have only a very basic understainding of python, doing mainly C.
My installation is different, as I do rsync between two disks connected to the same machine. But the s3ql filesystem is running on top of a network mounted sshfs filesystem, so it is a network based rsync.
It seems to crash at different files, but I will dig into that a little bit before making a definate statement.
However, I think it might help to know that this bug shows up on a machine which does only internal network traffic. So the crash isn't raised by heavy network errors. A more stable network connection would be hard to get.
Except sshfs does introduce network errors. Maybe I should check if the crash is triggered without sshfs.
> my s3ql evaluation system crashes with the same error, when doing a
> rsync -c. So maybe I can help debugging. I am a professional software
> developer, so I know one or two strategies to hunt a bug. But I have
> only a very basic understainding of python, doing mainly C.
> My installation is different, as I do rsync between two disks connected
> to the same machine. But the s3ql filesystem is running on top of a
> network mounted sshfs filesystem, so it is a network based rsync.
> It seems to crash at different files, but I will dig into that a little
> bit before making a definate statement.
If the same file can sometimes be read correctly, and sometimes not,
then I know what the problem is. Just have to find a free minute to
write a patch.
Thanks for your help!
-Nikolaus
-- Time flies like an arrow, fruit flies like a Banana.
>> my s3ql evaluation system crashes with the same error, when doing a
>> rsync -c. So maybe I can help debugging. I am a professional software
>> developer, so I know one or two strategies to hunt a bug. But I have
>> only a very basic understainding of python, doing mainly C.
>> My installation is different, as I do rsync between two disks connected
>> to the same machine. But the s3ql filesystem is running on top of a
>> network mounted sshfs filesystem, so it is a network based rsync.
>> It seems to crash at different files, but I will dig into that a little
>> bit before making a definate statement.
> If the same file can sometimes be read correctly, and sometimes not,
> then I know what the problem is. Just have to find a free minute to
> write a patch.