[lbackup-discussion] General rsync errors

Scott Haneda reply to this message via the mailing list
Wed May 5 15:45:45 NZST 2010


On May 3, 2010, at 3:40 AM, henri wrote:

>> Depending on how this test works out, we may be able to figure out what is causing it, and then get it fixed in rsync.
> 
> I am uncertain of where the problem is at the moment as I have not personally seen these errors. Have you tried using rsync with the source / destination set to different physical devices? 

Yes, I have quite a few servers, and quite a few things going on.  If I had to make a huge generalization, my basic setup is that each machine has it's own internal RAID mirror.  Basic two drives, one protecting the other.

This is not backup though, this is for drives that misbehave.  I have scripts that watch raid and smart status and keep me up to date on what is going on with the mirror.

The mirror is cloned by rsync locally, from itself, to itself, twice a day, sometimes with lbackup, others without.  I am still deciding if I will be rolling out lbackup on all machines, or continuing to use my hobbled little backup script that sort of does the same thing.  At midnight and noon the entire data set at the root of the mirror is backed up, sort of like:
    rsyc / /backups/local/noon
    rsyc / /backups/local/midnight

From there, on other various schedules, depending on the machine, and how much data there is that changes, depends on my archival schedule.  For example, an http server where lots of clients are uploading new data all the time, will get a much more frequent schedule, whereas a machine that sits there, serves a little DNS and does some SNMP reporting, I need not backup all that often, once a day is certainly enough.

Since the local to local mirror backup is only as good as those two drives, all that data is sent off to my backup server, via rsync over ssh.  That machine itself has a large mirror, and follows most of the same routines.  The backup machine accepts backups in a rotational method either by my script moving directories around, or by lbackup on the machines I am testing that on.

With all this, I can say, there are a significant amount of changes in source and destination drives.  And I have and continue to see errors from local to local, local to remote, and remote to local, all using different combinations of drives and transports.  Also, some cases uses rsync and others use lbackup. 

The only thing unique here is these are all Mac OS X machines, and they all use two patches:
    patch-crtimes.diff
    patch-fileflags.diff

I have used those patches for almost a year now.  Why is it that those patches are not accepted into the main branch?  Maybe there is something wrong with the patches?  Where do I find out more about the patches,  how made them, why they are not incorporated into upstream, why has Apple not at least incorporated them into their ditros?

> If it is a bug in rsync, it seems odd that the reported error is not easily reproducible. As stated above I am not yet sure where the problem is located. As previously mentioned, trying a known good device for reading / writing is a good way to move forward if you are experiencing IO errors.

Yes, the repeatability aspect of this is most frustrating.  If I can repeat it, we are down to a file, which could have permissions, acl's, data, resource forks, data forks, and some other special to Mac OS X meta data.  In a repeatable case, it would be trivial to start to pick away at the file until I find what at least causes the errors.  I find it somewhat hard to believe that across perhaps 20 total drives, and a handful of machines, that all of them have something wrong with the hardware.

I have more or less ruled out lbackup as far as I am concerned.  The only thing I can think is it is patch related, as that is the only difference in the Mac OS X version and the version the rest of the world uses, or there is some kind of strange race condition that happens when multiple rsyncs run.  My schedules are all staggered, though that does not mean the OS is not changing files in the middle of a backup.  This could explain why a second run "fixes" things.  Perhaps there are some options to rsync that do the file compare operations at the exact time the file comes into queue.  I bet that comes at the expense of performance greatly though.

I do wonder how Carbon Copy Cloner (CCC), which has a rather large following and user base, is what I would consider to be pretty rock solid software.  I am sure 10's of thousands of people use CCC every day, and they manage to do so without filling their forums with reports of errors.  I am using the same patches that CCC uses.  

I am keeping a much closer eye on it now, and also have removed a few launchd based schedules and run the backups manually now, so I can watch them as they happen, until I do get to a repeatable case.

If I find out more, I will post here, as it stands, until I get it repeatable, it is as good as it not happening :)

Thanks for looking into this and for the suggestions.
-- 
Scott * If you contact me off list replace talklists@ with scott@ * 



More information about the lbackup-discussion mailing list