Re: [Deja-dup-hackers] [Duplicity-talk] (Feature Request) inotify integration?



Hello all,

Jenn Hulley-Miller wrote:
>> Is it possible to provide duplicity with a list of changed files
>> [...]built via linux's inotify subsystem.

> I have looked at this before and it would be a good solution for
> detecting changed files.  [...]
> Please enter a request on the Launchpad site and we'll look at
> integrating this and retiring the old scanning method.
> ...Ken

I have been giving this a lot of thought over the last few days and cross-post it to the Deja Dup list as it is relevant to them as well. I apologise that it has become longer than intended!

Conceptually, inotify (http://en.wikipedia.org/wiki/Inotify) is a dream addition to a backup program. As Jenn says, scanning for changes is often the vast majority of the work for an incremental backup.

TimeVault talks about the move to event-driven backup here:
https://wiki.ubuntu.com/TimeVault/Restructure
"A snapshot delay is set for each directory to be watched (say, 1 min.). When a file changes in that directory, a snapshot is taken after the configured delay time. If the file changes before the snapshot is taken, then the timer is reset repeatedly until you're done fiddling with the file, or some other specified time runs out (say, 60 min.) and a snapshot is forced. This mini-snapshot is called a delta. It only involves the file in question and so should be relatively fast (my benchmarks put it at around 50-150ms for a 1MB file, less for smaller files), and can be done in the background."

This really could change the way that backups are done, as the notion of "scheduling" backups would become redundant if we could rely on events showing that files had changed.

However, if duplicity moved to inotify and it failed to notify the backup daemon that a file had changed, that file would never get backed up (unless it was later changed again and noticed). The key ways in which I can see it failing are: (a) if files to be backed up are on removable drives and changed on a different computer; (b) if files to be backed up are on a partition modified by another OS install on the same computer (for example, one of our computers dual-boots Windows and deja dup monitors one folder on the Windows partition); or (c) if we hit some bug in inotify (I have no idea how reliable inotify is in cases, say, of power cuts or files changed over SAMBA etc.)

The best generic option that I have been able to come up with would be to use inotify for changes to partitions used by the system (/, /home etc.) and to still do a system scan for removable drives and other partitions. This still would not solve the use-case of someone who mounts and edits their /home partition files from another (say, Gnu/Linux) install on the same machine.

I would guess that inotify would be a better way to determine changes in 99% of cases for 99% of users, but a backup program really needs to cater for those 1% cases. If there are times that we cannot trust inotify, then duplicity really needs to be doing a system scan -- at least occasionally to pick up anything missed. Potentially duplicity could run scans (as it does now) and compare the results to inotify results for the first x backups to try and profile the system and user. If inotify is often wrong then full scans could be done each time and if inotify is always the same, then it could rely on this with a full scan every y backups/months.

A different approach, to obtain some of the benefit of inotify with no risk, would be to follow TimeVault's lead and instantly (after some delay to prevent 100 versions of a file while you are editing it) backup files noticed by inotify. This would be better than the current approach as files would always be backed up, even if they changed between scheduled backups. However, the current approach to scheduled backups (full scans) could be maintained and this would mop up anything missed by inotify -- though for 99% of users this scan would likely yield no changed files.

I think it would be great if we could come up with an option that would be best in nearly all cases, rather than adding additional configuration options. This is something that a lot of users would not be able to make an informed decision about.

I thought it was worth discussing these issues on the list, so that the best approach can be embodied in a useful feature request.

Thanks again to all who have made such an excellent program.

Regards,

Aaron




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]