selkie.manifest
— Manifest of sizes, hashes
Executable
Usage:
$ manifest foo # writes foo.sizes
$ manifest -s foo # writes foo.chksum
$ manifest -h foo # writes foo.hashes
$ manifest -H foo # writes foo.md5
$ manifest -c foo | diff - foo.sizes
$ manifest -cs foo | diff - foo.chksum
$ manifest -ch foo | diff - foo.hashes
$ manifest -z foo -o foo.sizes
$ manifest -s foo -o foo.chksum
$ manifest -h foo -o foo.hashes
$ manifest -Z foo.hashes | diff - foo.sizes
$ manifest -l foo > foo.dirs
$ manifest foo.dirs
$ manifest -d foo old.sizes # writes to stdout
$ manifest -d foo old.sizes -o foo.diffs
$ manifest -e delta.tgz -d foo old.sizes
$ manifest -e delta.tgz foo.diffs
$ manifest -i delta.tgz
Create foo.sizes
The script manifest
creates a listing of the files in a directory
hierarchy, recording their sizes, or their sizes and MD5 hash values.
The basic usage is:
$ manifest foo
This expects foo
to exist and to be a directory. It writes a
file called foo.sizes
, in which it records the pathnames and
sizes of all files in the directory hierarchy under foo
.
One can then use diff
to compare that listing to the listing
created for another copy of foo
. For example:
$ diff foo.sizes /Backups/foo.sizes
Create and check foo.md5
To compute an MD5 hash for each file:
$ manifest -H foo
Computing the hash values is much slower than just getting the
sizes. For that reason, -H
checkpoints foo.md5 every 2 seconds. It writes first to a temp
file, and replaces foo.md5 only if the temp file is successfully
written.
It prints out “Compute hash …” for each file when it computes the MD5 value, and it prints out “Writing …” each time it checkpoints the .md5 file.
After computing hashes, one should save foo.md5 to a safe place. Subsequently, one can recompute foo.md5 and do a diff to the saved copy to determine whether any files have become corrupted.
Create foo.hashes
(Deprecated.) This is an old version of -H
, which does not do
checkpointing:
$ manifest -h foo
The listing is written to foo.hashes
.
Creating foo.chksum
The -s
flag causes checksums to be produced. This is still
much slower than just a size listing, but it is about four times
faster than hashes. It is not crytographically secure, but is just as
good at detecting changes that were not specifically intended to
deceive.
Writing to stdout
One may use the -c
flag to cause the sizes to be
written to standard output instead of a file:
$ manifest -c foo | diff - foo.sizes
The -c
flag may be combined with -h
or -s
. When hashes are
written to a file, progress messages are printed to standard error,
but the progress messages are suppressed when -c
is provided:
$ manifest -ch foo | diff - foo.hashes
Extracting sizes from foo.hashes or foo.chksum
One may use the -Z
flag to extract a size listing from a
hashes or checksum. For example:
$ manifest -Z foo.hashes | diff - ~/Backups/foo.sizes
$ manifest -Z foo.chksum | diff - ~/Backups/foo.sizes
Selecting directories
Finally, one may limit the listing to a subset of the directories. To get a raw listing of the directories:
$ manifest -l foo > foo.dirs
Then edit foo.dirs
by hand to leave only the directories that
should be included, and do:
$ manifest foo.dirs
As a safety precaution, the filename must end in .dirs
,
otherwise an error is signalled. The output is written to
foo.dirs.sizes
, so that it can be readily distinguished from
foo.sizes
, which includes all subdirectories.
Synchronizing directories
Suppose foo
is a working version of a shared project.
Before making any edits, create a manifest old.sizes
.
After making changes to foo
, the following creates a list of
all files that have been changed (added, modified, deleted):
$ manifest -d foo old.sizes
It prints out a list of actions that would need to be applied to the
old version of foo
to bring it in sync with its
current state. That is, it is a summary of the editing actions that
have been taken since the manifest old.sizes
was created.
One can capture just the edits and apply them to other copies of foo
.
To create a tarfile containing the edits:
$ manifest -e delta.tgz -d foo old.sizes
To update an old version of foo
, e.g., on another machine,
copy delta.tgz
to the other machine and do:
$ manifest -i delta.tgz
The tarfile delta.tgz
includes a listing of the diffs,
containing relative pathnames; that is why one does not
specify foo
on the command line.
Module
Toplevel functions
The command line versions translate to function calls as follows. First, flags translate into keyword-value settings. Some flags take one or two arguments; their arguments are taken as the flags are encountered. In the end, if any arguments remain, the null flag is imputed.
z |
fnc=create, ifile=*arg*, otype=’z’ |
s |
fnc=create, ifile=*arg*, otype=’s’ |
h |
fnc=create, ifile=*arg*, otype=’h’ |
c |
ofile=’-’ |
o |
ofile=*arg* |
Z |
fnc=extract_sizes, ifile=*arg* |
l |
fnc=list_directories, ifile=*arg* |
d |
diff=(arg1, arg2) |
e |
fnc=export, ofile=*arg* |
i |
fnc=import_into, ifile=*arg* |
null |
arg=*arg* |
The following information is then supplied:
If keyword ‘diff’ is provided, then fnc defaults to ‘diff’. Otherwise, fnc defaults to ‘create’. The keyword ‘arg’ gets replaced by:
‘ifile’ if the fnc is ‘create’ ‘diff’ if the fnc is ‘export’ ‘dest’ if the fnc is ‘import’
The calls at the top of the page translate as follows. The supplied information is marked with brackets:
manifest([fnc=’create’,] arg[ifile]=’foo’) manifest(fnc=’create’, ifile=’foo’, otype=’s’) manifest(fnc=’create’, ifile=’foo’, otype=’h’) manifest([fnc=’create’,] arg[ifile]=’foo’, ofile=’-‘) manifest(fnc=’create’, ifile=’foo’, otype=’s’, ofile=’-‘) manifest(fnc=’create’, ifile=’foo’, otype=’h’, ofile=’-‘) manifest(fnc=’create’, ifile=’foo’, otype=’z’, ofile=’foo.sizes’) manifest(fnc=’create’, ifile=’foo’, otype=’s’, ofile=’foo.chksum’) manifest(fnc=’create’, ifile=’foo’, otype=’h’, ofile=’foo.hashes’) manifest(fnc=’extract_sizes’, ifile=’foo.hashes’) manifest(fnc=’list_directories’, ifile=’foo’) manifest([fnc=’create’,] arg[ifile]=’foo.dirs’) manifest([fnc=’diff’,] diff=(‘foo’, ‘remote.sizes’)) manifest([fnc=’diff’,] diff=(‘foo’, ‘remote.sizes’), ofile=’foo.diffs’) manifest(fnc=’export’, diff=(‘foo’, ‘remote.sizes’), ofile=’delta.tgz’) manifest(fnc=’export’, arg[diff]=’foo.diffs’, ofile=’delta.tgz’) manifest(fnc=’import_into’, ifile=’delta.tgz’, arg[dest]=’foo’)
The functions that are dispatched to are as follows:
create(ifile, otype, ofile, update, trace, force)
ifile may end with ‘.dirs’, or it may name a directory. otype is one of ‘z’, ‘s’, or ‘h’. ofile, if provided, may be a filename, ‘-’, a list, a dict, or a file. It defaults to dir + a suffix depending on otype. If update is True, then otype, ofile, and force must be None. Otype is determined from the ifile suffix, ofile is set to ifile, and force is set to True. trace defaults to False for type ‘z’ and True for types ‘s’ and ‘h’. force defaults to False. If False, create will refuse to overwrite an existing ofile.
extract_sizes(ifile)
Ofile defaults to stdout
list_directories(ifile)
Ofile defaults to stdout
difference(diff, ofile)
ofile defaults to stdout
export_delta(diff, ofile)
diff may either be a filename containing diffs, or a pair
consisting of directory and remote sizes listing. The diffs specify
what must be done (additions, replacements, deletions)
to make the remote directory match the
given directory.
import_delta(dest, ifile)
Dest must be a directory and ifile must be a tarfile
produced by export().