Magic-Folder discovers new files

Contacts: meejah

Date: 2021-06-03

When starting up the software, magic-folders should find new or changed files since the last time it ran in every Magic Folder it is tracking.

Rationale

This is a core function of magic-folders.

User stories

Start magic-folders a Second Time

As a user, I want local filesystem changes to be discovered and uploaded so that the Snapshots in Tahoe represent my local filesystem state

Discover the State of Synchronization

As a user, I wish to know whether magic-folders has currently queued more work to do and how long ago it was since it checked for more work to do.

Keeping the story template here in case we want to add more

$STORY_NAME

As a $PERSON, I want $FEATURE so that $BENEFIT.

Have as many as you like. Group user stories together into meaningfully deliverable units.

Constraints and requirements

Must

Nice to have

Must not

Out of scope

Things that are definitely not included that people ask about.

Success

How will we know when we are done?

The state of our Personal DMD in Tahoe should match the state of our local filesystem at some point after we re-start the magic-folders software. “Match the state” here means:

There is some vagueness above about knowing when the synchronization is completed. It should be possible to use the status API to discover if magic-folders has work queued. The above checks should be valid after:

How will we measure how well we have done?

It might be a good idea to stop at this point & get feedback, to make sure you’re solving the right problem.

Alternatives considered

Detailed design

When triggered, the scanning system walks the entire tree of files below the Magic Folder.

As we use twisted.python.filepath.FilePath for many other operations within magic-folder we will use FilePath.walk() to traverse the files. This produces a generator; we should take care to yield control back to the reactor occasionally so that we don’t starve network reads/writes if we are descending a lot of files.

For each file found:

Data integrity

Do not do overlapping scans. There should be either zero or one scans happening at any given time for any given Magic Folder.

Consider the uploader: it may be turning LocalSnapshots into RemoteSnapshots at the moment.

Consider the downloader: it may be discovering new RemoteSnapshots and downloading them.

Consider the user: they may be changing or adding files at any moment (e.g. during the scan).

Security

We must not “discover” a file outside the magic-folder directory.

Backwards compatibility

None.

Performance and scalability

When scanning a large directory we must take care not to pause the reactor for “too long”

Further reading

Links to related things. Other designs, tickets, epics, mailing list threads, etc.