Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar: Move local .git objects to scalar cache for efficiency and behavior breaks #716

Closed
derrickstolee opened this issue Jan 10, 2025 · 1 comment · Fixed by #720
Closed
Assignees

Comments

@derrickstolee
Copy link

When using a Scalar clone with the GVFS protocol, the objects from the server are populated in the Scalar cache directory. This is listed as an alternate for the local git enlistment to read from.

In order to keep the alternate maintained, the background maintenance does some changes to point the object directory at the shared object cache (the alternate) so it can mutate those files. This has some side effects:

  1. The local .git/objects directory is never maintained. This continues to grow and remain unpacked for the lifetime of the enlistment. This should only contain the objects created by the user, but that can grow over time.
  2. The git pack-refs process run by weekly maintenance can fail when it cannot see the local objects. This process is run with GIT_OBJECT_DIRECTORY equal to the shared object cache, so doesn't see the objects created by the user unless they have pushed them and then fetched them into the shared object cache.

One way to fix both of these problems could be to copy the objects from .git/objects into the shared object cache. This could be a new maintenance task, likely good to do within the daily schedule. It could also be something to do before the loose-objects step, since we don't expect any packfiles here.

Note: there was a bug that led to users downloading server data into their .git/objects directory instead of to the shared object cache. Having something copy loose objects and packs from their .git/objects directory could help get those enlistments under control.

@dscho
Copy link
Member

dscho commented Jan 10, 2025

[...] copy the objects from .git/objects into the shared object cache.

I agree that this is likely the best idea.

I had a quick look at the existing code that performs a similar operation: tmp_objdir_migrate()
(which uses finalize_object_file_flags() internally).

We would likely need to either make adjustments or copy/edit this: The idea of moving the files one by one does not work, even if we ensure that for packfiles, the .idx file is moved last (to guarantee that the .pack file is not being in the middle of being copied while a separate git process already read .idx and wants to access the .pack file and assumes it is corrupt because it is too short). The problem is that during the migration, the .idx file is still in the original location, and if we then (re-)move the corresponding .keep, .pack, or .rev file any git process that has read the original .idx file and wants to read any referenced object will likewise assume that the repository is corrupt.

@mjcheetham mjcheetham self-assigned this Jan 14, 2025
mjcheetham added a commit that referenced this issue Jan 31, 2025
Introduce a new maintenance task, `cache-local-objects`, that operates
on Scalar or VFS for Git repositories with a per-volume, shared object
cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose
objects from the repository object directory to the shared cache.

Older versions of `microsoft/git` incorrectly placed packfiles in the
repository object directory instead of the shared cache; this task will
help clean up existing clones impacted by that issue.

Fixes #716
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants