From 2d55248185575047fafe155e79b9b4e36a4e6ec6 Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Tue, 26 Nov 2024 10:39:18 -0800 Subject: [PATCH] FAQ: wip --- Documentation/FAQ.md | 184 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 Documentation/FAQ.md diff --git a/Documentation/FAQ.md b/Documentation/FAQ.md new file mode 100644 index 00000000..e54a9682 --- /dev/null +++ b/Documentation/FAQ.md @@ -0,0 +1,184 @@ +# Frequently Answered Questions + +## Table of Contents + + * [Why did `git-filter-repo` rewrite commit hashes?](#why-did-git-filter-repo-rewrite-commit-hashes) + * [Why did `git-filter-repo` rewrite more commit hashes than I expected?](#why-did-git-filter-repo-rewrite-more-commit-hashes-than-i-expected) + * [Why did `git-filter-repo` rewrite other branches too?](#why-did-git-filter-repo-rewrite-other-branches-too) + * [Help! Can I recover or undo the filtering?](#help-can-i-recover-or-undo-the-filtering) + * [Can you change `git-filter-repo` to allow future folks to recover from `--force`'d rewrites?](#can-you-change-git-filter-repo-to-allow-future-folks-to-recover-from---forced-rewrites) + * [Can I use `git-filter-repo` to fix a repository with corruption?](#Can-I-use-git-filter-repo-to-fix-a-repository-with-corruption) + * [What kinds of problems does `git-filter-repo` not try to solve?](#What-kinds-of-problems-does-git-filter-repo-not-try-to-solve) + + +## Why did `git-filter-repo` rewrite commit hashes? + +This is fundamental to how Git operates. In more detail... + +Each commit in Git is a hash of its contents. Those contents include +the commit message, the author (name, email, and time authored), the +committer (name, email and time committed), the toplevel tree hash, +and the parent(s) of the commit. This means that if any of the commit +fields change, including the tree hash or the hash of the parent(s) of +the commit, then the hash for the commit will change. + +(The same is true for files ("blobs") and trees stored in git as well; +each is a hash of its contents, so literally if anything changes, the +commit hash will change.) + +If you attempt to write commit (or tree or blob) objects with an +incorrect hash, Git will reject it as corrupt. + +## Why did `git-filter-repo` rewrite more commit hashes than I expected? + +There are two aspects to this, or two possible underlying questions users +might be asking here: + * Why did commits newer than the ones I expected have their hash change? + * Why did commits older than the ones I expected have their hash change? + +For the first question, see [why filter-repo rewrites commit +hashes](#why-did-it-rewrite-commit-hashes), and note that if you +modify some old commit, (as an example) to remove a file, then obviously +that commit's hash must change. Further, since that commit will have +a new hash, any other commit with that commit as a parent will need to +have a new hash. That will need to chain all the way to the most +recent commits in history. This is fundamental to Git and there is +nothing you can do to change this. + +For the second question, if you are sure the filter you specified +would not apply to the older commits, then the issue is probably that +git-fast-export and git-fast-import (both of which git-filter-repo +uses) canonicalize history in various ways. This means that even if +you have no filter, these tools sometimes change commit hashes. This +can happen in any of these cases: + + * If you have signed commits, the signatures will be stripped + * If you have commits with extended headers, the extended headers will + be stripped (signed commits are actually a special case of this) + * If you have commits in an encoding other than UTF-8, they will by + default be re-encoded into UTF-8 + * If you have a commit without an author, one will be added that + matches the committer. + * If you have trees that are not canonical (e.g. incorrect sorting + order), they will be canonicalized + +If this affects you and you really only want to rewrite newer commits in +history, you can use the `--refs` argument to git-filter-repo to specify +a range of history that you want rewritten. + +(For those attempting to be clever and use `--refs` for the first +question: Note that if you attempt to only rewrite a few old commits, +then all you'll succeed in is adding new commits that won't be part of +any branch and will be subject to garbage collection. The branches will +still hold on to the unrewritten versions of the commits. Thus, you +have to rewrite all the way to the branch tip for the rewrite to be +meaningful. Said another way, the `--refs` trick is only useful for +restricting the rewrite to newer commits, never for restricting the +rewrite to older commits.) + +## Why did `git-filter-repo` rewrite other branches too? + +git-filter-repo's name is git-filter-*repo*. + +It can restrict its rewriting to a subset of history, such as a single +branch, using the `--refs` option. However, using that comes with the +risk that one branch now has a different version of some commits than +other branches do; usually, when you rewrite history, you want all +branches that depended on what you are rewriting to be updated. + +## Help! Can I recover or undo the filtering? + +Sure, _if_ you followed the instructions. The instructions told you to +make a fresh clone before running git-filter-repo. If you did that, you +can just throw away your clone with the flubbed rewrite, and make a new +clone. + +If you didn't make a fresh clone, and you didn't run with `--force`, you +would have seen the following warning: +``` +Aborting: Refusing to destructively overwrite repo history since +this does not look like a fresh clone. +[...] +Please operate on a fresh clone instead. If you want to proceed +anyway, use --force. +``` +If you then added `--force`, well, you were warned. + +If you didn't make a fresh clone, and you ran with `--force`, and you +didn't think to read the description of the `--force` option: +``` + Ignore fresh clone checks and rewrite history (an irreversible + operation, especially since it by default ends with an + immediate pruning of reflogs and old objects). +``` +and you didn't read even the beginning of the manual +``` +git-filter-repo destructively rewrites history +``` +and you think it's okay to run a command with `--force` in it on something +you don't have a backup of, then now is the time to reasses your life +choices. `--force` should be a pretty clear warning sign. + +See also the next question. + +## Can you change `git-filter-repo` to allow future folks to recover from --force'd rewrites? + +This will never be supported. + +* Providing an alternate method to restore would require storing both + the original history and the new history, meaning that those who are + trying to shrink their repository size instead see it grow and have to + figure out extra steps to expunge the old history to see the actual + size savings. Experience showed with other tools that this was + frustrating and difficult to figure out for many users. Providing an + alternate method to restore would mean that users who are trying to + purge sensitive data from their repository still find the sensitive + data after the rewrite because it hasn't actually been purged. In + order to actually purge it, they have to take extra steps, which again + has made things difficult for users in the past with other tools. + +* Providing an alternate method to restore would also mean trying to + figure out what should be backed up and how. The obvious choices used + by previous tools only actually provided partial backups (reflogs + would be ignored for example, as would uncommitted changes whether + staged or not). The only reasonable full backup mechanism is making a + separate clone, which is both expensive and something the user can and + should understand how to do on their own. + +* Providing an alternate method to restore would also mean providing + documentation on how to restore. Past methods by other tools in the + history rewriting space suggested that it was rather difficult for + users to figure out. Difficult enough, in fact, that users simply + didn't ever use them. They instead made a separate clone before + rewriting history and if they didn't like the rewrite, then they just + blew it away and made a new clone to work with. Since that was + observed to be the easy restoration method, I simply enforced it with + this tool, requiring users who look like they might not be operating + on a fresh clone to use the --force flag. + +But more than all that, if there were an alternate method to restore, +why would you have needed to specify the --force flag? Doesn't its +existence (and the wording of its documentation) make it pretty clear on +its own that there isn't going to be a way to restore? + +## Can I use `git-filter-repo` to fix a repository with corruption? + + git fsck throws warnings/errors=>git-filter-repo may not parse the objects... + +## What kinds of problems does `git-filter-repo` not try to solve? + + * Filtering history but magically keeping the same commit IDs + * Bidirectional development between filtered and unfiltered repository (josh) + * Filtering based on the difference (a.k.a. patch or change) between commits (rebase) + * Conversion between different version control systems (reposurgeon) + * Having two people filter their clone of the repository (with the same + filtering command) and getting the same new commit IDs + +