From dbe026572a1bcfc5831ea062a073ba244539055d Mon Sep 17 00:00:00 2001 From: Elijah Newren Date: Sat, 23 Nov 2024 13:38:49 -0800 Subject: [PATCH] wip --- Documentation/curated-examples-from-issues.md | 231 ++++++++++++++++++ 1 file changed, 231 insertions(+) create mode 100644 Documentation/curated-examples-from-issues.md diff --git a/Documentation/curated-examples-from-issues.md b/Documentation/curated-examples-from-issues.md new file mode 100644 index 00000000..ee8d17fb --- /dev/null +++ b/Documentation/curated-examples-from-issues.md @@ -0,0 +1,231 @@ +# Curated examples from issues + +Lots of people have filed issues against git-filter-repo, and many times it +boils down into questions of "How do I?" or "Why doesn't this work?" + +I thought I'd collect a bunch of these as example repository filterings +that others may be interested in. + +## Table of Contents + + * [Adding files to root commits](#adding-files-to-root-commits) + * [Purge a large list of files](#purge-a-large-list-of-files) + +## Adding files to root commits + + + +Here's an example that will take `/path/to/existing/README.md` and +store it as `README.md` in the repository, and take +`/home/myusers/mymodule.gitignore` and store it as `src/.gitignore` in +the repository: + +``` +git filter-repo --commit-callback "if not commit.parents: commit.file_changes += [ + FileChange(b'M', b'README.md', b'$(git hash-object -w '/path/to/existing/README.md')', b'100644'), + FileChange(b'M', b'src/.gitignore', b'$(git hash-object -w '/home/myusers/mymodule.gitignore')', b'100644')]" +``` + +Alternatively, you could also use the [insert-beginning contrib script](../contrib/filter-repo-demos/insert-beginning). + +## Purge a large list of files + + + +Stick all the files in some file (one per line), +e.g. ../DELETED_FILENAMES.txt, and then run + +``` +git filter-repo --invert-paths --paths-from-file ../DELETED_FILENAMES.txt +``` + +## Extracting a libary to a separate repo + + + +``` +git filter-repo \ + --path src/some-folder/some-feature \ + --path-rename src/some-folder/some-feature/:src/ +``` + +## Replace words in all commit messages + + + +``` +git-filter-repo --message-callback 'return message.replace(b"stuff", b"task")' +``` + +## Only keep files from two branches + + + +Let's say you know that the files currently present on two branches +are the only files that matter. Files that used to exist in either of +these branches, or files that only exist on some other branch, should +all be deleted from all versions of history. This can be accomplished +by getting a list of files from each branch, combining them, sorting +the list and picking out just the unique entries, then passing to +`--paths-from-file`: + +``` +git ls-tree -r ${BRANCH1} >../my-files +git ls-tree -r ${BRANCH2} >>../my-files +sort ../my-files | uniq >../my-relevant-files +git filter-repo --paths-from-file ../my-relevant-files +``` + +## Renormalize end-of-line characters and add a .gitattributes + + + +``` +contrib/filter-repo-demos/lint-history dos2unix +[edit .gitattributes] +contrib/filter-repo-demos/insert-beginning .gitattributes +``` + +## Remove spaces at the end of lines + + + +Removing all spaces at the end of lines of non-binary files, including +stripping trailing carriage returns: + +``` +git filter-repo --replace-text <(echo 'regex:[\r\t ]+(\n|$)==>\n') +``` + +## Having both exclude and include rules for filenames + + + +If you want to have rules to both include and exclude filenames, you +can simply invoke `git filter-repo` multiple times. Alternatively, +you can dispense with `--path` arguments and instead use the more +generic `--filename-callback`. For example to include all files under +`src/` except for `src/README.md`: + +``` +git filter-repo --filename-callback ' + if filename == b"src/README.md": + return None + if filename.startswith(b"src/"): + return filename + return None' +``` + +## Removing paths with a certain extension + + + +``` +git filter-repo --invert-paths --path-glob '*.xsa' +``` + +or + +``` +git filter-repo --filename-callback ' + if filename.endswith(b".xsa"): + return None + return filename' +``` + +## Removing a directory + + + +``` +git filter-repo --path node_modules/electron/dist/ --invert-paths +``` + +## Convert from NFD filenames to NFC + + + +Given that Mac does utf-8 normalization of filenames, and has +historically switched which kind of normalization it does, users may +have committed files with alternative normalizations to their +repository. If someone wants to convert filenames in NFD form to NFC, +they could run + +``` +git filter-repo --filename-callback ' + try: + return subprocess.check_output("iconv -f utf-8-mac -t utf-8".split(), + input=filename) + except: + return filename +' +``` + +or + +``` +git filter-repo --filename-callback ' + import unicodedata + try: + return bytearray(unicodedata.normalize('NFC', filename.decode('utf-8')), 'utf-8') + except: + return filename +' +``` + +## Set the committer of the last few commits to myself + + + +``` +git filter-repo --refs main~5..main --commit-callback ' + commit.commiter_name = b"My Wonderful Self" + commit.committer_email = b"my@self.org" +' + +## Handling special characters, e.g. accents in names + + + +Since characters like ë and á are multi-byte characters and python +won't allow you to directly place those in a bytestring +(e.g. b"Raphaël González"), you just need to make a normal string and +then convert to a bytestring to handle these. For example, changing +the author name and email where the author email is currently +`example@test.com`: + +``` +git filter-repo --refs main~5..main --commit-callback ' + if commit.author_email = b"example@test.com": + commit.author_name = "Raphaël González".encode() + commit.author_email = b"rgonzalez@test.com" +' +``` + + + handling repository corruption (old original objects are corrupt) + + + removing all files with a backslash in them (final example is best) + + + replace a binary blob in history + + + callback for lint-history + + + using replace refs to delete old history + + + replacing pngs with compressed alternative + (#537 also used a change.blob_id thingy) + + + + need for a multi-step filtering to avoid path collisions or ordering issues + + + Two things: + textwrap.dedent + easier example of using git-filter-repo as a library