Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out duplicate fortunes #104

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

inguin
Copy link
Contributor

@inguin inguin commented Oct 31, 2024

Remove two completely redundant files from off/unrotated and a large number of exact duplicates (identified by script).

The cookie jars zippy and off/unrotated/zippy are virtually identical,
except for odd spelling fixes.
All quotes from off/unrotated/cookie are also included in cookie -- with
the one exception of "Shit Happens." which is included in
off/unrotated/vulgarity.
Remove any duplicate fortunes, identified by a script, preferring
occurrences in specific files over "cookie" or "miscellaneous".

For fortunes occuring in both the offensive and regular files, keep the
regular ones.
@shlomif
Copy link
Owner

shlomif commented Nov 6, 2024

@inguin : hi! Thanks for the pull-request! It looks fine at first glance, but I
need to write a script to validate it.

@inguin
Copy link
Contributor Author

inguin commented Nov 6, 2024

Makes sense. This is the hacky Python script I used: https://gist.github.com/inguin/f4e8e5f5fb80c447527f596295edaf5a

It just reads all files into a dictionary (text -> locations), then prints those with multiple locations. Might be worth implementing as a CI test, but my Perl skills are a bit rusty.

@inguin
Copy link
Contributor Author

inguin commented Nov 6, 2024

Just realized that my script can be easily tweaked to check that I didn't accidentally remove too much. Just print out all unique fortunes into a merged and sorted file with:

for text in sorted(locations_by_text.keys()):
    print(text, end="%\n")

If I do that before and after my changes I see that the following are gone:

  • Empty fortunes (files beginning with %, or consecutive % lines)
  • Three redundant fortunes with typos (in favor of the fixed ones)
  • "Shit Happens." (in favor of "Shit happens." with different capitalization)

@shlomif
Copy link
Owner

shlomif commented Nov 6, 2024

@inguin : thanks again. I'll try to take a look

shlomif added a commit that referenced this pull request Nov 12, 2024
shlomif added a commit that referenced this pull request Nov 12, 2024
@shlomif
Copy link
Owner

shlomif commented Nov 12, 2024

@inguin : hi! I used a tweaked version of that python3 script to create my own filtering-own changeset. can you please license your code under MITL or similar ( https://github.com/shlomif/Freenode-programming-channel-FAQ/blob/master/FAQ_with_ToC__generated.md#i-want-to-release-my-code---which-open-source-licence-should-i-use ).

commit: 810a89e

thanks!

@inguin
Copy link
Contributor Author

inguin commented Nov 14, 2024

can you please license your code under MITL or similar

Sure, please feel free to use the script under whichever license is best for the project.

Is that sufficient or would you like me to make a commit adding a copyright header?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants