Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type hints and migrate to argparse. Simplify async setup. #60

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Repsay
Copy link

@Repsay Repsay commented Aug 27, 2024

I have been working with my own improved version of demeuk. This version uses argparse for arguments and made some small improvements in the checking of the lines by using continue instead of stop. Also made use of Manager and Process to simplify the async setup. If any things need to change before push to master please let me know.

@GingerGeneste
Copy link
Contributor

Hi, thanks for your PR! The input is very much appreciated.

For our review process, it speeds things up if you separate your PR into 3 different PRs:

  • docopt -> argparse (wich is awesome btw!)
  • typehinting
  • async setup

One note on typehinting: I am kind of on the fence for supporting typehinting in this case. But having type-suggestions in your arguments is something I can agree with.
One note on async setup: I have not looked into detail of the async setup but we do like to keep the stdin functionality.

Looking forward to review your PRs !

@Repsay
Copy link
Author

Repsay commented Aug 27, 2024

Hi, thanks for your PR! The input is very much appreciated.

For our review process, it speeds things up if you separate your PR into 3 different PRs:

  • docopt -> argparse (wich is awesome btw!)
  • typehinting
  • async setup

One note on typehinting: I am kind of on the fence for supporting typehinting in this case. But having type-suggestions in your arguments is something I can agree with. One note on async setup: I have not looked into detail of the async setup but we do like to keep the stdin functionality.

Looking forward to review your PRs !

Hey, I explored the possibility of splitting the request and changes into multiple requests. However, the only thing that can be separated from the current request without overhauling the entire codebase is the type hinting, as many of the changes are interconnected and dependent on each other.

# Punct will be added using rules.
if len(token) > 1:
if tag != 'PUNCT' or tag != '.' or tag != '':
if tag != "PUNCT" and tag != "." and tag != "":
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug

bin/demeuk.py Outdated
HASH_HEX_REGEX = '^[a-fA-F0-9]+$'
MAC_REGEX = '^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$'
UUID_REGEX = '^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$'
EMAIL_REGEX = r"[a-zA-Z0-9][a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]{0,63}@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z]{2,6})+"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local part validation is stricter, only allowing valid characters in accordance with standard email formats.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixes #20

@@ -394,7 +460,44 @@ def remove_strip_punctuation(line, punctuation):
return False, line


def add_split(line, punctuation=(' ', '-', r'\.')):
def add_split(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more possible splits by default

@@ -820,15 +922,15 @@ def check_regex(line, regex):
true if all regexes match
false if line does not match regex
"""
for regex in regex:
for regex in regexes:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug or at least not clear usage

@@ -932,25 +1037,45 @@ def clean_encode(line, input_encoding):
# Single byte encodings. Also it is beter to not include iso encoding by default.
# https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings
# Input_encoding is by default [utf8]
fallback_encodings = [
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added fallback encoding

@Repsay
Copy link
Author

Repsay commented Aug 27, 2024

I have added some comments on the changes that are different then the migration of argparse and async.

@GingerGeneste
Copy link
Contributor

I have added some comments on the changes that are different then the migration of argparse and async.

Thanks, I am trying to find a way to review your PR as functional and quickly as possible so the comments help.

@GingerGeneste
Copy link
Contributor

Hi, thanks for your PR! The input is very much appreciated.
For our review process, it speeds things up if you separate your PR into 3 different PRs:

  • docopt -> argparse (wich is awesome btw!)
  • typehinting
  • async setup

One note on typehinting: I am kind of on the fence for supporting typehinting in this case. But having type-suggestions in your arguments is something I can agree with. One note on async setup: I have not looked into detail of the async setup but we do like to keep the stdin functionality.
Looking forward to review your PRs !

Hey, I explored the possibility of splitting the request and changes into multiple requests. However, the only thing that can be separated from the current request without overhauling the entire codebase is the type hinting, as many of the changes are interconnected and dependent on each other.

I will give it a try but behold for a lot of comments. I am considering splitting your PR myself. If we start with argparse and follow up from there i think it is doable.

@GingerGeneste GingerGeneste self-requested a review August 27, 2024 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants