-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
type hints and migrate to argparse. Simplify async setup. #60
base: master
Are you sure you want to change the base?
type hints and migrate to argparse. Simplify async setup. #60
Conversation
Hi, thanks for your PR! The input is very much appreciated. For our review process, it speeds things up if you separate your PR into 3 different PRs:
One note on typehinting: I am kind of on the fence for supporting typehinting in this case. But having type-suggestions in your arguments is something I can agree with. Looking forward to review your PRs ! |
Hey, I explored the possibility of splitting the request and changes into multiple requests. However, the only thing that can be separated from the current request without overhauling the entire codebase is the type hinting, as many of the changes are interconnected and dependent on each other. |
# Punct will be added using rules. | ||
if len(token) > 1: | ||
if tag != 'PUNCT' or tag != '.' or tag != '': | ||
if tag != "PUNCT" and tag != "." and tag != "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug
bin/demeuk.py
Outdated
HASH_HEX_REGEX = '^[a-fA-F0-9]+$' | ||
MAC_REGEX = '^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$' | ||
UUID_REGEX = '^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$' | ||
EMAIL_REGEX = r"[a-zA-Z0-9][a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]{0,63}@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z]{2,6})+" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The local part validation is stricter, only allowing valid characters in accordance with standard email formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixes #20
@@ -394,7 +460,44 @@ def remove_strip_punctuation(line, punctuation): | |||
return False, line | |||
|
|||
|
|||
def add_split(line, punctuation=(' ', '-', r'\.')): | |||
def add_split( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more possible splits by default
@@ -820,15 +922,15 @@ def check_regex(line, regex): | |||
true if all regexes match | |||
false if line does not match regex | |||
""" | |||
for regex in regex: | |||
for regex in regexes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug or at least not clear usage
@@ -932,25 +1037,45 @@ def clean_encode(line, input_encoding): | |||
# Single byte encodings. Also it is beter to not include iso encoding by default. | |||
# https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings | |||
# Input_encoding is by default [utf8] | |||
fallback_encodings = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added fallback encoding
I have added some comments on the changes that are different then the migration of argparse and async. |
Thanks, I am trying to find a way to review your PR as functional and quickly as possible so the comments help. |
I will give it a try but behold for a lot of comments. I am considering splitting your PR myself. If we start with argparse and follow up from there i think it is doable. |
…with special characters
…both directions without duplicates to lines to avoid redoing the same work in the future.
I have been working with my own improved version of demeuk. This version uses argparse for arguments and made some small improvements in the checking of the lines by using continue instead of stop. Also made use of Manager and Process to simplify the async setup. If any things need to change before push to master please let me know.