-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Another approach to parse tag string with spaces to avoid long running time on regex matching. #7625
base: develop
Are you sure you want to change the base?
Conversation
a7e99d7
to
e45df9e
Compare
…unning time on regex matching.
e45df9e
to
6366d33
Compare
samcli/cli/types.py
Outdated
parsed, parsed_tag = CfnTags._standard_key_value_parser(value) | ||
if not parsed: | ||
return False, None | ||
tags_dict = {**tags_dict, **parsed_tag} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (if parsed_tag
is of type dict
)
tags_dict = {**tags_dict, **parsed_tag} | |
tags_dict.update(parsed_tag) |
samcli/cli/types.py
Outdated
for value in tag_value.split(): | ||
parsed, parsed_tag = CfnTags._standard_key_value_parser(value) | ||
if not parsed: | ||
return False, None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we're also returning a boolean here? Can we just return None
, or an empty dict
instead of returning a tuple
?
samcli/cli/types.py
Outdated
# first, remove all the spaces and start over again. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there missing code here, or maybe duplicate comment? This comment and the one under say "first"
samcli/cli/types.py
Outdated
|
||
# Next, looking for a quote strings that contain spaces and proceed to replace them | ||
quoted_strings_with_spaces = re.findall(self._quoted_pattern, modified_val) | ||
quoted_strings_without_spaces = [self._replace_spaces(s) for s in quoted_strings_with_spaces] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no single character variable names
samcli/cli/types.py
Outdated
for group in groups: | ||
key, v = group | ||
self._add_value(result, _unquote_wrapped_quotes(key), _unquote_wrapped_quotes(v)) | ||
# Instead of parsing a full {tag}={tag} pattern, we will try to look for quoted string with spaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we refactor this new logic out to a separate method to unit test? There's a lot of moving parts here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky because the block below
groups = re.findall(self._pattern, val)
take longer time to return
@@ -194,6 +219,7 @@ def __init__(self, multiple_values_per_key=False): | |||
TAG_REGEX = '[A-Za-z0-9\\"_:\\.\\/\\+-\\@=]' | |||
|
|||
_pattern = r"{tag}={tag}".format(tag=_generate_match_regex(match_pattern=TAG_REGEX, delim=" ")) | |||
_quoted_pattern = _generate_quoted_match_regex(match_pattern=TAG_REGEX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse or refactor the old _generate_match_regex
? They share similar return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have different regex, hence creating a new function.
samcli/cli/types.py
Outdated
# Finally, restart the parsing with key=value separated by (multiple) spaces. | ||
parsed, tags = self._multiple_space_separated_key_value_parser(modified_val) | ||
if parsed: | ||
for k in tags: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
for k in tags: | |
for key, value in tags.items(): |
space_positions = [i for i, char in enumerate(text) if char == " "] | ||
modified = text.replace(" ", replacement) | ||
|
||
return modified, space_positions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on using dataclasses for return objects, instead of returning tuples and performing array indexes to get the modified strings/replacement index array?
I imagine that changing things to dataclasses will make it easier for future readers to understand what is happening quickly.
What does this mean? Does this change how tags works for customers? |
Which issue(s) does this change fix?
#6657
Why is this change necessary?
Previous fix (fix: improve tag parsing performance for list input #7049) only handle the case when
tags
are provided throughsamconfig.toml
, which resulted into a list toCfnTags
, but it doesn't fix the case when customer specify--tags
parameter. In the later case, the provided value will be in a string, and we will face the issue withre.findall
again, mainly due to the complex regex provided to parse{tag}={tag}
pattern intags
.The solution is instead of parsing a full complex regex with
{tag}={tag}
, we will look for any quote strings with space in the value (e.g. "Test App"), replace the space with some replacement, and restart the parsing process.How does it address the issue?
Run
sam deploy
with--tags tag1="value 1" tag2="value 2"
What side effects does this change have?
Customer might not be able to set tags properly during
sam deploy
.Mandatory Checklist
PRs will only be reviewed after checklist is complete
make pr
passesmake update-reproducible-reqs
if dependencies were changedBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.