Skip to content

Commit

Permalink
fix(normalization): fix for short strings
Browse files Browse the repository at this point in the history
  • Loading branch information
eroux committed Feb 21, 2024
1 parent 84e7d3e commit 85a3155
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions botok/utils/unicode_normalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,8 @@ def is_suffix(char):


def normalize_invalid_start_string(s):
if len(s) < 2:
return s
# we put the vowel in second place if the string starts with a vowel
if is_vowel(s[0]) and not is_vowel(s[1]) and not is_suffix(s[1]):
return s[1] + s[0] + (s[2:] if len(s) > 2 else "")
Expand Down

0 comments on commit 85a3155

Please sign in to comment.