MNT: decode and then slice the prefix off #21

tacaswell · 2024-10-11T13:53:12Z

The length of a string as a string (e.g. in number of unicode codepoints) only matches the length in bytes as utf-8 for the first page (e.g. ascii).

In [3]: '\N{Snowman}'
Out[3]: '☃'

In [5]: '\N{Snowman}'.encode()
Out[5]: b'\xe2\x98\x83'

In [6]: prefix = '\N{Snowman}'

In [7]: a = prefix + '_bob'

In [8]: a
Out[8]: '☃_bob'

In [9]: a.encode()[len(prefix):].decode()
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[9], line 1
----> 1 a.encode()[len(prefix):].decode()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x98 in position 0: invalid start byte

In [10]: len(prefix)
Out[10]: 1

In [11]: len(prefix.encode())
Out[11]: 3

The length of a string as a string (e.g. in number of unicode codepoints) only matches the length in bytes as utf-8 for the first page (e.g. ascii).

tacaswell added 2 commits October 11, 2024 09:45

MNT: decode and then slice the prefix off

e3e03c5

The length of a string as a string (e.g. in number of unicode codepoints) only matches the length in bytes as utf-8 for the first page (e.g. ascii).

STY: placate the linter about quotes

b34f801

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT: decode and then slice the prefix off #21

MNT: decode and then slice the prefix off #21

tacaswell commented Oct 11, 2024

MNT: decode and then slice the prefix off #21

Are you sure you want to change the base?

MNT: decode and then slice the prefix off #21

Conversation

tacaswell commented Oct 11, 2024