In general, transform punctuation with >1 character to single-character #12

kayaulai · 2022-12-18T14:34:40Z

Perhaps use some list of ascii characters to transform them into?

kayaulai · 2022-12-19T04:39:08Z

My suggestion for this:

BEFORE getting the boundary lists, add all the non-boundary markers to the Utterance column. This means every line MUST end with a boundary marker.
Thereafter, we can assign any random Unicode character to correspond to the boundary types, since there is no risk of the final character not being a boundary marker.

kayaulai · 2023-01-06T20:52:24Z

我説了三句 <PERIOD>

<COMMA> > 逗
<PERIOD> > 句

我説了三句句

transform back to (in the record)

我説了三句 <PERIOD>

kayaulai assigned JayyyyLee Dec 18, 2022

kayaulai mentioned this issue Jan 6, 2023

In the reports, transform the boundaries back to the multiple-character version (if any) #20

Open

Provide feedback