properly handle tags in annotate transcription #16

ggdhines-zz · 2015-09-15T11:24:41Z

I need to make sure that tags in annotate (and in the future folger) are aggregated correctly. A big part of the problem is that MAFFT, which I use to do the alignment over multiple transcriptions, only allows basic ASCII characters and even some of those aren't allowed (spaces for example). So representing tags as special characters is a problem. The solution is to convert all normal characters to lower case and use upper case characters for special meaning. For example "A" would represent space, "B" could represent "" etc. (I have already implemented this) In theory this could result in misaligning of the text - since case can matter, but in practice I highly doubt it will ever matter. The benefit is that aggregating the tags becomes as simple as aggregating the text. The challenge is that for the final text we need to go back to the individual transcriptions and decide case on a character by character basis. (I still need to do this.) Also for Folger there are too many tags - we would run out of capital letters. So I think what I will do is use "B" (for example) to represent all tags and then at the end go back and vote on what each tag should actually be. (So basically just like what happens with deciding on case.)

ggdhines-zz added the enhancement label Sep 15, 2015

ggdhines-zz self-assigned this Sep 15, 2015

ggdhines-zz mentioned this issue Sep 15, 2015

dealing with foreign text zooniverse/AnnoTate#81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

properly handle tags in annotate transcription #16

properly handle tags in annotate transcription #16

ggdhines-zz commented Sep 15, 2015

properly handle tags in annotate transcription #16

properly handle tags in annotate transcription #16

Comments

ggdhines-zz commented Sep 15, 2015