Looking for suggestions for XML translation #100
Replies: 3 comments 3 replies
-
I think the most promising approach is that identified by the paper at section 3.1. E.g.
I can't make sense of what the authors mean by using a |
Beta Was this translation helpful? Give feedback.
-
I’ve been doing some preliminary testing with tag injection. It’s slow but seems to work well! Something the paper didn’t mention but appears to be necessary is to incorporate translation confidence (Hypothesis.score) to avoid matching text that the seq2seq model “fixes”.
This is because "g who was scared of swimming" can be translated by the seq2seq model as "que tenía miedo de nadar". |
Beta Was this translation helpful? Give feedback.
-
I was able to generate ~3000 lines of tag injected data by running on a CPU for a month: https://github.com/argosopentech/tag-injection |
Beta Was this translation helpful? Give feedback.
-
Repost from OpenNMT Forum.
I'm looking into adding XML support to Argos Translate (#23).
The difficult part is that tags in the source sentence need to be placed correctly into the target sentence, ex:
This clearly needs to be done by the seq2seq model because words within a tag need to be translated in the context of the surrounding words.
I've tried writing some code to normalize tags in the input dataset into a standard format:
Then at inference I could use the standardized tags to place tags in the output. The issue with this is that most of the data I'm currently using for Argos Translate only contains a handful of tags in this format which is likely not sufficient.
My current plan is to try to find/generate more data in this format but any suggestions for better strategies are greatly appreciated!
Reference:
http://www.statmt.org/wmt20/pdf/2020.wmt-1.138.pdf
Beta Was this translation helpful? Give feedback.
All reactions