Skip to content

Research project dealing with Optimal Transport for cross-lingual Word Alignement problem.

Notifications You must be signed in to change notification settings

codebypol/NLP-OT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP-OT

This repository contains the work done during a research project.

The main.ipynb notebook contains the final report and the code that we used to get our results and the changes that we tested.

Full report available here (in French only).

Oral presentation can be found here, also only available in French.

Abstract

There are few precise studies on the use of Optimal Transport in word alignment problems between sentences of different languages. Our study follows on from the work of Zi-Yi Dou and Graham Neubig, who introduced this method into word alignment problems without truly exploring the details of OT. We have studied the implications of the Sinkhorn algorithm and the construction of the cost matrix in the approximation of the Optimal Transport Plan. By observing the different successive steps of the Dou and Neubig method, we propose an optimisation of the Sinkhron algorithm through a study on the entropy regularisation parameter. Then, we propose a modification of the construction of the cost matrix, by defining two new distances. We conclude our study by proposing a new method of alignment extraction, more flexible and adapted to the different distributions.

Content

  • Introduction
    • Embedding spaces
    • Multilingual embedding spaces
    • Earth mover's distance
  • Word Alignment
  • Optimal transport
  • Our research
    • Entropic regularization
    • A deeper analysis of the cost matrix
  • Conclusion

About

Research project dealing with Optimal Transport for cross-lingual Word Alignement problem.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages