Neural Nets for Machine Reading Comprehension (BiDAF)

Machine Comprehension (MC)/ Machine Reading Comprehension (MRC) / Question Answering (QA) models enable computers to read a document and answer general questions against it. While this is a relatively elementary task for a human, it's not that straightforward for AI models.

Interactive demo by the authors of the paper [2].

Model

Layers of the model:

Embedding layers (3 levels of granularity):
- Character embedding layer
- Word embedding layer
- Contextual embedding layer
Attention and Modeling layers: fuse information from context and query
Output layer: get start and end indexes

See the original implementation of BiDAF.

Dataset

Dataset used [1].
Create new directory: mkdir dataset
Create new directory for TriviaQA dataset: mkdir dataset/triviaqa
The data can be downloaded from the TriviaQA website or with: wget https://nlp.cs.washington.edu/triviaqa/data/triviaqa-rc.tar.gz

and extract with: tar -xf triviaqa-rc.tar.gz -C dataset/triviaqa

SQuAD dataset (1.1 version)

create new dir : mkdir dataset/squad
download the data with: wget https://www.wolframcloud.com/objects/6b06e230-f56a-4244-8f23-382e74440a15
oppure (meglio, ma riguarda path dataset nel codice):
train wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json -O dataset/squad/train-v1.1.json
dev wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -O dataset/squad/dev-v1.1.json

Dependencies

tensorflow-gpu 2.0.1
gensim 3.8.0
numpy 1.18.1

Dependencies can be installed with: pip install -r requirements.txt

Create new directory: mkdir glove
Get glove pretrained: wget https://nlp.stanford.edu/data/glove.6B.zip
And extract it: unzip glove.6B.zip -d ./glove

References

[1] Mandar Joshi, Eunsol Choi, Daniel S. Weld, Luke Zettlemoyer (2017). TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Association for Computational Linguistics (ACL). Vancouver, Canada.

[2] Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, Hananneh Hajishirzi (2017). Bidirectional Attention Flow for Machine Comprehension. CoRR.

[3] Jeffrey Pennington, Richard Socher, Christopher D. Manning (2014) GloVe: Global Vectors for Word Representation. Empirical Methods in Natural Language Processing (EMNLP).

[4] Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber (2015). Highway Networks. CoRR.

[5] Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev and Percy Liang (2016) SQuAD: 100,000+ Questions for Machine Comprehension of Text. Empirical Methods in Natural Language Processing (EMNLP).

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
readme_imgs		readme_imgs
.gitignore		.gitignore
BIDAF_presentation.pdf		BIDAF_presentation.pdf
README.md		README.md
layers.py		layers.py
layersTensors.py		layersTensors.py
main.py		main.py
model.py		model.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Nets for Machine Reading Comprehension (BiDAF)

Model

Dataset

SQuAD dataset (1.1 version)

Dependencies

References

About

Releases

Packages

Languages

francidellungo/NeuralNet_for_MachineComprehension

Folders and files

Latest commit

History

Repository files navigation

Neural Nets for Machine Reading Comprehension (BiDAF)

Model

Dataset

SQuAD dataset (1.1 version)

Dependencies

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages