Eye-Tracking Feature Prediction

This repo contains the code for the experiments presented in the following paper:
Nora Hollenstein, Federico Pirovano, Lena Jäger, Ce Zhang & Lisa Beinborn. "Multilingual language models predict human reading behavior". NAACL 2021.

Folders

The repository is divided into three folders:

processing contains the python package of the project;
params contains example parameter configurations for training and testing;
scripts contains the python runnable scripts of the project.

Requirements

The Python version used is 3.7.7.

Running the code

A maximum of three folders have to be specified:

--data_gaze_dir will contain the folders for each dataset. The script expects this folder to contain one dataset folder for each task specified in the arguments. The name of the folder and the name of the task need to match;
--results_gaze_dir will be created to contain the results of the gaze prediction task of the project;
--params_gaze_dir will contain the json configuration files for the gaze prediction task.

The runnable scripts are: combine_datasets.py, normalize_data.py, train.py, and test.py.
combine_datasets.py combines the datasets passed as option into one dataset under the name all.
The other scripts are meant to be run in order: first normalize_data.py, then train.py, and finally test.py.
The normalize_data.py script will convert and save the data from dataset-specific format into a shared format that can be read by the GazeDataset class.

Example:
python scripts/gaze/normalize_data.py --data_gaze_dir data/gaze/ --tasks geco-nl

python scripts/gaze/combine_datasets.py --data_gaze_dir data/gaze/ --tasks dundee geco zuco-all --percentage 0.8

python scripts/gaze/train.py --data_gaze_dir data/gaze/ --results_gaze_dir results/gaze/ --tasks geco-nl --mlflow_dir mlruns/ --params_gaze_dir params/gaze/dutch/

python scripts/gaze/test.py --data_gaze_dir data/gaze/ --results_gaze_dir results/gaze/ --tasks geco-nl

The eye-tracking features are predicted in the following order: n_fix, first_fix_dur, first_pass_dur, total_fix_dur, mean_fix_dur, fix_prob, n_refix, reread_prob.

Data

The GazeDataNormalizer class supports the following datasets:

English

Dundee corpus: place all the files in a dataset-specific folder;
GECO corpus: download only the MonolingualReadingData.xlsx file and place it in a dataset-specific folder;
ZuCo 1.0 corpus: download only the MATLAB files for tasks 1 and 2 and place them in two separated dataset-specific folders;
ZuCo 2.0 corpus: download only the MATLAB files for task 1 and place it in a dataset-specific folder. After normalizing the ZuCo data seprately per task, you can use the script combine_datasets.py to merge them into a single dataset, if needed.

Dutch

GECO corpus: download only the L1ReadingData.xlsx file and place it in a dataset-specific folder;

German

Potsdam Textbook Corpus: These files nees to be preprocessed so that the words and corresponding eye-tracking features are merged into one file per text and reader.

Russian

Russian Sentence Corpus: Download the file data_103.csv and place it in a dataset folder.

The random state can be set in settings.py. The model parameters are set in params/gaze/config.json.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data/gaze/zuco		data/gaze/zuco
params/gaze		params/gaze
processing		processing
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eye-Tracking Feature Prediction

Folders

Requirements

Running the code

Data

English

Dutch

German

Russian

About

Releases

Packages

Languages

felixhultin/multilingual-gaze

Folders and files

Latest commit

History

Repository files navigation

Eye-Tracking Feature Prediction

Folders

Requirements

Running the code

Data

English

Dutch

German

Russian

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages