Skip to content

Latest commit

 

History

History
24 lines (19 loc) · 799 Bytes

Readme.md

File metadata and controls

24 lines (19 loc) · 799 Bytes

Convert winobias dataset from .txt format to .conll format

Based on Berkeley Coref system (please check their website for more info)

  1. extract the senteces from winobias.txt (in our case, winobias.txt means anti_stereotyped_type1.txt.dev etc.)
mkdir wino_sentences
python toSentences.py data/anti_stereotyped_type1.txt.dev wino_sentences/ 
  1. Run Berkeleycoref preprocessh script (refer to "preprocessing" section here)

  2. Add all the side info obtained by Berkeleycoref to our data:

mkdir wino_berkeley
python addCoref.py data/winobias.txt data/wino_preprocess/ wino_berkeley/
mkdir wino_conll
python toWino.py wino_berkeley/ wino_conll/