Codes for "Mitigating the Negative Impact of Over-association for Conversational Query Production" (Information Processing & Management).
- Data preparation: Download datasets (Wizard-of-Internet or DuSinc) and place data file under "saved_data" directory. Run scripts under ''databased/'' to process data.
- Training : Run scripts are under ''databased'' like "". The models are also saved in "saved_data" directory. Model predictions are generated when training is done automatically, which is saved as "generated_predictions.txt" by default.
- Evaluation: Use "" under ''databased'' to evaluate. Please note the file path.
- Data preparation: Use the trained models above to prepare generated candidates for model-wholeseq. For Dusinc, we recommand to train k model separately because of its small size (split the data set to k fold, and get pseudo data for i-th fold using data from other folds). In this way, we can get better candidates according with model distribution.
- Training: Run "modelbased/". ''en'' or ''zh'' for the second argument. The hyperparameters are in "" and "" respectively.
- Predicting: Run "modelbased/" to generate predictions.
- Evaluation: Same as "Base / Data-based". Note the file path.