This program has been used to perform the evaluation of our proposed argument mining pipeline.
- Docker and Docker-Compose
- Alternatively: Python 3.7 and Poetry 1.0
- Duplicate the file
config-example.yml
toconfig.yml
and adapt the settings to your liking. - Create the folders
data/input
anddata/output
. - If using Docker, please do not edit the web server settings.
Docker will download all required data on the first run and thus may take a while. Future runs are cached and the app available immediately.
Using Docker, start the program with:
docker-compose run app python -m recap_am.{entrypoint}
Using Poetry, start the program with:
poetry run python -m recap_am.{entrypoint}
The following entry points are available:
server
: Starts a flask server providing a website to perform interactive mining. The address is printed in the terminal.cli
: Start the pipeline without interaction.evaluate
: Perform a grid computation with the parameters major claim method, relationship type threshold and graph construction method.
Per default, the program will look for input data in data/input
.
If you just want to convert plain text to argument graph, a .txt
file is enough.
If you want to compare a benchmark graph to the generated on, please provide a .json
file conforming to the OVA-format.
Category | Features |
---|---|
Structural | Punctuation, sentence length and position. |
Indicators | Claim-premise and first-person indicators. |
Syntactic | Depth of constituency parse trees, presence of modal verbs, number of grammatical productions in the parse tree. |
Embeddings | GloVe sentence embeddings (arithmetic mean of its word vectors). |
To start training, run the program with:
poetry run python -m recap_am.adu.training.train_adu
or
poetry run python -m recap_am.adu.training.train_clpr
for the ADU or Claim/Premise classifier respectively.
Start the jupyter notebook recap_am/preprocessing/pipeline.ipynb
within the container:
- Run cells & import libraries.
- Load your CSV data with the rows
child, parent, stance
into a DataFramedf
. - Run the following call to generate a dataset using GloVe Embeddings for either "english" or "german":
data = prep_dataset(df, model = "glove", language = "english")
. - Use
data
to train any classifier.