This repository contains the official implementation of XDetox: Text Detoxification with Token-Level Toxicity Explanations, accepted to EMNLP 2024.
Methods for mitigating toxic content through masking and infilling often overlook the decision-making process, leading to either insufficient or excessive modifications of toxic tokens. To address this challenge, we propose XDetox, a novel method that integrates tokenlevel toxicity explanations with the masking and infilling detoxification process. We utilized this approach with two strategies to enhance the performance of detoxification. First, identifying toxic tokens to improve the quality of masking. Second, selecting the regenerated sentence by re-ranking the least toxic sentence among candidates. Our experimental results show state-of-the-art performance across four datasets compared to existing detoxification methods. Furthermore, human evaluations indicate that our method outperforms baselines in both fluency and toxicity reduction. These results demonstrate the effectiveness of our method in text detoxification.
We conducted our experiments using an NVIDIA A100 GPU with 40GB of VRAM. For systems with lower VRAM, the method can still be run; however, you may need to reduce the batch size to accommodate the available memory.
To clone this repository along with its submodules, use the following command:
git clone --recurse-submodules https://github.com/LeeBumSeok/XDetox.git
Ensure you have Python 3.8+ installed along with the required dependencies. Install the necessary libraries using:
pip install -r requirements.txt
After cloning the repository, you can easily run the XDetox method with the provided script. Use the following command to run:
python lab.py --all --output_folder single --evaluate --ranking
This research is inspired by ideas from previous work on text detoxification and explainability, particularly MARCO (Hallinan et al., 2023) and DecompX (Modarressi et al., 2023).
TODO