Skip to content

XDetox: Text Detoxification with Token-Level Toxicity Explanations (EMNLP 2024 Main, short)

Notifications You must be signed in to change notification settings

LeeBumSeok/XDetox

Repository files navigation

XDetox

This repository contains the official implementation of XDetox: Text Detoxification with Token-Level Toxicity Explanations, accepted to EMNLP 2024.

Paper

Abstract

Methods for mitigating toxic content through masking and infilling often overlook the decision-making process, leading to either insufficient or excessive modifications of toxic tokens. To address this challenge, we propose XDetox, a novel method that integrates tokenlevel toxicity explanations with the masking and infilling detoxification process. We utilized this approach with two strategies to enhance the performance of detoxification. First, identifying toxic tokens to improve the quality of masking. Second, selecting the regenerated sentence by re-ranking the least toxic sentence among candidates. Our experimental results show state-of-the-art performance across four datasets compared to existing detoxification methods. Furthermore, human evaluations indicate that our method outperforms baselines in both fluency and toxicity reduction. These results demonstrate the effectiveness of our method in text detoxification.

Run Code

Recommended Hardware

We conducted our experiments using an NVIDIA A100 GPU with 40GB of VRAM. For systems with lower VRAM, the method can still be run; however, you may need to reduce the batch size to accommodate the available memory.

Installation

To clone this repository along with its submodules, use the following command:

git clone --recurse-submodules https://github.com/LeeBumSeok/XDetox.git

Requirements

Ensure you have Python 3.8+ installed along with the required dependencies. Install the necessary libraries using:

pip install -r requirements.txt

Quick Start

After cloning the repository, you can easily run the XDetox method with the provided script. Use the following command to run:

python lab.py --all --output_folder single --evaluate --ranking

Related Work

This research is inspired by ideas from previous work on text detoxification and explainability, particularly MARCO (Hallinan et al., 2023) and DecompX (Modarressi et al., 2023).

Citation

TODO

About

XDetox: Text Detoxification with Token-Level Toxicity Explanations (EMNLP 2024 Main, short)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages