Legal Text Classification for Competition Impact Assessment Studies: Semantic Similarity and Textual Entailment with Sentence Transformers
Competition impact assessment studies (CIA) are conducted by policymakers to evaluate various legislation and identify potential issues that may distort markets in an anticompetitive manner. CIA is a crucial part of policymaking as it provides a systematic approach to developing effective sector regulations. However, the comprehensive review of legal documents for these studies take a significant amount of time and manpower. To streamline this process, this study proposes a pre-screening tool developed using sentence transformers trained on semantic similarity and textual entailment tasks. The models classify whether pieces of legislation contain potentially restrictive provisions that need to be flagged to CIA proponents. Using an adaptive pre-training approach, information from unlabeled legal text is used to introduce domain knowledge to pre-trained BERT models. The models are then fine-tuned for downstream classification tasks. The best model is able to correctly classify 85% of competition restrictive provisions in legal text.