Skip to content

coastalcph/danish_legal_lms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Danish Legal Language Models

Available Language Models for Danish

Model Name Layers / Units / Heads Vocab. Parameters Legal
Maltehb/danish-bert-botxo 12 / 768 / 12 32K 110M
xlm-roberta-base 12 / 768 / 12 256K 278M
coastalcph/danish-legal-lm-base 12 / 768 / 12 32K 110M
coastalcph/danish-legal-bert-base 12 / 768 / 12 32K 110M
coastalcph/danish-legal-longformer-base 12 / 768 / 12 32K 134M
coastalcph/danish-legal-xlm-base 12 / 768 / 12 32K 110M

Danish Legal Pile

This model is pre-trained on a combination of the Danish part of the MultiEURLEX (Chalkidis et al., 2021) dataset comprising 65k EU laws and two subsets (retsinformationdk, retspraksis) of the Danish Gigaword Corpus (Derczynski et al., 2021) comprising legal proceedings. It achieves the following results on the evaluation set.

Model Name Loss Accuracy
Maltehb/danish-bert-botxo 22.3 7.038
coastalcph/danish-legal-lm-base 84.8 0.651
coastalcph/danish-legal-bert-base 80.1 0.878
coastalcph/danish-legal-bert-base 82.5 0.768
coastalcph/danish-legal-xlm-base 83.1 0.727

Benchmarking

Model Name EURLEX Val. EURLEX Test
Maltehb/danish-bert-botxo 73.7 / 42.8 67.6 / 38.2
coastalcph/danish-legal-lm-base 75.1 / 46.5 69.1 / 41.9
coastalcph/danish-legal-bert-base 75.0 / 50.4 68.9 / 44.3
coastalcph/danish-legal-xlm-base TBA TBA
coastalcph/danish-legal-longformer-base 75.7 / 52.9 69.6 / 47.0
coastalcph/danish-legal-longformer-base + SD Penalty (Pezeshki et al., 2020) 76.1 / 52.9 69.9 / 47.0

The top-2 best models (coastalcph/danish-legal-longformer-base, coastalcph/danish-legal-longformer-base-sd) are available on HuggingFace Hub with instructions on how can be used as text classifier or feature extractor.

Code Base

Train new RoBERTa LM

sh train_mlm_gpu.sh

Modify pre-trained XLM-R

export PYTHONPATH=.
python src/mod_teacher_model.py --teacher_model_path coastalcph/danish-legal-lm-base --student_model_path coastalcph/danish-legal-lm-base

Longformerize pre-trained RoBERTa LM

export PYTHONPATH=.
python src/longformerize_model.py --roberta_model_path coastalcph/danish-legal-lm-base --max_length 2048 --attention_window 128

About

Danish Legal Language Models

Resources

Stars

Watchers

Forks

Packages

No packages published