Machine learning experiments for the article BBRC: Brazilian Banking Regulation Corpora, which describes an NLP corpora (dataset) with the same name as the article. It was published in the 7th Financial Technology and Natural Language Processing (FinNLP) 2024 (within LREC-COLING 2024).
We present BBRC, a collection of 25 corpus of banking regulatory risk from different departments of Banco do Brasil (BB). These are individual corpus about investments, insurance, human resources, security, technology, treasury, loans, accounting, fraud, credit cards, payment methods, agribusiness, risks, etc. They were annotated in binary form by experts indicating whether each regulatory document contains regulatory risk that may require changes to products, processes, services, and channels of a bank department or not. The corpora in Portuguese contain documents from 26 Brazilian regulatory authorities in the financial sector. In total, there are 61,650 annotated documents, mostly between half and three pages long. The corpora belong to a Natural Language Processing (NLP) application that has been in production since 2020. The corpora size is 1.6GB.
The article (paper): https://aclanthology.org/2024.finnlp-1.15.pdf
Hugging Face link to the data: https://huggingface.co/datasets/bancodobrasil/bbrc_brazilian_banking_regulation_corpora
Presentation video: https://drive.google.com/file/d/1Lk_xVno8odMJJK2yskEe9azQov4Y7vcv/view?usp=sharing
Presentation: https://drive.google.com/file/d/1vxKThA_CqDIX6XalFk68yb8WeTwDtek7/view?usp=sharing
FinNLP: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-kdf-2024/home
LREC-COLING 2024: https://lrec-coling-2024.org/
LinkedIn post: https://www.linkedin.com/feed/update/urn:li:activity:7199492778874015745/