Skip to content

Brazilian Banking Regulation Corpora (BBRC), an NLP dataset described in the article with the same name. It was published in FinNLP 2024 (LREC-COLING 2024).

License

Notifications You must be signed in to change notification settings

bancodobrasil/bbrc

Repository files navigation

BBRC: Brazilian Banking Regulation Corpora

Machine learning experiments for the article BBRC: Brazilian Banking Regulation Corpora, which describes an NLP corpora (dataset) with the same name as the article. It was published in the 7th Financial Technology and Natural Language Processing (FinNLP) 2024 (within LREC-COLING 2024).

We present BBRC, a collection of 25 corpus of banking regulatory risk from different departments of Banco do Brasil (BB). These are individual corpus about investments, insurance, human resources, security, technology, treasury, loans, accounting, fraud, credit cards, payment methods, agribusiness, risks, etc. They were annotated in binary form by experts indicating whether each regulatory document contains regulatory risk that may require changes to products, processes, services, and channels of a bank department or not. The corpora in Portuguese contain documents from 26 Brazilian regulatory authorities in the financial sector. In total, there are 61,650 annotated documents, mostly between half and three pages long. The corpora belong to a Natural Language Processing (NLP) application that has been in production since 2020. The corpora size is 1.6GB.

The article (paper): https://aclanthology.org/2024.finnlp-1.15.pdf

Hugging Face link to the data: https://huggingface.co/datasets/bancodobrasil/bbrc_brazilian_banking_regulation_corpora

Presentation video: https://drive.google.com/file/d/1Lk_xVno8odMJJK2yskEe9azQov4Y7vcv/view?usp=sharing

Presentation: https://drive.google.com/file/d/1vxKThA_CqDIX6XalFk68yb8WeTwDtek7/view?usp=sharing

FinNLP: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp-kdf-2024/home

LREC-COLING 2024: https://lrec-coling-2024.org/

LinkedIn post: https://www.linkedin.com/feed/update/urn:li:activity:7199492778874015745/

About

Brazilian Banking Regulation Corpora (BBRC), an NLP dataset described in the article with the same name. It was published in FinNLP 2024 (LREC-COLING 2024).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published