Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to gazettes published by AMA #32

Closed
danielfireman opened this issue Mar 31, 2023 · 0 comments · Fixed by #64
Closed

Add support to gazettes published by AMA #32

danielfireman opened this issue Mar 31, 2023 · 0 comments · Fixed by #64

Comments

@danielfireman
Copy link

To decrease costs, many Brazilian counties publish in gazettes in groups, like the Alagoas' County Association (AMA). Those gazettes (example here) are distinct from what QD deals, so far: they are one file which contains executive information about many counties. This is no easy task, so OKBR and IFAL joined forces to tackle this problem in the context of the Alagoas state, which counts 102 cities. That resulted in a project financed by IFAL and supported by OKBR, where you can follow the progress here.

The spider was ready and fetches gazette created according to the Sistema Gerenciador de Publicações Legais (SIGPub). The code to split the SIGPub gazette's text into each county's content is ready to be used in the context of Alagoas. Even though there is no hard restriction, all the 40+ automated tests created so far used gazettes from Alagoas.

So, this issue is about changing the data processing pipeline to use this code to split the SIGPUb gazette and to store each fragment as a different entry. After discussing with @giuliocc, we outlined a few decisions:

  • must not change the querido-diario, which might make onboarding harder
  • priority to focus on changing the flow in try_process_gazette_file
  • the first attempt could be as simple as: if gazette is in this list, do that.
  • the gazette id of sub-gazettes will be a compound of the main one with some county id
  • all county entries are going to share the same PDF file URL

@giuliocc already set up a test environment and gave me access to it. Thanks a lot!

cc/ @alex-custodio @Luisa-Coelho

@trevineju trevineju linked a pull request Dec 13, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant