Spider: Illinois Finance Authority #914

pjsier · 2019-10-24T02:32:27Z

URL: https://www.il-fa.com/
Documents URL: https://www.il-fa.com/public-access/board-documents/
Spider Name: il_finance_authority
Agency Name: Illinois Finance Authority

See the contribution guide for information on how to get started

aneesh404 · 2019-10-24T02:43:27Z

I'd like to work on this issue

pjsier · 2019-10-24T02:48:40Z

@aneesh404 sounds good!

aneesh404 · 2019-10-28T11:24:25Z

Hi! I'm sorry I'm not getting time to work on this issue. Please feel free to assign it to someone else.

janeskim · 2019-10-30T00:22:52Z

Hi! It looks like this issue isn't claimed. Is it ok if I work on this issue?

pjsier · 2019-10-30T00:51:27Z

@janeskim all yours!

mesterhammerfic · 2020-03-06T23:37:20Z

If this one is open I'm going to work on it

pjsier · 2020-03-09T17:02:10Z

@mesterhammerfic sounds great! Assigning you now

ledaliang · 2020-06-16T15:22:58Z

Hi, I was wondering If I could work on this issue if it hasn't been active recently. Thanks!

pjsier · 2020-06-16T15:47:53Z

@ledaliang thanks for your interest! We try to limit contributors to one issue at a time, but once your other PR is merged you can feel free to work on this one

PatrickKlingler · 2020-06-26T19:01:30Z

Hi there I've only just asked for a Slack invite, but could I start working on this now?

pjsier · 2020-06-26T19:39:52Z

@PatrickKlingler sure! Marking it claimed now

PatrickKlingler · 2020-06-26T22:57:34Z

Hey Patrick would it be possible to add another PDF parser?

The PyPDF2 parser does not seem to work for the PDFs on IFA's website, i.e. it returns an empty string. I copied this code to parse the PDF: https://github.com/City-Bureau/city-scrapers/blob/main/city_scrapers/spiders/il_pollution_control.py#L103

Apparently PyPDF2 is limited to certain kinds of PDF encodings: https://stackoverflow.com/questions/30272269/python-text-extraction-does-not-work-on-some-pdfs

I ended up using pdfplumber and that works but it would introduce another dependency.

pjsier · 2020-06-27T11:50:19Z

@PatrickKlingler gotcha, we've run into issues with PyPDF2 so I think it's fine to add something additional here, but on other projects we've been working with pdfminer.six directly. If it works for you I'm fine with adding pdfminer.six as a dependency here since we'll try to eventually remove PyPDF2. We have an example of using it here https://github.com/City-Bureau/city-scrapers-cle/blob/46cf904f87f7c78fe2733eafc4ac97a68ce47d02/city_scrapers/spiders/cuya_developmental_disabilities.py#L36-L44

pjsier · 2020-07-14T13:13:04Z

@PatrickKlingler wanted to follow up on this, we just replaced PyPDF2 with pdfminer.six throughout all of our repos so hopefully that makes this easier!

PatrickKlingler · 2020-07-14T16:39:13Z

Good to hear! Haven't been able to get to this in a while, but I'll have some time this weekend!

…

On Tue, Jul 14, 2020, 9:13 AM Patrick Sier ***@***.***> wrote: @PatrickKlingler <https://github.com/PatrickKlingler> wanted to follow up on this, we just replaced PyPDF2 with pdfminer.six throughout all of our repos so hopefully that makes this easier! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#914 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADEVAQHNRBZWNJTA73FL3ZTR3RKXBANCNFSM4JEN2HEQ> .

solisedwin · 2020-09-29T06:27:34Z

Hey, seems like this issue has been opened for a while. I would like to tackle on this issue as my first contrib. Also seems like a good opportunity since I have built projects using Scrapy before. If that's fine by you.

pjsier · 2020-09-29T12:31:21Z

@solisedwin yep, this has been inactive more than 30 days so it's all yours if you're interested! I can assign you now

solisedwin · 2020-11-05T01:44:03Z

Hey I'm still working on this web crawler. Just been rewriting it and fine tuning it for better code readability. Should have it done soon. Thanks

pjsier added good first issue help wanted location: chicago Hacktoberfest labels Oct 24, 2019

pjsier added claimed and removed help wanted labels Oct 24, 2019

pjsier added help wanted and removed claimed labels Oct 28, 2019

pjsier added claimed and removed help wanted labels Oct 30, 2019

pjsier added help wanted and removed claimed labels Feb 6, 2020

pjsier added claimed and removed help wanted labels Mar 9, 2020

pjsier added help wanted and removed claimed labels Jun 16, 2020

pjsier added claimed and removed help wanted labels Jun 26, 2020

pjsier removed the Hacktoberfest label Sep 3, 2020

pjsier assigned solisedwin Sep 29, 2020

solisedwin mentioned this issue Dec 20, 2020

914 spider il finance authority #995

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spider: Illinois Finance Authority #914

Spider: Illinois Finance Authority #914

pjsier commented Oct 24, 2019

aneesh404 commented Oct 24, 2019

pjsier commented Oct 24, 2019

aneesh404 commented Oct 28, 2019

janeskim commented Oct 30, 2019

pjsier commented Oct 30, 2019

mesterhammerfic commented Mar 6, 2020

pjsier commented Mar 9, 2020

ledaliang commented Jun 16, 2020

pjsier commented Jun 16, 2020

PatrickKlingler commented Jun 26, 2020

pjsier commented Jun 26, 2020

PatrickKlingler commented Jun 26, 2020

pjsier commented Jun 27, 2020

pjsier commented Jul 14, 2020

PatrickKlingler commented Jul 14, 2020 via email

solisedwin commented Sep 29, 2020

pjsier commented Sep 29, 2020

solisedwin commented Nov 5, 2020

Spider: Illinois Finance Authority #914

Spider: Illinois Finance Authority #914

Comments

pjsier commented Oct 24, 2019

aneesh404 commented Oct 24, 2019

pjsier commented Oct 24, 2019

aneesh404 commented Oct 28, 2019

janeskim commented Oct 30, 2019

pjsier commented Oct 30, 2019

mesterhammerfic commented Mar 6, 2020

pjsier commented Mar 9, 2020

ledaliang commented Jun 16, 2020

pjsier commented Jun 16, 2020

PatrickKlingler commented Jun 26, 2020

pjsier commented Jun 26, 2020

PatrickKlingler commented Jun 26, 2020

pjsier commented Jun 27, 2020

pjsier commented Jul 14, 2020

PatrickKlingler commented Jul 14, 2020 via email

solisedwin commented Sep 29, 2020

pjsier commented Sep 29, 2020

solisedwin commented Nov 5, 2020