-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spider: Illinois Finance Authority #914
Comments
I'd like to work on this issue |
@aneesh404 sounds good! |
Hi! I'm sorry I'm not getting time to work on this issue. Please feel free to assign it to someone else. |
Hi! It looks like this issue isn't claimed. Is it ok if I work on this issue? |
@janeskim all yours! |
If this one is open I'm going to work on it |
@mesterhammerfic sounds great! Assigning you now |
Hi, I was wondering If I could work on this issue if it hasn't been active recently. Thanks! |
@ledaliang thanks for your interest! We try to limit contributors to one issue at a time, but once your other PR is merged you can feel free to work on this one |
Hi there I've only just asked for a Slack invite, but could I start working on this now? |
@PatrickKlingler sure! Marking it claimed now |
Hey Patrick would it be possible to add another PDF parser? The PyPDF2 parser does not seem to work for the PDFs on IFA's website, i.e. it returns an empty string. I copied this code to parse the PDF: https://github.com/City-Bureau/city-scrapers/blob/main/city_scrapers/spiders/il_pollution_control.py#L103 Apparently PyPDF2 is limited to certain kinds of PDF encodings: https://stackoverflow.com/questions/30272269/python-text-extraction-does-not-work-on-some-pdfs I ended up using |
@PatrickKlingler gotcha, we've run into issues with PyPDF2 so I think it's fine to add something additional here, but on other projects we've been working with |
@PatrickKlingler wanted to follow up on this, we just replaced PyPDF2 with |
Good to hear!
Haven't been able to get to this in a while, but I'll have some time this
weekend!
…On Tue, Jul 14, 2020, 9:13 AM Patrick Sier ***@***.***> wrote:
@PatrickKlingler <https://github.com/PatrickKlingler> wanted to follow up
on this, we just replaced PyPDF2 with pdfminer.six throughout all of our
repos so hopefully that makes this easier!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#914 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADEVAQHNRBZWNJTA73FL3ZTR3RKXBANCNFSM4JEN2HEQ>
.
|
Hey, seems like this issue has been opened for a while. I would like to tackle on this issue as my first contrib. Also seems like a good opportunity since I have built projects using Scrapy before. If that's fine by you. |
@solisedwin yep, this has been inactive more than 30 days so it's all yours if you're interested! I can assign you now |
Hey I'm still working on this web crawler. Just been rewriting it and fine tuning it for better code readability. Should have it done soon. Thanks |
URL: https://www.il-fa.com/
Documents URL: https://www.il-fa.com/public-access/board-documents/
Spider Name: il_finance_authority
Agency Name: Illinois Finance Authority
See the contribution guide for information on how to get started
The text was updated successfully, but these errors were encountered: