Skip to content

Commit

Permalink
repo refactoring
Browse files Browse the repository at this point in the history
  • Loading branch information
andreamust committed Dec 7, 2021
1 parent ae3d618 commit 8364bee
Show file tree
Hide file tree
Showing 7 changed files with 9 additions and 9 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The full documentation for installing Tesseract and its dependencies can be foun
### Internet Culturale scraper
For downloading resources from "Internet Culturale" you need to run the ```internet_culturale_scraper.py``` as:
```
python3 internet_culturale_scraper.py [-h] [--resource_url] [--output_path]
python3 src/internet_culturale_scraper.py [-h] [--resource_url] [--output_path]
```

The parameter to pass are described as follows:
Expand All @@ -41,7 +41,7 @@ The parameter to pass are described as follows:

You can also browse the script's documentation by typing:
```
python3 internet_culturale_scraper.py --help
python3 src/internet_culturale_scraper.py --help
```

The script will download all files related to the given resource to the specified folder.
Expand All @@ -58,7 +58,7 @@ To attempt to download the non-downloaded files again, simply restart the script

For downloading resources from "Internet Culturale" you need to run the ```internet_culturale_scraper.py``` as:
```
python3 hemeroteca_digital_scraper.py [-h] [--resource_url] [--output_path]
python3 src/hemeroteca_digital_scraper.py [-h] [--resource_url] [--output_path]
```

The parameter to pass are described as follows:
Expand All @@ -67,13 +67,13 @@ The parameter to pass are described as follows:
--output_path (string): the existing path in with to save the downloaded resource
```
The resource url must be the url of a specific resource search result of the "Query" section, only searching for resource's "Title", and clicking on "Search among free-access titles", as illustrated in the image:
![](../../../../Desktop/Screenshot 2021-12-07 at 15.45.55.png)
Remember to select **only** one resource at the time.

You can also browse the script's documentation by typing:
```
python3 hemeroteca_digital_scraper.py --help
python3 src/hemeroteca_digital_scraper.py --help
```

The resource url must be the url of a specific resource search result of the "Query" section, only searching for resource's "Title", and clicking on "Search among free-access titles", as illustrated in the image:
![](etc/img/hemeroteca_digital.png)
Remember to select **only** one resource at the time.

Binary file added etc/img/hemeroteca_digital.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions ocr_pdf.py → src/ocr_pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ def ocrise_multiple(final_path, language_mode, single_lang, multiple_langs, outp
multiple_langs, single_lang)
if len(filename.split('-')[:-1]) > 1:
if extension is None and f"{'-'.join(filename.split('-')[:-1])}.txt" not in [f for f in
os.listdir('./')]:
os.listdir('../')]:
save_to_txt(f"{'-'.join(filename.split('-')[:-1])}.txt", image_ocr)
elif extension is None and f"{'-'.join(filename.split('-')[:-1])}.txt" in [f for f in
os.listdir('./')]:
os.listdir('../')]:
with open(f"{'-'.join(filename.split('-')[:-1])}.txt", "a") as existing_file:
existing_file.write(f"\n\n\n{image_ocr}")
else:
Expand Down

0 comments on commit 8364bee

Please sign in to comment.