How to get the scraper to not store original HTML files? #233

nishalsach · 2022-07-25T19:30:35Z

nishalsach
Jul 25, 2022

Hi, the documentation says that in the default config, the tool also stores the HTML files from websites, when running in CLI mode. I wanted to ask if there was a way to switch this off and only store the extracted JSON. I couldn't find out how to do so in the config file. Any help would be appreciated!

nishalsach · 2024-03-05T22:13:51Z

nishalsach
Mar 5, 2024
Author

In case anybody else was curious, I resolved this by setting up a cron job in a new command line session that would delete all .html files in the news-please subdirectory for the chosen date every 2 minutes or so.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get the scraper to not store original HTML files? #233

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to get the scraper to not store original HTML files? #233

nishalsach Jul 25, 2022

Replies: 1 comment

nishalsach Mar 5, 2024 Author

nishalsach
Jul 25, 2022

nishalsach
Mar 5, 2024
Author