Skip to content

Commit

Permalink
Merge pull request #592 from psivash/wip
Browse files Browse the repository at this point in the history
feat: Added "class" attribute when searching for nodes to check
  • Loading branch information
AndyTheFactory authored Dec 28, 2023
2 parents 036e7da + 475ab6d commit 6ccf8af
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 25 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@ First release after the fork. This release is based on the 0.1.7 release of the
- **parsing**: improved publication date extraction([`4d137eb`](https://github.com/AndyTheFactory/newspaper4k/commit/4d137eb0b6d5b3df971a01f4aa8c1961af9da118)) (by Andrei)
- some linter errors, whitespaces and spelling([`79553f6`](https://github.com/AndyTheFactory/newspaper4k/commit/79553f6302cea1a6e36103fb4dc1c675ca704cd3)) (by Andrei)

################################### These are the original newspaper3k release notes ###################################
########################################################################################################################
## [0.1.7](https://github.com/codelucas/newspaper/tree/0.1.7) (2016-01-30)
[Full Changelog](https://github.com/codelucas/newspaper/compare/0.1.6...0.1.7)

Expand Down
44 changes: 20 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Newspaper4k: Article scraping & curation, a continuation of the beloved newspaper3k by codelucas
# Newspaper4k: Article Scraping & Curation, a continuation of the beloved newspaper3k by codelucas
[![PyPI version](https://badge.fury.io/py/newspaper4k.svg)](https://badge.fury.io/py/newspaper4k)
![Build status](https://github.com/AndyTheFactory/newspaper4k/actions/workflows/pipeline.yml/badge.svg)
[![Coverage status](https://coveralls.io/repos/github/AndyTheFactory/newspaper4k/badge.svg?branch=master)](https://coveralls.io/github/AndyTheFactory/newspaper4k)
Expand Down Expand Up @@ -152,15 +152,22 @@ Also, in any case, please provide the following information:

# Requirements and dependencies

Following system packages are required:

- PIL: `libjpeg-dev` `zlib1g-dev` `libpng12-dev`
- lxml: `libxml2-dev` `libxslt-dev`
- Python Development version: `python-dev`


**If you are on Debian / Ubuntu**, install using the following:

- Install `pip3` command needed to install `newspaper3k` package:
- Install `python3` and `python3-dev`:

$ sudo apt-get install python3-pip
$ sudo apt-get install python3 python3-dev

- Python development version, needed for Python.h:
- Install `pip3` command needed to install `newspaper4k` package:

$ sudo apt-get install python-dev
$ sudo apt-get install python3-pip

- lxml requirements:

Expand All @@ -173,13 +180,17 @@ Also, in any case, please provide the following information:
NOTE: If you find problem installing `libpng12-dev`, try installing
`libpng-dev`.

- Download NLP related corpora:
- Install the distribution via pip:

$ pip3 install newspaper4k

- Download NLP (nltk) related corpora:

$ curl https://raw.githubusercontent.com/AndyTheFactory/newspaper4k/master/download_corpora.py | python3

- Install the distribution via pip:
- Download NLP (nltk) related corpora:

$ pip3 install newspaper3k
$ curl https://raw.githubusercontent.com/AndyTheFactory/newspaper4k/master/download_corpora.py | python3

**If you are on OSX**, install using the following, you may use both
homebrew or macports:
Expand All @@ -188,25 +199,10 @@ homebrew or macports:

$ brew install libtiff libjpeg webp little-cms2

$ pip3 install newspaper3k
$ pip3 install newspaper4k

$ curl https://raw.githubusercontent.com/AndyTheFactory/newspaper4k/master/download_corpora.py | python3

**Otherwise**, install with the following:

NOTE: You will still most likely need to install the following libraries
via your package manager

- PIL: `libjpeg-dev` `zlib1g-dev` `libpng12-dev`
- lxml: `libxml2-dev` `libxslt-dev`
- Python Development version: `python-dev`

```{=html}
<!-- -->
```
$ pip3 install newspaper3k

$ curl https://raw.githubusercontent.com/codelucas/newspaper/master/download_corpora.py | python3

# LICENSE

Expand Down
2 changes: 1 addition & 1 deletion tests/create_test_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ def main(args):


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Create test data for newspaper3k")
parser = argparse.ArgumentParser(description="Create test data for newspaper4k")
parser.add_argument("--url", type=str, help="URL to download", required=True)
parser.add_argument(
"--language",
Expand Down

0 comments on commit 6ccf8af

Please sign in to comment.