Web Crawler

Description

Simple web crawling package.

Starts crawling from given domain_url. Visits every html link within specified domain via HTTP/HTTPS, collects each site title and links and repeats the process for collected links.

Returns dictionary of dictionaries as follows:

{
    'http://0.0.0.0:8000': {
    'title': 'Index',
    'links': {'http://0.0.0.0:8000/example.html', 'http://0.0.0.0:8000/site.html'}
    }
    ...
}

Installation

pip install git+https://github.com/myslak71/web_crawler.git

Usage

In scripts:

from web_crawler import site_map
site_map(url)

CLI:

$ web-crawler --url URL

OPTION		DESCRIPTION
-u, --url	REQUIRED	Domain URL to start crawling from
-h, --help	OPTIONAL	Help

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
example		example
tests		tests
web_crawler		web_crawler
.gitignore		.gitignore
.travis.yml		.travis.yml
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler

Description

Installation

Usage

About

Releases

Packages

Languages

myslak71/web-crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

Description

Installation

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages