Skip to content

A simple SEO tool to help you find the sitemap 🚀

License

Notifications You must be signed in to change notification settings

hsuanchi/Find-Sitemaps

Repository files navigation

PRs Welcome License:MIT PyPi:Find-Sitemap Code style: black

Find-Sitemap

Find Sitemap is a tool that helps you easily locate sitemaps on any website. It provides a quick and easy way to find the sitemap of a website, even if it is hidden deep within the website's directory structure. It can also detect multiple sitemaps, allowing you to view and analyze all the pages that are included in the site's sitemap.

>>> from Find_Sitemap import FindSitemap
>>> main = FindSitemap('google.com')
>>> main.crawl()
...
...
check 13801/13804: https://google.com/sitemap.xml
check 13802/13804: https://google.com/feed.xml
check 13803/13804: https://google.com/sitemap_index.xml
check 13804/13804: https://google.com/sitemapindex.xml
--------------------
Find sitemap urls len: 1
Find sitemap urls list: ['https://www.google.com/sitemap.xml']

🚀  Try now in Colab

Getting Started

Installing Requests on PyPI:

$ pip install Find-Sitemap

Prerequisites

Usage

  1. Show the subdomains, slugs_L1, slugs_L2, filetypes parameters.

    >>> from Find_Sitemap import FindSitemap
    >>> main = FindSitemap('google.com')
    >>> main.subdomains
    {'www.'}
    
    >>> main.slugs_L1
    {'/default', '/sitemap', '/feeds', '/api', '/contents' ...}
    
    >>> main.slugs_L2
    {'/sitemap', '/stock', '/sitemap1', '/sitemap0', ...}
    
    >>> main.filetypes
    {'txt', 'xml', 'xml.gz', 'jsp', 'html', ...}
    
  2. Add the subdomains, slugs_L1, slugs_L2, filetypes parameters.

    >>> from Find_Sitemap import FindSitemap
    >>> main = FindSitemap('google.com')
    >>> main.subdomains.add("shop.")
    >>> main.slugs_L1.add("/node")
    >>> main.slugs_L2.add("/site")
    >>> main.filetypes.add("xml")
    
  3. Remove the subdomains, slugs_L1, slugs_L2, filetypes parameters.

    >>> from Find_Sitemap import FindSitemap
    >>> main = FindSitemap('google.com')
    >>> main.subdomains.remove("shop.")
    >>> main.slugs_L1.remove("/node")
    >>> main.slugs_L2.remove("/site")
    >>> main.filetypes.remove("xml")
    
  4. Run the crawler.

    >>> from Find_Sitemap import FindSitemap
    >>> main = FindSitemap('google.com')
    >>> main.crawl()
    ...
    ...
    check 13801/13804: https://google.com/sitemap.xml
    check 13802/13804: https://google.com/feed.xml
    check 13803/13804: https://google.com/sitemap_index.xml
    check 13804/13804: https://google.com/sitemapindex.xml
    --------------------
    Find sitemap urls len: 1
    Find sitemap urls list: ['https://www.google.com/sitemap.xml']
    

Contributing

About

About

A simple SEO tool to help you find the sitemap 🚀

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published