Easily fetch, slice, dice, and output HTML (or XML) content from anywhere.
A Top Shelf Craft creation
Michael Rog, Proprietor
-
From your project directory, use Composer to require the plugin package:
composer require topshelfcraft/scraper
-
In the Control Panel, go to Settings → Plugins and click the “Install” button for Scraper.
-
There is no Step 3.
Scraper is also available for installation via the Craft CMS Plugin Store.
The Scraper plugin exposes a full-featured crawler object to your Twig template, allowing you to fetch, parse, and filter DOM elements from a remote source document.
When invoking the plugin, you can choose whether to use SimpleHtmlDom or Symfony components to instantiate your crawler:
{% set crawler = craft.scraper.using('symfony').get('https://zombo.com') %}
{% set crawler = craft.scraper.using('simplehtmldom').get('https://zombo.com') %}
I generally recommend using the Symfony components; they are more powerful and resilient to malformed source code. (The SimpleHtmlDom crawler is included to provide backwards compatibility with Craft 2 projects.)
When you opt for Symfony components, the get
method instantiates a full BrowserKit client, giving you access to all the BrowserKit and DomCrawler methods.
You can iterate over the DOM elements from your source document like this:
{% for node in crawler.filter('h2 > a') %}
{{ node.text() }}
{% endfor %}
When you opt for the SimpleHtmlDom crawler, the get
method instantiates a SimpleHtmlDom client, giving you access to all the SimpleHtmlDom methods.
You can iterate over the DOM elements from your source document like this:
{% for node in crawler.find('h1') %}
{{ node.innertext() }}
{% endfor %}
Ask a question on StackExchange, and ping me with a URL via email or Discord.
Craft 4.2.1+
Please open a GitHub Issue, submit a PR to the 4.x.dev
branch, or just email me.
- Plugin development: Michael Rog / @michaelrog
- Includes the "Simple HTML DOM" library, created by S. C. Chen
- Includes the Symfony DomCrawler via Goutte, created by Fabian Potencier / @fabpot
- Icon: "Upright vacuum cleaner" by Creaticca Creative Agency, via The Noun Project