A fast javascript dom parser for web scraping.
cargo add jsdom
use std::collections::HashSet;
use jsdom::extract::extract_links;
const SCRIPT: &str = r###"
var ele = document.createElement('a');
ele.href = 'https://a11ywatch.com';
"###;
#[test]
fn parse_links() {
// build tree with elements created from the nodes todo
let links: HashSet<String> = extract_links(SCRIPT);
assert!(links.contains("https://a11ywatch.com"))
}
This package will rollout features that are most important for web scraping first.
hashbrown
: Enable the hashbrown crate.tokio
: Enable tokio streaming utils.
Intro stage can handle elements created in statements and expressions.