Skip to content

Commit

Permalink
Check for url-encoded fragments
Browse files Browse the repository at this point in the history
  • Loading branch information
jyn514 authored and Joshua Nelson committed Jan 9, 2021
1 parent aba1fac commit 0f24c88
Show file tree
Hide file tree
Showing 7 changed files with 25 additions and 1 deletion.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@

* When a website gives 405 Method Not Supported for HEAD requests, fall back to GET. In particular,
this no longer marks all links to play.rust-lang.org as broken. [PR#136]
* URL-encoded fragments, like `#%E2%80%A0`, are now decoded. [PR#141]

[PR#136]: https://github.com/deadlinks/cargo-deadlinks/pull/136
[PR#141]: https://github.com/deadlinks/cargo-deadlinks/pull/141

#### Changed

Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ ureq = { version = "1.5.4", features = ["tls"], default-features = false }
serde = "1.0"
serde_derive = "1.0"
url = "2"
# Try to keep this in sync with `url`'s version
percent-encoding = "2"
walkdir = "2.1"

[dev-dependencies]
Expand Down
13 changes: 13 additions & 0 deletions src/check.rs
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,19 @@ fn is_fragment_available(
return Ok(());
}

// Try again with percent-decoding.
// NOTE: This isn't done unconditionally because it's possible the fragment it's linking to was also percent-encoded.
match percent_encoding::percent_decode(fragment.as_bytes()).decode_utf8() {
Ok(cow) => {
if fragments.contains(&*cow) {
return Ok(());
}
}
// If this was invalid UTF8 after percent-decoding, it can't be in the file (since we have a `String`, not opaque bytes).
// Assume it wasn't meant to be url-encoded.
Err(err) => warn!("{} url-decoded to invalid UTF8: {}", fragment, err),
}

// Rust documentation uses `#n-m` fragments and JavaScript to highlight
// a range of lines in HTML of source code, an element with `id`
// attribute of (literal) "#n-m" will not exist, but elements with
Expand Down
3 changes: 2 additions & 1 deletion tests/broken_links.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ fn reports_broken_links() {
.and(contains("Broken intra-doc link to [<code>links</code>]!"))
.and(contains(
"Fragment #fragments at index.html does not exist!",
)),
))
.and(contains("Fragment #%FF at index.html does not exist!")),
);
}

Expand Down
1 change: 1 addition & 0 deletions tests/broken_links/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
//! with [intra-doc](links) that will be emitted as HTML
//! and intra-doc [`links`][x] that won't.
//! It also has [links to](#fragments).
//! [Non-unicode link](#%FF)
4 changes: 4 additions & 0 deletions tests/simple_project/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
//! [Non-ascii link](#†)
//!
//! <div id="†">Some text</div>

/// Foo function
///
/// Has something to do with [bar](./fn.bar.html).
Expand Down

0 comments on commit 0f24c88

Please sign in to comment.