The blazing fast way to Yank URIs from text.
go get -u github.com/eskriett/yuri
import "github.com/eskriett/yuri"
func main() {
yuri.YankURIs([]byte("yuri lives at https://github.com/eskriett/yuri"))
// []string{"https://github.com/eskriett/yuri"}
}
go get -u github.com/eskriett/yuri/cmd/yuri
$ echo "I want to extract: http://example.com/" | yuri
http://example.com/
yuri tries to extract URIs of numerous schemes from text as fast as possible. Compared to most similar tools which use regular expressions, yuri uses a DFA built using ragel for performance.
The schemes yuri is currently able to extract are:
ftp
http
https
hxxp
hxxps
mailto
While the tool works well in many cases, it may sometimes return URIs which are not fully compliant with their corresponding RFC (yuri is loosely based on the ABNF provided by RFC3987). If full RFC complicance is a requirement for a given scheme, a post-pass URI validation phase is advisable.
Please note that I developed yuri to solve a specific problem and am aware there may be problems. If you notice yuri fails to extract a URI or want to submit improvements (e.g. support for an additional scheme), please submit a pull request which includes tests for your changes.