Skip to content

The blazing fast way to extract URLs from text

License

Notifications You must be signed in to change notification settings

starshipyard/yuri

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

yuri

GoDoc Build Status

The blazing fast way to Yank URIs from text.

Usage

go get -u github.com/eskriett/yuri
import "github.com/eskriett/yuri"

func main() {
    yuri.YankURIs([]byte("yuri lives at https://github.com/eskriett/yuri"))
    // []string{"https://github.com/eskriett/yuri"}
}

cmd/yuri

go get -u github.com/eskriett/yuri/cmd/yuri
$ echo "I want to extract: http://example.com/" | yuri
http://example.com/

Implementation details

yuri tries to extract URIs of numerous schemes from text as fast as possible. Compared to most similar tools which use regular expressions, yuri uses a DFA built using ragel for performance.

The schemes yuri is currently able to extract are:

ftp
http
https
hxxp
hxxps
mailto

While the tool works well in many cases, it may sometimes return URIs which are not fully compliant with their corresponding RFC (yuri is loosely based on the ABNF provided by RFC3987). If full RFC complicance is a requirement for a given scheme, a post-pass URI validation phase is advisable.

Contributing

Please note that I developed yuri to solve a specific problem and am aware there may be problems. If you notice yuri fails to extract a URI or want to submit improvements (e.g. support for an additional scheme), please submit a pull request which includes tests for your changes.

About

The blazing fast way to extract URLs from text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published