Skip to content

Library for Java/Kotlin to extract touch (web clip) icon information from the website

License

Notifications You must be signed in to change notification settings

ohmae/touch-icon-extractor

Repository files navigation

touch-icon-extractor

license GitHub release GitHub issues GitHub closed issues GitHub Actions codecov Maven Central

This is a library to extract WebClip icon information from the website. Available in JVM and Android as this is written in pure Kotlin.

Sample App

This app is in touch-icon-extractor-sample And also published in Play store

Library structure

  • net.mm2d:touchicon: core component. All feature is provided by this. Use UrlConnection for HTTP access and its own parser for HTML parse.
  • net.mm2d:touchicon-http-okhttp: Adapter to use OkHttp for HTTP access.

net.mm2d:touchicon-html-jsoup: is EOL. The last version is 0.8.3

How to use

jCenter will close in May. In 0.9.1 moved to mavenCentral from jcenter.
Please note that the groupID has changed

Download from mavenCentral.
The latest version is: Maven Central

dependencies {
    implementation("net.mm2d.touchicon:touchicon:$touchIconVersion")
    implementation("net.mm2d.touchicon:touchicon-http-okhttp:$touchIconVersion") // Optional: If use OkHttp for HTTP access
}

Versions below 0.9.1 were distributed with jCenter. However, jCenter will close and old versions are not migrated to mavenCentral. If you need an older version, please use the GitHub Pages repository.

repositories {
    maven { url = URI("https://ohmae.github.com/maven") }
}
dependencies {
    implementation("net.mm2d:touchicon:$touchIconVersion")
    implementation("net.mm2d:touchicon-http-okhttp:$touchIconVersion") // Optional: If use OkHttp for HTTP access
}

API document

Documentation comment is written in KDoc.

Sample code

val extractor = TouchIconExtractor()                    // initialize
extractor.userAgent = "user agent string"               // option: set User-Agent
extractor.headers = mapOf("Cookie" to "hoge=fuga")      // option: set additional HTTP header
extractor.downloadLimit = 10_000                        // option: set download limit (default 64kB).
                                                        // <= 0 means no limit
//...
GlobalScope.launch(Dispatchers.Main) {
    val job = async(Dispatchers.IO) {
        extractor.fromPage(siteUrl, true)               // Do not call from the Main thread
    }
    //...
}

If in RxJava

//...
Single.fromCallable { extractor.fromPage(url, true) }   // Do not call from the Main thread
        .subscribeOn(Schedulers.io())
        .observeOn(AndroidSchedulers.mainThread())
        .subscribe({
            //...
        }, {})

By default, this use HttpUrlConnection for HTTP access. If you want to use OkHttp, use touchicon-http-okhttp module.

val extractor = TouchIconExtractor(
    httpClient = OkHttpAdapterFactory.create(OkHttpClient())
)

HTTP Session

You may want to use communication in the same session as other communication. You need to use the same cookie in WebView and HTTP session of this library. For example, to use the same session as WebView in an Android application,

For the default HTTP client using HttpUrlConnection, implement CookieHandler.

object WebViewCookieHandler : CookieHandler {
    private val cookieManager = CookieManager.getInstance()

    override fun saveCookie(url: String, value: String) {
        cookieManager.setCookie(url, value)
    }

    override fun loadCookie(url: String): String? = cookieManager.getCookie(url)
}
TouchIconExtractor(
    httpClient = SimpleHttpClientAdapterFactory.create(WebViewCookieHandler)
)

For OkHttp, set CookieJar in OkHttpClient as you know.

object WebViewCookieJar : CookieJar {
    private val cookieManager = CookieManager.getInstance()

    override fun saveFromResponse(url: HttpUrl, cookies: List<Cookie>) {
        val urlString = url.toString()
        cookies.forEach {
            cookieManager.setCookie(urlString, it.toString())
        }
    }

    override fun loadForRequest(url: HttpUrl): List<Cookie> =
        cookieManager.getCookie(url.toString()).let { cookie ->
            if (cookie.isNullOrEmpty()) {
                emptyList()
            } else {
                cookie.split(";")
                    .filter { it.isNotBlank() }
                    .mapNotNull { Cookie.parse(url, it) }
            }
        }
}
TouchIconExtractor(
    httpClient = OkHttpAdapterFactory.create(
        OkHttpClient.Builder()
            .cookieJar(WebViewCookieJar)
            .build()
    )
)

Operating principle

There are two kinds of methods for specifying the WebClip icon. This library supports both.

Icon associated with the wab page

Specify the following description in the HTML header.

<link rel="icon" href="/favicon.ico" type="image/x-icon">
<link rel="shortcut icon" href="/favicon.ico">
<link rel="apple-touch-icon" href="/apple-touch-icon.png" sizes="57x57">
<link rel="apple-touch-icon-precomposed" href="/apple-touch-icon-precomposed.png" sizes="80x80">

If you want this information, as following

extractor.fromPage(url)

This library attempts to download an HTML file from the specified URL. Since only the header is required, if the download size is larger than a certain size, the download is stopped there.

Analyzing the downloaded HTML file, Extract only link tags whose rel attribute is "icon", "shortcut icon", "apple-touch-icon", "apple-touch-icon-precomposed". Parse it, create an PageIcon instance, and return it as a result.

Web App Manifest

Although not strictly a WebClip icon, this can also get an icon written in the Web App Manifest.

This is described by the following JSON.

{
  "short_name": "name",
  "name": "Web App Icon",
  "icons": [
    {
      "src": "icon-1x.png",
      "type": "image/png",
      "sizes": "48x48"
    },
    {
      "src": "icon-2x.png",
      "type": "image/png",
      "sizes": "96x96"
    },
    {
      "src": "icon-4x.png",
      "type": "image/png",
      "sizes": "192x192"
    }
  ],
  "start_url": "index.html"
}

And it is described as follows in HTML.

<link rel="manifest" href="/manifest.json">

This information is expressed as WebAppIcon.

If you want this information, as following

extractor.fromPage(url, true)

As you guessed, it gets at the same time as PageIcon.

Icon associated with the Domain

Simply putting a file with a fixed name like "favicon.ico" in the root of the domain. Whether an icon exists or not can not be known until you try HTTP communication.

This is an inefficient, but there are Websites that are still deployed in this way. You should try only if you can not get it by the method in the previous section. Please be aware that this method can be annoying to the website administrator.

If you want this information, as following

extractor.fromDomain(url)

It checks whether the file exists, and returns the information if it exists.

The order of checking the existence of the icon is as follows

  1. apple-touch-icon-precomposed.png
  2. apple-touch-icon.png
  3. favicon.ico

If the file exists, the subsequent files will not be checked.

If you do not need precomposed, as following

extractor.fromDomain(url, false)

The order of checking the existence of the icon is as follows

  1. apple-touch-icon.png
  2. favicon.ico

Sometimes the size information is included in the name, such as "apple-touch-icon-120x120.png"

When

extractor.fromDomain(url, true, listOf("120x120", "72x72"))

The order of checking the existence of the icon is as follows

  1. apple-touch-icon-120x120-precomposed.png
  2. apple-touch-icon-120x120.png
  3. apple-touch-icon-72x72-precomposed.png
  4. apple-touch-icon-72x72.png
  5. apple-touch-icon-precomposed.png
  6. apple-touch-icon.png
  7. favicon.ico

There are methods to gather all the information (TouchIconExtractor#listFromDomain()) This is for debugging and verification, strongly recommended not to use in production..

Comparison of icons

Often you can get more than one icon. Which is the most appropriate icon depends on the application, but this library provides several Comparator.

val icons = extractor.fromDomain(url, true, listOf("120x120", "72x72"))
val bestIcon1 = icons.maxWith(IconComparator.SIZE)     // Compare by size. (the largest icon is the best)
val bestIcon2 = icons.maxWith(IconComparator.REL_SIZE) // Compare by rel, if same, compare by size

Dependent OSS

Author

大前 良介 (OHMAE Ryosuke) http://www.mm2d.net/

License

MIT License

About

Library for Java/Kotlin to extract touch (web clip) icon information from the website

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published