-
Notifications
You must be signed in to change notification settings - Fork 0
How does see link works?
See-Link first extracts the URL from the input text. It uses the first URL in the list for further action. It then visits the URL and scrapes the data like a normal user would do by opening a browser. It uses puppeteer to do this and by default, it opens the webpage in a headless chrome/chromium browser. To bypass certain restrictions put up by websites to detect bots, see-link uses facebook's user agent in its request headers.
See-Link gives first priority to the open-graph markup followed by twitter-card markup and other meta tags. Although the priority given to the markups is almost the same all over the package, there are some different approaches for different meta info.
Following headers form the meta info returned by See-Link. Meta markups/tags are listed in the priority order that see-link follows to scrape data using them.
og:title
twitter:title
- Document
title
tag - First
h1
tag - First
h2
tag
og:description
twitter:description
- Description meta tag
- First
p
tag
See-Link looks through og:image
, twitter:image
markups and if it finds nothing then it looks for a link tag with attribute rel="image_src"
. However many websites don't have the above markups. In such a case see-link parses images from the page's body.
The problem in this approach is: how to determine which image to use? A user asked a similar question on Quora:
How does Facebook determine which images to show as thumbnails when posting a link?
A Facebook employee answered the question (in 2010):
On the client-side, the candidate images are filtered by javascript that removes all images less than 50 pixels in height or width and all images with a ratio of the longest dimension to the shortest dimension greater than 3:1. The filtered images are then sorted by area and users are given a selection of multiple images that exist.
See-Link uses the same strategy to filter out the possible results and returns the first image in the list after filtering.
- Link tag with
rel='canonical'
attribute og:url
If nothing is found then it uses the page's URL and returns the domain name.
This info is present in the meta tag with attribute name="theme-color"
. The theme-color info can be leveraged by designers to create an awesome preview. Since many sites don't provide the info, See-Link defaults to returning the dominant color of the page if this metadata is not found.
See-Link uses color-thief to extract the dominant color from the page. By default getDominantThemeColor
is set to true
.
og:video
twitter:player
- Link tag with
rel='video_src'
attribute - The page URL, if it points to a video content
It looks through the link tag with attributes in the order:
rel='icon'
rel='shortcut icon'
rel='apple-touch-icon'
This is the type of the web page and looks for og:type
meta markup.