Skip to content
4poc edited this page Dec 20, 2014 · 12 revisions

url.discover

This public task is created for submitted urls, the urls are delegated to other tasks based on the url alone, for instance a url of the youtube.com domain is delegated to the url.youtube task. If no suitable task was found the url is delegated to the url.generic task.

For each delegated url a item.url_delegated event is created.

Parameters:

Name Type Description
item_id int The id of an existing item.
urls list List of URLs.

url.generic

Fetch magic numbers to determine the file type and proceed.

In order to determine the type of a generic url the first few bytes are loaded and observed to include either one of the supported magic numbers (aka the file signature) or only valid latin1/utf-8 codepoints.

In case a valid file signature was found the corresponding task is scheduled. The signatures and tasks are defined in the task.file module (in the FileTasks class).

In case only valid ascii/utf-8 was found it is assumed that the url is a regular website in which case the web.download(?) task is scheduled.

Parameters:

Name Type Description
item_id int The id of an existing item.
url string a URL
Clone this wiki locally