diff --git a/index.html b/index.html index 4432929..02fef1d 100644 --- a/index.html +++ b/index.html @@ -69,6 +69,11 @@
+ + + Skip to content + +
@@ -222,6 +227,19 @@ + + + + @@ -233,6 +251,52 @@ + + + + @@ -1168,6 +1232,43 @@ + + + + +
@@ -1185,9 +1286,60 @@ -

Home

- - +

Tests

+

ckanext-files

+

Files as first-class citizens of CKAN. Upload, manage, remove files directly +and attach them to datasets, resources, etc.

+

Read the documentation for a full user guide.

+

Quickstart

+
    +
  1. +

    Install the extension +

    pip install ckanext-files
    +

    +
  2. +
  3. +

    Add files to the ckan.plugins setting in your CKAN + config file.

    +
  4. +
  5. +

    Run DB migrations +

    ckan db upgrade -p files
    +

    +
  6. +
  7. +

    Configure storage

    +
    ckanext.files.storage.default.type = files:fs
    +ckanext.files.storage.default.path = /tmp/example
    +ckanext.files.storage.default.create_path = true
    +
    +
  8. +
  9. +

    Upload your first file

    +
    ckanapi action files_file_create upload@~/Downloads/file.txt`
    +
    +
  10. +
+

Development

+

Install dev extras and nodeJS dependencies:

+
pip install -e '.[dev]'
+npm ci
+
+

Run unittests: +

pytest
+

+

Run frontend tests: +

# start test server in separate terminal
+make test-server
+
+# run tests
+npx cypress run
+

+

Run typecheck: +

npx pyright
+

+

License

+

AGPL

diff --git a/search/search_index.json b/search/search_index.json index a384bf1..2108b26 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-\\.\\_]+","pipeline":["stopWordFilter"]},"docs":[{"location":"api/","title":"API","text":""},{"location":"api/#files_file_createcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_create(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Create a new file.

This action passes uploaded file to the storage without strict validation. File is converted into standard upload object and everything else is controlled by storage. The same file may be uploaded to one storage and rejected by other, depending on configuration.

This action is way too powerful to use it directly. The recommended approach is to register a different action for handling specific type of uploads and call current action internally.

When uploading a real file(or using werkqeug.datastructures.FileStorage), name parameter can be omited. In this case, the name of uploaded file is used.

ckanapi action files_file_create upload@path/to/file.txt\n

When uploading a raw content of the file using string or bytes object, name is mandatory.

ckanapi action files_file_create upload@<(echo -n \"hello world\") name=file.txt\n

Requires storage with CREATE capability.

Params:

Returns:

dictionary with file details.

"},{"location":"api/#files_file_deletecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_delete(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Remove file from storage.

Unlike packages, file has no state field. Removal usually means that file details removed from DB and file itself removed from the storage.

Some storage can implement revisions of the file and keep archived versions or backups. Check storage documentation if you need to know whether there are chances that file is not completely removed with this operation.

Requires storage with REMOVE capability.

ckanapi action files_file_delete id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n

Params:

Returns:

dictionary with details of the removed file.

"},{"location":"api/#files_file_pincontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_pin(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Pin file to the current owner.

Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.

Params:

Returns:

dictionary with details of updated file

"},{"location":"api/#files_file_renamecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_rename(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Rename the file.

This action changes human-readable name of the file, which is stored in DB. Real location of the file in the storage is not modified.

ckanapi action files_file_show \\\n    id=226056e2-6f83-47c5-8bd2-102e2b82ab9a \\\n    name=new-name.txt\n

Params:

Returns:

dictionary with file details

"},{"location":"api/#files_file_scancontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_scan(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

List files of the owner

This action internally calls files_file_search, but with static values of owner filters. If owner is not specified, files filtered by current user. If owner is specified, user must pass authorization check to see files.

Params:

The all other parameters are passed as-is to files_file_search.

Returns:

"},{"location":"api/#files_file_searchcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_search(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Search files.

This action is not stabilized yet and will change in future.

Provides an ability to search files using exact filter by name, content_type, size, owner, etc. Results are paginated and returned in package_search manner, as dict with count and results items.

All columns of File model can be used as filters. Before the search, type of column and type of filter value are compared. If they are the same, original values are used in search. If type different, column value and filter value are casted to string.

This request produces size = 10 SQL expression:

ckanapi action files_file_search size:10\n

This request produces size::text = '10' SQL expression:

ckanapi action files_file_search size=10\n

Even though results are usually not changed, using correct types leads to more efficient search.

Apart from File columns, the following Owner properties can be used for searching: owner_id, owner_type, pinned.

storage_data and plugin_data are dictionaries. Filter's value for these fields used as a mask. For example, storage_data={\"a\": {\"b\": 1}} matches any File with storage_data containing item a with value that contains b=1. This works only with data represented by nested dictionaries, without other structures, like list or sets.

Experimental feature: File columns can be passed as a pair of operator and value. This feature will be replaced by strictly defined query language at some point:

ckanapi action files_file_search size:'[\"<\", 100]' content_type:'[\"like\", \"text/%\"]'\n
Fillowing operators are accepted: =, <, >, !=, like

Params:

Returns:

"},{"location":"api/#files_file_search_by_usercontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_search_by_user(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Internal action. Do not use it.

"},{"location":"api/#files_file_showcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_show(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Show file details.

This action only displays information from DB record. There is no way to get the content of the file using this action(or any other API action).

ckanapi action files_file_show id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n

Params:

Returns:

dictionary with file details

"},{"location":"api/#files_file_unpincontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_unpin(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Pin file to the current owner.

Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.

Params:

Returns:

dictionary with details of updated file

"},{"location":"api/#files_multipart_completecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_complete(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Finalize multipart upload and transform it into completed file.

Depending on storage this action may require additional parameters. But usually it just takes ID and verify that content type, size and hash provided when upload was initialized, much the actual value.

If data is valid and file is completed inside the storage, new File entry with file details created in DB and file can be used just as any normal file.

Requires storage with MULTIPART capability.

Params:

Returns:

dictionary with details of the created file

"},{"location":"api/#files_multipart_refreshcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_refresh(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Refresh details of incomplete upload.

Can be used if upload process was interrupted and client does not how many bytes were already uploaded.

Requires storage with MULTIPART capability.

Params:

Returns:

dictionary with details of the updated upload

"},{"location":"api/#files_multipart_startcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_start(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Initialize multipart(resumable,continuous,signed,etc) upload.

Apart from standard parameters, different storages can require additional data, so always check documentation of the storage before initiating multipart upload.

When upload initialized, storage usually returns details required for further upload. It may be a presigned URL for direct upload, or just an ID of upload which must be used with files_multipart_update.

Requires storage with MULTIPART capability.

Params:

Returns:

dictionary with details of initiated upload. Depends on used storage

"},{"location":"api/#files_multipart_updatecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_update(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Update incomplete upload.

Depending on storage this action may require additional parameters. Most likely, upload with the fragment of uploaded file.

Requires storage with MULTIPART capability.

Params:

Returns:

dictionary with details of the updated upload

"},{"location":"api/#files_resource_uploadcontext-context-data_dict-dictstr-any","title":"files_resource_upload(context: 'Context', data_dict: 'dict[str, Any]')","text":"

Create a new file inside resource storage.

This action internally calls files_file_create with ignore_auth=True and always uses resources storage.

New file is not attached to resource. You need to call files_transfer_ownership manually, when resource created.

Params:

Returns:

dictionary with file details.

"},{"location":"api/#files_transfer_ownershipcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_transfer_ownership(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Transfer file ownership.

Depending on storage this action may require additional parameters. Most likely, upload with the fragment of uploaded file.

Params:

Returns:

dictionary with details of updated file

"},{"location":"changelog/","title":"Changelog","text":"

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

"},{"location":"changelog/#unreleased","title":"Unreleased","text":"

Compare with latest

"},{"location":"changelog/#features","title":"Features","text":""},{"location":"changelog/#code-refactoring","title":"Code Refactoring","text":""},{"location":"changelog/#v031-2024-05-22","title":"v0.3.1 - 2024-05-22","text":"

Compare with v0.3.0

"},{"location":"changelog/#features_1","title":"Features","text":""},{"location":"changelog/#v030-2024-05-16","title":"v0.3.0 - 2024-05-16","text":"

Compare with v0.2.6

"},{"location":"changelog/#features_2","title":"Features","text":""},{"location":"changelog/#bug-fixes","title":"Bug Fixes","text":""},{"location":"changelog/#code-refactoring_1","title":"Code Refactoring","text":""},{"location":"changelog/#v006-2024-04-24","title":"v0.0.6 - 2024-04-24","text":"

Compare with v0.0.5

"},{"location":"changelog/#bug-fixes_1","title":"Bug Fixes","text":""},{"location":"changelog/#v026-2024-04-24","title":"v0.2.6 - 2024-04-24","text":"

Compare with v0.2.4

"},{"location":"changelog/#bug-fixes_2","title":"Bug Fixes","text":""},{"location":"changelog/#v024-2024-04-15","title":"v0.2.4 - 2024-04-15","text":"

Compare with v0.2.3

"},{"location":"changelog/#features_3","title":"Features","text":""},{"location":"changelog/#v023-2024-04-07","title":"v0.2.3 - 2024-04-07","text":"

Compare with v0.2.2

"},{"location":"changelog/#features_4","title":"Features","text":""},{"location":"changelog/#bug-fixes_3","title":"Bug Fixes","text":""},{"location":"changelog/#v022-2024-03-18","title":"v0.2.2 - 2024-03-18","text":"

Compare with v0.2.1

"},{"location":"changelog/#v021-2024-03-18","title":"v0.2.1 - 2024-03-18","text":"

Compare with v0.2.0

"},{"location":"changelog/#features_5","title":"Features","text":""},{"location":"changelog/#v020-2024-03-12","title":"v0.2.0 - 2024-03-12","text":"

Compare with v0.0.5

"},{"location":"changelog/#features_6","title":"Features","text":""},{"location":"changelog/#code-refactoring_2","title":"Code Refactoring","text":""},{"location":"changelog/#v005-2024-02-26","title":"v0.0.5 - 2024-02-26","text":"

Compare with v0.0.4

"},{"location":"changelog/#bug-fixes_4","title":"Bug Fixes","text":""},{"location":"changelog/#v004-2023-10-25","title":"v0.0.4 - 2023-10-25","text":"

Compare with v0.0.2

"},{"location":"changelog/#v002-2022-02-09","title":"v0.0.2 - 2022-02-09","text":"

Compare with v0.0.1

"},{"location":"changelog/#v001-2021-09-21","title":"v0.0.1 - 2021-09-21","text":"

Compare with first commit

"},{"location":"cli/","title":"CLI","text":"

ckanext-files register files entrypoint under ckan command. Commands below must be executed as ckan -c $CKAN_INI files <COMMAND>.

adapters [-v]

List all available storage adapters. With -v/--verbose flag docstring from adapter classes are printed as well.

storages [-v]

List all configured storages. With -v/--verbose flag all supported capabilities are shown.

stream FILE_ID [-o OUTPUT] [--start START] [--end END]

Stream content of the file to STDOUT. For non-textual files use output redirection stream ID > file.ext. Alternatively, output destination can be specified via -o/--output option. If it contains path to directory, inside this directory will be created file with the same name as streamed item. Otherwise, OUTPUT is used as filename.

--start and --end can be used to receive a fragment of the file. Only positive values are guaranteed to work with any storage that supports STREAM. Some storages support negative values for these options and count them from the end of file. I.e --start -10 reads last 10 bytes of file. --end -1 reads till the last byte, but the last byte is not included into output.

scan [-s default] [-u] [-t [-a OWNER_ID]]

List all files that exist in storage. Works only if storage supports SCAN. By default shows content of default storage. -s/--storage-name option changes target storage.

-u/--untracked-only flag shows only untracked files, that has no corresponding record in DB. Can be used to identify leftovers after removing data from portal.

-t/--track flag registers any untracked file by creating DB record for it. Can be used only when ANALYZE is supported. Files are created without an owner. Use -a/--adopt-by option with user ID to give ownership over new files to the specified user. Can be used when configuring a new storage connected to existing location with files.

"},{"location":"implementation-example/","title":"Example implementation of custom storage adapter","text":"

Storage consist of the storage object that dispatches operation requests and 3 services that do the actual job: Reader, Uploader and Manager. To define a custom storage, you need to extend the main storage class, describe storage logic and register storage via IFiles.files_get_storage_adapters.

Let's implement DB storage. It will store files in SQL table using SQLAlchemy. There will be just one requirement for the table: it must have column for storing unique identifier of the file and another column for storing content of the file as bytes.

For the sake of simplicity, our storage will work only with existing tables. Create the table manually before we begin.

First of all, we create an adapter that does nothing and register it in our plugin.

from __future__ import annotations\n\nfrom typing import Any\nimport sqlalchemy as sa\n\nimport ckan.plugins as p\nfrom ckan.model.types import make_uuid\nfrom ckanext.files import shared\n\n\nclass ExamplePlugin(p.SingletonPlugin):\n    p.implements(shared.IFiles)\n    def files_get_storage_adapters(self) -> dict[str, Any]:\n        return {\"example:db\": DbStorage}\n\n\nclass DbStorage(shared.Storage):\n    ...\n

After installing and enabling your custom plugin, you can configure storage with this adapter by adding a single new line to config file:

ckanext.files.storage.db.type = files:db\n

But if you check storage via ckan files storages -v, you'll see that it can't do anything.

ckan files storages -v\n\n... db: example:db\n...        Supports: Capability.NONE\n...        Does not support: Capability.REMOVE|STREAM|CREATE|...\n

Before we start uploading files, let's make sure that storage has proper configuration. As files will be stored in the DB table, we need the name of the table and DB connection string. Let's assume that table already exists, but we don't know which columns to use for files. So we need name of column for content and for file's unique identifier. ckanext-files uses term location instead of identifier, so we'll do the same in our implementation.

There are 4 required options in total: * db_url: DB connection string * table: name of the table * location_column: name of column for file's unique identifier * content_column: name of column for file's content

It's not mandatory, but is highly recommended that you declare config options for the adapter. It can be done via Storage.declare_config_options class method, which accepts declaration object and key namespace for storage options.

class DbStorage(shared.Storage):\n\n    @classmethod\n    def declare_config_options(cls, declaration, key) -> None:\n        declaration.declare(key.db_url).required()\n        declaration.declare(key.table).required()\n        declaration.declare(key.location_column).required()\n        declaration.declare(key.content_column).required()\n

And we probably want to initialize DB connection when storage is initialized. For this we'll extend constructor, which must be defined as method accepting keyword-only arguments:

class DbStorage(shared.Storage):\n    ...\n\n    def __init__(self, **settings: Any) -> None:\n        db_url = self.ensure_option(settings, \"db_url\")\n\n        self.engine = sa.create_engine(db_url)\n        self.location_column = sa.column(\n            self.ensure_option(settings, \"location_column\")\n        )\n        self.content_column = sa.column(self.ensure_option(settings, \"content_column\"))\n        self.table = sa.table(\n            self.ensure_option(settings, \"table\"),\n            self.location_column,\n            self.content_column,\n        )\n        super().__init__(**settings)\n

You can notice that we are using Storage.ensure_option quite often. This method returns the value of specified option from settings or raises an exception.

The table definition and columns are saved as storage attributes, to simplify building SQL queries in future.

Now we are going to define classes for all 3 storage services and tell storage, how to initialize these services.

There are 3 services: Reader, Uploader and Manager. Each of them initialized via corresponding storage method: make_reader, make_uploader and make_manager. And each of them accepts a single argument during creation, the storage itself.

class DbStorage(shared.Storage):\n    def make_reader(self):\n        return DbReader(self)\n\n    def make_uploader(self):\n        return DbUploader(self)\n\n    def make_manager(self):\n        return DbManager(self)\n\n\nclass DbReader(shared.Reader):\n    ...\n\n\nclass DbUploader(shared.Uploader):\n    ...\n\n\nclass DbManager(shared.Manager):\n    ...\n

Our first target is Uploader service. It's responsible for file creation. For the minimal implementation it needs upload method and capabilities attribute which tells the storage, what exactly the Uploader can do.

class DbUploader(shared.Uploader):\n    capabilities = shared.Capability.CREATE\n\n    def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -> shared.FileData:\n        ...\n

upload receives the location(name) of the uploaded file; upload object with file's content; and extras dictionary that contains any additional arguments that can be passed to uploader. We are going to ignore location and generate a unique UUID for every uploaded file instead of using user-defined filename.

The goal is to write the file into DB and return shared.FileData that contains location of the file in DB(value of location_column), size of the file in bytes, MIMEtype of the file and hash of file content.

For location we'll just use ckan.model.types.make_uuid function. Size and MIMEtype are already available as upload.size and upload.content_type.

The only problem is hash of the content. You can compute it in any way you like, but there is a simple option if you have no preferences. upload has hashing_reader method, which returns an iterable for file content. When you read file through it, content hash is automatically computed and you can get it using get_hash method of the reader.

Just make sure to read the whole file before checking the hash, because hash computed using consumed content. I.e, if you just create the hashing reader, but do not read a single byte from it, you'll receive the hash of empty string. If you read just 1 byte, you'll receive the hash of this single byte, etc.

The easiest option for you is to call reader.read() method to consume the whole file and then call reader.get_hash() to receive the hash.

Here's the final implementation of DbUploader:

class DbUploader(shared.Uploader):\n    capabilities = shared.Capability.CREATE\n\n    def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -> shared.FileData:\n        uuid = make_uuid()\n        reader = upload.hashing_reader()\n\n        values = {\n            self.storage.location_column: uuid,\n            self.storage.content_column: reader.read(),\n        }\n        stmt = sa.insert(self.storage.table, values)\n\n        result = self.storage.engine.execute(stmt)\n\n        return shared.FileData(\n            uuid,\n            upload.size,\n            upload.content_type,\n            reader.get_hash()\n        )\n

Now you can upload file into your new db storage:

ckanapi action files_file_create storage=db name=hello.txt upload@<(echo -n 'hello world')\n\n...{\n...  \"atime\": null,\n...  \"content_type\": \"text/plain\",\n...  \"ctime\": \"2024-06-17T13:48:52.121755+00:00\",\n...  \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...  \"id\": \"bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\",\n...  \"location\": \"5a4472b3-cf38-4c58-81a6-4d4acb7b170e\",\n...  \"mtime\": null,\n...  \"name\": \"hello.txt\",\n...  \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...  \"owner_type\": \"user\",\n...  \"pinned\": false,\n...  \"size\": 11,\n...  \"storage\": \"db\",\n...  \"storage_data\": {}\n...}\n

File is created, but you cannot read it just yet. Try running ckan files stream CLI command with file ID:

ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... Operation stream is not supported by db storage\n... Aborted!\n

As expected, you have to write extra code.

Streaming, reading and generating links is a responsibility of Reader service. We only need stream method for minimal implementation. This method receives shared.FileData object(the same object as the one returned from Uploader.upload) and extras containing all additional arguments passed by the caller. The result is any iterable producing bytes.

We'll use location property of shared.FileData as a value for location_column inside the table.

And don't forget to add STREAM capability to Reader.capabilities.

class DbReader(shared.Reader):\n    capabilities = shared.Capability.STREAM\n\n    def stream(self, data: shared.FileData, extras: dict[str, Any]) -> Iterable[bytes]:\n        stmt = (\n            sa.select(self.storage.content_column)\n            .select_from(self.storage.table)\n            .where(self.storage.location_column == data.location)\n        )\n        row = self.storage.engine.execute(stmt).fetchone()\n\n        return row\n

The result may be confusing: we returning Row object from the stream method. But our goal is to return any iterable that produces bytes. Row is iterable(tuple-like). And it contains only one item - value of column with file content, i.e, bytes. So it satisfy the requirements.

Now you can check content via CLI once again.

ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... hello world\n

Finally, we need to add file removal for the minimal implementation. And it also nice to to have SCAN capability, as it shows all files currently available in storage, so we add it as bonus. These operations handled by Manager. We need remove and scan methods. Arguments are already familiar to you. As for results:

class DbManager(shared.Manager):\n    storage: DbStorage\n    capabilities = shared.Capability.SCAN | shared.Capability.REMOVE\n\n    def scan(self, extras: dict[str, Any]) -> Iterable[str]:\n        stmt = sa.select(self.storage.location_column).select_from(self.storage.table)\n        for row in self.storage.engine.execute(stmt):\n            yield row[0]\n\n    def remove(\n        self,\n        data: shared.FileData | shared.MultipartData,\n        extras: dict[str, Any],\n    ) -> bool:\n        stmt = sa.delete(self.storage.table).where(\n            self.storage.location_column == data.location,\n        )\n        self.storage.engine.execute(stmt)\n        return True\n

Now you can list the all the files in storage:

ckan files scan -s db\n

And remove file using ckanaapi and file ID

ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n

That's all you need for the basic storage. But check definition of base storage and services to find details about other methods. And also check implementation of other storages for additional ideas. <

"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#requirements","title":"Requirements","text":"

Compatibility with core CKAN versions:

CKAN version Compatible? 2.9 no 2.10 yes 2.11 yes master yes

Note

It's recommended to install the extension via pip. If you are using GitHub version of the extension, stick to the vX.Y.Z tags to avoid breaking changes. Check the changelog before upgrading the extension.

"},{"location":"installation/#installation_1","title":"Installation","text":"

Install the extension

pip install ckanext-files # (1)!\n
  1. If you want to use additional adapters, like Apache-libcloud or OpenDAL, specify corresponding package extras
    pip install ckanext-files[opendal,libcloud]\n

Add files to the ckan.plugins setting in your CKAN config file.

Run DB migrations

ckan db upgrade -p files\n
"},{"location":"interfaces/","title":"Interfaces","text":""},{"location":"interfaces/#interfaces","title":"Interfaces","text":"

ckanext-files registers ckanext.files.shared.IFiles interface. As extension is actively developed, this interface may change in future. Always use inherit=True when implementing IFiles.

class IFiles(Interface):\n    \"\"\"Extension point for ckanext-files.\"\"\"\n\n    def files_get_storage_adapters(self) -> dict[str, Any]:\n        \"\"\"Return mapping of storage type to adapter class.\n\n        Example:\n        >>> def files_get_storage_adapters(self):\n        >>>     return {\n        >>>         \"my_ext:dropbox\": DropboxStorage,\n        >>>     }\n\n        \"\"\"\n\n        return {}\n\n    def files_register_owner_getters(self) -> dict[str, Callable[[str], Any]]:\n        \"\"\"Return mapping with lookup functions for owner types.\n\n        Name of the getter is the name used as `Owner.owner_type`. The getter\n        itself is a function that accepts owner ID and returns optional owner\n        entity.\n\n        Example:\n        >>> def files_register_owner_getters(self):\n        >>>     return {\"resource\": model.Resource.get}\n        \"\"\"\n        return {}\n\n    def files_file_allows(\n        self,\n        context: types.Context,\n        file: File | Multipart,\n        operation: types.FileOperation,\n    ) -> bool | None:\n        \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n        Return True/False if user allowed/not allowed. Return `None` to rely on\n        other plugins.\n\n        Default implementation relies on cascade_access config option. If owner\n        of file is included into cascade access, user can perform operation on\n        file if he can perform the same operation with file's owner.\n\n        If current owner is not affected by cascade access, user can perform\n        operation on file only if user owns the file.\n\n        Example:\n        >>> def files_file_allows(\n        >>>         self, context,\n        >>>         file: shared.File | shared.Multipart,\n        >>>         operation: shared.types.FileOperation\n        >>> ) -> bool | None:\n        >>>     if file.owner_info and file.owner_info.owner_type == \"resource\":\n        >>>         return is_authorized_boolean(\n        >>>             f\"resource_{operation}\",\n        >>>             context,\n        >>>             {\"id\": file.owner_info.id}\n        >>>         )\n        >>>\n        >>>     return None\n\n        \"\"\"\n        return None\n\n    def files_owner_allows(\n        self,\n        context: types.Context,\n        owner_type: str,\n        owner_id: str,\n        operation: types.OwnerOperation,\n    ) -> bool | None:\n        \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n        Return True/False if user allowed/not allowed. Return `None` to rely on\n        other plugins.\n\n        Example:\n        >>> def files_owner_allows(\n        >>>         self, context,\n        >>>         owner_type: str, owner_id: str,\n        >>>         operation: shared.types.OwnerOperation\n        >>> ) -> bool | None:\n        >>>     if owner_type == \"resource\" and operation == \"file_transfer\":\n        >>>         return is_authorized_boolean(\n        >>>             f\"resource_update\",\n        >>>             context,\n        >>>             {\"id\": owner_id}\n        >>>         )\n        >>>\n        >>>     return None\n\n        \"\"\"\n        return None\n
"},{"location":"primer/","title":"Welcome to MkDocs","text":"

For full documentation visit mkdocs.org{ data-preview }

Attribute Lists{ data-preview }

Some title

Some content

Some title

Some content

Open styled details Nested details!

And more content again.

theme:\nfeatures:\n- content.code.annotate # (1)!\n
  1. :man_raising_hand: I'm a code annotation! I can contain code, formatted text, images, ... basically anything that can be written in Markdown.
C
#include <stdio.h>\n\nint main(void) {\nprintf(\"Hello world!\\n\");\nreturn 0;\n}\n
C++
#include <iostream>\n\nint main(void) {\nstd::cout << \"Hello world!\" << std::endl;\nreturn 0;\n}\n
graph LR\nA[Start] --> B{Error?};\nB -->|Yes| C[Hmm...];\nC --> D[Debug];\nD --> B;\nB ---->|No| E[Yay!];
sequenceDiagram\nautonumber\nAlice->>John: Hello John, how are you?\nloop Healthcheck\nJohn->>John: Fight against hypochondria\nend\nNote right of John: Rational thoughts!\nJohn-->>Alice: Great!\nJohn->>Bob: How about you?\nBob-->>John: Jolly good!

```py title=\"IFiles\" class IFiles(Interface): \"\"\"Extension point for ckanext-files.\"\"\"

def files_get_storage_adapters(self) -> dict[str, Any]:\n    \"\"\"Return mapping of storage type to adapter class.\n\n    Example:\n    >>> def files_get_storage_adapters(self):\n    >>>     return {\n    >>>         \"my_ext:dropbox\": DropboxStorage,\n    >>>     }\n\n    \"\"\"\n\n    return {}\n\ndef files_register_owner_getters(self) -> dict[str, Callable[[str], Any]]:\n    \"\"\"Return mapping with lookup functions for owner types.\n\n    Name of the getter is the name used as `Owner.owner_type`. The getter\n    itself is a function that accepts owner ID and returns optional owner\n    entity.\n\n    Example:\n    >>> def files_register_owner_getters(self):\n    >>>     return {\"resource\": model.Resource.get}\n    \"\"\"\n    return {}\n\ndef files_file_allows(\n    self,\n    context: types.Context,\n    file: File | Multipart,\n    operation: types.FileOperation,\n) -> bool | None:\n    \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n    Return True/False if user allowed/not allowed. Return `None` to rely on\n    other plugins.\n\n    Default implementation relies on cascade_access config option. If owner\n    of file is included into cascade access, user can perform operation on\n    file if he can perform the same operation with file's owner.\n\n    If current owner is not affected by cascade access, user can perform\n    operation on file only if user owns the file.\n\n    Example:\n    >>> def files_file_allows(\n    >>>         self, context,\n    >>>         file: shared.File | shared.Multipart,\n    >>>         operation: shared.types.FileOperation\n    >>> ) -> bool | None:\n    >>>     if file.owner_info and file.owner_info.owner_type == \"resource\":\n    >>>         return is_authorized_boolean(\n    >>>             f\"resource_{operation}\",\n    >>>             context,\n    >>>             {\"id\": file.owner_info.id}\n    >>>         )\n    >>>\n    >>>     return None\n\n    \"\"\"\n    return None\n\ndef files_owner_allows(\n    self,\n    context: types.Context,\n    owner_type: str,\n    owner_id: str,\n    operation: types.OwnerOperation,\n) -> bool | None:\n    \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n    Return True/False if user allowed/not allowed. Return `None` to rely on\n    other plugins.\n\n    Example:\n    >>> def files_owner_allows(\n    >>>         self, context,\n    >>>         owner_type: str, owner_id: str,\n    >>>         operation: shared.types.OwnerOperation\n    >>> ) -> bool | None:\n    >>>     if owner_type == \"resource\" and operation == \"file_transfer\":\n    >>>         return is_authorized_boolean(\n    >>>             f\"resource_update\",\n    >>>             context,\n    >>>             {\"id\": owner_id}\n    >>>         )\n    >>>\n    >>>     return None\n\n    \"\"\"\n    return None\n\n\n\n  ```\n\n  === \"Hello\"\n\n  world\n\n  === \"bye\"\n\n  world\n
"},{"location":"shared/","title":"Shared","text":"

All public utilites are collected inside ckanext.files.shared module. Avoid using anything that is not listed there. Do not import anything from modules other than shared.

"},{"location":"shared/#get_storagename-str-none-none-storage","title":"get_storage(name: 'str | None' = None) -> 'Storage'","text":"

Return existing storage instance.

Storages are initialized when plugin is loaded. As result, this function always returns the same storage object for the given name.

If no name specified, default storage is returned.

Example:

default_storage = get_storage()\nstorage = get_storage(\"storage name\")\n

"},{"location":"shared/#make_storagename-str-settings-dictstr-any-storage","title":"make_storage(name: 'str', settings: 'dict[str, Any]') -> 'Storage'","text":"

Initialize storage instance with specified settings.

Storage adapter is defined by type key of the settings. All other settings depend on the specific adapter.

Example:

storage = make_storage(\"memo\", {\"type\": \"files:redis\"})\n

"},{"location":"shared/#make_uploadvalue-typesuploadable-upload-upload","title":"make_upload(value: 'types.Uploadable | Upload') -> 'Upload'","text":"

Convert value into Upload object

Use this function for simple and reliable initialization of Upload object. Avoid creating Upload manually, unless you are 100% sure you can provide correct MIMEtype, size and stream.

Example:

storage.upload(\"file.txt\", make_upload(b\"hello world\"))\n

"},{"location":"shared/#with_task_queuefunc-any-name-str-none-none","title":"with_task_queue(func: 'Any', name: 'str | None' = None)","text":"

Decorator for functions that schedule tasks.

Decorated function automatically initializes separate task queue that is processed when function is finished. All tasks receive function's result as execution data(first argument to Task.run).

Without this decorator, you have to manually create task queue context before queuing tasks.

Example:

@with_task_queue\ndef my_action(context, data_dict):\n    ...\n

"},{"location":"shared/#add_tasktask-task","title":"add_task(task: 'Task')","text":"

Add task to the current task queue.

This function can be called only inside task queue context. Such context initialized automatically inside functions decorated with with_task_queue:

@with_task_queue\ndef taks_producer():\n    add_task(...)\n\ntask_producer()\n

If task queue context can be initialized manually using TaskQueue and with statement:

queue = TaskQueue()\nwith queue:\n    add_task(...)\n\nqueue.process(execution_data)\n

"},{"location":"upload-strategies/","title":"File upload strategies","text":"

There is no \"right\" way to add file to entity via ckanext-files. Everything depends on your use-case and here you can find a few different ways to combine file and arbitrary entity.

"},{"location":"upload-strategies/#attach-existing-file-and-then-transfer-ownership-via-api","title":"Attach existing file and then transfer ownership via API","text":"

The simplest option is just saving file ID inside a field of the entity. It's recommended to transfer file ownership to the entity and pin the file.

ckanapi action package_patch id=PACKAGE_ID attachment_id=FILE_ID\n\nckanapi action files_transfer_ownership id=FILE_ID \\\n    owner_type=package owner_id=PACKAGE_ID pin=true\n

Pros: * simple and transparent

Cons: * it's easy to forget about ownership transfer and leave the entity with the inaccessible file * after entity got reference to file and before ownership is transfered data may be considered invalid.

"},{"location":"upload-strategies/#automatically-transfer-ownership-using-validator","title":"Automatically transfer ownership using validator","text":"

Add files_transfer_ownership(owner_type) to the validation schema of entity. When it validated, ownership transfer task is queued and file automatically transfered to the entity after the update.

Pros: * minimal amount of changes if metadata schema already modified * relationships between owner and file are up-to-date after any modification

Cons: * works only with files uploaded in advance and cannot handle native implementation of resource form

"},{"location":"upload-strategies/#upload-file-and-assign-owner-via-queued-task","title":"Upload file and assign owner via queued task","text":"

Add a field that accepts uploaded file. The action itself does not process the upload. Instead create a validator for the upload field, that will schedule a task for file upload and ownership transfer.

In this way, if action is failed, no upload happens and you don't need to do anything with the file, as it never left server's temporal directory. If action finished without an error, the task is executed and file uploaded/attached to action result.

Pros: * can be used together with native group/user/resource form after small modification of CKAN core. * handles upload inside other action as an atomic operation

Cons: * you have to validate file before upload happens to prevent situation when action finished successfully but then upload failed because of file's content type or size. * tasks themselves are experimental and it's not recommended to put a lot of logic into them * there are just too many things that can go wrong

"},{"location":"upload-strategies/#add-a-new-action-that-combines-uploads-modifications-and-ownership-transfer","title":"Add a new action that combines uploads, modifications and ownership transfer","text":"

If you want to add attachmen to dataset, create a separate action that accepts dataset ID and uploaded file. Internally it will upload the file by calling files_file_create, then update dataset via packaage_patch and finally transfer ownership via files_transfer_ownership.

Pros: * no magic. Everything is described in the new action * can be extracted into shared extension and used across multiple portals

Cons: * if you need to upload multiple files and update multipe fields, action quickly becomes too compicated. * integration with existing workflows, like dataset/resource creation is hard. You have to override existing views or create a brand new ones.

"},{"location":"validators/","title":"Validators","text":"Validator Effect files_into_upload Transform value of field(usually file uploaded via <input type=\"file\">) into upload object using ckanext.files.shared.make_upload files_parse_filesize Convert human-readable filesize(1B, 10MiB, 20GB) into an integer files_ensure_name(name_field) If name_field is empty, copy into it filename from current field. Current field must be processed with files_into_upload first files_file_id_exists Verify that file ID exists files_accept_file_with_type(*type) Verify that file ID refers to file with one of specified types. As a type can be used full MIMEtype(image/png), or just its main(image) or secondary(png) part files_accept_file_with_storage(*storage_name) Verify that file ID refers to file stored inside one of specified storages files_transfer_ownership(owner_type, name_of_owner_id_field) Transfer ownership for file ID to specified entity when current API action is successfully finished"},{"location":"configuration/","title":"Configuration","text":"

There are two types of config options for ckanext-files:

Depending on the type of the storage, available options are quite different. For example, files:fs storage type requires path option that controls filesystem path where uploads are stored. files:redis storage type accepts prefix option that defines Redis' key prefix of files stored in Redis. All storage specific options always have form ckanext.files.storage.<STORAGE>.<OPTION>:

ckanext.files.storage.memory.prefix = xxx:\n# or\nckanext.files.storage.my_drive.path = /tmp/hello\n
"},{"location":"configuration/fs/","title":"Filesystem storage configuration","text":"

Private filesystem storage

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n

Public filesystem storage

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:public_fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n## URL of the storage folder. `public_root + location` must produce a public URL\nckanext.files.storage.NAME.public_root =\n
"},{"location":"configuration/global/","title":"Global configuration","text":"
# Default storage used for upload when no explicit storage specified\n# (optional, default: default)\nckanext.files.default_storage = default\n\n# MIMEtypes that can be served without content-disposition:attachment header.\n# (optional, default: application/pdf image video)\nckanext.files.inline_content_types = application/pdf image video\n\n# Storage used for user image uploads. When empty, user image uploads are not\n# allowed.\n# (optional, default: user_images)\nckanext.files.user_images_storage = user_images\n\n# Storage used for group image uploads. When empty, group image uploads are\n# not allowed.\n# (optional, default: group_images)\nckanext.files.group_images_storage = group_images\n\n# Storage used for resource uploads. When empty, resource uploads are not\n# allowed.\n# (optional, default: resources)\nckanext.files.resources_storage = resources\n\n# Enable HTML templates and JS modules required for unsafe default\n# implementation of resource uploads via files. IMPORTANT: this option exists\n# to simplify migration and experiments with the extension. These templates\n# may change a lot or even get removed in the public release of the\n# extension.\n# (optional, default: false)\nckanext.files.enable_resource_migration_template_patch = false\n\n# Any authenticated user can upload files.\n# (optional, default: false)\nckanext.files.authenticated_uploads.allow = false\n\n# Names of storages that can by used by non-sysadmin users when authenticated\n# uploads enabled\n# (optional, default: default)\nckanext.files.authenticated_uploads.storages = default\n\n# List of owner types that grant access on owned file to anyone who has\n# access to the owner of file. For example, if this option has value\n# `resource package`, anyone who passes `resource_show` auth, can see all\n# files owned by resource; anyone who passes `package_show`, can see all\n# files owned by package; anyone who passes\n# `package_update`/`resource_update` can modify files owned by\n# package/resource; anyone who passes `package_delete`/`resource_delete` can\n# delete files owned by package/resoure. IMPORTANT: Do not add `user` to this\n# list. Files may be temporarily owned by user during resource creation.\n# Using cascade access rules with `user` exposes such temporal files to\n# anyone who can read user's profile.\n# (optional, default: package resource group organization)\nckanext.files.owner.cascade_access = package resource group organization\n\n# Use `<OWNER_TYPE>_update` auth function to check access for ownership\n# transfer. When this flag is disabled `<OWNER_TYPE>_file_transfer` auth\n# function is used.\n# (optional, default: true)\nckanext.files.owner.transfer_as_update = true\n\n# Use `<OWNER_TYPE>_update` auth function to check access when listing all\n# files of the owner. When this flag is disabled `<OWNER_TYPE>_file_scan`\n# auth function is used.\n# (optional, default: true)\nckanext.files.owner.scan_as_update = true\n
"},{"location":"configuration/libcloud/","title":"Apache libcloud storage configuration","text":"

To use this storage install extension with libcloud extras.

pip install 'ckanext-files[libcloud]'\n

The actual storage backend is controlled by provider option of the storage. List of all providers is available here

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:libcloud\n## apache-libcloud storage provider. List of providers available at https://libcloud.readthedocs.io/en/stable/storage/supported_providers.html#provider-matrix . Use upper-cased value from Provider Constant column\nckanext.files.storage.NAME.provider =\n## API key or username\nckanext.files.storage.NAME.key =\n## Secret password\nckanext.files.storage.NAME.secret =\n## JSON object with additional parameters passed directly to storage constructor.\nckanext.files.storage.NAME.params =\n## Name of the container(bucket)\nckanext.files.storage.NAME.container =\n
"},{"location":"configuration/opendal/","title":"OpenDAL storage configuration","text":"

To use this storage install extension with opendal extras.

pip install 'ckanext-files[opendal]'\n

The actual storage backend is controlled by scheme option of the storage. List of all schemes is available here

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:opendal\n## OpenDAL service type. Check available services at  https://docs.rs/opendal/latest/opendal/services/index.html\nckanext.files.storage.NAME.scheme =\n## JSON object with parameters passed directly to OpenDAL operator.\nckanext.files.storage.NAME.params =\n
"},{"location":"configuration/redis/","title":"Redis storage configuration","text":"
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.NAME.prefix = ckanext:files:default:file_content:\n
"},{"location":"configuration/storage/","title":"Storage configuration","text":"

All available options for the storage type can be checked via config declarations CLI. First, add the storage type to the config file:

ckanext.files.storage.xxx.type = files:redis\n

Now run the command that shows all available config option of the plugin.

ckan config declaration files -d\n

Because Redis storage adapter is enabled, you'll see all the options regsitered by Redis adapter alongside with the global options:

## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.xxx.prefix = ckanext:files:default:file_content:\n

Sometimes you will see a validation error if storage has required config options. Let's try using files:fs storage instead of the redis:

ckanext.files.storage.xxx.type = files:fs\n

Now any attempt to run ckan config declaration files -d will show an error, because required path option is missing:

Invalid configuration values provided:\nckanext.files.storage.xxx.path: Missing value\nAborted!\n

Add the required option to satisfy the application

ckanext.files.storage.xxx.type = files:fs\nckanext.files.storage.xxx.path = /tmp\n

And run CLI command once again. This time you'll see the list of allowed options:

## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.xxx.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.xxx.create_path = false\n

There is a number of options that are supported by every storage. You can set them and expect that every storage, regardless of type, will use these options in the same way:

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = ADAPTER\n## The maximum size of a single upload.\n## Supports size suffixes: 42B, 2M, 24KiB, 1GB. `0` means no restrictions.\nckanext.files.storage.NAME.max_size = 0\n## Space-separated list of MIME types or just type or subtype part.\n## Example: text/csv pdf application video jpeg\nckanext.files.storage.NAME.supported_types =\n## Descriptive name of the storage used for debugging. When empty, name from\n## the config option is used, i.e: `ckanext.files.storage.DEFAULT_NAME...`\nckanext.files.storage.NAME.name = NAME\n
"},{"location":"migration/","title":"Migration from native CKAN storage system","text":"

Important: ckanext-files itself is an independent file-management system. You don't have to migrate existing files from groups, users and resources to it. You can just start using ckanext-files for new fields defined in metadata schema or for uploading arbitrary files. And continue using native CKAN uploads for group/user images and resource files. Migration workflows described here merely exist as a PoC of using ckanext-files for everything in CKAN. Don't migrate your production instances yet, because concepts and rules may change in future and migration process will change as well. Try migration only as an experiment, that gives you an idea of what else you want to see in ckanext-file, and share this idea with us.

Note: every migration workflow described below requires installed ckanext-files. Complete installation section before going further.

CKAN has following types of files:

At the moment, there is no migration strategy for the last two types. Replacing site logo manually is a trivial task, so there will be no dedicated command for it. As for extensions, every of them is unique, so feel free to create an issue in the current repository: we'll consider creation of migration script for your scenario or, at least, explain how you can perform migration by yourself.

Migration process for group/organization/user images and resource uploads described below. Keep in mind, that this process only describes migration from native CKAN storage system, that keeps files inside local filesystem. If you are using storage extensions, like ckanext-s3filestore or ckanext-cloudstorage, create an issue in the current repository with a request of migration command. As there are a lot of different forks of such extension, creating reliable migration script may be challenging, so we need some details about your environment to help with migration.

Migration workflows bellow require certain changes to metadata schemas, UI widgets for file uploads and styles of your portal(depending on the customization).

"},{"location":"migration/group/","title":"Migration for group/organization images","text":"

Note: internally, groups and organizations are the same entity, so this workflow describes both of them.

First of all, you need a configured storage that supports public links. As all group/organization images are stored inside local filesystem, you can use files:public_fs storage adapter.

This extension expects that the name of group images storage will be group_images. This name will be used in all other commands of this migration workflow. If you want to use different name for group images storage, override ckanext.files.group_images_storage config option which has default value group_images and don't forget to adapt commands if you use a different name for the storage.

This configuration example sets 10MiB restriction on upload size via ckanext.files.storage.group_images.max_size option. Feel free to change it or remove completely to allow any upload size. This restriction is applied to future uploads only. Any existing file that exceeds limit is kept.

Uploads restricted to image/* MIMEtype via ckanext.files.storage.group_images.supported_types option. You can make this option more or less restrictive. This restriction is applied to future uploads only. Any existing file with wrong MIMEtype is kept.

ckanext.files.storage.group_images.path controls location of the upload folder in filesystem. It should match value of ckan.storage_path option plus storage/uploads/group. In example below we assume that value of ckan.storage_path is /var/storage/ckan.

ckanext.files.storage.group_images.public_root option specifies base URL from which every group image can be accessed. In most cases it's CKAN URL plus uploads/group. If you are serving CKAN application from the ckan.site_url, leave this option unchanged. If you are using ckan.root_path, like /data/, insert this root path into the value of the option. Example below uses %(ckan.site_url)s wildcard, which will be automatically replaced with the value of ckan.site_url config option. You can specify site URL explicitely if you don't like this wildcard syntax.

ckanext.files.storage.group_images.type = files:public_fs\nckanext.files.storage.group_images.max_size = 10MiB\nckanext.files.storage.group_images.supported_types = image\nckanext.files.storage.group_images.path = /var/storage/ckan/storage/uploads/group\nckanext.files.storage.group_images.public_root = %(ckan.site_url)s/uploads/group\n

Now let's run a command that show us the list of files available under newly configured storage:

ckan files scan -s group_images\n

All these files are not tracked by files extension yet, i.e they don't have corresponding record in DB with base details, like size, MIMEtype, filehash, etc. Let's create these details via the command below. It's safe to run this command multiple times: it will gather and store information about files not registered in system and ignore any previously registered file.

ckan files scan -s group_images -t\n

Finally, let's run the command, that shows only untracked files. Ideally, you'll see nothing upon executing it, because you just registered every file in the system.

ckan files scan -s group_images -u\n

Note, all the file are still available inside storage directory. If previous command shows nothing, it only means that CKAN already knows details about each file from the storage directory. If you want to see the list of the files again, omit -u flag(which stands for \"untracked\") and you'll see again all the files in the command output:

ckan files scan -s group_images\n

Now, when all images are tracked by the system, we can give the ownership over these files to groups/organizations that are using them. Run the command below to connect files with their owners. It will search for groups/organizations first and report, how many connections were identified. There will be suggestion to show identified relationship and the list of files that have no owner(if there are such files). Presence of files without owner usually means that you removed group/organization from database, but did not remove its image.

Finally, you'll be asked if you want to transfer ownership over files. This operation does not change existing data and if you disable ckanext-files after ownership transfer, you won't see any difference. The whole ownership transfer is managed inside custom DB tables generated by ckanext-files, so it's safe operation.

ckan files migrate groups group_images\n

Here's an example of output that you can see when running the command:

Found 3 files. Searching file owners...\n[####################################] 100% Located owners for 2 files out of 3.\n\nShow group IDs and corresponding file? [y/N]: y\nd7186937-3080-429f-a434-22b74b9a8d39: file-1.png\n87e2a1aa-7905-4a28-a087-90433f8e169e: file-2.png\n\nShow files that do not belong to any group? [y/N]: y\nfile-3.png\n\nTransfer file ownership to group identified in previous steps? [y/N]: y\nTransfering file-2.png  [####################################]  100%\n

Now comes the most complex part. You need to change metadata schema and UI in order to:

Original CKAN workflow for uploading files was:

This approach is different from strategy recommended by ckanext-files. But in order to make the migration as simple as possible, we'll stay close to original workflow.

Note: suggestet approach resembles existing process of file uploads in CKAN. But ckanext-files was designed as a system, that gives you a choice. Check file upload strategies to learn more about alternative implementations of upload and their pros/cons.

First, we need to replace Upload/Link widget on group/organization form. If you are using native group templates, create group/snippets/group_form.html and organization/snippets/organization_form.html. Inside both files, extend original template and override block basic_fields. You only need to replace last field

{{ form.image_upload(\n    data, errors, is_upload_enabled=h.uploads_enabled(),\n    is_url=is_url, is_upload=is_upload) }}\n

with

{{ form.image_upload(\n    data, errors, is_upload_enabled=h.files_group_images_storage_is_configured(),\n    is_url=is_url, is_upload=is_upload,\n    field_upload=\"files_image_upload\") }}\n

There are two differences with the original. First, we use h.files_group_images_storage_is_configured() instead of h.uploads_enabled(). As we are using different storage for different upload types, now upload widgets can be enabled independently. And second, we pass field_upload=\"files_image_upload\" argument into macro. It will send uploaded file to CKAN inside files_image_upload instead of original image_upload field. This must be done because CKAN unconditionally strips image_upload field from submission payload, making processing of the file too unreliable. We changed the name of upload field and CKAN keeps this new field, so that we can process it as we wish.

Note: if you are using ckanext-scheming, you only need to replace form_snippet of the image_url field, instead of rewriting the whole template.

Now, let's define validation rules for this new upload field. We need to create plugins that modify validation schema for group and organization. Due to CKAN implementation details, you need separate plugin for group and organization.

Note: if you are using ckanext-scheming, you can add files_image_upload validators to schemas of organization and group. Check the list of validators that must be applied to this new field below.

Here's an example of plugins that modify validation schemas of group and organization. As you can see, they are mostly the same:

from ckan.lib.plugins import DefaultGroupForm, DefaultOrganizationForm\nfrom ckan.logic.schema import default_create_group_schema, default_update_group_schema\n\n\ndef _modify_schema(schema, type):\n    schema[\"files_image_upload\"] = [\n        tk.get_validator(\"ignore_empty\"),\n        tk.get_validator(\"files_into_upload\"),\n        tk.get_validator(\"files_validate_with_storage\")(\"group_images\"),\n        tk.get_validator(\"files_upload_as\")(\n            \"group_images\",\n            type,\n            \"id\",\n            \"public_url\",\n            type + \"_patch\",\n            \"image_url\",\n        ),\n    ]\n\n\nclass FilesGroupPlugin(p.SingletonPlugin, DefaultGroupForm):\n    p.implements(p.IGroupForm, inherit=True)\n    is_organization = False\n\n    def group_types(self):\n        return [\"group\"]\n\n    def create_group_schema(self):\n        return _modify_schema(default_create_group_schema(), \"group\")\n\n    def update_group_schema(self):\n        return _modify_schema(default_update_group_schema(), \"group\")\n\n\nclass FilesOrganizationPlugin(p.SingletonPlugin, DefaultOrganizationForm):\n    p.implements(p.IGroupForm, inherit=True)\n    is_organization = True\n\n    def group_types(self):\n        return [\"organization\"]\n\n    def create_group_schema(self):\n        return _modify_schema(default_create_group_schema(), \"organization\")\n\n    def update_group_schema(self):\n        return _modify_schema(default_update_group_schema(), \"organization\")\n

There are 4 validators that must be applied to the new upload field:

That's all. Now every image upload for group/organization is handled by ckanext-files. To verify it, do the following. First, check list of files currently stored in group_images storage via command that we used in the beginning of the migration:

ckan files scan -s group_images\n

You'll see a list of existing files. Their names follow format <ISO_8601_DATETIME><FILENAME>, e.g 2024-06-14-133840.539670photo.jpg.

Now upload an image into existing group, or create a new group with any image. When you check list of files again, you'll see one new record. But this time this record resembles UUID: da046887-e76c-4a68-97cf-7477665710ff.

"},{"location":"migration/resource/","title":"Resource","text":""},{"location":"migration/resource/#migration-for-resource-uploads","title":"Migration for resource uploads","text":"

Configure named storage for resources. Use files:ckan_resource_fs storage adapter.

This extension expects that the name of resources storage will be resources. This name will be used in all other commands of this migration workflow. If you want to use different name for resources storage, override ckanext.files.resources_storage config option which has default value resources and don't forget to adapt commands if you use a different name for the storage.

ckanext.files.storage.resources.path must match value of ckan.storage_path option, followed by resources directory. In example below we assume that value of ckan.storage_path is /var/storage/ckan.

Example below sets 10MiB limit on resource size. Modify it if you are using different limit set by ckan.max_resource_size.

Unlike group and user images, this storage does not need upload type restriction and public_root.

ckanext.files.storage.resources.type = files:ckan_resource_fs\nckanext.files.storage.resources.max_size = 10MiB\nckanext.files.storage.resources.path = /var/storage/ckan/resources\n

Check the list of untracked files available inside newly configured storage:

ckan files scan -s resources -u\n

Track all these files:

ckan files scan -s resources -t\n

Re-check that now you see no untracked files:

ckan files scan -s resources -u\n

Transfer file ownership to corresponding resources. In addition to simple ownership transfer, this command will ask you, whether you want to modify resource's url_type and url fields. It's required to move file management to files extension completely and enable possibility of migration to different storage type.

If you accept resource modifications, for every file owner url_type will be changed to file and url will be changed to file ID. Then all modified packages will be reindexed.

Changing url_type means that some pages will change. For example, instead of Download button CKAN will show you Go to resource button on the resource page, because Download label is specific to url_type=upload. And some views may stop working as well. But this is safer option for migration, than leaving url_type unchanged: ckanext-files manages files in its own way and some assumptions about files will not work anymore, so using different url_type is the fastest way to tell everyone that something changed.

Broken views can be easily fixed. Every view implemented as a separate plugin. You always can inherit from this plugin and override methods that relied on different behavior. And a lot of views work with file URL directly, so they won't even see the difference.

ckan files migrate local-resources resources\n

And the next goal is correct metadata schema. If you are using ckanext-scheming, you need to modify validators of url and format fields.

If you are working with native schemas, you have to modify dataset schema via implementing IDatasetForm. Here's an example:

from ckan.lib.plugins import DefaultDatasetForm\nfrom ckan.logic import schema\n\nclass FilesDatasetPlugin(p.SingletonPlugin, DefaultDatasetForm):\n    p.implements(p.IDatasetForm, inherit=True)\n\n    def is_fallback(self):\n        return True\n\n    def package_types(self):\n        return [\"dataset\"]\n\n    def _modify_schema(self, schema):\n        schema[\"resources\"][\"url\"].extend([\n            tk.get_validator(\"files_verify_url_type_and_value\"),\n            tk.get_validator(\"files_file_id_exists\"),\n            tk.get_validator(\"files_transfer_ownership\")(\"resource\",\"id\"),\n        ])\n        schema[\"resources\"][\"format\"].insert(0, tk.get_validator(\"files_content_type_from_file\")(\"url\"))\n\n    def create_package_schema(self):\n        sch = schema.default_create_package_schema()\n        self._modify_schema(sch)\n        return sch\n\n    def update_package_schema(self):\n        sch = schema.default_update_package_schema()\n        self._modify_schema(sch)\n        return sch\n\n    def show_package_schema(self):\n        sch = schema.default_show_package_schema()\n        sch[\"resources\"][\"url\"].extend([\n            tk.get_validator(\"files_verify_url_type_and_value\"),\n            tk.get_validator(\"files_id_into_resource_download_url\"),\n        ])\n        return sch\n

Both create and update schemas are updated in the same way. We add a new validator to format field, to correctly identify file format. And there is a number of new validators for url:

At top of this, we also have two validators applied to show_package_schema(use output_validators in ckanext-scheming):

And the next part is the trickiest. You need to create a number of templates and JS modules. But because ckanext-files is actively developed, most likely, your custom files will be outdated pretty soon.

Instead, we recommend enabling patch for resource form that shipped with ckanext-files. It's a bit hacky, but because the extension itself is stil in alpha-stage, it should be acceptable. Check file upload strategies for examples of implementation that you can add to your portal instead of the default patch.

To enable patch for templates, add following line to the config file:

ckanext.files.enable_resource_migration_template_patch = true\n

This option adds Add file button to resource form

Upon clicking, this button is replaced by widget that supports uploading new files of selecting previously uploaded files that are not used by any resource yet

"},{"location":"migration/user/","title":"Migration for user avatars","text":"

This workflow is similar to group/organization migration. It contains the sequence of actions, but explanations are removed, because you already know details from the group migration. Only steps that are different will contain detailed explanation of the process.

Configure local filesystem storage with support of public links(files:public_fs) for user images.

This extension expects that the name of user images storage will be user_images. This name will be used in all other commands of this migration workflow. If you want to use different name for user images storage, override ckanext.files.user_images_storage config option which has default value user_images and don't forget to adapt commands if you use a different name for the storage.

ckanext.files.storage.user_images.path resembles this option for group/organization images storage. But user images are kept inside user folder by default. As result, value of this option should match value of ckan.storage_path option plus storage/uploads/user. In example below we assume that value of ckan.storage_path is /var/storage/ckan.

ckanext.files.storage.user_images.public_root resebles this option for group/organization images storage. But user images are available at CKAN URL plus uploads/user.

ckanext.files.storage.user_images.type = files:public_fs\nckanext.files.storage.user_images.max_size = 10MiB\nckanext.files.storage.user_images.supported_types = image\nckanext.files.storage.user_images.path = /var/storage/ckan/storage/uploads/user\nckanext.files.storage.user_images.public_root = %(ckan.site_url)s/uploads/user\n

Check the list of untracked files available inside newly configured storage:

ckan files scan -s user_images -u\n

Track all these files:

ckan files scan -s user_images -t\n

Re-check that now you see no untracked files:

ckan files scan -s user_images -u\n

Transfer image ownership to corresponding users:

ckan files migrate users user_images\n

Update user template. Required field is defined in user/new_user_form.html and user/edit_user_form.html. It's a bit different from the filed used by group/organization, but you again need to add field_upload=\"files_image_upload\" parameter to the macro image_upload and replace h.uploads_enabled() with h.files_user_images_storage_is_configured().

User has no dedicated interface for validation schema modification and here comes the biggest difference from group migration. You need to chain user_create and user_update action and modify schema from context:

def _patch_schema(schema):\n    schema[\"files_image_upload\"] = [\n        tk.get_validator(\"ignore_empty\"),\n        tk.get_validator(\"files_into_upload\"),\n        tk.get_validator(\"files_validate_with_storage\")(\"user_images\"),\n        tk.get_validator(\"files_upload_as\")(\n            \"user_images\",\n            \"user\",\n            \"id\",\n            \"public_url\",\n            \"user_patch\",\n            \"image_url\",\n        ),\n    ]\n\n\n@tk.chained_action\ndef user_update(next_action, context, data_dict):\n    schema = context.setdefault('schema', ckan.logic.schema.default_update_user_schema())\n    _patch_schema(schema)\n    return next_action(context, data_dict)\n\n\n\n@tk.chained_action\ndef user_create(next_action, context, data_dict):\n    schema = context.setdefault('schema', ckan.logic.schema.default_user_schema())\n    _patch_schema(schema)\n    return next_action(context, data_dict)\n

Validators are all the same, but now we are using user instead of group/organization in parameters.

That's all. Just as with groups, you can update an avatar and verify that all new filenames resemble UUIDs.

"},{"location":"usage/capabilities/","title":"Capabilities","text":"

To understand in advance whether specific storage can perform certain actions, ckanext-files uses ckanext.files.shared.Capability. It's an enumeration of operations that can be supported by storage:

These capabilities are defined when storage is created and are automatically checked by actions that work with storage. If you want to check if storage supports certain capability, it can be done manually. If you want to check presence of multiple capabilities at once, you can combine them via bitwise-or operator.

from ckanext.files.shared import Capability, get_storage\n\nstorage = get_storage()\n\ncan_read = storage.supports(Capability.STREAM)\n\nread_and_write = Capability.CREATE | Capability.STREAM\ncan_read_and_write = storage.supports(read_and_write)\n

ckan files storages -v CLI command lists all configured storages with their capabilities.

"},{"location":"usage/configure/","title":"Configure the storage","text":"

Before uploading files, you have to configure a storage: place where all uploaded files are stored. Storage relies on adapter that describes where and how data is be stored: filesystem, cloud, DB, etc. And, depending on the adapter, storage may have a couple of addition specific options. For example, filesystem adapter likely requires a path to the folder where uploads are stored. DB adapter may need DB connection parameters. Cloud adapter most likely will not work without an API key. These additional options are specific to adapter and you have to check its documentation to find out what are the possible options.

Let's start from the Redis adapter, because it has minimal requirements in terms of configuration.

Add the following line to the CKAN config file:

ckanext.files.storage.default.type = files:redis\n

The name of adapter is files:redis. It follows recommended naming convention for adapters:<EXTENSION>:<TYPE>. You can tell from the name above that we are using adapter defined in the files extension with redis type. But this naming convention is not enforced and its only purpose is avoiding name conflicts. Technically, adapter name can use any character, including spaces, newlines and emoji.

If you make a typo in the adapter's name, any CKAN CLI command will produce an error message with the list of available adapters:

Invalid configuration values provided:\nckanext.files.storage.default.type: Value must be one of ['files:fs', 'files:public_fs', 'files:redis']\nAborted!\n

Storage is configured, so we can actually upload the file. Let's use ckanapi for this task. Files are created via files_file_create API action and this time we have to pass 2 parameters into it:

The final command is here:

echo -n 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n

And that's what you see as result:

{\n  \"atime\": null,\n  \"content_type\": \"text/plain\",\n  \"ctime\": \"2024-06-02T15:02:14.819117+00:00\",\n  \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n  \"id\": \"e21162ab-abfb-476c-b8c5-5fe7cb89eca0\",\n  \"location\": \"24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\",\n  \"mtime\": null,\n  \"name\": \"hello.txt\",\n  \"size\": 11,\n  \"storage\": \"default\",\n  \"storage_data\": {}\n}\n

Content of the file can be checked via CKAN CLI. Use id from the last API call's output in the command ckan files stream ID:

ckan files stream e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n

Alternatively, we can use Redis CLI to get the content of the file. Note, you cannot get the content via CKAN API, because it's JSON-based and streaming files doesn't suit its principles.

By default, Redis adapter puts the content under the key <PREFIX><LOCATION>. Pay attention to LOCATION. It's the value available as location in the API response(i.e, 24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46 in our case). It's different from the id(ID used by DB to uniquely identify file record) and name(human readable name of the file). In our scenario, location looks like UUID because of the internal details of Redis adapter implementation. But different adapters may use more path-like value, i.e. something similar to path/to/folder/hello.txt.

PREFIX can be configured, but we skipped this step and got the default value: ckanext:files:default:file_content:. So the final Redis key of our file is ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46

redis-cli\n\n127.0.0.1:6379> GET ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\n\"hello world\"\n

And before we moved further, let's remove the file, using its id:

ckanapi action files_file_delete id=e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n
"},{"location":"usage/js/","title":"JavaScript utilities","text":"

Note: ckanext-files does not provide stable CKAN JS modules at the moment. Try creating your own widgets and share with us your examples or requirements. We'll consider creating and including widgets into ckanext-files if they are generic enough for majority of the users.

ckanext-files registers few utilities inside CKAN JS namespace to help with building UI components.

First group of utilities registered inside CKAN Sandbox. Inside CKAN JS modules it's accessible as this.sandbox. If you are writing code outside of JS modules, Sandbox can be initialized via call to ckan.sandbox()

const sandbox = ckan.sandbox()\n

When files plugin loaded, sandbox contains files attribute with two members:

The simplest way to upload the file is using upload helper.

await sandbox.files.upload(\n    new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n)\n

This function uploads file to default storage via files_file_create action. Extra parameters for API call can be passed using second argument of upload helper. Use an object with requestParams key. Value of this key will be added to standard API request parameters. For example, if you want to use storage with name memory and field with value custom:

await sandbox.files.upload(\n    new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n    {requestParams: {storage: \"memory\", field: \"custom\"}}\n)\n

If you need more control over upload, you can create an uploader and interact with it directly, instead of using upload helper.

Uploader is an object that uploads file to server. It extends base uploader, which defines standard interface for this object. Uploader perfroms all the API calls internally and returns uploaded file details. Out of the box you can use Standard and Multipart uploaders. Standard uses files_file_create API action and specializes on normal uploads. Multipart relies on files_multipart_* actions and can be used to pause and continue upload.

To create uploader instance, pass its name as a string to makeUploader. And then you can call upload method of the uploader to perform the actual upload. This method requires two arguments:

const uploader = sandbox.files.makeUploader(\"Standard\")\nawait uploader.upload(new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}), {})\n

One of the reasons to use manually created uploader is progress tracking. Uploader supports event subscriptions via uploader.addEventListener(event, callback) and here's the list of possible upload events:

If you want to use upload helper with customized uploader, there are two ways to do it.

The second group of ckanext-files utilities is available as ckan.CKANEXT_FILES object. This object mainly serves as extension and configuration point for sandbox.files.

ckan.CKANEXT_FILES.adapters is a collection of all classes that can be used to initialize uploader. It contains Standard, Multipart and Base classes. Standard and Multipart can be used as is, while Base must be extended by your custom uploader class. Add your custom uploader classes to adapters, to make them available application-wide:

class MyUploader extends Base { ... }\n\nckan.CKANEXT_FILES.adapters[\"My\"] = MyUploader;\n\nawait sandbox.files.upload(new File(...), {adapter: \"My\"})\n

ckan.CKANEXT_FILES.defaultSettings contain the object with default settings available as this.settings inside any uploader. You can change the name of the storage used by all uploaders using this object. Note, changes will apply only to uploaders initialized after modification.

ckan.CKANEXT_FILES.defaultSettings.storage = \"memory\"\n
"},{"location":"usage/multi-storage/","title":"Multi-storage","text":"

It's possible to configure multiple storages at once and specify which one you want to use for the individual file upload. Up until now we used the following storage options:

All of them have a common prefix ckanext.files.storage.default. and it's a key for using multiple storages simultaneously.

Every option of the storage follows the pattern: ckanext.files.storage.<STORAGE_NAME>.<OPTION>. As all the options above contain default on position of <STORAGE_NAME>, they are related to the default storage.

If you want to configure a storage with the name custom change the configuration of storage:

ckanext.files.storage.custom.type = files:fs\nckanext.files.storage.custom.path = /tmp/example\nckanext.files.storage.custom.create_path = true\n

And, if you want to use Redis-based storage named memory and filesystem-based storage named default, use the following configuration:

ckanext.files.storage.memory.type = files:redis\n\nckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n

The default storage is special. ckanext-files use it by default, as name suggests. If you remove configuration for the default storage and try to create a file, you'll see the following error:

echo 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n\n... ckan.logic.ValidationError: None - {'storage': ['Storage default is not configured']}\n

Storage default is not configured. That's why we need default configuration. But if you want to upload a file into a different storage or you don't want to add the default storage at all, you can always specify explicitly the name of the storage you are going to use.

When using API actions, add storage parameter to the call:

echo 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt storage=memory\n

When writing python code, pass storage name to get_storage function:

storage = get_storage(\"memory\")\n

When writing JS code, pass object {requestParams: {storage: \"memory\"}} to upload function:

const sandbox = ckan.sandbox()\nconst file = new File([\"content\"], \"file.txt\")\nconst options = {requestParams: {storage: \"memory\"}};\n\nawait sandbox.files.upload(file, options)\n
"},{"location":"usage/multipart/","title":"Multipart, resumable and signed uploads","text":"

This feature has many names, but it basically divides a single upload into multiple stages. It can be used in following situations:

All these situations are handled by 4 API actions, which are available is storage has MULTIPART capability:

Implementation of multipart upload depends on the used adapter, so make sure you checked its documentation before using any multipart actions. There are some common steps in multipart upload workflow that are usually the same among all adapters:

Incomplete files support most of normal file actions, but you need to pass completed=False to action when working with incomplete files. I.e, if you want to remove incomplete upload, use its ID and completed=False:

ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24 completed=False\n

Incompleted files do not support streaming and downloading via public interface of the extension. But storage adapter can expose such features via custom methods if it's technically possible.

Example of basic multipart upload is shown above. files:fs adapter can be used for running this example, as it implements MULTIPART.

First, create text file and check its size:

echo 'hello world!' > /tmp/file.txt\nwc -c /tmp/file.txt\n\n... 13 /tmp/file.txt\n

The size is 13 bytes and content type is text/plain. These values must be used for upload initialization.

ckanapi action files_multipart_start name=file.txt size=13 content_type=text/plain\n\n... {\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n...   \"hash\": \"\",\n...   \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"name\": \"file.txt\",\n...   \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...   \"owner_type\": \"user\",\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {\n...     \"uploaded\": 0\n...   }\n... }\n

Here storage_data contains {\"uploaded\": 0}. It may be different for other adaptes, especially if they implement non-consecutive uploads, but generally it's the recommended way to keep upload progress.

Now we'll upload first 5 bytes of file.

ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n    upload@<(dd if=/tmp/file.txt bs=1 count=5)\n\n... {\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n...   \"hash\": \"\",\n...   \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"name\": \"file.txt\",\n...   \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...   \"owner_type\": \"user\",\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {\n...     \"uploaded\": 5\n...   }\n... }\n

If you try finalizing upload right now, you'll get an error.

ckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... ckan.logic.ValidationError: None - {'upload': ['Actual value of upload size(5) does not match expected value(13)']}\n

Let's upload the rest of bytes and complete the upload.

ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n    upload@<(dd if=/tmp/file.txt bs=1 skip=5)\n\nckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... {\n...   \"atime\": null,\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:57:18.483716+00:00\",\n...   \"hash\": \"c897d1410af8f2c74fba11b1db511e9e\",\n...   \"id\": \"a740692f-e3d5-492f-82eb-f04e47c13848\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"mtime\": null,\n...   \"name\": \"file.txt\",\n...   \"owner_id\": null,\n...   \"owner_type\": null,\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {}\n... }\n

Now file can be used normally. You can transfer file ownership to someone, stream or modify it. Pay attention to ID: completed file has its own unique ID, which is different from ID of the incomplete upload.

"},{"location":"usage/ownership/","title":"File ownership","text":"

Every file can have an owner and there can be only one owner of the file. It's possible to create file without an owner, but usually application will only benefit from keeping every file with its owner. Owner is described with two fields: ID and type.

When file is created, by default the current user from API action's context is assigned as an owner of the file. From now on, the owner can perform other operations, such as renaming/displaying/removing with the file.

Apart from chaining auth function, to modify access rules for the file, plugin can implement IFiles.files_file_allows and IFiles.files_owner_allows methods.

def files_file_allows(\n    self,\n    context: Context,\n    file: File | Multipart,\n    operation: types.FileOperation,\n) -> bool | None:\n    ...\n\ndef files_owner_allows(\n    self,\n    context: Context,\n    owner_type: str, owner_id: str,\n    operation: types.OwnerOperation,\n) -> bool | None:\n    ...\n

These methods receive current action context, the tested object details, and the name of operation(show, update, delete, file_transfer). files_file_allows checks permission for accessed file. It's usually called when user interacts with file directly. files_owner_allows works with owner described by type and ID. It's usually called when user transfer file ownership, perform bulk file operation for owner files, or just trying to get the list of files that belongs to owner.

If method returns true/false, operation is allowed/denied. If method returns None, default logic used to check access.

As already mentoined, by default, user who owns the file, can access it. But what about different owners? What if file owned by other entity, like resource or dataset?

Out of the box, nobody can access such files. But there are three config options that modify this restriction.

ckanext.files.owner.cascade_access = ENTITY_TYPE ANOTHER_TYPE gives access to file owned by entity if user already has access to entity itself. Use words like package, resource, group instead of ENTITY_TYPE.

For example: file is owned by resource. If cascade access is enabled, whoever has access to resource_show of the resource, can also see the file owned by this resource. If user passes resource_update for resource, he can also modify the file owned by this resource, etc.

Important: be careful and do not add user to ckanext.files.owner.cascade_access. User's own files are considered private and most likely you don't really need anyone else to be able to see or modify these files.

The second option is ckanext.files.owner.transfer_as_update. When transfer-as-update enabled, any user who has <OWNER_TYPE>_update permission, can transfer own files to this OWNER_TYPE. Intead of using this option, you can define <OWNER_TYPE>_file_transfer.

And the third option is ckanext.files.owner.scan_as_update. Just as with ownership transfer, it gives user permission to list all files of the owner if user can <OWNER_TYPE>_update it. Intead of using this option, you can define <OWNER_TYPE>_file_scan.

"},{"location":"usage/permissions/","title":"Permissions","text":"

File creation is not allowed by default. Only sysadmin can use files_file_create and files_multipart_start actions. This is done deliberately: uncontrolled uploads can turn your portal into user's personal cloud-storage.

There are three ways to grant upload permission to normal users.

The BAD option is simple. Enable ckanext.files.authenticated_uploads.allow config option and every registered user will be allowed to upload files. But only into default storage. If you want to change the list of storages available to common user, specify storage names as ckanext.files.authenticated_uploads.storages option.

The GOOD option is relatively simple. Define chained auth function with name files_file_create. It's called whenever user initiates an upload. Now you can decide whether user is allowed to upload files with specified parameters.

The BEST option is to leave this restriction unchanged. Do not allow any user to call files_file_create. Instead, create a new action for your goal. ckanext-files isn't a solution - it's a tool that helps you in building the solution.

If you need to add documents field to dataset that contains uploaded PDF files, create a separate action dataset_document_attach. Specify access rules and validation for it. Or even hardcode the storage that will be used for uploads. And then, from this new action, call files_file_create with ignore_auth: True.

In this way you control every side of uploading documents into dataset and do not accidentally break other functionality, because every other feature will define its own action.

"},{"location":"usage/task-queue/","title":"Task queue","text":"

One of the challenges introduced by independently managed files is related to file ownership. As long as you can call files_transfer_ownership manually, things are transparent. But as soon as you add custom file field to dataset, you probably want to automatically transfer ownership of the file refered by this custom field.

Imagine, that you have PDF file owned by you. And you specify ID of this file in the attachment_id field of the dataset. You want to show download link for this file on the dataset page. But if file owned by you, nobody will be able to download the file. So you decide to transfer file ownership to dataset, so that anyone who sees dataset, can see the file as well.

You cannot update dataset and transfer ownership after it, because there will be a time window between these two actions, when data is not valid. Or even worse, after updating dataset you'll lose internet connection and won't be able to finish the transfer.

Neither you can transfer ownership first and then update the dataset. attachment_id may have additional validators and you don't know in advance, whether you'll be able to successfully update dataset after the transfer.

This problem can be solved via queuing additional tasks inside the action. For example, validator that checks if certain file ID can be used as attachment_id can queue ownership transfer. If dataset update completed without errors, queued task is executed automatically and dataset becomes the owner of the file.

Task is queued via ckanext.files.shared.add_task function, which accepts objects inherited from ckanext.files.shared.Task. Task class requires implementing abstract method run(result: Any, idx: int, prev: Any), which is called when task is executed. This method receives the result of action which caused task execution, task's position in queue and the result of previous task.

For example, one of attachment_id validatos can queue the following MyTask via add_task(MyTask(file_id)) to transfer file_id ownership to the updated dataset:

from ckanext.files.shared import Task\n\nclass MyTask(Task):\n    def __init__(self, file_id):\n        self.file_id = file_id\n\n    def run(self, dataset, idx, prev):\n        return tk.get_action(\"files_transfer_ownership\")(\n            {\"ignore_auth\": True},\n            {\n                \"id\": self.file_id,\n                \"owner_type\": \"package\",\n                \"owner_id\": dataset[\"id\"],\n                \"pin\": True,\n            },\n        )\n

As the first argument, Task.run receives the result of action which was called. Right now only following actions support tasks:

If you want to enable tasks support for your custom action, decorate it with ckanext.files.shared.with_task_queue decorator:

from ckanext.files.shared import with_task_queue\n\n@with_task_queue\ndef my_action(context, data_dict)\n    # you can call `add_task` inside this action's stack frame.\n    ...\n

Good example of validator using tasks is files_transfer_ownership validator factory. It can be added to metadata schema as files_transfer_ownership(owner_type, name_of_id_field). For example, if you are adding this validator to resource, call it as files_transfer_ownership(\"resource\", \"id\"). The second argument is the name of the ID field. As in most cases it's id, you can omit the second argument:

"},{"location":"usage/tracked-files/","title":"Tracked and untracked files","text":"

There is a difference between creating files via action:

tk.get_action(\"files_file_create\")(\n    {\"ignore_auth\": True},\n    {\"upload\": \"hello\", \"name\": \"hello.txt\"}\n)\n

and via direct call to Storage.upload:

from ckanext.files.shared import get_storage, make_upload\n\nstorage = get_storage()\nstorage.upload(\"hello.txt\", make_upload(b\"hello\"), {})\n

The former snippet creates a tracked file: file uploaded to the storage and its details are saved to database.

The latter snippet creates an untracked file: file uploaded to the storage, but its details are not saved anywhere.

Untracked files can be used to achieve specific goals. For example, imagine a storage adapter that writes files to the specified ZIP archive. You can create an interface, that initializes such storage for an existing ZIP resource and uploads files into it. You don't need a separate record in DB for every uploaded file, because all of them go into the resource, that is already stored in DB.

But such use-cases are pretty specific, so prefer to use API if you are not sure, what you need. The main reason to use tracked files is their discoverability: you can use files_file_search API action to list all the tracked files and optionally filter them by storage, location, content_type, etc:

ckanapi action files_file_search\n\n... {\n...   \"count\": 123,\n...   \"results\": [\n...     {\n...       \"atime\": null,\n...       \"content_type\": \"text/plain\",\n...       \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n...       \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...       \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n...       \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n...       \"mtime\": null,\n...       \"name\": \"hello.txt\",\n...       \"size\": 11,\n...       \"storage\": \"default\",\n...       \"storage_data\": {}\n...     },\n...     ...\n...   ]\n... }\n\nckanapi action files_file_search size:5 rows=1\n\n... {\n...   \"count\": 2,\n...   \"results\": [\n...     {\n...       \"atime\": null,\n...       \"content_type\": \"text/plain\",\n...       \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n...       \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...       \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n...       \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n...       \"mtime\": null,\n...       \"name\": \"hello.txt\",\n...       \"size\": 5,\n...       \"storage\": \"default\",\n...       \"storage_data\": {}\n...     }\n...   ]\n... }\n\nckanapi action files_file_search content_type=application/pdf\n\n... {\n...   \"count\": 0,\n...   \"results\": []\n... }\n

As for untracked files, their discoverability depends on the storage adapters. Some of them, files:fs for example, can scan the storage and locate all uploaded files, both thacked and untracked. If you have files:fs storage configured as default, use the following command to scan its content:

ckan files scan\n

If you want to scan a different storage, specify its name via -s/--storage-name option. Remember, that some storage adapters do not support scanning.

ckan files scan -s memory\n

If you want to see untracked files only, add -u/--untracked-only flag.

ckan files scan -u\n

If you want to track any untracked files, by creating a DB record for every such file, add -t/--track flag. After that you'll be able to discover previously untracked files via files_file_search API action. Most usable this option will be during the migration, when you are configuring a new storage, that points to an existing location with files.

ckan files scan -t\n
"},{"location":"usage/transfer/","title":"Ownership transfer","text":"

File ownership can be transfered. As there can be only one owner of the file, as soon as you transfer ownership over file, you yourself do not own this file.

To transfer ownership, use files_transfer_ownership action and specify id of the file, owner_id and owner_type of the new owner.

You can't just transfer ownership to anyone. You either must pass IFiles.files_owner_allows check for file_transfer operation, or pass a cascade access check for the future owner of the file when cascade access and transfer-as-update is enabled.

For example, if you have the following options in config file:

ckanext.files.owner.cascade_access = organization\nckanext.files.owner.transfer_as_update = true\n
you must pass organization_update auth function if you want to transfer file ownership to organization.

In addition, file can be pinned. In this way we mark important files. Imagine the resource and its uploaded file. The link to this file is used by resource and we don't want this file to be accidentally transfered to someone else. We pin the file and now nobody can transfer the file without explicit confirmation of his intention.

There are two ways to move pinned file:

"},{"location":"usage/use-in-browser/","title":"Usage in browser","text":"

You can upload files using JavaScript CKAN modules. ckanext-files extends CKAN's Sandbox object(available as this.sandbox inside the JS CKAN module), so we can use shortcut and upload file directly from the DevTools. Open any CKAN page, switch to JS console and create the sandbox instance. Inside it we have files object, which in turn contains upload method. This method accepts File object for upload(the same object you can get from the input[type=file]).

sandbox = ckan.sandbox()\nawait sandbox.files.upload(\nnew File([\"content\"], \"file.txt\")\n)\n\n... {\n...     \"id\": \"18cdaa65-5eed-4078-89a8-469b137627ce\",\n...     \"name\": \"file.txt\",\n...     \"location\": \"b53907c3-8434-4dee-9a9e-6c4d3055d200\",\n...     \"content_type\": \"text/plain\",\n...     \"size\": 7,\n...     \"hash\": \"9a0364b9e99bb480dd25e1f0284c8555\",\n...     \"storage\": \"default\",\n...     \"ctime\": \"2024-06-02T16:12:27.902055+00:00\",\n...     \"mtime\": null,\n...     \"atime\": null,\n...     \"storage_data\": {}\n... }\n

If you are still using FS storage configured in previous section, switch to /tmp/example folder and check it's content:

ls /tmp/example\n... b53907c3-8434-4dee-9a9e-6c4d3055d200\n\ncat b53907c3-8434-4dee-9a9e-6c4d3055d200\n... content\n

And, as usually, let's remove file using the ID from the upload promise:

sandbox.client.call(\"POST\", \"files_file_delete\", {\nid: \"18cdaa65-5eed-4078-89a8-469b137627ce\"\n})\n
"},{"location":"usage/use-in-code/","title":"Usage in code","text":"

If you are writing the code and you want to interact with the storage directly, without the API layer, you can do it via a number of public functions of the extension available in ckanext.files.shared.

Let's configure filesystem storage first. Filesystem adapter has a mandatory option path that controls filesystem location, where files are stored. If path does not exist, storage will raise an exception by default. But it can also create missing path if you enable create_path option. Here's our final version of settings:

ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n

Now we are going to connect to CKAN shell via ckan shell CLI command and create an instance of the storage:

from ckanext.files.shared import get_storage\nstorage = get_storage()\n

Because you have all configuration in place, the rest is fairly straightforward. We will upload the file, read it's content and remove it from the CKAN shell.

To create the file, storage.upload method must be called with 2 parameters:

You can use any string as the first parameter. As for the \"special stream-like object\", ckanext-files has ckanext.files.shared.make_upload function, that accepts a number of different types(bytes, werkzeug.datastructures.FileStorage, BytesIO, file descriptor) and converts them into expected format.

from ckanext.files.shared import make_upload\n\nupload = make_upload(b\"hello world\")\nresult = storage.upload('file.txt', upload)\n\nprint(result)\n\n... FileData(\n...     location='60b385e7-8137-496c-bb1d-6ae4d7963ab3',\n...     size=11,\n...     content_type='text/plain',\n...     hash='5eb63bbbe01eeed093cb22bb8f5acdc3',\n...     storage_data={}\n... )\n

result is an instance of ckanext.files.shared.FileData dataclass. It contains all the information required by storage to manage the file.

result object has location attribute that contains the name of the file relative to the path option specified in the storage configuration. If you visit /tmp/example directory, which was set as a path for the storage, you'll see there a file with the name matching location from result. And its content matches the content of our upload, which is quite an expected outcome.

cat /tmp/example/60b385e7-8137-496c-bb1d-6ae4d7963ab3\n\n... hello world\n

But let's go back to the shell and try reading file from the python's code. We'll pass result to the storage's stream method, which produces an iterable of bytes based on our result:

buffer = storage.stream(result)\ncontent = b\"\".join(buffer)\n\n... b'hello world'\n

In most cases, storage only needs a location of the file object to read it. So, if you don't have result generated during the upload, you still can read the file as long as you have its location. But remember, that some storage adapters may require additional information, and the following example must be adapted depending on the adapter:

from ckanext.files.shared import FileData\n\nlocation = \"60b385e7-8137-496c-bb1d-6ae4d7963ab3\"\ndata = FileData(location)\n\nbuffer = storage.stream(data)\ncontent = b\"\".join(buffer)\nprint(content)\n\n... b'hello world'\n

And finally we can to remove the file

storage.remove(result)\n
"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-\\.\\_]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#ckanext-files","title":"ckanext-files","text":"

Files as first-class citizens of CKAN. Upload, manage, remove files directly and attach them to datasets, resources, etc.

Read the documentation for a full user guide.

"},{"location":"#quickstart","title":"Quickstart","text":"
  1. Install the extension

    pip install ckanext-files\n

  2. Add files to the ckan.plugins setting in your CKAN config file.

  3. Run DB migrations

    ckan db upgrade -p files\n

  4. Configure storage

    ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n
  5. Upload your first file

    ckanapi action files_file_create upload@~/Downloads/file.txt`\n
"},{"location":"#development","title":"Development","text":"

Install dev extras and nodeJS dependencies:

pip install -e '.[dev]'\nnpm ci\n

Run unittests:

pytest\n

Run frontend tests:

# start test server in separate terminal\nmake test-server\n\n# run tests\nnpx cypress run\n

Run typecheck:

npx pyright\n

"},{"location":"#license","title":"License","text":"

AGPL

"},{"location":"api/","title":"API","text":""},{"location":"api/#files_file_createcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_create(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Create a new file.

This action passes uploaded file to the storage without strict validation. File is converted into standard upload object and everything else is controlled by storage. The same file may be uploaded to one storage and rejected by other, depending on configuration.

This action is way too powerful to use it directly. The recommended approach is to register a different action for handling specific type of uploads and call current action internally.

When uploading a real file(or using werkqeug.datastructures.FileStorage), name parameter can be omited. In this case, the name of uploaded file is used.

ckanapi action files_file_create upload@path/to/file.txt\n

When uploading a raw content of the file using string or bytes object, name is mandatory.

ckanapi action files_file_create upload@<(echo -n \"hello world\") name=file.txt\n

Requires storage with CREATE capability.

Params:

Returns:

dictionary with file details.

"},{"location":"api/#files_file_deletecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_delete(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Remove file from storage.

Unlike packages, file has no state field. Removal usually means that file details removed from DB and file itself removed from the storage.

Some storage can implement revisions of the file and keep archived versions or backups. Check storage documentation if you need to know whether there are chances that file is not completely removed with this operation.

Requires storage with REMOVE capability.

ckanapi action files_file_delete id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n

Params:

Returns:

dictionary with details of the removed file.

"},{"location":"api/#files_file_pincontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_pin(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Pin file to the current owner.

Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.

Params:

Returns:

dictionary with details of updated file

"},{"location":"api/#files_file_renamecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_rename(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Rename the file.

This action changes human-readable name of the file, which is stored in DB. Real location of the file in the storage is not modified.

ckanapi action files_file_show \\\n    id=226056e2-6f83-47c5-8bd2-102e2b82ab9a \\\n    name=new-name.txt\n

Params:

Returns:

dictionary with file details

"},{"location":"api/#files_file_scancontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_scan(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

List files of the owner

This action internally calls files_file_search, but with static values of owner filters. If owner is not specified, files filtered by current user. If owner is specified, user must pass authorization check to see files.

Params:

The all other parameters are passed as-is to files_file_search.

Returns:

"},{"location":"api/#files_file_searchcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_search(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Search files.

This action is not stabilized yet and will change in future.

Provides an ability to search files using exact filter by name, content_type, size, owner, etc. Results are paginated and returned in package_search manner, as dict with count and results items.

All columns of File model can be used as filters. Before the search, type of column and type of filter value are compared. If they are the same, original values are used in search. If type different, column value and filter value are casted to string.

This request produces size = 10 SQL expression:

ckanapi action files_file_search size:10\n

This request produces size::text = '10' SQL expression:

ckanapi action files_file_search size=10\n

Even though results are usually not changed, using correct types leads to more efficient search.

Apart from File columns, the following Owner properties can be used for searching: owner_id, owner_type, pinned.

storage_data and plugin_data are dictionaries. Filter's value for these fields used as a mask. For example, storage_data={\"a\": {\"b\": 1}} matches any File with storage_data containing item a with value that contains b=1. This works only with data represented by nested dictionaries, without other structures, like list or sets.

Experimental feature: File columns can be passed as a pair of operator and value. This feature will be replaced by strictly defined query language at some point:

ckanapi action files_file_search size:'[\"<\", 100]' content_type:'[\"like\", \"text/%\"]'\n
Fillowing operators are accepted: =, <, >, !=, like

Params:

Returns:

"},{"location":"api/#files_file_search_by_usercontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_search_by_user(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Internal action. Do not use it.

"},{"location":"api/#files_file_showcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_show(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Show file details.

This action only displays information from DB record. There is no way to get the content of the file using this action(or any other API action).

ckanapi action files_file_show id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n

Params:

Returns:

dictionary with file details

"},{"location":"api/#files_file_unpincontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_unpin(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Pin file to the current owner.

Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.

Params:

Returns:

dictionary with details of updated file

"},{"location":"api/#files_multipart_completecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_complete(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Finalize multipart upload and transform it into completed file.

Depending on storage this action may require additional parameters. But usually it just takes ID and verify that content type, size and hash provided when upload was initialized, much the actual value.

If data is valid and file is completed inside the storage, new File entry with file details created in DB and file can be used just as any normal file.

Requires storage with MULTIPART capability.

Params:

Returns:

dictionary with details of the created file

"},{"location":"api/#files_multipart_refreshcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_refresh(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Refresh details of incomplete upload.

Can be used if upload process was interrupted and client does not how many bytes were already uploaded.

Requires storage with MULTIPART capability.

Params:

Returns:

dictionary with details of the updated upload

"},{"location":"api/#files_multipart_startcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_start(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Initialize multipart(resumable,continuous,signed,etc) upload.

Apart from standard parameters, different storages can require additional data, so always check documentation of the storage before initiating multipart upload.

When upload initialized, storage usually returns details required for further upload. It may be a presigned URL for direct upload, or just an ID of upload which must be used with files_multipart_update.

Requires storage with MULTIPART capability.

Params:

Returns:

dictionary with details of initiated upload. Depends on used storage

"},{"location":"api/#files_multipart_updatecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_update(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Update incomplete upload.

Depending on storage this action may require additional parameters. Most likely, upload with the fragment of uploaded file.

Requires storage with MULTIPART capability.

Params:

Returns:

dictionary with details of the updated upload

"},{"location":"api/#files_resource_uploadcontext-context-data_dict-dictstr-any","title":"files_resource_upload(context: 'Context', data_dict: 'dict[str, Any]')","text":"

Create a new file inside resource storage.

This action internally calls files_file_create with ignore_auth=True and always uses resources storage.

New file is not attached to resource. You need to call files_transfer_ownership manually, when resource created.

Params:

Returns:

dictionary with file details.

"},{"location":"api/#files_transfer_ownershipcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_transfer_ownership(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'","text":"

Transfer file ownership.

Depending on storage this action may require additional parameters. Most likely, upload with the fragment of uploaded file.

Params:

Returns:

dictionary with details of updated file

"},{"location":"changelog/","title":"Changelog","text":"

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

"},{"location":"changelog/#unreleased","title":"Unreleased","text":"

Compare with latest

"},{"location":"changelog/#features","title":"Features","text":""},{"location":"changelog/#code-refactoring","title":"Code Refactoring","text":""},{"location":"changelog/#v031-2024-05-22","title":"v0.3.1 - 2024-05-22","text":"

Compare with v0.3.0

"},{"location":"changelog/#features_1","title":"Features","text":""},{"location":"changelog/#v030-2024-05-16","title":"v0.3.0 - 2024-05-16","text":"

Compare with v0.2.6

"},{"location":"changelog/#features_2","title":"Features","text":""},{"location":"changelog/#bug-fixes","title":"Bug Fixes","text":""},{"location":"changelog/#code-refactoring_1","title":"Code Refactoring","text":""},{"location":"changelog/#v006-2024-04-24","title":"v0.0.6 - 2024-04-24","text":"

Compare with v0.0.5

"},{"location":"changelog/#bug-fixes_1","title":"Bug Fixes","text":""},{"location":"changelog/#v026-2024-04-24","title":"v0.2.6 - 2024-04-24","text":"

Compare with v0.2.4

"},{"location":"changelog/#bug-fixes_2","title":"Bug Fixes","text":""},{"location":"changelog/#v024-2024-04-15","title":"v0.2.4 - 2024-04-15","text":"

Compare with v0.2.3

"},{"location":"changelog/#features_3","title":"Features","text":""},{"location":"changelog/#v023-2024-04-07","title":"v0.2.3 - 2024-04-07","text":"

Compare with v0.2.2

"},{"location":"changelog/#features_4","title":"Features","text":""},{"location":"changelog/#bug-fixes_3","title":"Bug Fixes","text":""},{"location":"changelog/#v022-2024-03-18","title":"v0.2.2 - 2024-03-18","text":"

Compare with v0.2.1

"},{"location":"changelog/#v021-2024-03-18","title":"v0.2.1 - 2024-03-18","text":"

Compare with v0.2.0

"},{"location":"changelog/#features_5","title":"Features","text":""},{"location":"changelog/#v020-2024-03-12","title":"v0.2.0 - 2024-03-12","text":"

Compare with v0.0.5

"},{"location":"changelog/#features_6","title":"Features","text":""},{"location":"changelog/#code-refactoring_2","title":"Code Refactoring","text":""},{"location":"changelog/#v005-2024-02-26","title":"v0.0.5 - 2024-02-26","text":"

Compare with v0.0.4

"},{"location":"changelog/#bug-fixes_4","title":"Bug Fixes","text":""},{"location":"changelog/#v004-2023-10-25","title":"v0.0.4 - 2023-10-25","text":"

Compare with v0.0.2

"},{"location":"changelog/#v002-2022-02-09","title":"v0.0.2 - 2022-02-09","text":"

Compare with v0.0.1

"},{"location":"changelog/#v001-2021-09-21","title":"v0.0.1 - 2021-09-21","text":"

Compare with first commit

"},{"location":"cli/","title":"CLI","text":"

ckanext-files register files entrypoint under ckan command. Commands below must be executed as ckan -c $CKAN_INI files <COMMAND>.

adapters [-v]

List all available storage adapters. With -v/--verbose flag docstring from adapter classes are printed as well.

storages [-v]

List all configured storages. With -v/--verbose flag all supported capabilities are shown.

stream FILE_ID [-o OUTPUT] [--start START] [--end END]

Stream content of the file to STDOUT. For non-textual files use output redirection stream ID > file.ext. Alternatively, output destination can be specified via -o/--output option. If it contains path to directory, inside this directory will be created file with the same name as streamed item. Otherwise, OUTPUT is used as filename.

--start and --end can be used to receive a fragment of the file. Only positive values are guaranteed to work with any storage that supports STREAM. Some storages support negative values for these options and count them from the end of file. I.e --start -10 reads last 10 bytes of file. --end -1 reads till the last byte, but the last byte is not included into output.

scan [-s default] [-u] [-t [-a OWNER_ID]]

List all files that exist in storage. Works only if storage supports SCAN. By default shows content of default storage. -s/--storage-name option changes target storage.

-u/--untracked-only flag shows only untracked files, that has no corresponding record in DB. Can be used to identify leftovers after removing data from portal.

-t/--track flag registers any untracked file by creating DB record for it. Can be used only when ANALYZE is supported. Files are created without an owner. Use -a/--adopt-by option with user ID to give ownership over new files to the specified user. Can be used when configuring a new storage connected to existing location with files.

"},{"location":"implementation-example/","title":"Example implementation of custom storage adapter","text":"

Storage consist of the storage object that dispatches operation requests and 3 services that do the actual job: Reader, Uploader and Manager. To define a custom storage, you need to extend the main storage class, describe storage logic and register storage via IFiles.files_get_storage_adapters.

Let's implement DB storage. It will store files in SQL table using SQLAlchemy. There will be just one requirement for the table: it must have column for storing unique identifier of the file and another column for storing content of the file as bytes.

For the sake of simplicity, our storage will work only with existing tables. Create the table manually before we begin.

First of all, we create an adapter that does nothing and register it in our plugin.

from __future__ import annotations\n\nfrom typing import Any\nimport sqlalchemy as sa\n\nimport ckan.plugins as p\nfrom ckan.model.types import make_uuid\nfrom ckanext.files import shared\n\n\nclass ExamplePlugin(p.SingletonPlugin):\n    p.implements(shared.IFiles)\n    def files_get_storage_adapters(self) -> dict[str, Any]:\n        return {\"example:db\": DbStorage}\n\n\nclass DbStorage(shared.Storage):\n    ...\n

After installing and enabling your custom plugin, you can configure storage with this adapter by adding a single new line to config file:

ckanext.files.storage.db.type = files:db\n

But if you check storage via ckan files storages -v, you'll see that it can't do anything.

ckan files storages -v\n\n... db: example:db\n...        Supports: Capability.NONE\n...        Does not support: Capability.REMOVE|STREAM|CREATE|...\n

Before we start uploading files, let's make sure that storage has proper configuration. As files will be stored in the DB table, we need the name of the table and DB connection string. Let's assume that table already exists, but we don't know which columns to use for files. So we need name of column for content and for file's unique identifier. ckanext-files uses term location instead of identifier, so we'll do the same in our implementation.

There are 4 required options in total: * db_url: DB connection string * table: name of the table * location_column: name of column for file's unique identifier * content_column: name of column for file's content

It's not mandatory, but is highly recommended that you declare config options for the adapter. It can be done via Storage.declare_config_options class method, which accepts declaration object and key namespace for storage options.

class DbStorage(shared.Storage):\n\n    @classmethod\n    def declare_config_options(cls, declaration, key) -> None:\n        declaration.declare(key.db_url).required()\n        declaration.declare(key.table).required()\n        declaration.declare(key.location_column).required()\n        declaration.declare(key.content_column).required()\n

And we probably want to initialize DB connection when storage is initialized. For this we'll extend constructor, which must be defined as method accepting keyword-only arguments:

class DbStorage(shared.Storage):\n    ...\n\n    def __init__(self, **settings: Any) -> None:\n        db_url = self.ensure_option(settings, \"db_url\")\n\n        self.engine = sa.create_engine(db_url)\n        self.location_column = sa.column(\n            self.ensure_option(settings, \"location_column\")\n        )\n        self.content_column = sa.column(self.ensure_option(settings, \"content_column\"))\n        self.table = sa.table(\n            self.ensure_option(settings, \"table\"),\n            self.location_column,\n            self.content_column,\n        )\n        super().__init__(**settings)\n

You can notice that we are using Storage.ensure_option quite often. This method returns the value of specified option from settings or raises an exception.

The table definition and columns are saved as storage attributes, to simplify building SQL queries in future.

Now we are going to define classes for all 3 storage services and tell storage, how to initialize these services.

There are 3 services: Reader, Uploader and Manager. Each of them initialized via corresponding storage method: make_reader, make_uploader and make_manager. And each of them accepts a single argument during creation, the storage itself.

class DbStorage(shared.Storage):\n    def make_reader(self):\n        return DbReader(self)\n\n    def make_uploader(self):\n        return DbUploader(self)\n\n    def make_manager(self):\n        return DbManager(self)\n\n\nclass DbReader(shared.Reader):\n    ...\n\n\nclass DbUploader(shared.Uploader):\n    ...\n\n\nclass DbManager(shared.Manager):\n    ...\n

Our first target is Uploader service. It's responsible for file creation. For the minimal implementation it needs upload method and capabilities attribute which tells the storage, what exactly the Uploader can do.

class DbUploader(shared.Uploader):\n    capabilities = shared.Capability.CREATE\n\n    def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -> shared.FileData:\n        ...\n

upload receives the location(name) of the uploaded file; upload object with file's content; and extras dictionary that contains any additional arguments that can be passed to uploader. We are going to ignore location and generate a unique UUID for every uploaded file instead of using user-defined filename.

The goal is to write the file into DB and return shared.FileData that contains location of the file in DB(value of location_column), size of the file in bytes, MIMEtype of the file and hash of file content.

For location we'll just use ckan.model.types.make_uuid function. Size and MIMEtype are already available as upload.size and upload.content_type.

The only problem is hash of the content. You can compute it in any way you like, but there is a simple option if you have no preferences. upload has hashing_reader method, which returns an iterable for file content. When you read file through it, content hash is automatically computed and you can get it using get_hash method of the reader.

Just make sure to read the whole file before checking the hash, because hash computed using consumed content. I.e, if you just create the hashing reader, but do not read a single byte from it, you'll receive the hash of empty string. If you read just 1 byte, you'll receive the hash of this single byte, etc.

The easiest option for you is to call reader.read() method to consume the whole file and then call reader.get_hash() to receive the hash.

Here's the final implementation of DbUploader:

class DbUploader(shared.Uploader):\n    capabilities = shared.Capability.CREATE\n\n    def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -> shared.FileData:\n        uuid = make_uuid()\n        reader = upload.hashing_reader()\n\n        values = {\n            self.storage.location_column: uuid,\n            self.storage.content_column: reader.read(),\n        }\n        stmt = sa.insert(self.storage.table, values)\n\n        result = self.storage.engine.execute(stmt)\n\n        return shared.FileData(\n            uuid,\n            upload.size,\n            upload.content_type,\n            reader.get_hash()\n        )\n

Now you can upload file into your new db storage:

ckanapi action files_file_create storage=db name=hello.txt upload@<(echo -n 'hello world')\n\n...{\n...  \"atime\": null,\n...  \"content_type\": \"text/plain\",\n...  \"ctime\": \"2024-06-17T13:48:52.121755+00:00\",\n...  \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...  \"id\": \"bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\",\n...  \"location\": \"5a4472b3-cf38-4c58-81a6-4d4acb7b170e\",\n...  \"mtime\": null,\n...  \"name\": \"hello.txt\",\n...  \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...  \"owner_type\": \"user\",\n...  \"pinned\": false,\n...  \"size\": 11,\n...  \"storage\": \"db\",\n...  \"storage_data\": {}\n...}\n

File is created, but you cannot read it just yet. Try running ckan files stream CLI command with file ID:

ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... Operation stream is not supported by db storage\n... Aborted!\n

As expected, you have to write extra code.

Streaming, reading and generating links is a responsibility of Reader service. We only need stream method for minimal implementation. This method receives shared.FileData object(the same object as the one returned from Uploader.upload) and extras containing all additional arguments passed by the caller. The result is any iterable producing bytes.

We'll use location property of shared.FileData as a value for location_column inside the table.

And don't forget to add STREAM capability to Reader.capabilities.

class DbReader(shared.Reader):\n    capabilities = shared.Capability.STREAM\n\n    def stream(self, data: shared.FileData, extras: dict[str, Any]) -> Iterable[bytes]:\n        stmt = (\n            sa.select(self.storage.content_column)\n            .select_from(self.storage.table)\n            .where(self.storage.location_column == data.location)\n        )\n        row = self.storage.engine.execute(stmt).fetchone()\n\n        return row\n

The result may be confusing: we returning Row object from the stream method. But our goal is to return any iterable that produces bytes. Row is iterable(tuple-like). And it contains only one item - value of column with file content, i.e, bytes. So it satisfy the requirements.

Now you can check content via CLI once again.

ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... hello world\n

Finally, we need to add file removal for the minimal implementation. And it also nice to to have SCAN capability, as it shows all files currently available in storage, so we add it as bonus. These operations handled by Manager. We need remove and scan methods. Arguments are already familiar to you. As for results:

class DbManager(shared.Manager):\n    storage: DbStorage\n    capabilities = shared.Capability.SCAN | shared.Capability.REMOVE\n\n    def scan(self, extras: dict[str, Any]) -> Iterable[str]:\n        stmt = sa.select(self.storage.location_column).select_from(self.storage.table)\n        for row in self.storage.engine.execute(stmt):\n            yield row[0]\n\n    def remove(\n        self,\n        data: shared.FileData | shared.MultipartData,\n        extras: dict[str, Any],\n    ) -> bool:\n        stmt = sa.delete(self.storage.table).where(\n            self.storage.location_column == data.location,\n        )\n        self.storage.engine.execute(stmt)\n        return True\n

Now you can list the all the files in storage:

ckan files scan -s db\n

And remove file using ckanaapi and file ID

ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n

That's all you need for the basic storage. But check definition of base storage and services to find details about other methods. And also check implementation of other storages for additional ideas. <

"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#requirements","title":"Requirements","text":"

Compatibility with core CKAN versions:

CKAN version Compatible? 2.9 no 2.10 yes 2.11 yes master yes

Note

It's recommended to install the extension via pip. If you are using GitHub version of the extension, stick to the vX.Y.Z tags to avoid breaking changes. Check the changelog before upgrading the extension.

"},{"location":"installation/#installation_1","title":"Installation","text":"

Install the extension

pip install ckanext-files # (1)!\n
  1. If you want to use additional adapters, like Apache-libcloud or OpenDAL, specify corresponding package extras
    pip install ckanext-files[opendal,libcloud]\n

Add files to the ckan.plugins setting in your CKAN config file.

Run DB migrations

ckan db upgrade -p files\n
"},{"location":"interfaces/","title":"Interfaces","text":""},{"location":"interfaces/#interfaces","title":"Interfaces","text":"

ckanext-files registers ckanext.files.shared.IFiles interface. As extension is actively developed, this interface may change in future. Always use inherit=True when implementing IFiles.

class IFiles(Interface):\n    \"\"\"Extension point for ckanext-files.\"\"\"\n\n    def files_get_storage_adapters(self) -> dict[str, Any]:\n        \"\"\"Return mapping of storage type to adapter class.\n\n        Example:\n        >>> def files_get_storage_adapters(self):\n        >>>     return {\n        >>>         \"my_ext:dropbox\": DropboxStorage,\n        >>>     }\n\n        \"\"\"\n\n        return {}\n\n    def files_register_owner_getters(self) -> dict[str, Callable[[str], Any]]:\n        \"\"\"Return mapping with lookup functions for owner types.\n\n        Name of the getter is the name used as `Owner.owner_type`. The getter\n        itself is a function that accepts owner ID and returns optional owner\n        entity.\n\n        Example:\n        >>> def files_register_owner_getters(self):\n        >>>     return {\"resource\": model.Resource.get}\n        \"\"\"\n        return {}\n\n    def files_file_allows(\n        self,\n        context: types.Context,\n        file: File | Multipart,\n        operation: types.FileOperation,\n    ) -> bool | None:\n        \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n        Return True/False if user allowed/not allowed. Return `None` to rely on\n        other plugins.\n\n        Default implementation relies on cascade_access config option. If owner\n        of file is included into cascade access, user can perform operation on\n        file if he can perform the same operation with file's owner.\n\n        If current owner is not affected by cascade access, user can perform\n        operation on file only if user owns the file.\n\n        Example:\n        >>> def files_file_allows(\n        >>>         self, context,\n        >>>         file: shared.File | shared.Multipart,\n        >>>         operation: shared.types.FileOperation\n        >>> ) -> bool | None:\n        >>>     if file.owner_info and file.owner_info.owner_type == \"resource\":\n        >>>         return is_authorized_boolean(\n        >>>             f\"resource_{operation}\",\n        >>>             context,\n        >>>             {\"id\": file.owner_info.id}\n        >>>         )\n        >>>\n        >>>     return None\n\n        \"\"\"\n        return None\n\n    def files_owner_allows(\n        self,\n        context: types.Context,\n        owner_type: str,\n        owner_id: str,\n        operation: types.OwnerOperation,\n    ) -> bool | None:\n        \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n        Return True/False if user allowed/not allowed. Return `None` to rely on\n        other plugins.\n\n        Example:\n        >>> def files_owner_allows(\n        >>>         self, context,\n        >>>         owner_type: str, owner_id: str,\n        >>>         operation: shared.types.OwnerOperation\n        >>> ) -> bool | None:\n        >>>     if owner_type == \"resource\" and operation == \"file_transfer\":\n        >>>         return is_authorized_boolean(\n        >>>             f\"resource_update\",\n        >>>             context,\n        >>>             {\"id\": owner_id}\n        >>>         )\n        >>>\n        >>>     return None\n\n        \"\"\"\n        return None\n
"},{"location":"primer/","title":"Welcome to MkDocs","text":"

For full documentation visit mkdocs.org{ data-preview }

Attribute Lists{ data-preview }

Some title

Some content

Some title

Some content

Open styled details Nested details!

And more content again.

theme:\nfeatures:\n- content.code.annotate # (1)!\n
  1. :man_raising_hand: I'm a code annotation! I can contain code, formatted text, images, ... basically anything that can be written in Markdown.
C
#include <stdio.h>\n\nint main(void) {\nprintf(\"Hello world!\\n\");\nreturn 0;\n}\n
C++
#include <iostream>\n\nint main(void) {\nstd::cout << \"Hello world!\" << std::endl;\nreturn 0;\n}\n
graph LR\nA[Start] --> B{Error?};\nB -->|Yes| C[Hmm...];\nC --> D[Debug];\nD --> B;\nB ---->|No| E[Yay!];
sequenceDiagram\nautonumber\nAlice->>John: Hello John, how are you?\nloop Healthcheck\nJohn->>John: Fight against hypochondria\nend\nNote right of John: Rational thoughts!\nJohn-->>Alice: Great!\nJohn->>Bob: How about you?\nBob-->>John: Jolly good!

```py title=\"IFiles\" class IFiles(Interface): \"\"\"Extension point for ckanext-files.\"\"\"

def files_get_storage_adapters(self) -> dict[str, Any]:\n    \"\"\"Return mapping of storage type to adapter class.\n\n    Example:\n    >>> def files_get_storage_adapters(self):\n    >>>     return {\n    >>>         \"my_ext:dropbox\": DropboxStorage,\n    >>>     }\n\n    \"\"\"\n\n    return {}\n\ndef files_register_owner_getters(self) -> dict[str, Callable[[str], Any]]:\n    \"\"\"Return mapping with lookup functions for owner types.\n\n    Name of the getter is the name used as `Owner.owner_type`. The getter\n    itself is a function that accepts owner ID and returns optional owner\n    entity.\n\n    Example:\n    >>> def files_register_owner_getters(self):\n    >>>     return {\"resource\": model.Resource.get}\n    \"\"\"\n    return {}\n\ndef files_file_allows(\n    self,\n    context: types.Context,\n    file: File | Multipart,\n    operation: types.FileOperation,\n) -> bool | None:\n    \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n    Return True/False if user allowed/not allowed. Return `None` to rely on\n    other plugins.\n\n    Default implementation relies on cascade_access config option. If owner\n    of file is included into cascade access, user can perform operation on\n    file if he can perform the same operation with file's owner.\n\n    If current owner is not affected by cascade access, user can perform\n    operation on file only if user owns the file.\n\n    Example:\n    >>> def files_file_allows(\n    >>>         self, context,\n    >>>         file: shared.File | shared.Multipart,\n    >>>         operation: shared.types.FileOperation\n    >>> ) -> bool | None:\n    >>>     if file.owner_info and file.owner_info.owner_type == \"resource\":\n    >>>         return is_authorized_boolean(\n    >>>             f\"resource_{operation}\",\n    >>>             context,\n    >>>             {\"id\": file.owner_info.id}\n    >>>         )\n    >>>\n    >>>     return None\n\n    \"\"\"\n    return None\n\ndef files_owner_allows(\n    self,\n    context: types.Context,\n    owner_type: str,\n    owner_id: str,\n    operation: types.OwnerOperation,\n) -> bool | None:\n    \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n    Return True/False if user allowed/not allowed. Return `None` to rely on\n    other plugins.\n\n    Example:\n    >>> def files_owner_allows(\n    >>>         self, context,\n    >>>         owner_type: str, owner_id: str,\n    >>>         operation: shared.types.OwnerOperation\n    >>> ) -> bool | None:\n    >>>     if owner_type == \"resource\" and operation == \"file_transfer\":\n    >>>         return is_authorized_boolean(\n    >>>             f\"resource_update\",\n    >>>             context,\n    >>>             {\"id\": owner_id}\n    >>>         )\n    >>>\n    >>>     return None\n\n    \"\"\"\n    return None\n\n\n\n  ```\n\n  === \"Hello\"\n\n  world\n\n  === \"bye\"\n\n  world\n
"},{"location":"shared/","title":"Shared","text":"

All public utilites are collected inside ckanext.files.shared module. Avoid using anything that is not listed there. Do not import anything from modules other than shared.

"},{"location":"shared/#get_storagename-str-none-none-storage","title":"get_storage(name: 'str | None' = None) -> 'Storage'","text":"

Return existing storage instance.

Storages are initialized when plugin is loaded. As result, this function always returns the same storage object for the given name.

If no name specified, default storage is returned.

Example:

default_storage = get_storage()\nstorage = get_storage(\"storage name\")\n

"},{"location":"shared/#make_storagename-str-settings-dictstr-any-storage","title":"make_storage(name: 'str', settings: 'dict[str, Any]') -> 'Storage'","text":"

Initialize storage instance with specified settings.

Storage adapter is defined by type key of the settings. All other settings depend on the specific adapter.

Example:

storage = make_storage(\"memo\", {\"type\": \"files:redis\"})\n

"},{"location":"shared/#make_uploadvalue-typesuploadable-upload-upload","title":"make_upload(value: 'types.Uploadable | Upload') -> 'Upload'","text":"

Convert value into Upload object

Use this function for simple and reliable initialization of Upload object. Avoid creating Upload manually, unless you are 100% sure you can provide correct MIMEtype, size and stream.

Example:

storage.upload(\"file.txt\", make_upload(b\"hello world\"))\n

"},{"location":"shared/#with_task_queuefunc-any-name-str-none-none","title":"with_task_queue(func: 'Any', name: 'str | None' = None)","text":"

Decorator for functions that schedule tasks.

Decorated function automatically initializes separate task queue that is processed when function is finished. All tasks receive function's result as execution data(first argument to Task.run).

Without this decorator, you have to manually create task queue context before queuing tasks.

Example:

@with_task_queue\ndef my_action(context, data_dict):\n    ...\n

"},{"location":"shared/#add_tasktask-task","title":"add_task(task: 'Task')","text":"

Add task to the current task queue.

This function can be called only inside task queue context. Such context initialized automatically inside functions decorated with with_task_queue:

@with_task_queue\ndef taks_producer():\n    add_task(...)\n\ntask_producer()\n

If task queue context can be initialized manually using TaskQueue and with statement:

queue = TaskQueue()\nwith queue:\n    add_task(...)\n\nqueue.process(execution_data)\n

"},{"location":"upload-strategies/","title":"File upload strategies","text":"

There is no \"right\" way to add file to entity via ckanext-files. Everything depends on your use-case and here you can find a few different ways to combine file and arbitrary entity.

"},{"location":"upload-strategies/#attach-existing-file-and-then-transfer-ownership-via-api","title":"Attach existing file and then transfer ownership via API","text":"

The simplest option is just saving file ID inside a field of the entity. It's recommended to transfer file ownership to the entity and pin the file.

ckanapi action package_patch id=PACKAGE_ID attachment_id=FILE_ID\n\nckanapi action files_transfer_ownership id=FILE_ID \\\n    owner_type=package owner_id=PACKAGE_ID pin=true\n

Pros: * simple and transparent

Cons: * it's easy to forget about ownership transfer and leave the entity with the inaccessible file * after entity got reference to file and before ownership is transfered data may be considered invalid.

"},{"location":"upload-strategies/#automatically-transfer-ownership-using-validator","title":"Automatically transfer ownership using validator","text":"

Add files_transfer_ownership(owner_type) to the validation schema of entity. When it validated, ownership transfer task is queued and file automatically transfered to the entity after the update.

Pros: * minimal amount of changes if metadata schema already modified * relationships between owner and file are up-to-date after any modification

Cons: * works only with files uploaded in advance and cannot handle native implementation of resource form

"},{"location":"upload-strategies/#upload-file-and-assign-owner-via-queued-task","title":"Upload file and assign owner via queued task","text":"

Add a field that accepts uploaded file. The action itself does not process the upload. Instead create a validator for the upload field, that will schedule a task for file upload and ownership transfer.

In this way, if action is failed, no upload happens and you don't need to do anything with the file, as it never left server's temporal directory. If action finished without an error, the task is executed and file uploaded/attached to action result.

Pros: * can be used together with native group/user/resource form after small modification of CKAN core. * handles upload inside other action as an atomic operation

Cons: * you have to validate file before upload happens to prevent situation when action finished successfully but then upload failed because of file's content type or size. * tasks themselves are experimental and it's not recommended to put a lot of logic into them * there are just too many things that can go wrong

"},{"location":"upload-strategies/#add-a-new-action-that-combines-uploads-modifications-and-ownership-transfer","title":"Add a new action that combines uploads, modifications and ownership transfer","text":"

If you want to add attachmen to dataset, create a separate action that accepts dataset ID and uploaded file. Internally it will upload the file by calling files_file_create, then update dataset via packaage_patch and finally transfer ownership via files_transfer_ownership.

Pros: * no magic. Everything is described in the new action * can be extracted into shared extension and used across multiple portals

Cons: * if you need to upload multiple files and update multipe fields, action quickly becomes too compicated. * integration with existing workflows, like dataset/resource creation is hard. You have to override existing views or create a brand new ones.

"},{"location":"validators/","title":"Validators","text":"Validator Effect files_into_upload Transform value of field(usually file uploaded via <input type=\"file\">) into upload object using ckanext.files.shared.make_upload files_parse_filesize Convert human-readable filesize(1B, 10MiB, 20GB) into an integer files_ensure_name(name_field) If name_field is empty, copy into it filename from current field. Current field must be processed with files_into_upload first files_file_id_exists Verify that file ID exists files_accept_file_with_type(*type) Verify that file ID refers to file with one of specified types. As a type can be used full MIMEtype(image/png), or just its main(image) or secondary(png) part files_accept_file_with_storage(*storage_name) Verify that file ID refers to file stored inside one of specified storages files_transfer_ownership(owner_type, name_of_owner_id_field) Transfer ownership for file ID to specified entity when current API action is successfully finished"},{"location":"configuration/","title":"Configuration","text":"

There are two types of config options for ckanext-files:

Depending on the type of the storage, available options are quite different. For example, files:fs storage type requires path option that controls filesystem path where uploads are stored. files:redis storage type accepts prefix option that defines Redis' key prefix of files stored in Redis. All storage specific options always have form ckanext.files.storage.<STORAGE>.<OPTION>:

ckanext.files.storage.memory.prefix = xxx:\n# or\nckanext.files.storage.my_drive.path = /tmp/hello\n
"},{"location":"configuration/fs/","title":"Filesystem storage configuration","text":"

Private filesystem storage

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n

Public filesystem storage

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:public_fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n## URL of the storage folder. `public_root + location` must produce a public URL\nckanext.files.storage.NAME.public_root =\n
"},{"location":"configuration/global/","title":"Global configuration","text":"
# Default storage used for upload when no explicit storage specified\n# (optional, default: default)\nckanext.files.default_storage = default\n\n# MIMEtypes that can be served without content-disposition:attachment header.\n# (optional, default: application/pdf image video)\nckanext.files.inline_content_types = application/pdf image video\n\n# Storage used for user image uploads. When empty, user image uploads are not\n# allowed.\n# (optional, default: user_images)\nckanext.files.user_images_storage = user_images\n\n# Storage used for group image uploads. When empty, group image uploads are\n# not allowed.\n# (optional, default: group_images)\nckanext.files.group_images_storage = group_images\n\n# Storage used for resource uploads. When empty, resource uploads are not\n# allowed.\n# (optional, default: resources)\nckanext.files.resources_storage = resources\n\n# Enable HTML templates and JS modules required for unsafe default\n# implementation of resource uploads via files. IMPORTANT: this option exists\n# to simplify migration and experiments with the extension. These templates\n# may change a lot or even get removed in the public release of the\n# extension.\n# (optional, default: false)\nckanext.files.enable_resource_migration_template_patch = false\n\n# Any authenticated user can upload files.\n# (optional, default: false)\nckanext.files.authenticated_uploads.allow = false\n\n# Names of storages that can by used by non-sysadmin users when authenticated\n# uploads enabled\n# (optional, default: default)\nckanext.files.authenticated_uploads.storages = default\n\n# List of owner types that grant access on owned file to anyone who has\n# access to the owner of file. For example, if this option has value\n# `resource package`, anyone who passes `resource_show` auth, can see all\n# files owned by resource; anyone who passes `package_show`, can see all\n# files owned by package; anyone who passes\n# `package_update`/`resource_update` can modify files owned by\n# package/resource; anyone who passes `package_delete`/`resource_delete` can\n# delete files owned by package/resoure. IMPORTANT: Do not add `user` to this\n# list. Files may be temporarily owned by user during resource creation.\n# Using cascade access rules with `user` exposes such temporal files to\n# anyone who can read user's profile.\n# (optional, default: package resource group organization)\nckanext.files.owner.cascade_access = package resource group organization\n\n# Use `<OWNER_TYPE>_update` auth function to check access for ownership\n# transfer. When this flag is disabled `<OWNER_TYPE>_file_transfer` auth\n# function is used.\n# (optional, default: true)\nckanext.files.owner.transfer_as_update = true\n\n# Use `<OWNER_TYPE>_update` auth function to check access when listing all\n# files of the owner. When this flag is disabled `<OWNER_TYPE>_file_scan`\n# auth function is used.\n# (optional, default: true)\nckanext.files.owner.scan_as_update = true\n
"},{"location":"configuration/libcloud/","title":"Apache libcloud storage configuration","text":"

To use this storage install extension with libcloud extras.

pip install 'ckanext-files[libcloud]'\n

The actual storage backend is controlled by provider option of the storage. List of all providers is available here

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:libcloud\n## apache-libcloud storage provider. List of providers available at https://libcloud.readthedocs.io/en/stable/storage/supported_providers.html#provider-matrix . Use upper-cased value from Provider Constant column\nckanext.files.storage.NAME.provider =\n## API key or username\nckanext.files.storage.NAME.key =\n## Secret password\nckanext.files.storage.NAME.secret =\n## JSON object with additional parameters passed directly to storage constructor.\nckanext.files.storage.NAME.params =\n## Name of the container(bucket)\nckanext.files.storage.NAME.container =\n
"},{"location":"configuration/opendal/","title":"OpenDAL storage configuration","text":"

To use this storage install extension with opendal extras.

pip install 'ckanext-files[opendal]'\n

The actual storage backend is controlled by scheme option of the storage. List of all schemes is available here

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:opendal\n## OpenDAL service type. Check available services at  https://docs.rs/opendal/latest/opendal/services/index.html\nckanext.files.storage.NAME.scheme =\n## JSON object with parameters passed directly to OpenDAL operator.\nckanext.files.storage.NAME.params =\n
"},{"location":"configuration/redis/","title":"Redis storage configuration","text":"
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.NAME.prefix = ckanext:files:default:file_content:\n
"},{"location":"configuration/storage/","title":"Storage configuration","text":"

All available options for the storage type can be checked via config declarations CLI. First, add the storage type to the config file:

ckanext.files.storage.xxx.type = files:redis\n

Now run the command that shows all available config option of the plugin.

ckan config declaration files -d\n

Because Redis storage adapter is enabled, you'll see all the options regsitered by Redis adapter alongside with the global options:

## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.xxx.prefix = ckanext:files:default:file_content:\n

Sometimes you will see a validation error if storage has required config options. Let's try using files:fs storage instead of the redis:

ckanext.files.storage.xxx.type = files:fs\n

Now any attempt to run ckan config declaration files -d will show an error, because required path option is missing:

Invalid configuration values provided:\nckanext.files.storage.xxx.path: Missing value\nAborted!\n

Add the required option to satisfy the application

ckanext.files.storage.xxx.type = files:fs\nckanext.files.storage.xxx.path = /tmp\n

And run CLI command once again. This time you'll see the list of allowed options:

## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.xxx.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.xxx.create_path = false\n

There is a number of options that are supported by every storage. You can set them and expect that every storage, regardless of type, will use these options in the same way:

## Storage adapter used by the storage\nckanext.files.storage.NAME.type = ADAPTER\n## The maximum size of a single upload.\n## Supports size suffixes: 42B, 2M, 24KiB, 1GB. `0` means no restrictions.\nckanext.files.storage.NAME.max_size = 0\n## Space-separated list of MIME types or just type or subtype part.\n## Example: text/csv pdf application video jpeg\nckanext.files.storage.NAME.supported_types =\n## Descriptive name of the storage used for debugging. When empty, name from\n## the config option is used, i.e: `ckanext.files.storage.DEFAULT_NAME...`\nckanext.files.storage.NAME.name = NAME\n
"},{"location":"migration/","title":"Migration from native CKAN storage system","text":"

Important: ckanext-files itself is an independent file-management system. You don't have to migrate existing files from groups, users and resources to it. You can just start using ckanext-files for new fields defined in metadata schema or for uploading arbitrary files. And continue using native CKAN uploads for group/user images and resource files. Migration workflows described here merely exist as a PoC of using ckanext-files for everything in CKAN. Don't migrate your production instances yet, because concepts and rules may change in future and migration process will change as well. Try migration only as an experiment, that gives you an idea of what else you want to see in ckanext-file, and share this idea with us.

Note: every migration workflow described below requires installed ckanext-files. Complete installation section before going further.

CKAN has following types of files:

At the moment, there is no migration strategy for the last two types. Replacing site logo manually is a trivial task, so there will be no dedicated command for it. As for extensions, every of them is unique, so feel free to create an issue in the current repository: we'll consider creation of migration script for your scenario or, at least, explain how you can perform migration by yourself.

Migration process for group/organization/user images and resource uploads described below. Keep in mind, that this process only describes migration from native CKAN storage system, that keeps files inside local filesystem. If you are using storage extensions, like ckanext-s3filestore or ckanext-cloudstorage, create an issue in the current repository with a request of migration command. As there are a lot of different forks of such extension, creating reliable migration script may be challenging, so we need some details about your environment to help with migration.

Migration workflows bellow require certain changes to metadata schemas, UI widgets for file uploads and styles of your portal(depending on the customization).

"},{"location":"migration/group/","title":"Migration for group/organization images","text":"

Note: internally, groups and organizations are the same entity, so this workflow describes both of them.

First of all, you need a configured storage that supports public links. As all group/organization images are stored inside local filesystem, you can use files:public_fs storage adapter.

This extension expects that the name of group images storage will be group_images. This name will be used in all other commands of this migration workflow. If you want to use different name for group images storage, override ckanext.files.group_images_storage config option which has default value group_images and don't forget to adapt commands if you use a different name for the storage.

This configuration example sets 10MiB restriction on upload size via ckanext.files.storage.group_images.max_size option. Feel free to change it or remove completely to allow any upload size. This restriction is applied to future uploads only. Any existing file that exceeds limit is kept.

Uploads restricted to image/* MIMEtype via ckanext.files.storage.group_images.supported_types option. You can make this option more or less restrictive. This restriction is applied to future uploads only. Any existing file with wrong MIMEtype is kept.

ckanext.files.storage.group_images.path controls location of the upload folder in filesystem. It should match value of ckan.storage_path option plus storage/uploads/group. In example below we assume that value of ckan.storage_path is /var/storage/ckan.

ckanext.files.storage.group_images.public_root option specifies base URL from which every group image can be accessed. In most cases it's CKAN URL plus uploads/group. If you are serving CKAN application from the ckan.site_url, leave this option unchanged. If you are using ckan.root_path, like /data/, insert this root path into the value of the option. Example below uses %(ckan.site_url)s wildcard, which will be automatically replaced with the value of ckan.site_url config option. You can specify site URL explicitely if you don't like this wildcard syntax.

ckanext.files.storage.group_images.type = files:public_fs\nckanext.files.storage.group_images.max_size = 10MiB\nckanext.files.storage.group_images.supported_types = image\nckanext.files.storage.group_images.path = /var/storage/ckan/storage/uploads/group\nckanext.files.storage.group_images.public_root = %(ckan.site_url)s/uploads/group\n

Now let's run a command that show us the list of files available under newly configured storage:

ckan files scan -s group_images\n

All these files are not tracked by files extension yet, i.e they don't have corresponding record in DB with base details, like size, MIMEtype, filehash, etc. Let's create these details via the command below. It's safe to run this command multiple times: it will gather and store information about files not registered in system and ignore any previously registered file.

ckan files scan -s group_images -t\n

Finally, let's run the command, that shows only untracked files. Ideally, you'll see nothing upon executing it, because you just registered every file in the system.

ckan files scan -s group_images -u\n

Note, all the file are still available inside storage directory. If previous command shows nothing, it only means that CKAN already knows details about each file from the storage directory. If you want to see the list of the files again, omit -u flag(which stands for \"untracked\") and you'll see again all the files in the command output:

ckan files scan -s group_images\n

Now, when all images are tracked by the system, we can give the ownership over these files to groups/organizations that are using them. Run the command below to connect files with their owners. It will search for groups/organizations first and report, how many connections were identified. There will be suggestion to show identified relationship and the list of files that have no owner(if there are such files). Presence of files without owner usually means that you removed group/organization from database, but did not remove its image.

Finally, you'll be asked if you want to transfer ownership over files. This operation does not change existing data and if you disable ckanext-files after ownership transfer, you won't see any difference. The whole ownership transfer is managed inside custom DB tables generated by ckanext-files, so it's safe operation.

ckan files migrate groups group_images\n

Here's an example of output that you can see when running the command:

Found 3 files. Searching file owners...\n[####################################] 100% Located owners for 2 files out of 3.\n\nShow group IDs and corresponding file? [y/N]: y\nd7186937-3080-429f-a434-22b74b9a8d39: file-1.png\n87e2a1aa-7905-4a28-a087-90433f8e169e: file-2.png\n\nShow files that do not belong to any group? [y/N]: y\nfile-3.png\n\nTransfer file ownership to group identified in previous steps? [y/N]: y\nTransfering file-2.png  [####################################]  100%\n

Now comes the most complex part. You need to change metadata schema and UI in order to:

Original CKAN workflow for uploading files was:

This approach is different from strategy recommended by ckanext-files. But in order to make the migration as simple as possible, we'll stay close to original workflow.

Note: suggestet approach resembles existing process of file uploads in CKAN. But ckanext-files was designed as a system, that gives you a choice. Check file upload strategies to learn more about alternative implementations of upload and their pros/cons.

First, we need to replace Upload/Link widget on group/organization form. If you are using native group templates, create group/snippets/group_form.html and organization/snippets/organization_form.html. Inside both files, extend original template and override block basic_fields. You only need to replace last field

{{ form.image_upload(\n    data, errors, is_upload_enabled=h.uploads_enabled(),\n    is_url=is_url, is_upload=is_upload) }}\n

with

{{ form.image_upload(\n    data, errors, is_upload_enabled=h.files_group_images_storage_is_configured(),\n    is_url=is_url, is_upload=is_upload,\n    field_upload=\"files_image_upload\") }}\n

There are two differences with the original. First, we use h.files_group_images_storage_is_configured() instead of h.uploads_enabled(). As we are using different storage for different upload types, now upload widgets can be enabled independently. And second, we pass field_upload=\"files_image_upload\" argument into macro. It will send uploaded file to CKAN inside files_image_upload instead of original image_upload field. This must be done because CKAN unconditionally strips image_upload field from submission payload, making processing of the file too unreliable. We changed the name of upload field and CKAN keeps this new field, so that we can process it as we wish.

Note: if you are using ckanext-scheming, you only need to replace form_snippet of the image_url field, instead of rewriting the whole template.

Now, let's define validation rules for this new upload field. We need to create plugins that modify validation schema for group and organization. Due to CKAN implementation details, you need separate plugin for group and organization.

Note: if you are using ckanext-scheming, you can add files_image_upload validators to schemas of organization and group. Check the list of validators that must be applied to this new field below.

Here's an example of plugins that modify validation schemas of group and organization. As you can see, they are mostly the same:

from ckan.lib.plugins import DefaultGroupForm, DefaultOrganizationForm\nfrom ckan.logic.schema import default_create_group_schema, default_update_group_schema\n\n\ndef _modify_schema(schema, type):\n    schema[\"files_image_upload\"] = [\n        tk.get_validator(\"ignore_empty\"),\n        tk.get_validator(\"files_into_upload\"),\n        tk.get_validator(\"files_validate_with_storage\")(\"group_images\"),\n        tk.get_validator(\"files_upload_as\")(\n            \"group_images\",\n            type,\n            \"id\",\n            \"public_url\",\n            type + \"_patch\",\n            \"image_url\",\n        ),\n    ]\n\n\nclass FilesGroupPlugin(p.SingletonPlugin, DefaultGroupForm):\n    p.implements(p.IGroupForm, inherit=True)\n    is_organization = False\n\n    def group_types(self):\n        return [\"group\"]\n\n    def create_group_schema(self):\n        return _modify_schema(default_create_group_schema(), \"group\")\n\n    def update_group_schema(self):\n        return _modify_schema(default_update_group_schema(), \"group\")\n\n\nclass FilesOrganizationPlugin(p.SingletonPlugin, DefaultOrganizationForm):\n    p.implements(p.IGroupForm, inherit=True)\n    is_organization = True\n\n    def group_types(self):\n        return [\"organization\"]\n\n    def create_group_schema(self):\n        return _modify_schema(default_create_group_schema(), \"organization\")\n\n    def update_group_schema(self):\n        return _modify_schema(default_update_group_schema(), \"organization\")\n

There are 4 validators that must be applied to the new upload field:

That's all. Now every image upload for group/organization is handled by ckanext-files. To verify it, do the following. First, check list of files currently stored in group_images storage via command that we used in the beginning of the migration:

ckan files scan -s group_images\n

You'll see a list of existing files. Their names follow format <ISO_8601_DATETIME><FILENAME>, e.g 2024-06-14-133840.539670photo.jpg.

Now upload an image into existing group, or create a new group with any image. When you check list of files again, you'll see one new record. But this time this record resembles UUID: da046887-e76c-4a68-97cf-7477665710ff.

"},{"location":"migration/resource/","title":"Resource","text":""},{"location":"migration/resource/#migration-for-resource-uploads","title":"Migration for resource uploads","text":"

Configure named storage for resources. Use files:ckan_resource_fs storage adapter.

This extension expects that the name of resources storage will be resources. This name will be used in all other commands of this migration workflow. If you want to use different name for resources storage, override ckanext.files.resources_storage config option which has default value resources and don't forget to adapt commands if you use a different name for the storage.

ckanext.files.storage.resources.path must match value of ckan.storage_path option, followed by resources directory. In example below we assume that value of ckan.storage_path is /var/storage/ckan.

Example below sets 10MiB limit on resource size. Modify it if you are using different limit set by ckan.max_resource_size.

Unlike group and user images, this storage does not need upload type restriction and public_root.

ckanext.files.storage.resources.type = files:ckan_resource_fs\nckanext.files.storage.resources.max_size = 10MiB\nckanext.files.storage.resources.path = /var/storage/ckan/resources\n

Check the list of untracked files available inside newly configured storage:

ckan files scan -s resources -u\n

Track all these files:

ckan files scan -s resources -t\n

Re-check that now you see no untracked files:

ckan files scan -s resources -u\n

Transfer file ownership to corresponding resources. In addition to simple ownership transfer, this command will ask you, whether you want to modify resource's url_type and url fields. It's required to move file management to files extension completely and enable possibility of migration to different storage type.

If you accept resource modifications, for every file owner url_type will be changed to file and url will be changed to file ID. Then all modified packages will be reindexed.

Changing url_type means that some pages will change. For example, instead of Download button CKAN will show you Go to resource button on the resource page, because Download label is specific to url_type=upload. And some views may stop working as well. But this is safer option for migration, than leaving url_type unchanged: ckanext-files manages files in its own way and some assumptions about files will not work anymore, so using different url_type is the fastest way to tell everyone that something changed.

Broken views can be easily fixed. Every view implemented as a separate plugin. You always can inherit from this plugin and override methods that relied on different behavior. And a lot of views work with file URL directly, so they won't even see the difference.

ckan files migrate local-resources resources\n

And the next goal is correct metadata schema. If you are using ckanext-scheming, you need to modify validators of url and format fields.

If you are working with native schemas, you have to modify dataset schema via implementing IDatasetForm. Here's an example:

from ckan.lib.plugins import DefaultDatasetForm\nfrom ckan.logic import schema\n\nclass FilesDatasetPlugin(p.SingletonPlugin, DefaultDatasetForm):\n    p.implements(p.IDatasetForm, inherit=True)\n\n    def is_fallback(self):\n        return True\n\n    def package_types(self):\n        return [\"dataset\"]\n\n    def _modify_schema(self, schema):\n        schema[\"resources\"][\"url\"].extend([\n            tk.get_validator(\"files_verify_url_type_and_value\"),\n            tk.get_validator(\"files_file_id_exists\"),\n            tk.get_validator(\"files_transfer_ownership\")(\"resource\",\"id\"),\n        ])\n        schema[\"resources\"][\"format\"].insert(0, tk.get_validator(\"files_content_type_from_file\")(\"url\"))\n\n    def create_package_schema(self):\n        sch = schema.default_create_package_schema()\n        self._modify_schema(sch)\n        return sch\n\n    def update_package_schema(self):\n        sch = schema.default_update_package_schema()\n        self._modify_schema(sch)\n        return sch\n\n    def show_package_schema(self):\n        sch = schema.default_show_package_schema()\n        sch[\"resources\"][\"url\"].extend([\n            tk.get_validator(\"files_verify_url_type_and_value\"),\n            tk.get_validator(\"files_id_into_resource_download_url\"),\n        ])\n        return sch\n

Both create and update schemas are updated in the same way. We add a new validator to format field, to correctly identify file format. And there is a number of new validators for url:

At top of this, we also have two validators applied to show_package_schema(use output_validators in ckanext-scheming):

And the next part is the trickiest. You need to create a number of templates and JS modules. But because ckanext-files is actively developed, most likely, your custom files will be outdated pretty soon.

Instead, we recommend enabling patch for resource form that shipped with ckanext-files. It's a bit hacky, but because the extension itself is stil in alpha-stage, it should be acceptable. Check file upload strategies for examples of implementation that you can add to your portal instead of the default patch.

To enable patch for templates, add following line to the config file:

ckanext.files.enable_resource_migration_template_patch = true\n

This option adds Add file button to resource form

Upon clicking, this button is replaced by widget that supports uploading new files of selecting previously uploaded files that are not used by any resource yet

"},{"location":"migration/user/","title":"Migration for user avatars","text":"

This workflow is similar to group/organization migration. It contains the sequence of actions, but explanations are removed, because you already know details from the group migration. Only steps that are different will contain detailed explanation of the process.

Configure local filesystem storage with support of public links(files:public_fs) for user images.

This extension expects that the name of user images storage will be user_images. This name will be used in all other commands of this migration workflow. If you want to use different name for user images storage, override ckanext.files.user_images_storage config option which has default value user_images and don't forget to adapt commands if you use a different name for the storage.

ckanext.files.storage.user_images.path resembles this option for group/organization images storage. But user images are kept inside user folder by default. As result, value of this option should match value of ckan.storage_path option plus storage/uploads/user. In example below we assume that value of ckan.storage_path is /var/storage/ckan.

ckanext.files.storage.user_images.public_root resebles this option for group/organization images storage. But user images are available at CKAN URL plus uploads/user.

ckanext.files.storage.user_images.type = files:public_fs\nckanext.files.storage.user_images.max_size = 10MiB\nckanext.files.storage.user_images.supported_types = image\nckanext.files.storage.user_images.path = /var/storage/ckan/storage/uploads/user\nckanext.files.storage.user_images.public_root = %(ckan.site_url)s/uploads/user\n

Check the list of untracked files available inside newly configured storage:

ckan files scan -s user_images -u\n

Track all these files:

ckan files scan -s user_images -t\n

Re-check that now you see no untracked files:

ckan files scan -s user_images -u\n

Transfer image ownership to corresponding users:

ckan files migrate users user_images\n

Update user template. Required field is defined in user/new_user_form.html and user/edit_user_form.html. It's a bit different from the filed used by group/organization, but you again need to add field_upload=\"files_image_upload\" parameter to the macro image_upload and replace h.uploads_enabled() with h.files_user_images_storage_is_configured().

User has no dedicated interface for validation schema modification and here comes the biggest difference from group migration. You need to chain user_create and user_update action and modify schema from context:

def _patch_schema(schema):\n    schema[\"files_image_upload\"] = [\n        tk.get_validator(\"ignore_empty\"),\n        tk.get_validator(\"files_into_upload\"),\n        tk.get_validator(\"files_validate_with_storage\")(\"user_images\"),\n        tk.get_validator(\"files_upload_as\")(\n            \"user_images\",\n            \"user\",\n            \"id\",\n            \"public_url\",\n            \"user_patch\",\n            \"image_url\",\n        ),\n    ]\n\n\n@tk.chained_action\ndef user_update(next_action, context, data_dict):\n    schema = context.setdefault('schema', ckan.logic.schema.default_update_user_schema())\n    _patch_schema(schema)\n    return next_action(context, data_dict)\n\n\n\n@tk.chained_action\ndef user_create(next_action, context, data_dict):\n    schema = context.setdefault('schema', ckan.logic.schema.default_user_schema())\n    _patch_schema(schema)\n    return next_action(context, data_dict)\n

Validators are all the same, but now we are using user instead of group/organization in parameters.

That's all. Just as with groups, you can update an avatar and verify that all new filenames resemble UUIDs.

"},{"location":"usage/capabilities/","title":"Capabilities","text":"

To understand in advance whether specific storage can perform certain actions, ckanext-files uses ckanext.files.shared.Capability. It's an enumeration of operations that can be supported by storage:

These capabilities are defined when storage is created and are automatically checked by actions that work with storage. If you want to check if storage supports certain capability, it can be done manually. If you want to check presence of multiple capabilities at once, you can combine them via bitwise-or operator.

from ckanext.files.shared import Capability, get_storage\n\nstorage = get_storage()\n\ncan_read = storage.supports(Capability.STREAM)\n\nread_and_write = Capability.CREATE | Capability.STREAM\ncan_read_and_write = storage.supports(read_and_write)\n

ckan files storages -v CLI command lists all configured storages with their capabilities.

"},{"location":"usage/configure/","title":"Configure the storage","text":"

Before uploading files, you have to configure a storage: place where all uploaded files are stored. Storage relies on adapter that describes where and how data is be stored: filesystem, cloud, DB, etc. And, depending on the adapter, storage may have a couple of addition specific options. For example, filesystem adapter likely requires a path to the folder where uploads are stored. DB adapter may need DB connection parameters. Cloud adapter most likely will not work without an API key. These additional options are specific to adapter and you have to check its documentation to find out what are the possible options.

Let's start from the Redis adapter, because it has minimal requirements in terms of configuration.

Add the following line to the CKAN config file:

ckanext.files.storage.default.type = files:redis\n

The name of adapter is files:redis. It follows recommended naming convention for adapters:<EXTENSION>:<TYPE>. You can tell from the name above that we are using adapter defined in the files extension with redis type. But this naming convention is not enforced and its only purpose is avoiding name conflicts. Technically, adapter name can use any character, including spaces, newlines and emoji.

If you make a typo in the adapter's name, any CKAN CLI command will produce an error message with the list of available adapters:

Invalid configuration values provided:\nckanext.files.storage.default.type: Value must be one of ['files:fs', 'files:public_fs', 'files:redis']\nAborted!\n

Storage is configured, so we can actually upload the file. Let's use ckanapi for this task. Files are created via files_file_create API action and this time we have to pass 2 parameters into it:

The final command is here:

echo -n 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n

And that's what you see as result:

{\n  \"atime\": null,\n  \"content_type\": \"text/plain\",\n  \"ctime\": \"2024-06-02T15:02:14.819117+00:00\",\n  \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n  \"id\": \"e21162ab-abfb-476c-b8c5-5fe7cb89eca0\",\n  \"location\": \"24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\",\n  \"mtime\": null,\n  \"name\": \"hello.txt\",\n  \"size\": 11,\n  \"storage\": \"default\",\n  \"storage_data\": {}\n}\n

Content of the file can be checked via CKAN CLI. Use id from the last API call's output in the command ckan files stream ID:

ckan files stream e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n

Alternatively, we can use Redis CLI to get the content of the file. Note, you cannot get the content via CKAN API, because it's JSON-based and streaming files doesn't suit its principles.

By default, Redis adapter puts the content under the key <PREFIX><LOCATION>. Pay attention to LOCATION. It's the value available as location in the API response(i.e, 24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46 in our case). It's different from the id(ID used by DB to uniquely identify file record) and name(human readable name of the file). In our scenario, location looks like UUID because of the internal details of Redis adapter implementation. But different adapters may use more path-like value, i.e. something similar to path/to/folder/hello.txt.

PREFIX can be configured, but we skipped this step and got the default value: ckanext:files:default:file_content:. So the final Redis key of our file is ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46

redis-cli\n\n127.0.0.1:6379> GET ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\n\"hello world\"\n

And before we moved further, let's remove the file, using its id:

ckanapi action files_file_delete id=e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n
"},{"location":"usage/js/","title":"JavaScript utilities","text":"

Note: ckanext-files does not provide stable CKAN JS modules at the moment. Try creating your own widgets and share with us your examples or requirements. We'll consider creating and including widgets into ckanext-files if they are generic enough for majority of the users.

ckanext-files registers few utilities inside CKAN JS namespace to help with building UI components.

First group of utilities registered inside CKAN Sandbox. Inside CKAN JS modules it's accessible as this.sandbox. If you are writing code outside of JS modules, Sandbox can be initialized via call to ckan.sandbox()

const sandbox = ckan.sandbox()\n

When files plugin loaded, sandbox contains files attribute with two members:

The simplest way to upload the file is using upload helper.

await sandbox.files.upload(\n    new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n)\n

This function uploads file to default storage via files_file_create action. Extra parameters for API call can be passed using second argument of upload helper. Use an object with requestParams key. Value of this key will be added to standard API request parameters. For example, if you want to use storage with name memory and field with value custom:

await sandbox.files.upload(\n    new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n    {requestParams: {storage: \"memory\", field: \"custom\"}}\n)\n

If you need more control over upload, you can create an uploader and interact with it directly, instead of using upload helper.

Uploader is an object that uploads file to server. It extends base uploader, which defines standard interface for this object. Uploader perfroms all the API calls internally and returns uploaded file details. Out of the box you can use Standard and Multipart uploaders. Standard uses files_file_create API action and specializes on normal uploads. Multipart relies on files_multipart_* actions and can be used to pause and continue upload.

To create uploader instance, pass its name as a string to makeUploader. And then you can call upload method of the uploader to perform the actual upload. This method requires two arguments:

const uploader = sandbox.files.makeUploader(\"Standard\")\nawait uploader.upload(new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}), {})\n

One of the reasons to use manually created uploader is progress tracking. Uploader supports event subscriptions via uploader.addEventListener(event, callback) and here's the list of possible upload events:

If you want to use upload helper with customized uploader, there are two ways to do it.

The second group of ckanext-files utilities is available as ckan.CKANEXT_FILES object. This object mainly serves as extension and configuration point for sandbox.files.

ckan.CKANEXT_FILES.adapters is a collection of all classes that can be used to initialize uploader. It contains Standard, Multipart and Base classes. Standard and Multipart can be used as is, while Base must be extended by your custom uploader class. Add your custom uploader classes to adapters, to make them available application-wide:

class MyUploader extends Base { ... }\n\nckan.CKANEXT_FILES.adapters[\"My\"] = MyUploader;\n\nawait sandbox.files.upload(new File(...), {adapter: \"My\"})\n

ckan.CKANEXT_FILES.defaultSettings contain the object with default settings available as this.settings inside any uploader. You can change the name of the storage used by all uploaders using this object. Note, changes will apply only to uploaders initialized after modification.

ckan.CKANEXT_FILES.defaultSettings.storage = \"memory\"\n
"},{"location":"usage/multi-storage/","title":"Multi-storage","text":"

It's possible to configure multiple storages at once and specify which one you want to use for the individual file upload. Up until now we used the following storage options:

All of them have a common prefix ckanext.files.storage.default. and it's a key for using multiple storages simultaneously.

Every option of the storage follows the pattern: ckanext.files.storage.<STORAGE_NAME>.<OPTION>. As all the options above contain default on position of <STORAGE_NAME>, they are related to the default storage.

If you want to configure a storage with the name custom change the configuration of storage:

ckanext.files.storage.custom.type = files:fs\nckanext.files.storage.custom.path = /tmp/example\nckanext.files.storage.custom.create_path = true\n

And, if you want to use Redis-based storage named memory and filesystem-based storage named default, use the following configuration:

ckanext.files.storage.memory.type = files:redis\n\nckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n

The default storage is special. ckanext-files use it by default, as name suggests. If you remove configuration for the default storage and try to create a file, you'll see the following error:

echo 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n\n... ckan.logic.ValidationError: None - {'storage': ['Storage default is not configured']}\n

Storage default is not configured. That's why we need default configuration. But if you want to upload a file into a different storage or you don't want to add the default storage at all, you can always specify explicitly the name of the storage you are going to use.

When using API actions, add storage parameter to the call:

echo 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt storage=memory\n

When writing python code, pass storage name to get_storage function:

storage = get_storage(\"memory\")\n

When writing JS code, pass object {requestParams: {storage: \"memory\"}} to upload function:

const sandbox = ckan.sandbox()\nconst file = new File([\"content\"], \"file.txt\")\nconst options = {requestParams: {storage: \"memory\"}};\n\nawait sandbox.files.upload(file, options)\n
"},{"location":"usage/multipart/","title":"Multipart, resumable and signed uploads","text":"

This feature has many names, but it basically divides a single upload into multiple stages. It can be used in following situations:

All these situations are handled by 4 API actions, which are available is storage has MULTIPART capability:

Implementation of multipart upload depends on the used adapter, so make sure you checked its documentation before using any multipart actions. There are some common steps in multipart upload workflow that are usually the same among all adapters:

Incomplete files support most of normal file actions, but you need to pass completed=False to action when working with incomplete files. I.e, if you want to remove incomplete upload, use its ID and completed=False:

ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24 completed=False\n

Incompleted files do not support streaming and downloading via public interface of the extension. But storage adapter can expose such features via custom methods if it's technically possible.

Example of basic multipart upload is shown above. files:fs adapter can be used for running this example, as it implements MULTIPART.

First, create text file and check its size:

echo 'hello world!' > /tmp/file.txt\nwc -c /tmp/file.txt\n\n... 13 /tmp/file.txt\n

The size is 13 bytes and content type is text/plain. These values must be used for upload initialization.

ckanapi action files_multipart_start name=file.txt size=13 content_type=text/plain\n\n... {\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n...   \"hash\": \"\",\n...   \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"name\": \"file.txt\",\n...   \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...   \"owner_type\": \"user\",\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {\n...     \"uploaded\": 0\n...   }\n... }\n

Here storage_data contains {\"uploaded\": 0}. It may be different for other adaptes, especially if they implement non-consecutive uploads, but generally it's the recommended way to keep upload progress.

Now we'll upload first 5 bytes of file.

ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n    upload@<(dd if=/tmp/file.txt bs=1 count=5)\n\n... {\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n...   \"hash\": \"\",\n...   \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"name\": \"file.txt\",\n...   \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...   \"owner_type\": \"user\",\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {\n...     \"uploaded\": 5\n...   }\n... }\n

If you try finalizing upload right now, you'll get an error.

ckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... ckan.logic.ValidationError: None - {'upload': ['Actual value of upload size(5) does not match expected value(13)']}\n

Let's upload the rest of bytes and complete the upload.

ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n    upload@<(dd if=/tmp/file.txt bs=1 skip=5)\n\nckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... {\n...   \"atime\": null,\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:57:18.483716+00:00\",\n...   \"hash\": \"c897d1410af8f2c74fba11b1db511e9e\",\n...   \"id\": \"a740692f-e3d5-492f-82eb-f04e47c13848\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"mtime\": null,\n...   \"name\": \"file.txt\",\n...   \"owner_id\": null,\n...   \"owner_type\": null,\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {}\n... }\n

Now file can be used normally. You can transfer file ownership to someone, stream or modify it. Pay attention to ID: completed file has its own unique ID, which is different from ID of the incomplete upload.

"},{"location":"usage/ownership/","title":"File ownership","text":"

Every file can have an owner and there can be only one owner of the file. It's possible to create file without an owner, but usually application will only benefit from keeping every file with its owner. Owner is described with two fields: ID and type.

When file is created, by default the current user from API action's context is assigned as an owner of the file. From now on, the owner can perform other operations, such as renaming/displaying/removing with the file.

Apart from chaining auth function, to modify access rules for the file, plugin can implement IFiles.files_file_allows and IFiles.files_owner_allows methods.

def files_file_allows(\n    self,\n    context: Context,\n    file: File | Multipart,\n    operation: types.FileOperation,\n) -> bool | None:\n    ...\n\ndef files_owner_allows(\n    self,\n    context: Context,\n    owner_type: str, owner_id: str,\n    operation: types.OwnerOperation,\n) -> bool | None:\n    ...\n

These methods receive current action context, the tested object details, and the name of operation(show, update, delete, file_transfer). files_file_allows checks permission for accessed file. It's usually called when user interacts with file directly. files_owner_allows works with owner described by type and ID. It's usually called when user transfer file ownership, perform bulk file operation for owner files, or just trying to get the list of files that belongs to owner.

If method returns true/false, operation is allowed/denied. If method returns None, default logic used to check access.

As already mentoined, by default, user who owns the file, can access it. But what about different owners? What if file owned by other entity, like resource or dataset?

Out of the box, nobody can access such files. But there are three config options that modify this restriction.

ckanext.files.owner.cascade_access = ENTITY_TYPE ANOTHER_TYPE gives access to file owned by entity if user already has access to entity itself. Use words like package, resource, group instead of ENTITY_TYPE.

For example: file is owned by resource. If cascade access is enabled, whoever has access to resource_show of the resource, can also see the file owned by this resource. If user passes resource_update for resource, he can also modify the file owned by this resource, etc.

Important: be careful and do not add user to ckanext.files.owner.cascade_access. User's own files are considered private and most likely you don't really need anyone else to be able to see or modify these files.

The second option is ckanext.files.owner.transfer_as_update. When transfer-as-update enabled, any user who has <OWNER_TYPE>_update permission, can transfer own files to this OWNER_TYPE. Intead of using this option, you can define <OWNER_TYPE>_file_transfer.

And the third option is ckanext.files.owner.scan_as_update. Just as with ownership transfer, it gives user permission to list all files of the owner if user can <OWNER_TYPE>_update it. Intead of using this option, you can define <OWNER_TYPE>_file_scan.

"},{"location":"usage/permissions/","title":"Permissions","text":"

File creation is not allowed by default. Only sysadmin can use files_file_create and files_multipart_start actions. This is done deliberately: uncontrolled uploads can turn your portal into user's personal cloud-storage.

There are three ways to grant upload permission to normal users.

The BAD option is simple. Enable ckanext.files.authenticated_uploads.allow config option and every registered user will be allowed to upload files. But only into default storage. If you want to change the list of storages available to common user, specify storage names as ckanext.files.authenticated_uploads.storages option.

The GOOD option is relatively simple. Define chained auth function with name files_file_create. It's called whenever user initiates an upload. Now you can decide whether user is allowed to upload files with specified parameters.

The BEST option is to leave this restriction unchanged. Do not allow any user to call files_file_create. Instead, create a new action for your goal. ckanext-files isn't a solution - it's a tool that helps you in building the solution.

If you need to add documents field to dataset that contains uploaded PDF files, create a separate action dataset_document_attach. Specify access rules and validation for it. Or even hardcode the storage that will be used for uploads. And then, from this new action, call files_file_create with ignore_auth: True.

In this way you control every side of uploading documents into dataset and do not accidentally break other functionality, because every other feature will define its own action.

"},{"location":"usage/task-queue/","title":"Task queue","text":"

One of the challenges introduced by independently managed files is related to file ownership. As long as you can call files_transfer_ownership manually, things are transparent. But as soon as you add custom file field to dataset, you probably want to automatically transfer ownership of the file refered by this custom field.

Imagine, that you have PDF file owned by you. And you specify ID of this file in the attachment_id field of the dataset. You want to show download link for this file on the dataset page. But if file owned by you, nobody will be able to download the file. So you decide to transfer file ownership to dataset, so that anyone who sees dataset, can see the file as well.

You cannot update dataset and transfer ownership after it, because there will be a time window between these two actions, when data is not valid. Or even worse, after updating dataset you'll lose internet connection and won't be able to finish the transfer.

Neither you can transfer ownership first and then update the dataset. attachment_id may have additional validators and you don't know in advance, whether you'll be able to successfully update dataset after the transfer.

This problem can be solved via queuing additional tasks inside the action. For example, validator that checks if certain file ID can be used as attachment_id can queue ownership transfer. If dataset update completed without errors, queued task is executed automatically and dataset becomes the owner of the file.

Task is queued via ckanext.files.shared.add_task function, which accepts objects inherited from ckanext.files.shared.Task. Task class requires implementing abstract method run(result: Any, idx: int, prev: Any), which is called when task is executed. This method receives the result of action which caused task execution, task's position in queue and the result of previous task.

For example, one of attachment_id validatos can queue the following MyTask via add_task(MyTask(file_id)) to transfer file_id ownership to the updated dataset:

from ckanext.files.shared import Task\n\nclass MyTask(Task):\n    def __init__(self, file_id):\n        self.file_id = file_id\n\n    def run(self, dataset, idx, prev):\n        return tk.get_action(\"files_transfer_ownership\")(\n            {\"ignore_auth\": True},\n            {\n                \"id\": self.file_id,\n                \"owner_type\": \"package\",\n                \"owner_id\": dataset[\"id\"],\n                \"pin\": True,\n            },\n        )\n

As the first argument, Task.run receives the result of action which was called. Right now only following actions support tasks:

If you want to enable tasks support for your custom action, decorate it with ckanext.files.shared.with_task_queue decorator:

from ckanext.files.shared import with_task_queue\n\n@with_task_queue\ndef my_action(context, data_dict)\n    # you can call `add_task` inside this action's stack frame.\n    ...\n

Good example of validator using tasks is files_transfer_ownership validator factory. It can be added to metadata schema as files_transfer_ownership(owner_type, name_of_id_field). For example, if you are adding this validator to resource, call it as files_transfer_ownership(\"resource\", \"id\"). The second argument is the name of the ID field. As in most cases it's id, you can omit the second argument:

"},{"location":"usage/tracked-files/","title":"Tracked and untracked files","text":"

There is a difference between creating files via action:

tk.get_action(\"files_file_create\")(\n    {\"ignore_auth\": True},\n    {\"upload\": \"hello\", \"name\": \"hello.txt\"}\n)\n

and via direct call to Storage.upload:

from ckanext.files.shared import get_storage, make_upload\n\nstorage = get_storage()\nstorage.upload(\"hello.txt\", make_upload(b\"hello\"), {})\n

The former snippet creates a tracked file: file uploaded to the storage and its details are saved to database.

The latter snippet creates an untracked file: file uploaded to the storage, but its details are not saved anywhere.

Untracked files can be used to achieve specific goals. For example, imagine a storage adapter that writes files to the specified ZIP archive. You can create an interface, that initializes such storage for an existing ZIP resource and uploads files into it. You don't need a separate record in DB for every uploaded file, because all of them go into the resource, that is already stored in DB.

But such use-cases are pretty specific, so prefer to use API if you are not sure, what you need. The main reason to use tracked files is their discoverability: you can use files_file_search API action to list all the tracked files and optionally filter them by storage, location, content_type, etc:

ckanapi action files_file_search\n\n... {\n...   \"count\": 123,\n...   \"results\": [\n...     {\n...       \"atime\": null,\n...       \"content_type\": \"text/plain\",\n...       \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n...       \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...       \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n...       \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n...       \"mtime\": null,\n...       \"name\": \"hello.txt\",\n...       \"size\": 11,\n...       \"storage\": \"default\",\n...       \"storage_data\": {}\n...     },\n...     ...\n...   ]\n... }\n\nckanapi action files_file_search size:5 rows=1\n\n... {\n...   \"count\": 2,\n...   \"results\": [\n...     {\n...       \"atime\": null,\n...       \"content_type\": \"text/plain\",\n...       \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n...       \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...       \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n...       \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n...       \"mtime\": null,\n...       \"name\": \"hello.txt\",\n...       \"size\": 5,\n...       \"storage\": \"default\",\n...       \"storage_data\": {}\n...     }\n...   ]\n... }\n\nckanapi action files_file_search content_type=application/pdf\n\n... {\n...   \"count\": 0,\n...   \"results\": []\n... }\n

As for untracked files, their discoverability depends on the storage adapters. Some of them, files:fs for example, can scan the storage and locate all uploaded files, both thacked and untracked. If you have files:fs storage configured as default, use the following command to scan its content:

ckan files scan\n

If you want to scan a different storage, specify its name via -s/--storage-name option. Remember, that some storage adapters do not support scanning.

ckan files scan -s memory\n

If you want to see untracked files only, add -u/--untracked-only flag.

ckan files scan -u\n

If you want to track any untracked files, by creating a DB record for every such file, add -t/--track flag. After that you'll be able to discover previously untracked files via files_file_search API action. Most usable this option will be during the migration, when you are configuring a new storage, that points to an existing location with files.

ckan files scan -t\n
"},{"location":"usage/transfer/","title":"Ownership transfer","text":"

File ownership can be transfered. As there can be only one owner of the file, as soon as you transfer ownership over file, you yourself do not own this file.

To transfer ownership, use files_transfer_ownership action and specify id of the file, owner_id and owner_type of the new owner.

You can't just transfer ownership to anyone. You either must pass IFiles.files_owner_allows check for file_transfer operation, or pass a cascade access check for the future owner of the file when cascade access and transfer-as-update is enabled.

For example, if you have the following options in config file:

ckanext.files.owner.cascade_access = organization\nckanext.files.owner.transfer_as_update = true\n
you must pass organization_update auth function if you want to transfer file ownership to organization.

In addition, file can be pinned. In this way we mark important files. Imagine the resource and its uploaded file. The link to this file is used by resource and we don't want this file to be accidentally transfered to someone else. We pin the file and now nobody can transfer the file without explicit confirmation of his intention.

There are two ways to move pinned file:

"},{"location":"usage/use-in-browser/","title":"Usage in browser","text":"

You can upload files using JavaScript CKAN modules. ckanext-files extends CKAN's Sandbox object(available as this.sandbox inside the JS CKAN module), so we can use shortcut and upload file directly from the DevTools. Open any CKAN page, switch to JS console and create the sandbox instance. Inside it we have files object, which in turn contains upload method. This method accepts File object for upload(the same object you can get from the input[type=file]).

sandbox = ckan.sandbox()\nawait sandbox.files.upload(\nnew File([\"content\"], \"file.txt\")\n)\n\n... {\n...     \"id\": \"18cdaa65-5eed-4078-89a8-469b137627ce\",\n...     \"name\": \"file.txt\",\n...     \"location\": \"b53907c3-8434-4dee-9a9e-6c4d3055d200\",\n...     \"content_type\": \"text/plain\",\n...     \"size\": 7,\n...     \"hash\": \"9a0364b9e99bb480dd25e1f0284c8555\",\n...     \"storage\": \"default\",\n...     \"ctime\": \"2024-06-02T16:12:27.902055+00:00\",\n...     \"mtime\": null,\n...     \"atime\": null,\n...     \"storage_data\": {}\n... }\n

If you are still using FS storage configured in previous section, switch to /tmp/example folder and check it's content:

ls /tmp/example\n... b53907c3-8434-4dee-9a9e-6c4d3055d200\n\ncat b53907c3-8434-4dee-9a9e-6c4d3055d200\n... content\n

And, as usually, let's remove file using the ID from the upload promise:

sandbox.client.call(\"POST\", \"files_file_delete\", {\nid: \"18cdaa65-5eed-4078-89a8-469b137627ce\"\n})\n
"},{"location":"usage/use-in-code/","title":"Usage in code","text":"

If you are writing the code and you want to interact with the storage directly, without the API layer, you can do it via a number of public functions of the extension available in ckanext.files.shared.

Let's configure filesystem storage first. Filesystem adapter has a mandatory option path that controls filesystem location, where files are stored. If path does not exist, storage will raise an exception by default. But it can also create missing path if you enable create_path option. Here's our final version of settings:

ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n

Now we are going to connect to CKAN shell via ckan shell CLI command and create an instance of the storage:

from ckanext.files.shared import get_storage\nstorage = get_storage()\n

Because you have all configuration in place, the rest is fairly straightforward. We will upload the file, read it's content and remove it from the CKAN shell.

To create the file, storage.upload method must be called with 2 parameters:

You can use any string as the first parameter. As for the \"special stream-like object\", ckanext-files has ckanext.files.shared.make_upload function, that accepts a number of different types(bytes, werkzeug.datastructures.FileStorage, BytesIO, file descriptor) and converts them into expected format.

from ckanext.files.shared import make_upload\n\nupload = make_upload(b\"hello world\")\nresult = storage.upload('file.txt', upload)\n\nprint(result)\n\n... FileData(\n...     location='60b385e7-8137-496c-bb1d-6ae4d7963ab3',\n...     size=11,\n...     content_type='text/plain',\n...     hash='5eb63bbbe01eeed093cb22bb8f5acdc3',\n...     storage_data={}\n... )\n

result is an instance of ckanext.files.shared.FileData dataclass. It contains all the information required by storage to manage the file.

result object has location attribute that contains the name of the file relative to the path option specified in the storage configuration. If you visit /tmp/example directory, which was set as a path for the storage, you'll see there a file with the name matching location from result. And its content matches the content of our upload, which is quite an expected outcome.

cat /tmp/example/60b385e7-8137-496c-bb1d-6ae4d7963ab3\n\n... hello world\n

But let's go back to the shell and try reading file from the python's code. We'll pass result to the storage's stream method, which produces an iterable of bytes based on our result:

buffer = storage.stream(result)\ncontent = b\"\".join(buffer)\n\n... b'hello world'\n

In most cases, storage only needs a location of the file object to read it. So, if you don't have result generated during the upload, you still can read the file as long as you have its location. But remember, that some storage adapters may require additional information, and the following example must be adapted depending on the adapter:

from ckanext.files.shared import FileData\n\nlocation = \"60b385e7-8137-496c-bb1d-6ae4d7963ab3\"\ndata = FileData(location)\n\nbuffer = storage.stream(data)\ncontent = b\"\".join(buffer)\nprint(content)\n\n... b'hello world'\n

And finally we can to remove the file

storage.remove(result)\n
"}]} \ No newline at end of file