diff --git a/index.html b/index.html index 4432929..02fef1d 100644 --- a/index.html +++ b/index.html @@ -69,6 +69,11 @@
Files as first-class citizens of CKAN. Upload, manage, remove files directly +and attach them to datasets, resources, etc.
+Read the documentation for a full user guide.
+Install the extension +
pip install ckanext-files
+
Add files
to the ckan.plugins
setting in your CKAN
+ config file.
Run DB migrations +
ckan db upgrade -p files
+
Configure storage
+ckanext.files.storage.default.type = files:fs
+ckanext.files.storage.default.path = /tmp/example
+ckanext.files.storage.default.create_path = true
+
Upload your first file
+ckanapi action files_file_create upload@~/Downloads/file.txt`
+
Install dev
extras and nodeJS dependencies:
pip install -e '.[dev]'
+npm ci
+
Run unittests: +
pytest
+
Run frontend tests: +
# start test server in separate terminal
+make test-server
+
+# run tests
+npx cypress run
+
Run typecheck: +
npx pyright
+
files_file_create(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Create a new file.
This action passes uploaded file to the storage without strict validation. File is converted into standard upload object and everything else is controlled by storage. The same file may be uploaded to one storage and rejected by other, depending on configuration.
This action is way too powerful to use it directly. The recommended approach is to register a different action for handling specific type of uploads and call current action internally.
When uploading a real file(or using werkqeug.datastructures.FileStorage
), name parameter can be omited. In this case, the name of uploaded file is used.
ckanapi action files_file_create upload@path/to/file.txt\n
When uploading a raw content of the file using string or bytes object, name is mandatory.
ckanapi action files_file_create upload@<(echo -n \"hello world\") name=file.txt\n
Requires storage with CREATE
capability.
Params:
name
: human-readable name of the file. Default: guess using upload fieldstorage
: name of the storage that will handle the upload. Default: default
upload
: content of the file as string, bytes, file descriptor or uploaded fileReturns:
dictionary with file details.
"},{"location":"api/#files_file_deletecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_delete(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Remove file from storage.
Unlike packages, file has no state
field. Removal usually means that file details removed from DB and file itself removed from the storage.
Some storage can implement revisions of the file and keep archived versions or backups. Check storage documentation if you need to know whether there are chances that file is not completely removed with this operation.
Requires storage with REMOVE
capability.
ckanapi action files_file_delete id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n
Params:
id
: ID of the filecompleted
: use False
to remove incomplete uploads. Default: True
Returns:
dictionary with details of the removed file.
"},{"location":"api/#files_file_pincontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_pin(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Pin file to the current owner.
Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.
Params:
id
: ID of the filecompleted
: use False
to pin incomplete uploads. Default: True
Returns:
dictionary with details of updated file
"},{"location":"api/#files_file_renamecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_rename(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Rename the file.
This action changes human-readable name of the file, which is stored in DB. Real location of the file in the storage is not modified.
ckanapi action files_file_show \\\n id=226056e2-6f83-47c5-8bd2-102e2b82ab9a \\\n name=new-name.txt\n
Params:
id
: ID of the filename
: new name of the filecompleted
: use False
to rename incomplete uploads. Default: True
Returns:
dictionary with file details
"},{"location":"api/#files_file_scancontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_scan(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"List files of the owner
This action internally calls files_file_search, but with static values of owner filters. If owner is not specified, files filtered by current user. If owner is specified, user must pass authorization check to see files.
Params:
owner_id
: ID of the ownerowner_type
: type of the ownerThe all other parameters are passed as-is to files_file_search
.
Returns:
count
: total number of files matching filtersresults
: array of dictionaries with file details.files_file_search(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Search files.
This action is not stabilized yet and will change in future.
Provides an ability to search files using exact filter by name, content_type, size, owner, etc. Results are paginated and returned in package_search manner, as dict with count
and results
items.
All columns of File model can be used as filters. Before the search, type of column and type of filter value are compared. If they are the same, original values are used in search. If type different, column value and filter value are casted to string.
This request produces size = 10
SQL expression:
ckanapi action files_file_search size:10\n
This request produces size::text = '10'
SQL expression:
ckanapi action files_file_search size=10\n
Even though results are usually not changed, using correct types leads to more efficient search.
Apart from File columns, the following Owner properties can be used for searching: owner_id
, owner_type
, pinned
.
storage_data
and plugin_data
are dictionaries. Filter's value for these fields used as a mask. For example, storage_data={\"a\": {\"b\": 1}}
matches any File with storage_data
containing item a
with value that contains b=1
. This works only with data represented by nested dictionaries, without other structures, like list or sets.
Experimental feature: File columns can be passed as a pair of operator and value. This feature will be replaced by strictly defined query language at some point:
ckanapi action files_file_search size:'[\"<\", 100]' content_type:'[\"like\", \"text/%\"]'\n
Fillowing operators are accepted: =
, <
, >
, !=
, like
Params:
start
: index of first row in result/number of rows to skip. Default: 0
rows
: number of rows to return. Default: 10
sort
: name of File column used for sorting. Default: name
reverse
: sort results in descending order. Default: False
storage_data
: mask for storage_data
column. Default: {}
plugin_data
: mask for plugin_data
column. Default: {}
owner_type: str
: show only specific owner id if present. Default: None
owner_type
: show only specific owner type if present. Default: None
pinned
: show only pinned/unpinned items if present. Default: None
completed
: use False
to search incomplete uploads. Default: True
Returns:
count
: total number of files matching filtersresults
: array of dictionaries with file details.files_file_search_by_user(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Internal action. Do not use it.
"},{"location":"api/#files_file_showcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_show(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Show file details.
This action only displays information from DB record. There is no way to get the content of the file using this action(or any other API action).
ckanapi action files_file_show id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n
Params:
id
: ID of the filecompleted
: use False
to show incomplete uploads. Default: True
Returns:
dictionary with file details
"},{"location":"api/#files_file_unpincontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_unpin(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Pin file to the current owner.
Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.
Params:
id
: ID of the filecompleted
: use False
to unpin incomplete uploads. Default: True
Returns:
dictionary with details of updated file
"},{"location":"api/#files_multipart_completecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_complete(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Finalize multipart upload and transform it into completed file.
Depending on storage this action may require additional parameters. But usually it just takes ID and verify that content type, size and hash provided when upload was initialized, much the actual value.
If data is valid and file is completed inside the storage, new File entry with file details created in DB and file can be used just as any normal file.
Requires storage with MULTIPART
capability.
Params:
id
: ID of the incomplete uploadReturns:
dictionary with details of the created file
"},{"location":"api/#files_multipart_refreshcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_refresh(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Refresh details of incomplete upload.
Can be used if upload process was interrupted and client does not how many bytes were already uploaded.
Requires storage with MULTIPART
capability.
Params:
id
: ID of the incomplete uploadReturns:
dictionary with details of the updated upload
"},{"location":"api/#files_multipart_startcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_start(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Initialize multipart(resumable,continuous,signed,etc) upload.
Apart from standard parameters, different storages can require additional data, so always check documentation of the storage before initiating multipart upload.
When upload initialized, storage usually returns details required for further upload. It may be a presigned URL for direct upload, or just an ID of upload which must be used with files_multipart_update
.
Requires storage with MULTIPART
capability.
Params:
storage
: name of the storage that will handle the upload. Default: default
name
: name of the uploaded file.content_type
: MIMEtype of the uploaded file. Used for validationsize
: Expected size of upload. Used for validationhash
: Expected content hash. If present, used for validation.Returns:
dictionary with details of initiated upload. Depends on used storage
"},{"location":"api/#files_multipart_updatecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_update(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Update incomplete upload.
Depending on storage this action may require additional parameters. Most likely, upload
with the fragment of uploaded file.
Requires storage with MULTIPART
capability.
Params:
id
: ID of the incomplete uploadReturns:
dictionary with details of the updated upload
"},{"location":"api/#files_resource_uploadcontext-context-data_dict-dictstr-any","title":"files_resource_upload(context: 'Context', data_dict: 'dict[str, Any]')
","text":"Create a new file inside resource storage.
This action internally calls files_file_create
with ignore_auth=True
and always uses resources storage.
New file is not attached to resource. You need to call files_transfer_ownership
manually, when resource created.
Params:
name
: human-readable name of the file. Default: guess using upload fieldupload
: content of the file as string, bytes, file descriptor or uploaded fileReturns:
dictionary with file details.
"},{"location":"api/#files_transfer_ownershipcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_transfer_ownership(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Transfer file ownership.
Depending on storage this action may require additional parameters. Most likely, upload
with the fragment of uploaded file.
Params:
id
: ID of the file uploadcompleted
: use False
to transfer incomplete uploads. Default: True
owner_id
: ID of the new ownerowner_type
: type of the new ownerforce
: move file even if it's pinned. Default: False
pin
: pin file after transfer to stop future transfers. Default: False
Returns:
dictionary with details of updated file
"},{"location":"changelog/","title":"Changelog","text":"All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
"},{"location":"changelog/#unreleased","title":"Unreleased","text":"Compare with latest
"},{"location":"changelog/#features","title":"Features","text":"Compare with v0.3.0
"},{"location":"changelog/#features_1","title":"Features","text":"Compare with v0.2.6
"},{"location":"changelog/#features_2","title":"Features","text":"Compare with v0.0.5
"},{"location":"changelog/#bug-fixes_1","title":"Bug Fixes","text":"Compare with v0.2.4
"},{"location":"changelog/#bug-fixes_2","title":"Bug Fixes","text":"Compare with v0.2.3
"},{"location":"changelog/#features_3","title":"Features","text":"Compare with v0.2.2
"},{"location":"changelog/#features_4","title":"Features","text":"Compare with v0.2.1
"},{"location":"changelog/#v021-2024-03-18","title":"v0.2.1 - 2024-03-18","text":"Compare with v0.2.0
"},{"location":"changelog/#features_5","title":"Features","text":"Compare with v0.0.5
"},{"location":"changelog/#features_6","title":"Features","text":"Compare with v0.0.4
"},{"location":"changelog/#bug-fixes_4","title":"Bug Fixes","text":"Compare with v0.0.2
"},{"location":"changelog/#v002-2022-02-09","title":"v0.0.2 - 2022-02-09","text":"Compare with v0.0.1
"},{"location":"changelog/#v001-2021-09-21","title":"v0.0.1 - 2021-09-21","text":"Compare with first commit
"},{"location":"cli/","title":"CLI","text":"ckanext-files register files
entrypoint under ckan
command. Commands below must be executed as ckan -c $CKAN_INI files <COMMAND>
.
adapters [-v]
List all available storage adapters. With -v/--verbose
flag docstring from adapter classes are printed as well.
storages [-v]
List all configured storages. With -v/--verbose
flag all supported capabilities are shown.
stream FILE_ID [-o OUTPUT] [--start START] [--end END]
Stream content of the file to STDOUT. For non-textual files use output redirection stream ID > file.ext
. Alternatively, output destination can be specified via -o/--output
option. If it contains path to directory, inside this directory will be created file with the same name as streamed item. Otherwise, OUTPUT
is used as filename.
--start
and --end
can be used to receive a fragment of the file. Only positive values are guaranteed to work with any storage that supports STREAM. Some storages support negative values for these options and count them from the end of file. I.e --start -10
reads last 10 bytes of file. --end -1
reads till the last byte, but the last byte is not included into output.
scan [-s default] [-u] [-t [-a OWNER_ID]]
List all files that exist in storage. Works only if storage supports SCAN
. By default shows content of default
storage. -s/--storage-name
option changes target storage.
-u/--untracked-only
flag shows only untracked files, that has no corresponding record in DB. Can be used to identify leftovers after removing data from portal.
-t/--track
flag registers any untracked file by creating DB record for it. Can be used only when ANALYZE
is supported. Files are created without an owner. Use -a/--adopt-by
option with user ID to give ownership over new files to the specified user. Can be used when configuring a new storage connected to existing location with files.
Storage consist of the storage object that dispatches operation requests and 3 services that do the actual job: Reader, Uploader and Manager. To define a custom storage, you need to extend the main storage class, describe storage logic and register storage via IFiles.files_get_storage_adapters
.
Let's implement DB storage. It will store files in SQL table using SQLAlchemy. There will be just one requirement for the table: it must have column for storing unique identifier of the file and another column for storing content of the file as bytes.
For the sake of simplicity, our storage will work only with existing tables. Create the table manually before we begin.
First of all, we create an adapter that does nothing and register it in our plugin.
from __future__ import annotations\n\nfrom typing import Any\nimport sqlalchemy as sa\n\nimport ckan.plugins as p\nfrom ckan.model.types import make_uuid\nfrom ckanext.files import shared\n\n\nclass ExamplePlugin(p.SingletonPlugin):\n p.implements(shared.IFiles)\n def files_get_storage_adapters(self) -> dict[str, Any]:\n return {\"example:db\": DbStorage}\n\n\nclass DbStorage(shared.Storage):\n ...\n
After installing and enabling your custom plugin, you can configure storage with this adapter by adding a single new line to config file:
ckanext.files.storage.db.type = files:db\n
But if you check storage via ckan files storages -v
, you'll see that it can't do anything.
ckan files storages -v\n\n... db: example:db\n... Supports: Capability.NONE\n... Does not support: Capability.REMOVE|STREAM|CREATE|...\n
Before we start uploading files, let's make sure that storage has proper configuration. As files will be stored in the DB table, we need the name of the table and DB connection string. Let's assume that table already exists, but we don't know which columns to use for files. So we need name of column for content and for file's unique identifier. ckanext-files uses term location
instead of identifier, so we'll do the same in our implementation.
There are 4 required options in total: * db_url
: DB connection string * table
: name of the table * location_column
: name of column for file's unique identifier * content_column
: name of column for file's content
It's not mandatory, but is highly recommended that you declare config options for the adapter. It can be done via Storage.declare_config_options
class method, which accepts declaration
object and key
namespace for storage options.
class DbStorage(shared.Storage):\n\n @classmethod\n def declare_config_options(cls, declaration, key) -> None:\n declaration.declare(key.db_url).required()\n declaration.declare(key.table).required()\n declaration.declare(key.location_column).required()\n declaration.declare(key.content_column).required()\n
And we probably want to initialize DB connection when storage is initialized. For this we'll extend constructor, which must be defined as method accepting keyword-only arguments:
class DbStorage(shared.Storage):\n ...\n\n def __init__(self, **settings: Any) -> None:\n db_url = self.ensure_option(settings, \"db_url\")\n\n self.engine = sa.create_engine(db_url)\n self.location_column = sa.column(\n self.ensure_option(settings, \"location_column\")\n )\n self.content_column = sa.column(self.ensure_option(settings, \"content_column\"))\n self.table = sa.table(\n self.ensure_option(settings, \"table\"),\n self.location_column,\n self.content_column,\n )\n super().__init__(**settings)\n
You can notice that we are using Storage.ensure_option
quite often. This method returns the value of specified option from settings or raises an exception.
The table definition and columns are saved as storage attributes, to simplify building SQL queries in future.
Now we are going to define classes for all 3 storage services and tell storage, how to initialize these services.
There are 3 services: Reader, Uploader and Manager. Each of them initialized via corresponding storage method: make_reader
, make_uploader
and make_manager
. And each of them accepts a single argument during creation, the storage itself.
class DbStorage(shared.Storage):\n def make_reader(self):\n return DbReader(self)\n\n def make_uploader(self):\n return DbUploader(self)\n\n def make_manager(self):\n return DbManager(self)\n\n\nclass DbReader(shared.Reader):\n ...\n\n\nclass DbUploader(shared.Uploader):\n ...\n\n\nclass DbManager(shared.Manager):\n ...\n
Our first target is Uploader service. It's responsible for file creation. For the minimal implementation it needs upload
method and capabilities
attribute which tells the storage, what exactly the Uploader can do.
class DbUploader(shared.Uploader):\n capabilities = shared.Capability.CREATE\n\n def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -> shared.FileData:\n ...\n
upload
receives the location
(name) of the uploaded file; upload
object with file's content; and extras
dictionary that contains any additional arguments that can be passed to uploader. We are going to ignore location
and generate a unique UUID for every uploaded file instead of using user-defined filename.
The goal is to write the file into DB and return shared.FileData
that contains location of the file in DB(value of location_column
), size of the file in bytes, MIMEtype of the file and hash of file content.
For location we'll just use ckan.model.types.make_uuid
function. Size and MIMEtype are already available as upload.size
and upload.content_type
.
The only problem is hash of the content. You can compute it in any way you like, but there is a simple option if you have no preferences. upload
has hashing_reader
method, which returns an iterable for file content. When you read file through it, content hash is automatically computed and you can get it using get_hash
method of the reader.
Just make sure to read the whole file before checking the hash, because hash computed using consumed content. I.e, if you just create the hashing reader, but do not read a single byte from it, you'll receive the hash of empty string. If you read just 1 byte, you'll receive the hash of this single byte, etc.
The easiest option for you is to call reader.read()
method to consume the whole file and then call reader.get_hash()
to receive the hash.
Here's the final implementation of DbUploader:
class DbUploader(shared.Uploader):\n capabilities = shared.Capability.CREATE\n\n def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -> shared.FileData:\n uuid = make_uuid()\n reader = upload.hashing_reader()\n\n values = {\n self.storage.location_column: uuid,\n self.storage.content_column: reader.read(),\n }\n stmt = sa.insert(self.storage.table, values)\n\n result = self.storage.engine.execute(stmt)\n\n return shared.FileData(\n uuid,\n upload.size,\n upload.content_type,\n reader.get_hash()\n )\n
Now you can upload file into your new db
storage:
ckanapi action files_file_create storage=db name=hello.txt upload@<(echo -n 'hello world')\n\n...{\n... \"atime\": null,\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-17T13:48:52.121755+00:00\",\n... \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n... \"id\": \"bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\",\n... \"location\": \"5a4472b3-cf38-4c58-81a6-4d4acb7b170e\",\n... \"mtime\": null,\n... \"name\": \"hello.txt\",\n... \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n... \"owner_type\": \"user\",\n... \"pinned\": false,\n... \"size\": 11,\n... \"storage\": \"db\",\n... \"storage_data\": {}\n...}\n
File is created, but you cannot read it just yet. Try running ckan files stream
CLI command with file ID:
ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... Operation stream is not supported by db storage\n... Aborted!\n
As expected, you have to write extra code.
Streaming, reading and generating links is a responsibility of Reader service. We only need stream
method for minimal implementation. This method receives shared.FileData
object(the same object as the one returned from Uploader.upload
) and extras
containing all additional arguments passed by the caller. The result is any iterable producing bytes.
We'll use location
property of shared.FileData
as a value for location_column
inside the table.
And don't forget to add STREAM
capability to Reader.capabilities
.
class DbReader(shared.Reader):\n capabilities = shared.Capability.STREAM\n\n def stream(self, data: shared.FileData, extras: dict[str, Any]) -> Iterable[bytes]:\n stmt = (\n sa.select(self.storage.content_column)\n .select_from(self.storage.table)\n .where(self.storage.location_column == data.location)\n )\n row = self.storage.engine.execute(stmt).fetchone()\n\n return row\n
The result may be confusing: we returning Row object from the stream method. But our goal is to return any iterable that produces bytes. Row is iterable(tuple-like). And it contains only one item - value of column with file content, i.e, bytes. So it satisfy the requirements.
Now you can check content via CLI once again.
ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... hello world\n
Finally, we need to add file removal for the minimal implementation. And it also nice to to have SCAN
capability, as it shows all files currently available in storage, so we add it as bonus. These operations handled by Manager. We need remove
and scan
methods. Arguments are already familiar to you. As for results:
remove
: return True
if file was successfully removed. Should return False
if file does not exist, but it's allowed to return True
as long as you are not checking the result.scan
: return iterable with all file locationsclass DbManager(shared.Manager):\n storage: DbStorage\n capabilities = shared.Capability.SCAN | shared.Capability.REMOVE\n\n def scan(self, extras: dict[str, Any]) -> Iterable[str]:\n stmt = sa.select(self.storage.location_column).select_from(self.storage.table)\n for row in self.storage.engine.execute(stmt):\n yield row[0]\n\n def remove(\n self,\n data: shared.FileData | shared.MultipartData,\n extras: dict[str, Any],\n ) -> bool:\n stmt = sa.delete(self.storage.table).where(\n self.storage.location_column == data.location,\n )\n self.storage.engine.execute(stmt)\n return True\n
Now you can list the all the files in storage:
ckan files scan -s db\n
And remove file using ckanaapi and file ID
ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n
That's all you need for the basic storage. But check definition of base storage and services to find details about other methods. And also check implementation of other storages for additional ideas. <
"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#requirements","title":"Requirements","text":"Compatibility with core CKAN versions:
CKAN version Compatible? 2.9 no 2.10 yes 2.11 yes master yesNote
It's recommended to install the extension via pip. If you are using GitHub version of the extension, stick to the vX.Y.Z tags to avoid breaking changes. Check the changelog before upgrading the extension.
"},{"location":"installation/#installation_1","title":"Installation","text":"Install the extension
pip install ckanext-files # (1)!\n
pip install ckanext-files[opendal,libcloud]\n
Add files
to the ckan.plugins
setting in your CKAN config file.
Run DB migrations
ckan db upgrade -p files\n
"},{"location":"interfaces/","title":"Interfaces","text":""},{"location":"interfaces/#interfaces","title":"Interfaces","text":"ckanext-files registers ckanext.files.shared.IFiles
interface. As extension is actively developed, this interface may change in future. Always use inherit=True
when implementing IFiles
.
class IFiles(Interface):\n \"\"\"Extension point for ckanext-files.\"\"\"\n\n def files_get_storage_adapters(self) -> dict[str, Any]:\n \"\"\"Return mapping of storage type to adapter class.\n\n Example:\n >>> def files_get_storage_adapters(self):\n >>> return {\n >>> \"my_ext:dropbox\": DropboxStorage,\n >>> }\n\n \"\"\"\n\n return {}\n\n def files_register_owner_getters(self) -> dict[str, Callable[[str], Any]]:\n \"\"\"Return mapping with lookup functions for owner types.\n\n Name of the getter is the name used as `Owner.owner_type`. The getter\n itself is a function that accepts owner ID and returns optional owner\n entity.\n\n Example:\n >>> def files_register_owner_getters(self):\n >>> return {\"resource\": model.Resource.get}\n \"\"\"\n return {}\n\n def files_file_allows(\n self,\n context: types.Context,\n file: File | Multipart,\n operation: types.FileOperation,\n ) -> bool | None:\n \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n Return True/False if user allowed/not allowed. Return `None` to rely on\n other plugins.\n\n Default implementation relies on cascade_access config option. If owner\n of file is included into cascade access, user can perform operation on\n file if he can perform the same operation with file's owner.\n\n If current owner is not affected by cascade access, user can perform\n operation on file only if user owns the file.\n\n Example:\n >>> def files_file_allows(\n >>> self, context,\n >>> file: shared.File | shared.Multipart,\n >>> operation: shared.types.FileOperation\n >>> ) -> bool | None:\n >>> if file.owner_info and file.owner_info.owner_type == \"resource\":\n >>> return is_authorized_boolean(\n >>> f\"resource_{operation}\",\n >>> context,\n >>> {\"id\": file.owner_info.id}\n >>> )\n >>>\n >>> return None\n\n \"\"\"\n return None\n\n def files_owner_allows(\n self,\n context: types.Context,\n owner_type: str,\n owner_id: str,\n operation: types.OwnerOperation,\n ) -> bool | None:\n \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n Return True/False if user allowed/not allowed. Return `None` to rely on\n other plugins.\n\n Example:\n >>> def files_owner_allows(\n >>> self, context,\n >>> owner_type: str, owner_id: str,\n >>> operation: shared.types.OwnerOperation\n >>> ) -> bool | None:\n >>> if owner_type == \"resource\" and operation == \"file_transfer\":\n >>> return is_authorized_boolean(\n >>> f\"resource_update\",\n >>> context,\n >>> {\"id\": owner_id}\n >>> )\n >>>\n >>> return None\n\n \"\"\"\n return None\n
"},{"location":"primer/","title":"Welcome to MkDocs","text":"For full documentation visit mkdocs.org{ data-preview }
Attribute Lists{ data-preview }
Some title
Some content
Some title
Some content
Open styled details Nested details!And more content again.
theme:\nfeatures:\n- content.code.annotate # (1)!\n
code
, formatted text, images, ... basically anything that can be written in Markdown.#include <stdio.h>\n\nint main(void) {\nprintf(\"Hello world!\\n\");\nreturn 0;\n}\n
C++ #include <iostream>\n\nint main(void) {\nstd::cout << \"Hello world!\" << std::endl;\nreturn 0;\n}\n
graph LR\nA[Start] --> B{Error?};\nB -->|Yes| C[Hmm...];\nC --> D[Debug];\nD --> B;\nB ---->|No| E[Yay!];
sequenceDiagram\nautonumber\nAlice->>John: Hello John, how are you?\nloop Healthcheck\nJohn->>John: Fight against hypochondria\nend\nNote right of John: Rational thoughts!\nJohn-->>Alice: Great!\nJohn->>Bob: How about you?\nBob-->>John: Jolly good!
```py title=\"IFiles\" class IFiles(Interface): \"\"\"Extension point for ckanext-files.\"\"\"
def files_get_storage_adapters(self) -> dict[str, Any]:\n \"\"\"Return mapping of storage type to adapter class.\n\n Example:\n >>> def files_get_storage_adapters(self):\n >>> return {\n >>> \"my_ext:dropbox\": DropboxStorage,\n >>> }\n\n \"\"\"\n\n return {}\n\ndef files_register_owner_getters(self) -> dict[str, Callable[[str], Any]]:\n \"\"\"Return mapping with lookup functions for owner types.\n\n Name of the getter is the name used as `Owner.owner_type`. The getter\n itself is a function that accepts owner ID and returns optional owner\n entity.\n\n Example:\n >>> def files_register_owner_getters(self):\n >>> return {\"resource\": model.Resource.get}\n \"\"\"\n return {}\n\ndef files_file_allows(\n self,\n context: types.Context,\n file: File | Multipart,\n operation: types.FileOperation,\n) -> bool | None:\n \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n Return True/False if user allowed/not allowed. Return `None` to rely on\n other plugins.\n\n Default implementation relies on cascade_access config option. If owner\n of file is included into cascade access, user can perform operation on\n file if he can perform the same operation with file's owner.\n\n If current owner is not affected by cascade access, user can perform\n operation on file only if user owns the file.\n\n Example:\n >>> def files_file_allows(\n >>> self, context,\n >>> file: shared.File | shared.Multipart,\n >>> operation: shared.types.FileOperation\n >>> ) -> bool | None:\n >>> if file.owner_info and file.owner_info.owner_type == \"resource\":\n >>> return is_authorized_boolean(\n >>> f\"resource_{operation}\",\n >>> context,\n >>> {\"id\": file.owner_info.id}\n >>> )\n >>>\n >>> return None\n\n \"\"\"\n return None\n\ndef files_owner_allows(\n self,\n context: types.Context,\n owner_type: str,\n owner_id: str,\n operation: types.OwnerOperation,\n) -> bool | None:\n \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n Return True/False if user allowed/not allowed. Return `None` to rely on\n other plugins.\n\n Example:\n >>> def files_owner_allows(\n >>> self, context,\n >>> owner_type: str, owner_id: str,\n >>> operation: shared.types.OwnerOperation\n >>> ) -> bool | None:\n >>> if owner_type == \"resource\" and operation == \"file_transfer\":\n >>> return is_authorized_boolean(\n >>> f\"resource_update\",\n >>> context,\n >>> {\"id\": owner_id}\n >>> )\n >>>\n >>> return None\n\n \"\"\"\n return None\n\n\n\n ```\n\n === \"Hello\"\n\n world\n\n === \"bye\"\n\n world\n
"},{"location":"shared/","title":"Shared","text":"All public utilites are collected inside ckanext.files.shared
module. Avoid using anything that is not listed there. Do not import anything from modules other than shared
.
get_storage(name: 'str | None' = None) -> 'Storage'
","text":"Return existing storage instance.
Storages are initialized when plugin is loaded. As result, this function always returns the same storage object for the given name.
If no name specified, default storage is returned.
Example:
default_storage = get_storage()\nstorage = get_storage(\"storage name\")\n
"},{"location":"shared/#make_storagename-str-settings-dictstr-any-storage","title":"make_storage(name: 'str', settings: 'dict[str, Any]') -> 'Storage'
","text":"Initialize storage instance with specified settings.
Storage adapter is defined by type
key of the settings. All other settings depend on the specific adapter.
Example:
storage = make_storage(\"memo\", {\"type\": \"files:redis\"})\n
"},{"location":"shared/#make_uploadvalue-typesuploadable-upload-upload","title":"make_upload(value: 'types.Uploadable | Upload') -> 'Upload'
","text":"Convert value into Upload object
Use this function for simple and reliable initialization of Upload object. Avoid creating Upload manually, unless you are 100% sure you can provide correct MIMEtype, size and stream.
Example:
storage.upload(\"file.txt\", make_upload(b\"hello world\"))\n
"},{"location":"shared/#with_task_queuefunc-any-name-str-none-none","title":"with_task_queue(func: 'Any', name: 'str | None' = None)
","text":"Decorator for functions that schedule tasks.
Decorated function automatically initializes separate task queue that is processed when function is finished. All tasks receive function's result as execution data(first argument to Task.run).
Without this decorator, you have to manually create task queue context before queuing tasks.
Example:
@with_task_queue\ndef my_action(context, data_dict):\n ...\n
"},{"location":"shared/#add_tasktask-task","title":"add_task(task: 'Task')
","text":"Add task to the current task queue.
This function can be called only inside task queue context. Such context initialized automatically inside functions decorated with with_task_queue
:
@with_task_queue\ndef taks_producer():\n add_task(...)\n\ntask_producer()\n
If task queue context can be initialized manually using TaskQueue and with
statement:
queue = TaskQueue()\nwith queue:\n add_task(...)\n\nqueue.process(execution_data)\n
"},{"location":"upload-strategies/","title":"File upload strategies","text":"There is no \"right\" way to add file to entity via ckanext-files. Everything depends on your use-case and here you can find a few different ways to combine file and arbitrary entity.
"},{"location":"upload-strategies/#attach-existing-file-and-then-transfer-ownership-via-api","title":"Attach existing file and then transfer ownership via API","text":"The simplest option is just saving file ID inside a field of the entity. It's recommended to transfer file ownership to the entity and pin the file.
ckanapi action package_patch id=PACKAGE_ID attachment_id=FILE_ID\n\nckanapi action files_transfer_ownership id=FILE_ID \\\n owner_type=package owner_id=PACKAGE_ID pin=true\n
Pros: * simple and transparent
Cons: * it's easy to forget about ownership transfer and leave the entity with the inaccessible file * after entity got reference to file and before ownership is transfered data may be considered invalid.
"},{"location":"upload-strategies/#automatically-transfer-ownership-using-validator","title":"Automatically transfer ownership using validator","text":"Add files_transfer_ownership(owner_type)
to the validation schema of entity. When it validated, ownership transfer task is queued and file automatically transfered to the entity after the update.
Pros: * minimal amount of changes if metadata schema already modified * relationships between owner and file are up-to-date after any modification
Cons: * works only with files uploaded in advance and cannot handle native implementation of resource form
"},{"location":"upload-strategies/#upload-file-and-assign-owner-via-queued-task","title":"Upload file and assign owner via queued task","text":"Add a field that accepts uploaded file. The action itself does not process the upload. Instead create a validator for the upload field, that will schedule a task for file upload and ownership transfer.
In this way, if action is failed, no upload happens and you don't need to do anything with the file, as it never left server's temporal directory. If action finished without an error, the task is executed and file uploaded/attached to action result.
Pros: * can be used together with native group/user/resource form after small modification of CKAN core. * handles upload inside other action as an atomic operation
Cons: * you have to validate file before upload happens to prevent situation when action finished successfully but then upload failed because of file's content type or size. * tasks themselves are experimental and it's not recommended to put a lot of logic into them * there are just too many things that can go wrong
"},{"location":"upload-strategies/#add-a-new-action-that-combines-uploads-modifications-and-ownership-transfer","title":"Add a new action that combines uploads, modifications and ownership transfer","text":"If you want to add attachmen to dataset, create a separate action that accepts dataset ID and uploaded file. Internally it will upload the file by calling files_file_create
, then update dataset via packaage_patch
and finally transfer ownership via files_transfer_ownership
.
Pros: * no magic. Everything is described in the new action * can be extracted into shared extension and used across multiple portals
Cons: * if you need to upload multiple files and update multipe fields, action quickly becomes too compicated. * integration with existing workflows, like dataset/resource creation is hard. You have to override existing views or create a brand new ones.
"},{"location":"validators/","title":"Validators","text":"Validator Effect files_into_upload Transform value of field(usually file uploaded via<input type=\"file\">
) into upload object using ckanext.files.shared.make_upload
files_parse_filesize Convert human-readable filesize(1B, 10MiB, 20GB) into an integer files_ensure_name(name_field) If name_field
is empty, copy into it filename from current field. Current field must be processed with files_into_upload
first files_file_id_exists Verify that file ID exists files_accept_file_with_type(*type) Verify that file ID refers to file with one of specified types. As a type can be used full MIMEtype(image/png
), or just its main(image
) or secondary(png
) part files_accept_file_with_storage(*storage_name) Verify that file ID refers to file stored inside one of specified storages files_transfer_ownership(owner_type, name_of_owner_id_field) Transfer ownership for file ID to specified entity when current API action is successfully finished"},{"location":"configuration/","title":"Configuration","text":"There are two types of config options for ckanext-files:
Depending on the type of the storage, available options are quite different. For example, files:fs
storage type requires path
option that controls filesystem path where uploads are stored. files:redis
storage type accepts prefix
option that defines Redis' key prefix of files stored in Redis. All storage specific options always have form ckanext.files.storage.<STORAGE>.<OPTION>
:
ckanext.files.storage.memory.prefix = xxx:\n# or\nckanext.files.storage.my_drive.path = /tmp/hello\n
"},{"location":"configuration/fs/","title":"Filesystem storage configuration","text":"Private filesystem storage
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n
Public filesystem storage
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:public_fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n## URL of the storage folder. `public_root + location` must produce a public URL\nckanext.files.storage.NAME.public_root =\n
"},{"location":"configuration/global/","title":"Global configuration","text":"# Default storage used for upload when no explicit storage specified\n# (optional, default: default)\nckanext.files.default_storage = default\n\n# MIMEtypes that can be served without content-disposition:attachment header.\n# (optional, default: application/pdf image video)\nckanext.files.inline_content_types = application/pdf image video\n\n# Storage used for user image uploads. When empty, user image uploads are not\n# allowed.\n# (optional, default: user_images)\nckanext.files.user_images_storage = user_images\n\n# Storage used for group image uploads. When empty, group image uploads are\n# not allowed.\n# (optional, default: group_images)\nckanext.files.group_images_storage = group_images\n\n# Storage used for resource uploads. When empty, resource uploads are not\n# allowed.\n# (optional, default: resources)\nckanext.files.resources_storage = resources\n\n# Enable HTML templates and JS modules required for unsafe default\n# implementation of resource uploads via files. IMPORTANT: this option exists\n# to simplify migration and experiments with the extension. These templates\n# may change a lot or even get removed in the public release of the\n# extension.\n# (optional, default: false)\nckanext.files.enable_resource_migration_template_patch = false\n\n# Any authenticated user can upload files.\n# (optional, default: false)\nckanext.files.authenticated_uploads.allow = false\n\n# Names of storages that can by used by non-sysadmin users when authenticated\n# uploads enabled\n# (optional, default: default)\nckanext.files.authenticated_uploads.storages = default\n\n# List of owner types that grant access on owned file to anyone who has\n# access to the owner of file. For example, if this option has value\n# `resource package`, anyone who passes `resource_show` auth, can see all\n# files owned by resource; anyone who passes `package_show`, can see all\n# files owned by package; anyone who passes\n# `package_update`/`resource_update` can modify files owned by\n# package/resource; anyone who passes `package_delete`/`resource_delete` can\n# delete files owned by package/resoure. IMPORTANT: Do not add `user` to this\n# list. Files may be temporarily owned by user during resource creation.\n# Using cascade access rules with `user` exposes such temporal files to\n# anyone who can read user's profile.\n# (optional, default: package resource group organization)\nckanext.files.owner.cascade_access = package resource group organization\n\n# Use `<OWNER_TYPE>_update` auth function to check access for ownership\n# transfer. When this flag is disabled `<OWNER_TYPE>_file_transfer` auth\n# function is used.\n# (optional, default: true)\nckanext.files.owner.transfer_as_update = true\n\n# Use `<OWNER_TYPE>_update` auth function to check access when listing all\n# files of the owner. When this flag is disabled `<OWNER_TYPE>_file_scan`\n# auth function is used.\n# (optional, default: true)\nckanext.files.owner.scan_as_update = true\n
"},{"location":"configuration/libcloud/","title":"Apache libcloud storage configuration","text":"To use this storage install extension with libcloud
extras.
pip install 'ckanext-files[libcloud]'\n
The actual storage backend is controlled by provider
option of the storage. List of all providers is available here
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:libcloud\n## apache-libcloud storage provider. List of providers available at https://libcloud.readthedocs.io/en/stable/storage/supported_providers.html#provider-matrix . Use upper-cased value from Provider Constant column\nckanext.files.storage.NAME.provider =\n## API key or username\nckanext.files.storage.NAME.key =\n## Secret password\nckanext.files.storage.NAME.secret =\n## JSON object with additional parameters passed directly to storage constructor.\nckanext.files.storage.NAME.params =\n## Name of the container(bucket)\nckanext.files.storage.NAME.container =\n
"},{"location":"configuration/opendal/","title":"OpenDAL storage configuration","text":"To use this storage install extension with opendal
extras.
pip install 'ckanext-files[opendal]'\n
The actual storage backend is controlled by scheme
option of the storage. List of all schemes is available here
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:opendal\n## OpenDAL service type. Check available services at https://docs.rs/opendal/latest/opendal/services/index.html\nckanext.files.storage.NAME.scheme =\n## JSON object with parameters passed directly to OpenDAL operator.\nckanext.files.storage.NAME.params =\n
"},{"location":"configuration/redis/","title":"Redis storage configuration","text":"## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.NAME.prefix = ckanext:files:default:file_content:\n
"},{"location":"configuration/storage/","title":"Storage configuration","text":"All available options for the storage type can be checked via config declarations CLI. First, add the storage type to the config file:
ckanext.files.storage.xxx.type = files:redis\n
Now run the command that shows all available config option of the plugin.
ckan config declaration files -d\n
Because Redis storage adapter is enabled, you'll see all the options regsitered by Redis adapter alongside with the global options:
## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.xxx.prefix = ckanext:files:default:file_content:\n
Sometimes you will see a validation error if storage has required config options. Let's try using files:fs
storage instead of the redis:
ckanext.files.storage.xxx.type = files:fs\n
Now any attempt to run ckan config declaration files -d
will show an error, because required path
option is missing:
Invalid configuration values provided:\nckanext.files.storage.xxx.path: Missing value\nAborted!\n
Add the required option to satisfy the application
ckanext.files.storage.xxx.type = files:fs\nckanext.files.storage.xxx.path = /tmp\n
And run CLI command once again. This time you'll see the list of allowed options:
## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.xxx.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.xxx.create_path = false\n
There is a number of options that are supported by every storage. You can set them and expect that every storage, regardless of type, will use these options in the same way:
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = ADAPTER\n## The maximum size of a single upload.\n## Supports size suffixes: 42B, 2M, 24KiB, 1GB. `0` means no restrictions.\nckanext.files.storage.NAME.max_size = 0\n## Space-separated list of MIME types or just type or subtype part.\n## Example: text/csv pdf application video jpeg\nckanext.files.storage.NAME.supported_types =\n## Descriptive name of the storage used for debugging. When empty, name from\n## the config option is used, i.e: `ckanext.files.storage.DEFAULT_NAME...`\nckanext.files.storage.NAME.name = NAME\n
"},{"location":"migration/","title":"Migration from native CKAN storage system","text":"Important: ckanext-files itself is an independent file-management system. You don't have to migrate existing files from groups, users and resources to it. You can just start using ckanext-files for new fields defined in metadata schema or for uploading arbitrary files. And continue using native CKAN uploads for group/user images and resource files. Migration workflows described here merely exist as a PoC of using ckanext-files for everything in CKAN. Don't migrate your production instances yet, because concepts and rules may change in future and migration process will change as well. Try migration only as an experiment, that gives you an idea of what else you want to see in ckanext-file, and share this idea with us.
Note: every migration workflow described below requires installed ckanext-files. Complete installation section before going further.
CKAN has following types of files:
At the moment, there is no migration strategy for the last two types. Replacing site logo manually is a trivial task, so there will be no dedicated command for it. As for extensions, every of them is unique, so feel free to create an issue in the current repository: we'll consider creation of migration script for your scenario or, at least, explain how you can perform migration by yourself.
Migration process for group/organization/user images and resource uploads described below. Keep in mind, that this process only describes migration from native CKAN storage system, that keeps files inside local filesystem. If you are using storage extensions, like ckanext-s3filestore or ckanext-cloudstorage, create an issue in the current repository with a request of migration command. As there are a lot of different forks of such extension, creating reliable migration script may be challenging, so we need some details about your environment to help with migration.
Migration workflows bellow require certain changes to metadata schemas, UI widgets for file uploads and styles of your portal(depending on the customization).
"},{"location":"migration/group/","title":"Migration for group/organization images","text":"Note: internally, groups and organizations are the same entity, so this workflow describes both of them.
First of all, you need a configured storage that supports public links. As all group/organization images are stored inside local filesystem, you can use files:public_fs
storage adapter.
This extension expects that the name of group images storage will be group_images
. This name will be used in all other commands of this migration workflow. If you want to use different name for group images storage, override ckanext.files.group_images_storage
config option which has default value group_images
and don't forget to adapt commands if you use a different name for the storage.
This configuration example sets 10MiB restriction on upload size via ckanext.files.storage.group_images.max_size
option. Feel free to change it or remove completely to allow any upload size. This restriction is applied to future uploads only. Any existing file that exceeds limit is kept.
Uploads restricted to image/*
MIMEtype via ckanext.files.storage.group_images.supported_types
option. You can make this option more or less restrictive. This restriction is applied to future uploads only. Any existing file with wrong MIMEtype is kept.
ckanext.files.storage.group_images.path
controls location of the upload folder in filesystem. It should match value of ckan.storage_path
option plus storage/uploads/group
. In example below we assume that value of ckan.storage_path
is /var/storage/ckan
.
ckanext.files.storage.group_images.public_root
option specifies base URL from which every group image can be accessed. In most cases it's CKAN URL plus uploads/group
. If you are serving CKAN application from the ckan.site_url
, leave this option unchanged. If you are using ckan.root_path
, like /data/
, insert this root path into the value of the option. Example below uses %(ckan.site_url)s
wildcard, which will be automatically replaced with the value of ckan.site_url
config option. You can specify site URL explicitely if you don't like this wildcard syntax.
ckanext.files.storage.group_images.type = files:public_fs\nckanext.files.storage.group_images.max_size = 10MiB\nckanext.files.storage.group_images.supported_types = image\nckanext.files.storage.group_images.path = /var/storage/ckan/storage/uploads/group\nckanext.files.storage.group_images.public_root = %(ckan.site_url)s/uploads/group\n
Now let's run a command that show us the list of files available under newly configured storage:
ckan files scan -s group_images\n
All these files are not tracked by files extension yet, i.e they don't have corresponding record in DB with base details, like size, MIMEtype, filehash, etc. Let's create these details via the command below. It's safe to run this command multiple times: it will gather and store information about files not registered in system and ignore any previously registered file.
ckan files scan -s group_images -t\n
Finally, let's run the command, that shows only untracked files. Ideally, you'll see nothing upon executing it, because you just registered every file in the system.
ckan files scan -s group_images -u\n
Note, all the file are still available inside storage directory. If previous command shows nothing, it only means that CKAN already knows details about each file from the storage directory. If you want to see the list of the files again, omit -u
flag(which stands for \"untracked\") and you'll see again all the files in the command output:
ckan files scan -s group_images\n
Now, when all images are tracked by the system, we can give the ownership over these files to groups/organizations that are using them. Run the command below to connect files with their owners. It will search for groups/organizations first and report, how many connections were identified. There will be suggestion to show identified relationship and the list of files that have no owner(if there are such files). Presence of files without owner usually means that you removed group/organization from database, but did not remove its image.
Finally, you'll be asked if you want to transfer ownership over files. This operation does not change existing data and if you disable ckanext-files after ownership transfer, you won't see any difference. The whole ownership transfer is managed inside custom DB tables generated by ckanext-files, so it's safe operation.
ckan files migrate groups group_images\n
Here's an example of output that you can see when running the command:
Found 3 files. Searching file owners...\n[####################################] 100% Located owners for 2 files out of 3.\n\nShow group IDs and corresponding file? [y/N]: y\nd7186937-3080-429f-a434-22b74b9a8d39: file-1.png\n87e2a1aa-7905-4a28-a087-90433f8e169e: file-2.png\n\nShow files that do not belong to any group? [y/N]: y\nfile-3.png\n\nTransfer file ownership to group identified in previous steps? [y/N]: y\nTransfering file-2.png [####################################] 100%\n
Now comes the most complex part. You need to change metadata schema and UI in order to:
Original CKAN workflow for uploading files was:
This approach is different from strategy recommended by ckanext-files. But in order to make the migration as simple as possible, we'll stay close to original workflow.
Note: suggestet approach resembles existing process of file uploads in CKAN. But ckanext-files was designed as a system, that gives you a choice. Check file upload strategies to learn more about alternative implementations of upload and their pros/cons.
First, we need to replace Upload/Link widget on group/organization form. If you are using native group templates, create group/snippets/group_form.html
and organization/snippets/organization_form.html
. Inside both files, extend original template and override block basic_fields
. You only need to replace last field
{{ form.image_upload(\n data, errors, is_upload_enabled=h.uploads_enabled(),\n is_url=is_url, is_upload=is_upload) }}\n
with
{{ form.image_upload(\n data, errors, is_upload_enabled=h.files_group_images_storage_is_configured(),\n is_url=is_url, is_upload=is_upload,\n field_upload=\"files_image_upload\") }}\n
There are two differences with the original. First, we use h.files_group_images_storage_is_configured()
instead of h.uploads_enabled()
. As we are using different storage for different upload types, now upload widgets can be enabled independently. And second, we pass field_upload=\"files_image_upload\"
argument into macro. It will send uploaded file to CKAN inside files_image_upload
instead of original image_upload
field. This must be done because CKAN unconditionally strips image_upload
field from submission payload, making processing of the file too unreliable. We changed the name of upload field and CKAN keeps this new field, so that we can process it as we wish.
Note: if you are using ckanext-scheming, you only need to replace form_snippet
of the image_url
field, instead of rewriting the whole template.
Now, let's define validation rules for this new upload field. We need to create plugins that modify validation schema for group and organization. Due to CKAN implementation details, you need separate plugin for group and organization.
Note: if you are using ckanext-scheming, you can add files_image_upload
validators to schemas of organization and group. Check the list of validators that must be applied to this new field below.
Here's an example of plugins that modify validation schemas of group and organization. As you can see, they are mostly the same:
from ckan.lib.plugins import DefaultGroupForm, DefaultOrganizationForm\nfrom ckan.logic.schema import default_create_group_schema, default_update_group_schema\n\n\ndef _modify_schema(schema, type):\n schema[\"files_image_upload\"] = [\n tk.get_validator(\"ignore_empty\"),\n tk.get_validator(\"files_into_upload\"),\n tk.get_validator(\"files_validate_with_storage\")(\"group_images\"),\n tk.get_validator(\"files_upload_as\")(\n \"group_images\",\n type,\n \"id\",\n \"public_url\",\n type + \"_patch\",\n \"image_url\",\n ),\n ]\n\n\nclass FilesGroupPlugin(p.SingletonPlugin, DefaultGroupForm):\n p.implements(p.IGroupForm, inherit=True)\n is_organization = False\n\n def group_types(self):\n return [\"group\"]\n\n def create_group_schema(self):\n return _modify_schema(default_create_group_schema(), \"group\")\n\n def update_group_schema(self):\n return _modify_schema(default_update_group_schema(), \"group\")\n\n\nclass FilesOrganizationPlugin(p.SingletonPlugin, DefaultOrganizationForm):\n p.implements(p.IGroupForm, inherit=True)\n is_organization = True\n\n def group_types(self):\n return [\"organization\"]\n\n def create_group_schema(self):\n return _modify_schema(default_create_group_schema(), \"organization\")\n\n def update_group_schema(self):\n return _modify_schema(default_update_group_schema(), \"organization\")\n
There are 4 validators that must be applied to the new upload field:
ignore_empty
: to skip validation, when image URL set manually and no upload selected.files_into_upload
: to convert value of upload field into normalized format, which is expected by ckanext-filesfiles_validate_with_storage(STORAGE_NAME)
: this validator requires an argument: the name of the storage we are using for image uploads. The validator will use storage settings to verify size and MIMEtype of the appload.files_upload_as(STORAGE_NAME, GROUP_TYPE, NAME_OF_ID_FIELD, \"public_url\", NAME_OF_PATCH_ACTION, NAME_OF_URL_FIELF)
: this validator is the most challenging. It accepts 6 arguments:group
or organization
depending on processed entityid
in your case.public_url
- use this exact value. It tells which property of file you want to use as link to the file.group_patch
or organization_patch
depending on processed entityimage_url
- name of the field that contains URL of the image. ckanext-files will put the public link of uploaded file into this field when form is processed.That's all. Now every image upload for group/organization is handled by ckanext-files. To verify it, do the following. First, check list of files currently stored in group_images
storage via command that we used in the beginning of the migration:
ckan files scan -s group_images\n
You'll see a list of existing files. Their names follow format <ISO_8601_DATETIME><FILENAME>
, e.g 2024-06-14-133840.539670photo.jpg
.
Now upload an image into existing group, or create a new group with any image. When you check list of files again, you'll see one new record. But this time this record resembles UUID: da046887-e76c-4a68-97cf-7477665710ff
.
Configure named storage for resources. Use files:ckan_resource_fs
storage adapter.
This extension expects that the name of resources storage will be resources
. This name will be used in all other commands of this migration workflow. If you want to use different name for resources storage, override ckanext.files.resources_storage
config option which has default value resources
and don't forget to adapt commands if you use a different name for the storage.
ckanext.files.storage.resources.path
must match value of ckan.storage_path
option, followed by resources
directory. In example below we assume that value of ckan.storage_path
is /var/storage/ckan
.
Example below sets 10MiB limit on resource size. Modify it if you are using different limit set by ckan.max_resource_size
.
Unlike group and user images, this storage does not need upload type restriction and public_root
.
ckanext.files.storage.resources.type = files:ckan_resource_fs\nckanext.files.storage.resources.max_size = 10MiB\nckanext.files.storage.resources.path = /var/storage/ckan/resources\n
Check the list of untracked files available inside newly configured storage:
ckan files scan -s resources -u\n
Track all these files:
ckan files scan -s resources -t\n
Re-check that now you see no untracked files:
ckan files scan -s resources -u\n
Transfer file ownership to corresponding resources. In addition to simple ownership transfer, this command will ask you, whether you want to modify resource's url_type
and url
fields. It's required to move file management to files extension completely and enable possibility of migration to different storage type.
If you accept resource modifications, for every file owner url_type
will be changed to file
and url
will be changed to file ID. Then all modified packages will be reindexed.
Changing url_type
means that some pages will change. For example, instead of Download button CKAN will show you Go to resource button on the resource page, because Download label is specific to url_type=upload
. And some views may stop working as well. But this is safer option for migration, than leaving url_type
unchanged: ckanext-files manages files in its own way and some assumptions about files will not work anymore, so using different url_type
is the fastest way to tell everyone that something changed.
Broken views can be easily fixed. Every view implemented as a separate plugin. You always can inherit from this plugin and override methods that relied on different behavior. And a lot of views work with file URL directly, so they won't even see the difference.
ckan files migrate local-resources resources\n
And the next goal is correct metadata schema. If you are using ckanext-scheming, you need to modify validators of url
and format
fields.
If you are working with native schemas, you have to modify dataset schema via implementing IDatasetForm. Here's an example:
from ckan.lib.plugins import DefaultDatasetForm\nfrom ckan.logic import schema\n\nclass FilesDatasetPlugin(p.SingletonPlugin, DefaultDatasetForm):\n p.implements(p.IDatasetForm, inherit=True)\n\n def is_fallback(self):\n return True\n\n def package_types(self):\n return [\"dataset\"]\n\n def _modify_schema(self, schema):\n schema[\"resources\"][\"url\"].extend([\n tk.get_validator(\"files_verify_url_type_and_value\"),\n tk.get_validator(\"files_file_id_exists\"),\n tk.get_validator(\"files_transfer_ownership\")(\"resource\",\"id\"),\n ])\n schema[\"resources\"][\"format\"].insert(0, tk.get_validator(\"files_content_type_from_file\")(\"url\"))\n\n def create_package_schema(self):\n sch = schema.default_create_package_schema()\n self._modify_schema(sch)\n return sch\n\n def update_package_schema(self):\n sch = schema.default_update_package_schema()\n self._modify_schema(sch)\n return sch\n\n def show_package_schema(self):\n sch = schema.default_show_package_schema()\n sch[\"resources\"][\"url\"].extend([\n tk.get_validator(\"files_verify_url_type_and_value\"),\n tk.get_validator(\"files_id_into_resource_download_url\"),\n ])\n return sch\n
Both create and update schemas are updated in the same way. We add a new validator to format field, to correctly identify file format. And there is a number of new validators for url
:
files_verify_url_type_and_value
: skip validation if we are not working with resource that contains file.files_file_id_exists
: verify existence of file IDfiles_transfer_ownership(\"resource\",\"id\")
: move file ownership to resource after successful validationAt top of this, we also have two validators applied to show_package_schema
(use output_validators
in ckanext-scheming):
files_verify_url_type_and_value
: skip validation if we are not working with resource that contains file.files_id_into_resource_download_url
: replace file ID with download URL in API outputAnd the next part is the trickiest. You need to create a number of templates and JS modules. But because ckanext-files is actively developed, most likely, your custom files will be outdated pretty soon.
Instead, we recommend enabling patch for resource form that shipped with ckanext-files. It's a bit hacky, but because the extension itself is stil in alpha-stage, it should be acceptable. Check file upload strategies for examples of implementation that you can add to your portal instead of the default patch.
To enable patch for templates, add following line to the config file:
ckanext.files.enable_resource_migration_template_patch = true\n
This option adds Add file button to resource form
Upon clicking, this button is replaced by widget that supports uploading new files of selecting previously uploaded files that are not used by any resource yet
"},{"location":"migration/user/","title":"Migration for user avatars","text":"This workflow is similar to group/organization migration. It contains the sequence of actions, but explanations are removed, because you already know details from the group migration. Only steps that are different will contain detailed explanation of the process.
Configure local filesystem storage with support of public links(files:public_fs
) for user images.
This extension expects that the name of user images storage will be user_images
. This name will be used in all other commands of this migration workflow. If you want to use different name for user images storage, override ckanext.files.user_images_storage
config option which has default value user_images
and don't forget to adapt commands if you use a different name for the storage.
ckanext.files.storage.user_images.path
resembles this option for group/organization images storage. But user images are kept inside user
folder by default. As result, value of this option should match value of ckan.storage_path
option plus storage/uploads/user
. In example below we assume that value of ckan.storage_path
is /var/storage/ckan
.
ckanext.files.storage.user_images.public_root
resebles this option for group/organization images storage. But user images are available at CKAN URL plus uploads/user
.
ckanext.files.storage.user_images.type = files:public_fs\nckanext.files.storage.user_images.max_size = 10MiB\nckanext.files.storage.user_images.supported_types = image\nckanext.files.storage.user_images.path = /var/storage/ckan/storage/uploads/user\nckanext.files.storage.user_images.public_root = %(ckan.site_url)s/uploads/user\n
Check the list of untracked files available inside newly configured storage:
ckan files scan -s user_images -u\n
Track all these files:
ckan files scan -s user_images -t\n
Re-check that now you see no untracked files:
ckan files scan -s user_images -u\n
Transfer image ownership to corresponding users:
ckan files migrate users user_images\n
Update user template. Required field is defined in user/new_user_form.html
and user/edit_user_form.html
. It's a bit different from the filed used by group/organization, but you again need to add field_upload=\"files_image_upload\"
parameter to the macro image_upload
and replace h.uploads_enabled()
with h.files_user_images_storage_is_configured()
.
User has no dedicated interface for validation schema modification and here comes the biggest difference from group migration. You need to chain user_create
and user_update
action and modify schema from context
:
def _patch_schema(schema):\n schema[\"files_image_upload\"] = [\n tk.get_validator(\"ignore_empty\"),\n tk.get_validator(\"files_into_upload\"),\n tk.get_validator(\"files_validate_with_storage\")(\"user_images\"),\n tk.get_validator(\"files_upload_as\")(\n \"user_images\",\n \"user\",\n \"id\",\n \"public_url\",\n \"user_patch\",\n \"image_url\",\n ),\n ]\n\n\n@tk.chained_action\ndef user_update(next_action, context, data_dict):\n schema = context.setdefault('schema', ckan.logic.schema.default_update_user_schema())\n _patch_schema(schema)\n return next_action(context, data_dict)\n\n\n\n@tk.chained_action\ndef user_create(next_action, context, data_dict):\n schema = context.setdefault('schema', ckan.logic.schema.default_user_schema())\n _patch_schema(schema)\n return next_action(context, data_dict)\n
Validators are all the same, but now we are using user
instead of group
/organization
in parameters.
That's all. Just as with groups, you can update an avatar and verify that all new filenames resemble UUIDs.
"},{"location":"usage/capabilities/","title":"Capabilities","text":"To understand in advance whether specific storage can perform certain actions, ckanext-files uses ckanext.files.shared.Capability
. It's an enumeration of operations that can be supported by storage:
These capabilities are defined when storage is created and are automatically checked by actions that work with storage. If you want to check if storage supports certain capability, it can be done manually. If you want to check presence of multiple capabilities at once, you can combine them via bitwise-or operator.
from ckanext.files.shared import Capability, get_storage\n\nstorage = get_storage()\n\ncan_read = storage.supports(Capability.STREAM)\n\nread_and_write = Capability.CREATE | Capability.STREAM\ncan_read_and_write = storage.supports(read_and_write)\n
ckan files storages -v
CLI command lists all configured storages with their capabilities.
Before uploading files, you have to configure a storage: place where all uploaded files are stored. Storage relies on adapter that describes where and how data is be stored: filesystem, cloud, DB, etc. And, depending on the adapter, storage may have a couple of addition specific options. For example, filesystem adapter likely requires a path to the folder where uploads are stored. DB adapter may need DB connection parameters. Cloud adapter most likely will not work without an API key. These additional options are specific to adapter and you have to check its documentation to find out what are the possible options.
Let's start from the Redis adapter, because it has minimal requirements in terms of configuration.
Add the following line to the CKAN config file:
ckanext.files.storage.default.type = files:redis\n
The name of adapter is files:redis
. It follows recommended naming convention for adapters:<EXTENSION>:<TYPE>
. You can tell from the name above that we are using adapter defined in the files
extension with redis
type. But this naming convention is not enforced and its only purpose is avoiding name conflicts. Technically, adapter name can use any character, including spaces, newlines and emoji.
If you make a typo in the adapter's name, any CKAN CLI command will produce an error message with the list of available adapters:
Invalid configuration values provided:\nckanext.files.storage.default.type: Value must be one of ['files:fs', 'files:public_fs', 'files:redis']\nAborted!\n
Storage is configured, so we can actually upload the file. Let's use ckanapi for this task. Files are created via files_file_create
API action and this time we have to pass 2 parameters into it:
name
: the name of uploaded fileupload
: content of the fileThe final command is here:
echo -n 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n
And that's what you see as result:
{\n \"atime\": null,\n \"content_type\": \"text/plain\",\n \"ctime\": \"2024-06-02T15:02:14.819117+00:00\",\n \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n \"id\": \"e21162ab-abfb-476c-b8c5-5fe7cb89eca0\",\n \"location\": \"24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\",\n \"mtime\": null,\n \"name\": \"hello.txt\",\n \"size\": 11,\n \"storage\": \"default\",\n \"storage_data\": {}\n}\n
Content of the file can be checked via CKAN CLI. Use id
from the last API call's output in the command ckan files stream ID
:
ckan files stream e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n
Alternatively, we can use Redis CLI to get the content of the file. Note, you cannot get the content via CKAN API, because it's JSON-based and streaming files doesn't suit its principles.
By default, Redis adapter puts the content under the key <PREFIX><LOCATION>
. Pay attention to LOCATION
. It's the value available as location
in the API response(i.e, 24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46
in our case). It's different from the id
(ID used by DB to uniquely identify file record) and name
(human readable name of the file). In our scenario, location
looks like UUID because of the internal details of Redis adapter implementation. But different adapters may use more path-like value, i.e. something similar to path/to/folder/hello.txt
.
PREFIX
can be configured, but we skipped this step and got the default value: ckanext:files:default:file_content:
. So the final Redis key of our file is ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46
redis-cli\n\n127.0.0.1:6379> GET ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\n\"hello world\"\n
And before we moved further, let's remove the file, using its id
:
ckanapi action files_file_delete id=e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n
"},{"location":"usage/js/","title":"JavaScript utilities","text":"Note: ckanext-files does not provide stable CKAN JS modules at the moment. Try creating your own widgets and share with us your examples or requirements. We'll consider creating and including widgets into ckanext-files if they are generic enough for majority of the users.
ckanext-files registers few utilities inside CKAN JS namespace to help with building UI components.
First group of utilities registered inside CKAN Sandbox. Inside CKAN JS modules it's accessible as this.sandbox
. If you are writing code outside of JS modules, Sandbox can be initialized via call to ckan.sandbox()
const sandbox = ckan.sandbox()\n
When files
plugin loaded, sandbox contains files
attribute with two members:
upload
: high-level helper for uploding files.makeUploader
: factory for uploader-objects that gives more control over upload process.The simplest way to upload the file is using upload
helper.
await sandbox.files.upload(\n new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n)\n
This function uploads file to default
storage via files_file_create
action. Extra parameters for API call can be passed using second argument of upload
helper. Use an object with requestParams
key. Value of this key will be added to standard API request parameters. For example, if you want to use storage
with name memory
and field
with value custom
:
await sandbox.files.upload(\n new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n {requestParams: {storage: \"memory\", field: \"custom\"}}\n)\n
If you need more control over upload, you can create an uploader and interact with it directly, instead of using upload
helper.
Uploader is an object that uploads file to server. It extends base uploader, which defines standard interface for this object. Uploader perfroms all the API calls internally and returns uploaded file details. Out of the box you can use Standard
and Multipart
uploaders. Standard
uses files_file_create
API action and specializes on normal uploads. Multipart
relies on files_multipart_*
actions and can be used to pause and continue upload.
To create uploader instance, pass its name as a string to makeUploader
. And then you can call upload
method of the uploader to perform the actual upload. This method requires two arguments:
requestParams
from example above. If you want to use default parameters, pass an empty object. If you want to use memory
storage, pass {storage: \"memory\"}
, etc.const uploader = sandbox.files.makeUploader(\"Standard\")\nawait uploader.upload(new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}), {})\n
One of the reasons to use manually created uploader is progress tracking. Uploader supports event subscriptions via uploader.addEventListener(event, callback)
and here's the list of possible upload events:
start
: file upload started. Event has detail
property with object that contains uploaded file as file
.multipartid
: multipart upload initialized. Event has detail
property with object that contains uploaded file as file
and ID of multipart upload as id
.progress
: another chunk of file was transferred to server. Event has detail
property with object that contains uploaded file as file
, number of loaded bytes as loaded
and total number of bytes that must be transferred as total
.finish
: file upload successfully finished. Event has detail
property with object that contains uploaded file as file
and file details from API response as result
.fail
: file upload failed. Event has detail
property with object that contains uploaded file as file
and object with CKAN validation errors as reasons
.error
: error unrelated to validation happened during upload, like call to non-existing action. Event has detail
property with object that contains uploaded file as file
and error as message
.If you want to use upload
helper with customized uploader, there are two ways to do it.
adapter
property with uploader name inside second argument of upload
helper: await sandbox.files.upload(new File(...), {adapter: \"Multipart\"})\n
uploader
property with uploader instance inside second argument of upload
helper: const uploader = sandbox.files.makeUploader(\"Multipart\")\nawait sandbox.files.upload(new File(...), {uploader})\n
The second group of ckanext-files utilities is available as ckan.CKANEXT_FILES
object. This object mainly serves as extension and configuration point for sandbox.files
.
ckan.CKANEXT_FILES.adapters
is a collection of all classes that can be used to initialize uploader. It contains Standard
, Multipart
and Base
classes. Standard
and Multipart
can be used as is, while Base
must be extended by your custom uploader class. Add your custom uploader classes to adapters
, to make them available application-wide:
class MyUploader extends Base { ... }\n\nckan.CKANEXT_FILES.adapters[\"My\"] = MyUploader;\n\nawait sandbox.files.upload(new File(...), {adapter: \"My\"})\n
ckan.CKANEXT_FILES.defaultSettings
contain the object with default settings available as this.settings
inside any uploader. You can change the name of the storage used by all uploaders using this object. Note, changes will apply only to uploaders initialized after modification.
ckan.CKANEXT_FILES.defaultSettings.storage = \"memory\"\n
"},{"location":"usage/multi-storage/","title":"Multi-storage","text":"It's possible to configure multiple storages at once and specify which one you want to use for the individual file upload. Up until now we used the following storage options:
ckanext.files.storage.default.type
ckanext.files.storage.default.path
ckanext.files.storage.default.create_path
All of them have a common prefix ckanext.files.storage.default.
and it's a key for using multiple storages simultaneously.
Every option of the storage follows the pattern: ckanext.files.storage.<STORAGE_NAME>.<OPTION>
. As all the options above contain default
on position of <STORAGE_NAME>
, they are related to the default
storage.
If you want to configure a storage with the name custom
change the configuration of storage:
ckanext.files.storage.custom.type = files:fs\nckanext.files.storage.custom.path = /tmp/example\nckanext.files.storage.custom.create_path = true\n
And, if you want to use Redis-based storage named memory
and filesystem-based storage named default
, use the following configuration:
ckanext.files.storage.memory.type = files:redis\n\nckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n
The default
storage is special. ckanext-files use it by default, as name suggests. If you remove configuration for the default
storage and try to create a file, you'll see the following error:
echo 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n\n... ckan.logic.ValidationError: None - {'storage': ['Storage default is not configured']}\n
Storage default is not configured. That's why we need default
configuration. But if you want to upload a file into a different storage or you don't want to add the default
storage at all, you can always specify explicitly the name of the storage you are going to use.
When using API actions, add storage
parameter to the call:
echo 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt storage=memory\n
When writing python code, pass storage name to get_storage
function:
storage = get_storage(\"memory\")\n
When writing JS code, pass object {requestParams: {storage: \"memory\"}}
to upload
function:
const sandbox = ckan.sandbox()\nconst file = new File([\"content\"], \"file.txt\")\nconst options = {requestParams: {storage: \"memory\"}};\n\nawait sandbox.files.upload(file, options)\n
"},{"location":"usage/multipart/","title":"Multipart, resumable and signed uploads","text":"This feature has many names, but it basically divides a single upload into multiple stages. It can be used in following situations:
All these situations are handled by 4 API actions, which are available is storage has MULTIPART
capability:
files_multipart_start
: initialize multipart upload and set expected final size and MIMEtype. Real multipart upload usually just return upload ID from this action. Resumable upload creates empty file in the storage to accumulate content inside it. Signed upload produces a URL for direct upload.files_multipart_update
: upload the fragment of the file of modify the upload in some other way. Most often this action accepts ID of the upload and upload
field with fragment of the uploaded file.files_multipart_refresh
: this action synchronizes and returns current upload progress. It can be used if upload was paused and client does not know how many bytes were uploaded and from which byte the next upload fragment starts.files_multipart_complete
: finalize the upload and convert it into normal file, available to other parts of the application. Multipart upload usually combines all uploaded parts into single file here. Resumable upload verifies that the result has expected MIMEtype and size. Signed upload just registers completed file in the system.Implementation of multipart upload depends on the used adapter, so make sure you checked its documentation before using any multipart actions. There are some common steps in multipart upload workflow that are usually the same among all adapters:
files_multipart_start
requires content_type
and size
parameters. These values will be used to validate completed upload.files_multipart_start
allows hash
parameter. This value will be used to validate completed upload. Unlike content_type
and size
, hash
is usually optional, because it may be difficult for client to compute it.files_multipart_update
accepts upload ID as id
and fragment of the file as upload
. Sequence of calls to files_multipart_update
with non-overlapping fragments can be used to upload the file. Even if adapter implements signed uploads and client is supposed to send file to the signed URL instead of using files_multipart_update
.files_multipart_complete
compares content_type
, size
and hash
(if present) specified during initialization of upload with actual values. If they are different, upload is not converted into normal file. Depending on implementation, storage may just ignore incorrect initial expectations an assign a real values to the file as long as they are allowed by storage configuration. But it's recommended to reject such uploads, so it safer to assume, that incorrect expectations are not accepted.Incomplete files support most of normal file actions, but you need to pass completed=False
to action when working with incomplete files. I.e, if you want to remove incomplete upload, use its ID and completed=False
:
ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24 completed=False\n
Incompleted files do not support streaming and downloading via public interface of the extension. But storage adapter can expose such features via custom methods if it's technically possible.
Example of basic multipart upload is shown above. files:fs
adapter can be used for running this example, as it implements MULTIPART
.
First, create text file and check its size:
echo 'hello world!' > /tmp/file.txt\nwc -c /tmp/file.txt\n\n... 13 /tmp/file.txt\n
The size is 13
bytes and content type is text/plain
. These values must be used for upload initialization.
ckanapi action files_multipart_start name=file.txt size=13 content_type=text/plain\n\n... {\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n... \"hash\": \"\",\n... \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n... \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n... \"name\": \"file.txt\",\n... \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n... \"owner_type\": \"user\",\n... \"pinned\": false,\n... \"size\": 13,\n... \"storage\": \"default\",\n... \"storage_data\": {\n... \"uploaded\": 0\n... }\n... }\n
Here storage_data
contains {\"uploaded\": 0}
. It may be different for other adaptes, especially if they implement non-consecutive uploads, but generally it's the recommended way to keep upload progress.
Now we'll upload first 5 bytes of file.
ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n upload@<(dd if=/tmp/file.txt bs=1 count=5)\n\n... {\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n... \"hash\": \"\",\n... \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n... \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n... \"name\": \"file.txt\",\n... \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n... \"owner_type\": \"user\",\n... \"pinned\": false,\n... \"size\": 13,\n... \"storage\": \"default\",\n... \"storage_data\": {\n... \"uploaded\": 5\n... }\n... }\n
If you try finalizing upload right now, you'll get an error.
ckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... ckan.logic.ValidationError: None - {'upload': ['Actual value of upload size(5) does not match expected value(13)']}\n
Let's upload the rest of bytes and complete the upload.
ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n upload@<(dd if=/tmp/file.txt bs=1 skip=5)\n\nckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... {\n... \"atime\": null,\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-22T14:57:18.483716+00:00\",\n... \"hash\": \"c897d1410af8f2c74fba11b1db511e9e\",\n... \"id\": \"a740692f-e3d5-492f-82eb-f04e47c13848\",\n... \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n... \"mtime\": null,\n... \"name\": \"file.txt\",\n... \"owner_id\": null,\n... \"owner_type\": null,\n... \"pinned\": false,\n... \"size\": 13,\n... \"storage\": \"default\",\n... \"storage_data\": {}\n... }\n
Now file can be used normally. You can transfer file ownership to someone, stream or modify it. Pay attention to ID: completed file has its own unique ID, which is different from ID of the incomplete upload.
"},{"location":"usage/ownership/","title":"File ownership","text":"Every file can have an owner and there can be only one owner of the file. It's possible to create file without an owner, but usually application will only benefit from keeping every file with its owner. Owner is described with two fields: ID and type.
When file is created, by default the current user from API action's context is assigned as an owner of the file. From now on, the owner can perform other operations, such as renaming/displaying/removing with the file.
Apart from chaining auth function, to modify access rules for the file, plugin can implement IFiles.files_file_allows
and IFiles.files_owner_allows
methods.
def files_file_allows(\n self,\n context: Context,\n file: File | Multipart,\n operation: types.FileOperation,\n) -> bool | None:\n ...\n\ndef files_owner_allows(\n self,\n context: Context,\n owner_type: str, owner_id: str,\n operation: types.OwnerOperation,\n) -> bool | None:\n ...\n
These methods receive current action context, the tested object details, and the name of operation(show
, update
, delete
, file_transfer
). files_file_allows
checks permission for accessed file. It's usually called when user interacts with file directly. files_owner_allows
works with owner described by type and ID. It's usually called when user transfer file ownership, perform bulk file operation for owner files, or just trying to get the list of files that belongs to owner.
If method returns true/false, operation is allowed/denied. If method returns None
, default logic used to check access.
As already mentoined, by default, user who owns the file, can access it. But what about different owners? What if file owned by other entity, like resource or dataset?
Out of the box, nobody can access such files. But there are three config options that modify this restriction.
ckanext.files.owner.cascade_access = ENTITY_TYPE ANOTHER_TYPE
gives access to file owned by entity if user already has access to entity itself. Use words like package
, resource
, group
instead of ENTITY_TYPE
.
For example: file is owned by resource. If cascade access is enabled, whoever has access to resource_show
of the resource, can also see the file owned by this resource. If user passes resource_update
for resource, he can also modify the file owned by this resource, etc.
Important: be careful and do not add user
to ckanext.files.owner.cascade_access
. User's own files are considered private and most likely you don't really need anyone else to be able to see or modify these files.
The second option is ckanext.files.owner.transfer_as_update
. When transfer-as-update enabled, any user who has <OWNER_TYPE>_update
permission, can transfer own files to this OWNER_TYPE
. Intead of using this option, you can define <OWNER_TYPE>_file_transfer
.
And the third option is ckanext.files.owner.scan_as_update
. Just as with ownership transfer, it gives user permission to list all files of the owner if user can <OWNER_TYPE>_update
it. Intead of using this option, you can define <OWNER_TYPE>_file_scan
.
File creation is not allowed by default. Only sysadmin can use files_file_create
and files_multipart_start
actions. This is done deliberately: uncontrolled uploads can turn your portal into user's personal cloud-storage.
There are three ways to grant upload permission to normal users.
The BAD option is simple. Enable ckanext.files.authenticated_uploads.allow
config option and every registered user will be allowed to upload files. But only into default
storage. If you want to change the list of storages available to common user, specify storage names as ckanext.files.authenticated_uploads.storages
option.
The GOOD option is relatively simple. Define chained auth function with name files_file_create
. It's called whenever user initiates an upload. Now you can decide whether user is allowed to upload files with specified parameters.
The BEST option is to leave this restriction unchanged. Do not allow any user to call files_file_create
. Instead, create a new action for your goal. ckanext-files isn't a solution - it's a tool that helps you in building the solution.
If you need to add documents field to dataset that contains uploaded PDF files, create a separate action dataset_document_attach
. Specify access rules and validation for it. Or even hardcode the storage that will be used for uploads. And then, from this new action, call files_file_create
with ignore_auth: True
.
In this way you control every side of uploading documents into dataset and do not accidentally break other functionality, because every other feature will define its own action.
"},{"location":"usage/task-queue/","title":"Task queue","text":"One of the challenges introduced by independently managed files is related to file ownership. As long as you can call files_transfer_ownership
manually, things are transparent. But as soon as you add custom file field to dataset, you probably want to automatically transfer ownership of the file refered by this custom field.
Imagine, that you have PDF file owned by you. And you specify ID of this file in the attachment_id
field of the dataset. You want to show download link for this file on the dataset page. But if file owned by you, nobody will be able to download the file. So you decide to transfer file ownership to dataset, so that anyone who sees dataset, can see the file as well.
You cannot update dataset and transfer ownership after it, because there will be a time window between these two actions, when data is not valid. Or even worse, after updating dataset you'll lose internet connection and won't be able to finish the transfer.
Neither you can transfer ownership first and then update the dataset. attachment_id
may have additional validators and you don't know in advance, whether you'll be able to successfully update dataset after the transfer.
This problem can be solved via queuing additional tasks inside the action. For example, validator that checks if certain file ID can be used as attachment_id
can queue ownership transfer. If dataset update completed without errors, queued task is executed automatically and dataset becomes the owner of the file.
Task is queued via ckanext.files.shared.add_task
function, which accepts objects inherited from ckanext.files.shared.Task
. Task
class requires implementing abstract method run(result: Any, idx: int, prev: Any)
, which is called when task is executed. This method receives the result of action which caused task execution, task's position in queue and the result of previous task.
For example, one of attachment_id
validatos can queue the following MyTask
via add_task(MyTask(file_id))
to transfer file_id
ownership to the updated dataset:
from ckanext.files.shared import Task\n\nclass MyTask(Task):\n def __init__(self, file_id):\n self.file_id = file_id\n\n def run(self, dataset, idx, prev):\n return tk.get_action(\"files_transfer_ownership\")(\n {\"ignore_auth\": True},\n {\n \"id\": self.file_id,\n \"owner_type\": \"package\",\n \"owner_id\": dataset[\"id\"],\n \"pin\": True,\n },\n )\n
As the first argument, Task.run
receives the result of action which was called. Right now only following actions support tasks:
package_create
packaage_update
resource_create
resource_update
group_create
group_update
organization_create
organization_update
user_create
user_update
If you want to enable tasks support for your custom action, decorate it with ckanext.files.shared.with_task_queue
decorator:
from ckanext.files.shared import with_task_queue\n\n@with_task_queue\ndef my_action(context, data_dict)\n # you can call `add_task` inside this action's stack frame.\n ...\n
Good example of validator using tasks is files_transfer_ownership
validator factory. It can be added to metadata schema as files_transfer_ownership(owner_type, name_of_id_field)
. For example, if you are adding this validator to resource, call it as files_transfer_ownership(\"resource\", \"id\")
. The second argument is the name of the ID field. As in most cases it's id
, you can omit the second argument:
files_transfer_ownership(\"organization\")
files_transfer_ownership(\"package\")
files_transfer_ownership(\"user\")
There is a difference between creating files via action:
tk.get_action(\"files_file_create\")(\n {\"ignore_auth\": True},\n {\"upload\": \"hello\", \"name\": \"hello.txt\"}\n)\n
and via direct call to Storage.upload
:
from ckanext.files.shared import get_storage, make_upload\n\nstorage = get_storage()\nstorage.upload(\"hello.txt\", make_upload(b\"hello\"), {})\n
The former snippet creates a tracked file: file uploaded to the storage and its details are saved to database.
The latter snippet creates an untracked file: file uploaded to the storage, but its details are not saved anywhere.
Untracked files can be used to achieve specific goals. For example, imagine a storage adapter that writes files to the specified ZIP archive. You can create an interface, that initializes such storage for an existing ZIP resource and uploads files into it. You don't need a separate record in DB for every uploaded file, because all of them go into the resource, that is already stored in DB.
But such use-cases are pretty specific, so prefer to use API if you are not sure, what you need. The main reason to use tracked files is their discoverability: you can use files_file_search
API action to list all the tracked files and optionally filter them by storage, location, content_type, etc:
ckanapi action files_file_search\n\n... {\n... \"count\": 123,\n... \"results\": [\n... {\n... \"atime\": null,\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n... \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n... \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n... \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n... \"mtime\": null,\n... \"name\": \"hello.txt\",\n... \"size\": 11,\n... \"storage\": \"default\",\n... \"storage_data\": {}\n... },\n... ...\n... ]\n... }\n\nckanapi action files_file_search size:5 rows=1\n\n... {\n... \"count\": 2,\n... \"results\": [\n... {\n... \"atime\": null,\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n... \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n... \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n... \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n... \"mtime\": null,\n... \"name\": \"hello.txt\",\n... \"size\": 5,\n... \"storage\": \"default\",\n... \"storage_data\": {}\n... }\n... ]\n... }\n\nckanapi action files_file_search content_type=application/pdf\n\n... {\n... \"count\": 0,\n... \"results\": []\n... }\n
As for untracked files, their discoverability depends on the storage adapters. Some of them, files:fs
for example, can scan the storage and locate all uploaded files, both thacked and untracked. If you have files:fs
storage configured as default
, use the following command to scan its content:
ckan files scan\n
If you want to scan a different storage, specify its name via -s/--storage-name
option. Remember, that some storage adapters do not support scanning.
ckan files scan -s memory\n
If you want to see untracked files only, add -u/--untracked-only
flag.
ckan files scan -u\n
If you want to track any untracked files, by creating a DB record for every such file, add -t/--track
flag. After that you'll be able to discover previously untracked files via files_file_search
API action. Most usable this option will be during the migration, when you are configuring a new storage, that points to an existing location with files.
ckan files scan -t\n
"},{"location":"usage/transfer/","title":"Ownership transfer","text":"File ownership can be transfered. As there can be only one owner of the file, as soon as you transfer ownership over file, you yourself do not own this file.
To transfer ownership, use files_transfer_ownership
action and specify id
of the file, owner_id
and owner_type
of the new owner.
You can't just transfer ownership to anyone. You either must pass IFiles.files_owner_allows
check for file_transfer
operation, or pass a cascade access check for the future owner of the file when cascade access and transfer-as-update is enabled.
For example, if you have the following options in config file:
ckanext.files.owner.cascade_access = organization\nckanext.files.owner.transfer_as_update = true\n
you must pass organization_update
auth function if you want to transfer file ownership to organization. In addition, file can be pinned. In this way we mark important files. Imagine the resource and its uploaded file. The link to this file is used by resource and we don't want this file to be accidentally transfered to someone else. We pin the file and now nobody can transfer the file without explicit confirmation of his intention.
There are two ways to move pinned file:
files_file_unpin
first and then transfer the ownership via separate API callforce
parameter to files_transfer_ownership
You can upload files using JavaScript CKAN modules. ckanext-files extends CKAN's Sandbox object(available as this.sandbox
inside the JS CKAN module), so we can use shortcut and upload file directly from the DevTools. Open any CKAN page, switch to JS console and create the sandbox instance. Inside it we have files
object, which in turn contains upload
method. This method accepts File
object for upload(the same object you can get from the input[type=file]
).
sandbox = ckan.sandbox()\nawait sandbox.files.upload(\nnew File([\"content\"], \"file.txt\")\n)\n\n... {\n... \"id\": \"18cdaa65-5eed-4078-89a8-469b137627ce\",\n... \"name\": \"file.txt\",\n... \"location\": \"b53907c3-8434-4dee-9a9e-6c4d3055d200\",\n... \"content_type\": \"text/plain\",\n... \"size\": 7,\n... \"hash\": \"9a0364b9e99bb480dd25e1f0284c8555\",\n... \"storage\": \"default\",\n... \"ctime\": \"2024-06-02T16:12:27.902055+00:00\",\n... \"mtime\": null,\n... \"atime\": null,\n... \"storage_data\": {}\n... }\n
If you are still using FS storage configured in previous section, switch to /tmp/example
folder and check it's content:
ls /tmp/example\n... b53907c3-8434-4dee-9a9e-6c4d3055d200\n\ncat b53907c3-8434-4dee-9a9e-6c4d3055d200\n... content\n
And, as usually, let's remove file using the ID from the upload
promise:
sandbox.client.call(\"POST\", \"files_file_delete\", {\nid: \"18cdaa65-5eed-4078-89a8-469b137627ce\"\n})\n
"},{"location":"usage/use-in-code/","title":"Usage in code","text":"If you are writing the code and you want to interact with the storage directly, without the API layer, you can do it via a number of public functions of the extension available in ckanext.files.shared
.
Let's configure filesystem storage first. Filesystem adapter has a mandatory option path
that controls filesystem location, where files are stored. If path does not exist, storage will raise an exception by default. But it can also create missing path if you enable create_path
option. Here's our final version of settings:
ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n
Now we are going to connect to CKAN shell via ckan shell
CLI command and create an instance of the storage:
from ckanext.files.shared import get_storage\nstorage = get_storage()\n
Because you have all configuration in place, the rest is fairly straightforward. We will upload the file, read it's content and remove it from the CKAN shell.
To create the file, storage.upload
method must be called with 2 parameters:
You can use any string as the first parameter. As for the \"special stream-like object\", ckanext-files has ckanext.files.shared.make_upload
function, that accepts a number of different types(bytes
, werkzeug.datastructures.FileStorage
, BytesIO
, file descriptor) and converts them into expected format.
from ckanext.files.shared import make_upload\n\nupload = make_upload(b\"hello world\")\nresult = storage.upload('file.txt', upload)\n\nprint(result)\n\n... FileData(\n... location='60b385e7-8137-496c-bb1d-6ae4d7963ab3',\n... size=11,\n... content_type='text/plain',\n... hash='5eb63bbbe01eeed093cb22bb8f5acdc3',\n... storage_data={}\n... )\n
result
is an instance of ckanext.files.shared.FileData
dataclass. It contains all the information required by storage to manage the file.
result
object has location
attribute that contains the name of the file relative to the path
option specified in the storage configuration. If you visit /tmp/example
directory, which was set as a path
for the storage, you'll see there a file with the name matching location
from result. And its content matches the content of our upload, which is quite an expected outcome.
cat /tmp/example/60b385e7-8137-496c-bb1d-6ae4d7963ab3\n\n... hello world\n
But let's go back to the shell and try reading file from the python's code. We'll pass result
to the storage's stream
method, which produces an iterable of bytes based on our result:
buffer = storage.stream(result)\ncontent = b\"\".join(buffer)\n\n... b'hello world'\n
In most cases, storage only needs a location of the file object to read it. So, if you don't have result
generated during the upload, you still can read the file as long as you have its location. But remember, that some storage adapters may require additional information, and the following example must be adapted depending on the adapter:
from ckanext.files.shared import FileData\n\nlocation = \"60b385e7-8137-496c-bb1d-6ae4d7963ab3\"\ndata = FileData(location)\n\nbuffer = storage.stream(data)\ncontent = b\"\".join(buffer)\nprint(content)\n\n... b'hello world'\n
And finally we can to remove the file
storage.remove(result)\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-\\.\\_]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#ckanext-files","title":"ckanext-files","text":"Files as first-class citizens of CKAN. Upload, manage, remove files directly and attach them to datasets, resources, etc.
Read the documentation for a full user guide.
"},{"location":"#quickstart","title":"Quickstart","text":"Install the extension
pip install ckanext-files\n
Add files
to the ckan.plugins
setting in your CKAN config file.
Run DB migrations
ckan db upgrade -p files\n
Configure storage
ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n
Upload your first file
ckanapi action files_file_create upload@~/Downloads/file.txt`\n
Install dev
extras and nodeJS dependencies:
pip install -e '.[dev]'\nnpm ci\n
Run unittests:
pytest\n
Run frontend tests:
# start test server in separate terminal\nmake test-server\n\n# run tests\nnpx cypress run\n
Run typecheck:
npx pyright\n
"},{"location":"#license","title":"License","text":"AGPL
"},{"location":"api/","title":"API","text":""},{"location":"api/#files_file_createcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_create(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Create a new file.
This action passes uploaded file to the storage without strict validation. File is converted into standard upload object and everything else is controlled by storage. The same file may be uploaded to one storage and rejected by other, depending on configuration.
This action is way too powerful to use it directly. The recommended approach is to register a different action for handling specific type of uploads and call current action internally.
When uploading a real file(or using werkqeug.datastructures.FileStorage
), name parameter can be omited. In this case, the name of uploaded file is used.
ckanapi action files_file_create upload@path/to/file.txt\n
When uploading a raw content of the file using string or bytes object, name is mandatory.
ckanapi action files_file_create upload@<(echo -n \"hello world\") name=file.txt\n
Requires storage with CREATE
capability.
Params:
name
: human-readable name of the file. Default: guess using upload fieldstorage
: name of the storage that will handle the upload. Default: default
upload
: content of the file as string, bytes, file descriptor or uploaded fileReturns:
dictionary with file details.
"},{"location":"api/#files_file_deletecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_delete(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Remove file from storage.
Unlike packages, file has no state
field. Removal usually means that file details removed from DB and file itself removed from the storage.
Some storage can implement revisions of the file and keep archived versions or backups. Check storage documentation if you need to know whether there are chances that file is not completely removed with this operation.
Requires storage with REMOVE
capability.
ckanapi action files_file_delete id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n
Params:
id
: ID of the filecompleted
: use False
to remove incomplete uploads. Default: True
Returns:
dictionary with details of the removed file.
"},{"location":"api/#files_file_pincontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_pin(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Pin file to the current owner.
Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.
Params:
id
: ID of the filecompleted
: use False
to pin incomplete uploads. Default: True
Returns:
dictionary with details of updated file
"},{"location":"api/#files_file_renamecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_rename(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Rename the file.
This action changes human-readable name of the file, which is stored in DB. Real location of the file in the storage is not modified.
ckanapi action files_file_show \\\n id=226056e2-6f83-47c5-8bd2-102e2b82ab9a \\\n name=new-name.txt\n
Params:
id
: ID of the filename
: new name of the filecompleted
: use False
to rename incomplete uploads. Default: True
Returns:
dictionary with file details
"},{"location":"api/#files_file_scancontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_scan(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"List files of the owner
This action internally calls files_file_search, but with static values of owner filters. If owner is not specified, files filtered by current user. If owner is specified, user must pass authorization check to see files.
Params:
owner_id
: ID of the ownerowner_type
: type of the ownerThe all other parameters are passed as-is to files_file_search
.
Returns:
count
: total number of files matching filtersresults
: array of dictionaries with file details.files_file_search(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Search files.
This action is not stabilized yet and will change in future.
Provides an ability to search files using exact filter by name, content_type, size, owner, etc. Results are paginated and returned in package_search manner, as dict with count
and results
items.
All columns of File model can be used as filters. Before the search, type of column and type of filter value are compared. If they are the same, original values are used in search. If type different, column value and filter value are casted to string.
This request produces size = 10
SQL expression:
ckanapi action files_file_search size:10\n
This request produces size::text = '10'
SQL expression:
ckanapi action files_file_search size=10\n
Even though results are usually not changed, using correct types leads to more efficient search.
Apart from File columns, the following Owner properties can be used for searching: owner_id
, owner_type
, pinned
.
storage_data
and plugin_data
are dictionaries. Filter's value for these fields used as a mask. For example, storage_data={\"a\": {\"b\": 1}}
matches any File with storage_data
containing item a
with value that contains b=1
. This works only with data represented by nested dictionaries, without other structures, like list or sets.
Experimental feature: File columns can be passed as a pair of operator and value. This feature will be replaced by strictly defined query language at some point:
ckanapi action files_file_search size:'[\"<\", 100]' content_type:'[\"like\", \"text/%\"]'\n
Fillowing operators are accepted: =
, <
, >
, !=
, like
Params:
start
: index of first row in result/number of rows to skip. Default: 0
rows
: number of rows to return. Default: 10
sort
: name of File column used for sorting. Default: name
reverse
: sort results in descending order. Default: False
storage_data
: mask for storage_data
column. Default: {}
plugin_data
: mask for plugin_data
column. Default: {}
owner_type: str
: show only specific owner id if present. Default: None
owner_type
: show only specific owner type if present. Default: None
pinned
: show only pinned/unpinned items if present. Default: None
completed
: use False
to search incomplete uploads. Default: True
Returns:
count
: total number of files matching filtersresults
: array of dictionaries with file details.files_file_search_by_user(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Internal action. Do not use it.
"},{"location":"api/#files_file_showcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_show(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Show file details.
This action only displays information from DB record. There is no way to get the content of the file using this action(or any other API action).
ckanapi action files_file_show id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n
Params:
id
: ID of the filecompleted
: use False
to show incomplete uploads. Default: True
Returns:
dictionary with file details
"},{"location":"api/#files_file_unpincontext-context-data_dict-dictstr-any-dictstr-any","title":"files_file_unpin(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Pin file to the current owner.
Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.
Params:
id
: ID of the filecompleted
: use False
to unpin incomplete uploads. Default: True
Returns:
dictionary with details of updated file
"},{"location":"api/#files_multipart_completecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_complete(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Finalize multipart upload and transform it into completed file.
Depending on storage this action may require additional parameters. But usually it just takes ID and verify that content type, size and hash provided when upload was initialized, much the actual value.
If data is valid and file is completed inside the storage, new File entry with file details created in DB and file can be used just as any normal file.
Requires storage with MULTIPART
capability.
Params:
id
: ID of the incomplete uploadReturns:
dictionary with details of the created file
"},{"location":"api/#files_multipart_refreshcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_refresh(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Refresh details of incomplete upload.
Can be used if upload process was interrupted and client does not how many bytes were already uploaded.
Requires storage with MULTIPART
capability.
Params:
id
: ID of the incomplete uploadReturns:
dictionary with details of the updated upload
"},{"location":"api/#files_multipart_startcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_start(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Initialize multipart(resumable,continuous,signed,etc) upload.
Apart from standard parameters, different storages can require additional data, so always check documentation of the storage before initiating multipart upload.
When upload initialized, storage usually returns details required for further upload. It may be a presigned URL for direct upload, or just an ID of upload which must be used with files_multipart_update
.
Requires storage with MULTIPART
capability.
Params:
storage
: name of the storage that will handle the upload. Default: default
name
: name of the uploaded file.content_type
: MIMEtype of the uploaded file. Used for validationsize
: Expected size of upload. Used for validationhash
: Expected content hash. If present, used for validation.Returns:
dictionary with details of initiated upload. Depends on used storage
"},{"location":"api/#files_multipart_updatecontext-context-data_dict-dictstr-any-dictstr-any","title":"files_multipart_update(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Update incomplete upload.
Depending on storage this action may require additional parameters. Most likely, upload
with the fragment of uploaded file.
Requires storage with MULTIPART
capability.
Params:
id
: ID of the incomplete uploadReturns:
dictionary with details of the updated upload
"},{"location":"api/#files_resource_uploadcontext-context-data_dict-dictstr-any","title":"files_resource_upload(context: 'Context', data_dict: 'dict[str, Any]')
","text":"Create a new file inside resource storage.
This action internally calls files_file_create
with ignore_auth=True
and always uses resources storage.
New file is not attached to resource. You need to call files_transfer_ownership
manually, when resource created.
Params:
name
: human-readable name of the file. Default: guess using upload fieldupload
: content of the file as string, bytes, file descriptor or uploaded fileReturns:
dictionary with file details.
"},{"location":"api/#files_transfer_ownershipcontext-context-data_dict-dictstr-any-dictstr-any","title":"files_transfer_ownership(context: 'Context', data_dict: 'dict[str, Any]') -> 'dict[str, Any]'
","text":"Transfer file ownership.
Depending on storage this action may require additional parameters. Most likely, upload
with the fragment of uploaded file.
Params:
id
: ID of the file uploadcompleted
: use False
to transfer incomplete uploads. Default: True
owner_id
: ID of the new ownerowner_type
: type of the new ownerforce
: move file even if it's pinned. Default: False
pin
: pin file after transfer to stop future transfers. Default: False
Returns:
dictionary with details of updated file
"},{"location":"changelog/","title":"Changelog","text":"All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
"},{"location":"changelog/#unreleased","title":"Unreleased","text":"Compare with latest
"},{"location":"changelog/#features","title":"Features","text":"Compare with v0.3.0
"},{"location":"changelog/#features_1","title":"Features","text":"Compare with v0.2.6
"},{"location":"changelog/#features_2","title":"Features","text":"Compare with v0.0.5
"},{"location":"changelog/#bug-fixes_1","title":"Bug Fixes","text":"Compare with v0.2.4
"},{"location":"changelog/#bug-fixes_2","title":"Bug Fixes","text":"Compare with v0.2.3
"},{"location":"changelog/#features_3","title":"Features","text":"Compare with v0.2.2
"},{"location":"changelog/#features_4","title":"Features","text":"Compare with v0.2.1
"},{"location":"changelog/#v021-2024-03-18","title":"v0.2.1 - 2024-03-18","text":"Compare with v0.2.0
"},{"location":"changelog/#features_5","title":"Features","text":"Compare with v0.0.5
"},{"location":"changelog/#features_6","title":"Features","text":"Compare with v0.0.4
"},{"location":"changelog/#bug-fixes_4","title":"Bug Fixes","text":"Compare with v0.0.2
"},{"location":"changelog/#v002-2022-02-09","title":"v0.0.2 - 2022-02-09","text":"Compare with v0.0.1
"},{"location":"changelog/#v001-2021-09-21","title":"v0.0.1 - 2021-09-21","text":"Compare with first commit
"},{"location":"cli/","title":"CLI","text":"ckanext-files register files
entrypoint under ckan
command. Commands below must be executed as ckan -c $CKAN_INI files <COMMAND>
.
adapters [-v]
List all available storage adapters. With -v/--verbose
flag docstring from adapter classes are printed as well.
storages [-v]
List all configured storages. With -v/--verbose
flag all supported capabilities are shown.
stream FILE_ID [-o OUTPUT] [--start START] [--end END]
Stream content of the file to STDOUT. For non-textual files use output redirection stream ID > file.ext
. Alternatively, output destination can be specified via -o/--output
option. If it contains path to directory, inside this directory will be created file with the same name as streamed item. Otherwise, OUTPUT
is used as filename.
--start
and --end
can be used to receive a fragment of the file. Only positive values are guaranteed to work with any storage that supports STREAM. Some storages support negative values for these options and count them from the end of file. I.e --start -10
reads last 10 bytes of file. --end -1
reads till the last byte, but the last byte is not included into output.
scan [-s default] [-u] [-t [-a OWNER_ID]]
List all files that exist in storage. Works only if storage supports SCAN
. By default shows content of default
storage. -s/--storage-name
option changes target storage.
-u/--untracked-only
flag shows only untracked files, that has no corresponding record in DB. Can be used to identify leftovers after removing data from portal.
-t/--track
flag registers any untracked file by creating DB record for it. Can be used only when ANALYZE
is supported. Files are created without an owner. Use -a/--adopt-by
option with user ID to give ownership over new files to the specified user. Can be used when configuring a new storage connected to existing location with files.
Storage consist of the storage object that dispatches operation requests and 3 services that do the actual job: Reader, Uploader and Manager. To define a custom storage, you need to extend the main storage class, describe storage logic and register storage via IFiles.files_get_storage_adapters
.
Let's implement DB storage. It will store files in SQL table using SQLAlchemy. There will be just one requirement for the table: it must have column for storing unique identifier of the file and another column for storing content of the file as bytes.
For the sake of simplicity, our storage will work only with existing tables. Create the table manually before we begin.
First of all, we create an adapter that does nothing and register it in our plugin.
from __future__ import annotations\n\nfrom typing import Any\nimport sqlalchemy as sa\n\nimport ckan.plugins as p\nfrom ckan.model.types import make_uuid\nfrom ckanext.files import shared\n\n\nclass ExamplePlugin(p.SingletonPlugin):\n p.implements(shared.IFiles)\n def files_get_storage_adapters(self) -> dict[str, Any]:\n return {\"example:db\": DbStorage}\n\n\nclass DbStorage(shared.Storage):\n ...\n
After installing and enabling your custom plugin, you can configure storage with this adapter by adding a single new line to config file:
ckanext.files.storage.db.type = files:db\n
But if you check storage via ckan files storages -v
, you'll see that it can't do anything.
ckan files storages -v\n\n... db: example:db\n... Supports: Capability.NONE\n... Does not support: Capability.REMOVE|STREAM|CREATE|...\n
Before we start uploading files, let's make sure that storage has proper configuration. As files will be stored in the DB table, we need the name of the table and DB connection string. Let's assume that table already exists, but we don't know which columns to use for files. So we need name of column for content and for file's unique identifier. ckanext-files uses term location
instead of identifier, so we'll do the same in our implementation.
There are 4 required options in total: * db_url
: DB connection string * table
: name of the table * location_column
: name of column for file's unique identifier * content_column
: name of column for file's content
It's not mandatory, but is highly recommended that you declare config options for the adapter. It can be done via Storage.declare_config_options
class method, which accepts declaration
object and key
namespace for storage options.
class DbStorage(shared.Storage):\n\n @classmethod\n def declare_config_options(cls, declaration, key) -> None:\n declaration.declare(key.db_url).required()\n declaration.declare(key.table).required()\n declaration.declare(key.location_column).required()\n declaration.declare(key.content_column).required()\n
And we probably want to initialize DB connection when storage is initialized. For this we'll extend constructor, which must be defined as method accepting keyword-only arguments:
class DbStorage(shared.Storage):\n ...\n\n def __init__(self, **settings: Any) -> None:\n db_url = self.ensure_option(settings, \"db_url\")\n\n self.engine = sa.create_engine(db_url)\n self.location_column = sa.column(\n self.ensure_option(settings, \"location_column\")\n )\n self.content_column = sa.column(self.ensure_option(settings, \"content_column\"))\n self.table = sa.table(\n self.ensure_option(settings, \"table\"),\n self.location_column,\n self.content_column,\n )\n super().__init__(**settings)\n
You can notice that we are using Storage.ensure_option
quite often. This method returns the value of specified option from settings or raises an exception.
The table definition and columns are saved as storage attributes, to simplify building SQL queries in future.
Now we are going to define classes for all 3 storage services and tell storage, how to initialize these services.
There are 3 services: Reader, Uploader and Manager. Each of them initialized via corresponding storage method: make_reader
, make_uploader
and make_manager
. And each of them accepts a single argument during creation, the storage itself.
class DbStorage(shared.Storage):\n def make_reader(self):\n return DbReader(self)\n\n def make_uploader(self):\n return DbUploader(self)\n\n def make_manager(self):\n return DbManager(self)\n\n\nclass DbReader(shared.Reader):\n ...\n\n\nclass DbUploader(shared.Uploader):\n ...\n\n\nclass DbManager(shared.Manager):\n ...\n
Our first target is Uploader service. It's responsible for file creation. For the minimal implementation it needs upload
method and capabilities
attribute which tells the storage, what exactly the Uploader can do.
class DbUploader(shared.Uploader):\n capabilities = shared.Capability.CREATE\n\n def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -> shared.FileData:\n ...\n
upload
receives the location
(name) of the uploaded file; upload
object with file's content; and extras
dictionary that contains any additional arguments that can be passed to uploader. We are going to ignore location
and generate a unique UUID for every uploaded file instead of using user-defined filename.
The goal is to write the file into DB and return shared.FileData
that contains location of the file in DB(value of location_column
), size of the file in bytes, MIMEtype of the file and hash of file content.
For location we'll just use ckan.model.types.make_uuid
function. Size and MIMEtype are already available as upload.size
and upload.content_type
.
The only problem is hash of the content. You can compute it in any way you like, but there is a simple option if you have no preferences. upload
has hashing_reader
method, which returns an iterable for file content. When you read file through it, content hash is automatically computed and you can get it using get_hash
method of the reader.
Just make sure to read the whole file before checking the hash, because hash computed using consumed content. I.e, if you just create the hashing reader, but do not read a single byte from it, you'll receive the hash of empty string. If you read just 1 byte, you'll receive the hash of this single byte, etc.
The easiest option for you is to call reader.read()
method to consume the whole file and then call reader.get_hash()
to receive the hash.
Here's the final implementation of DbUploader:
class DbUploader(shared.Uploader):\n capabilities = shared.Capability.CREATE\n\n def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -> shared.FileData:\n uuid = make_uuid()\n reader = upload.hashing_reader()\n\n values = {\n self.storage.location_column: uuid,\n self.storage.content_column: reader.read(),\n }\n stmt = sa.insert(self.storage.table, values)\n\n result = self.storage.engine.execute(stmt)\n\n return shared.FileData(\n uuid,\n upload.size,\n upload.content_type,\n reader.get_hash()\n )\n
Now you can upload file into your new db
storage:
ckanapi action files_file_create storage=db name=hello.txt upload@<(echo -n 'hello world')\n\n...{\n... \"atime\": null,\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-17T13:48:52.121755+00:00\",\n... \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n... \"id\": \"bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\",\n... \"location\": \"5a4472b3-cf38-4c58-81a6-4d4acb7b170e\",\n... \"mtime\": null,\n... \"name\": \"hello.txt\",\n... \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n... \"owner_type\": \"user\",\n... \"pinned\": false,\n... \"size\": 11,\n... \"storage\": \"db\",\n... \"storage_data\": {}\n...}\n
File is created, but you cannot read it just yet. Try running ckan files stream
CLI command with file ID:
ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... Operation stream is not supported by db storage\n... Aborted!\n
As expected, you have to write extra code.
Streaming, reading and generating links is a responsibility of Reader service. We only need stream
method for minimal implementation. This method receives shared.FileData
object(the same object as the one returned from Uploader.upload
) and extras
containing all additional arguments passed by the caller. The result is any iterable producing bytes.
We'll use location
property of shared.FileData
as a value for location_column
inside the table.
And don't forget to add STREAM
capability to Reader.capabilities
.
class DbReader(shared.Reader):\n capabilities = shared.Capability.STREAM\n\n def stream(self, data: shared.FileData, extras: dict[str, Any]) -> Iterable[bytes]:\n stmt = (\n sa.select(self.storage.content_column)\n .select_from(self.storage.table)\n .where(self.storage.location_column == data.location)\n )\n row = self.storage.engine.execute(stmt).fetchone()\n\n return row\n
The result may be confusing: we returning Row object from the stream method. But our goal is to return any iterable that produces bytes. Row is iterable(tuple-like). And it contains only one item - value of column with file content, i.e, bytes. So it satisfy the requirements.
Now you can check content via CLI once again.
ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... hello world\n
Finally, we need to add file removal for the minimal implementation. And it also nice to to have SCAN
capability, as it shows all files currently available in storage, so we add it as bonus. These operations handled by Manager. We need remove
and scan
methods. Arguments are already familiar to you. As for results:
remove
: return True
if file was successfully removed. Should return False
if file does not exist, but it's allowed to return True
as long as you are not checking the result.scan
: return iterable with all file locationsclass DbManager(shared.Manager):\n storage: DbStorage\n capabilities = shared.Capability.SCAN | shared.Capability.REMOVE\n\n def scan(self, extras: dict[str, Any]) -> Iterable[str]:\n stmt = sa.select(self.storage.location_column).select_from(self.storage.table)\n for row in self.storage.engine.execute(stmt):\n yield row[0]\n\n def remove(\n self,\n data: shared.FileData | shared.MultipartData,\n extras: dict[str, Any],\n ) -> bool:\n stmt = sa.delete(self.storage.table).where(\n self.storage.location_column == data.location,\n )\n self.storage.engine.execute(stmt)\n return True\n
Now you can list the all the files in storage:
ckan files scan -s db\n
And remove file using ckanaapi and file ID
ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n
That's all you need for the basic storage. But check definition of base storage and services to find details about other methods. And also check implementation of other storages for additional ideas. <
"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#requirements","title":"Requirements","text":"Compatibility with core CKAN versions:
CKAN version Compatible? 2.9 no 2.10 yes 2.11 yes master yesNote
It's recommended to install the extension via pip. If you are using GitHub version of the extension, stick to the vX.Y.Z tags to avoid breaking changes. Check the changelog before upgrading the extension.
"},{"location":"installation/#installation_1","title":"Installation","text":"Install the extension
pip install ckanext-files # (1)!\n
pip install ckanext-files[opendal,libcloud]\n
Add files
to the ckan.plugins
setting in your CKAN config file.
Run DB migrations
ckan db upgrade -p files\n
"},{"location":"interfaces/","title":"Interfaces","text":""},{"location":"interfaces/#interfaces","title":"Interfaces","text":"ckanext-files registers ckanext.files.shared.IFiles
interface. As extension is actively developed, this interface may change in future. Always use inherit=True
when implementing IFiles
.
class IFiles(Interface):\n \"\"\"Extension point for ckanext-files.\"\"\"\n\n def files_get_storage_adapters(self) -> dict[str, Any]:\n \"\"\"Return mapping of storage type to adapter class.\n\n Example:\n >>> def files_get_storage_adapters(self):\n >>> return {\n >>> \"my_ext:dropbox\": DropboxStorage,\n >>> }\n\n \"\"\"\n\n return {}\n\n def files_register_owner_getters(self) -> dict[str, Callable[[str], Any]]:\n \"\"\"Return mapping with lookup functions for owner types.\n\n Name of the getter is the name used as `Owner.owner_type`. The getter\n itself is a function that accepts owner ID and returns optional owner\n entity.\n\n Example:\n >>> def files_register_owner_getters(self):\n >>> return {\"resource\": model.Resource.get}\n \"\"\"\n return {}\n\n def files_file_allows(\n self,\n context: types.Context,\n file: File | Multipart,\n operation: types.FileOperation,\n ) -> bool | None:\n \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n Return True/False if user allowed/not allowed. Return `None` to rely on\n other plugins.\n\n Default implementation relies on cascade_access config option. If owner\n of file is included into cascade access, user can perform operation on\n file if he can perform the same operation with file's owner.\n\n If current owner is not affected by cascade access, user can perform\n operation on file only if user owns the file.\n\n Example:\n >>> def files_file_allows(\n >>> self, context,\n >>> file: shared.File | shared.Multipart,\n >>> operation: shared.types.FileOperation\n >>> ) -> bool | None:\n >>> if file.owner_info and file.owner_info.owner_type == \"resource\":\n >>> return is_authorized_boolean(\n >>> f\"resource_{operation}\",\n >>> context,\n >>> {\"id\": file.owner_info.id}\n >>> )\n >>>\n >>> return None\n\n \"\"\"\n return None\n\n def files_owner_allows(\n self,\n context: types.Context,\n owner_type: str,\n owner_id: str,\n operation: types.OwnerOperation,\n ) -> bool | None:\n \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n Return True/False if user allowed/not allowed. Return `None` to rely on\n other plugins.\n\n Example:\n >>> def files_owner_allows(\n >>> self, context,\n >>> owner_type: str, owner_id: str,\n >>> operation: shared.types.OwnerOperation\n >>> ) -> bool | None:\n >>> if owner_type == \"resource\" and operation == \"file_transfer\":\n >>> return is_authorized_boolean(\n >>> f\"resource_update\",\n >>> context,\n >>> {\"id\": owner_id}\n >>> )\n >>>\n >>> return None\n\n \"\"\"\n return None\n
"},{"location":"primer/","title":"Welcome to MkDocs","text":"For full documentation visit mkdocs.org{ data-preview }
Attribute Lists{ data-preview }
Some title
Some content
Some title
Some content
Open styled details Nested details!And more content again.
theme:\nfeatures:\n- content.code.annotate # (1)!\n
code
, formatted text, images, ... basically anything that can be written in Markdown.#include <stdio.h>\n\nint main(void) {\nprintf(\"Hello world!\\n\");\nreturn 0;\n}\n
C++ #include <iostream>\n\nint main(void) {\nstd::cout << \"Hello world!\" << std::endl;\nreturn 0;\n}\n
graph LR\nA[Start] --> B{Error?};\nB -->|Yes| C[Hmm...];\nC --> D[Debug];\nD --> B;\nB ---->|No| E[Yay!];
sequenceDiagram\nautonumber\nAlice->>John: Hello John, how are you?\nloop Healthcheck\nJohn->>John: Fight against hypochondria\nend\nNote right of John: Rational thoughts!\nJohn-->>Alice: Great!\nJohn->>Bob: How about you?\nBob-->>John: Jolly good!
```py title=\"IFiles\" class IFiles(Interface): \"\"\"Extension point for ckanext-files.\"\"\"
def files_get_storage_adapters(self) -> dict[str, Any]:\n \"\"\"Return mapping of storage type to adapter class.\n\n Example:\n >>> def files_get_storage_adapters(self):\n >>> return {\n >>> \"my_ext:dropbox\": DropboxStorage,\n >>> }\n\n \"\"\"\n\n return {}\n\ndef files_register_owner_getters(self) -> dict[str, Callable[[str], Any]]:\n \"\"\"Return mapping with lookup functions for owner types.\n\n Name of the getter is the name used as `Owner.owner_type`. The getter\n itself is a function that accepts owner ID and returns optional owner\n entity.\n\n Example:\n >>> def files_register_owner_getters(self):\n >>> return {\"resource\": model.Resource.get}\n \"\"\"\n return {}\n\ndef files_file_allows(\n self,\n context: types.Context,\n file: File | Multipart,\n operation: types.FileOperation,\n) -> bool | None:\n \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n Return True/False if user allowed/not allowed. Return `None` to rely on\n other plugins.\n\n Default implementation relies on cascade_access config option. If owner\n of file is included into cascade access, user can perform operation on\n file if he can perform the same operation with file's owner.\n\n If current owner is not affected by cascade access, user can perform\n operation on file only if user owns the file.\n\n Example:\n >>> def files_file_allows(\n >>> self, context,\n >>> file: shared.File | shared.Multipart,\n >>> operation: shared.types.FileOperation\n >>> ) -> bool | None:\n >>> if file.owner_info and file.owner_info.owner_type == \"resource\":\n >>> return is_authorized_boolean(\n >>> f\"resource_{operation}\",\n >>> context,\n >>> {\"id\": file.owner_info.id}\n >>> )\n >>>\n >>> return None\n\n \"\"\"\n return None\n\ndef files_owner_allows(\n self,\n context: types.Context,\n owner_type: str,\n owner_id: str,\n operation: types.OwnerOperation,\n) -> bool | None:\n \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n Return True/False if user allowed/not allowed. Return `None` to rely on\n other plugins.\n\n Example:\n >>> def files_owner_allows(\n >>> self, context,\n >>> owner_type: str, owner_id: str,\n >>> operation: shared.types.OwnerOperation\n >>> ) -> bool | None:\n >>> if owner_type == \"resource\" and operation == \"file_transfer\":\n >>> return is_authorized_boolean(\n >>> f\"resource_update\",\n >>> context,\n >>> {\"id\": owner_id}\n >>> )\n >>>\n >>> return None\n\n \"\"\"\n return None\n\n\n\n ```\n\n === \"Hello\"\n\n world\n\n === \"bye\"\n\n world\n
"},{"location":"shared/","title":"Shared","text":"All public utilites are collected inside ckanext.files.shared
module. Avoid using anything that is not listed there. Do not import anything from modules other than shared
.
get_storage(name: 'str | None' = None) -> 'Storage'
","text":"Return existing storage instance.
Storages are initialized when plugin is loaded. As result, this function always returns the same storage object for the given name.
If no name specified, default storage is returned.
Example:
default_storage = get_storage()\nstorage = get_storage(\"storage name\")\n
"},{"location":"shared/#make_storagename-str-settings-dictstr-any-storage","title":"make_storage(name: 'str', settings: 'dict[str, Any]') -> 'Storage'
","text":"Initialize storage instance with specified settings.
Storage adapter is defined by type
key of the settings. All other settings depend on the specific adapter.
Example:
storage = make_storage(\"memo\", {\"type\": \"files:redis\"})\n
"},{"location":"shared/#make_uploadvalue-typesuploadable-upload-upload","title":"make_upload(value: 'types.Uploadable | Upload') -> 'Upload'
","text":"Convert value into Upload object
Use this function for simple and reliable initialization of Upload object. Avoid creating Upload manually, unless you are 100% sure you can provide correct MIMEtype, size and stream.
Example:
storage.upload(\"file.txt\", make_upload(b\"hello world\"))\n
"},{"location":"shared/#with_task_queuefunc-any-name-str-none-none","title":"with_task_queue(func: 'Any', name: 'str | None' = None)
","text":"Decorator for functions that schedule tasks.
Decorated function automatically initializes separate task queue that is processed when function is finished. All tasks receive function's result as execution data(first argument to Task.run).
Without this decorator, you have to manually create task queue context before queuing tasks.
Example:
@with_task_queue\ndef my_action(context, data_dict):\n ...\n
"},{"location":"shared/#add_tasktask-task","title":"add_task(task: 'Task')
","text":"Add task to the current task queue.
This function can be called only inside task queue context. Such context initialized automatically inside functions decorated with with_task_queue
:
@with_task_queue\ndef taks_producer():\n add_task(...)\n\ntask_producer()\n
If task queue context can be initialized manually using TaskQueue and with
statement:
queue = TaskQueue()\nwith queue:\n add_task(...)\n\nqueue.process(execution_data)\n
"},{"location":"upload-strategies/","title":"File upload strategies","text":"There is no \"right\" way to add file to entity via ckanext-files. Everything depends on your use-case and here you can find a few different ways to combine file and arbitrary entity.
"},{"location":"upload-strategies/#attach-existing-file-and-then-transfer-ownership-via-api","title":"Attach existing file and then transfer ownership via API","text":"The simplest option is just saving file ID inside a field of the entity. It's recommended to transfer file ownership to the entity and pin the file.
ckanapi action package_patch id=PACKAGE_ID attachment_id=FILE_ID\n\nckanapi action files_transfer_ownership id=FILE_ID \\\n owner_type=package owner_id=PACKAGE_ID pin=true\n
Pros: * simple and transparent
Cons: * it's easy to forget about ownership transfer and leave the entity with the inaccessible file * after entity got reference to file and before ownership is transfered data may be considered invalid.
"},{"location":"upload-strategies/#automatically-transfer-ownership-using-validator","title":"Automatically transfer ownership using validator","text":"Add files_transfer_ownership(owner_type)
to the validation schema of entity. When it validated, ownership transfer task is queued and file automatically transfered to the entity after the update.
Pros: * minimal amount of changes if metadata schema already modified * relationships between owner and file are up-to-date after any modification
Cons: * works only with files uploaded in advance and cannot handle native implementation of resource form
"},{"location":"upload-strategies/#upload-file-and-assign-owner-via-queued-task","title":"Upload file and assign owner via queued task","text":"Add a field that accepts uploaded file. The action itself does not process the upload. Instead create a validator for the upload field, that will schedule a task for file upload and ownership transfer.
In this way, if action is failed, no upload happens and you don't need to do anything with the file, as it never left server's temporal directory. If action finished without an error, the task is executed and file uploaded/attached to action result.
Pros: * can be used together with native group/user/resource form after small modification of CKAN core. * handles upload inside other action as an atomic operation
Cons: * you have to validate file before upload happens to prevent situation when action finished successfully but then upload failed because of file's content type or size. * tasks themselves are experimental and it's not recommended to put a lot of logic into them * there are just too many things that can go wrong
"},{"location":"upload-strategies/#add-a-new-action-that-combines-uploads-modifications-and-ownership-transfer","title":"Add a new action that combines uploads, modifications and ownership transfer","text":"If you want to add attachmen to dataset, create a separate action that accepts dataset ID and uploaded file. Internally it will upload the file by calling files_file_create
, then update dataset via packaage_patch
and finally transfer ownership via files_transfer_ownership
.
Pros: * no magic. Everything is described in the new action * can be extracted into shared extension and used across multiple portals
Cons: * if you need to upload multiple files and update multipe fields, action quickly becomes too compicated. * integration with existing workflows, like dataset/resource creation is hard. You have to override existing views or create a brand new ones.
"},{"location":"validators/","title":"Validators","text":"Validator Effect files_into_upload Transform value of field(usually file uploaded via<input type=\"file\">
) into upload object using ckanext.files.shared.make_upload
files_parse_filesize Convert human-readable filesize(1B, 10MiB, 20GB) into an integer files_ensure_name(name_field) If name_field
is empty, copy into it filename from current field. Current field must be processed with files_into_upload
first files_file_id_exists Verify that file ID exists files_accept_file_with_type(*type) Verify that file ID refers to file with one of specified types. As a type can be used full MIMEtype(image/png
), or just its main(image
) or secondary(png
) part files_accept_file_with_storage(*storage_name) Verify that file ID refers to file stored inside one of specified storages files_transfer_ownership(owner_type, name_of_owner_id_field) Transfer ownership for file ID to specified entity when current API action is successfully finished"},{"location":"configuration/","title":"Configuration","text":"There are two types of config options for ckanext-files:
Depending on the type of the storage, available options are quite different. For example, files:fs
storage type requires path
option that controls filesystem path where uploads are stored. files:redis
storage type accepts prefix
option that defines Redis' key prefix of files stored in Redis. All storage specific options always have form ckanext.files.storage.<STORAGE>.<OPTION>
:
ckanext.files.storage.memory.prefix = xxx:\n# or\nckanext.files.storage.my_drive.path = /tmp/hello\n
"},{"location":"configuration/fs/","title":"Filesystem storage configuration","text":"Private filesystem storage
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n
Public filesystem storage
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:public_fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n## URL of the storage folder. `public_root + location` must produce a public URL\nckanext.files.storage.NAME.public_root =\n
"},{"location":"configuration/global/","title":"Global configuration","text":"# Default storage used for upload when no explicit storage specified\n# (optional, default: default)\nckanext.files.default_storage = default\n\n# MIMEtypes that can be served without content-disposition:attachment header.\n# (optional, default: application/pdf image video)\nckanext.files.inline_content_types = application/pdf image video\n\n# Storage used for user image uploads. When empty, user image uploads are not\n# allowed.\n# (optional, default: user_images)\nckanext.files.user_images_storage = user_images\n\n# Storage used for group image uploads. When empty, group image uploads are\n# not allowed.\n# (optional, default: group_images)\nckanext.files.group_images_storage = group_images\n\n# Storage used for resource uploads. When empty, resource uploads are not\n# allowed.\n# (optional, default: resources)\nckanext.files.resources_storage = resources\n\n# Enable HTML templates and JS modules required for unsafe default\n# implementation of resource uploads via files. IMPORTANT: this option exists\n# to simplify migration and experiments with the extension. These templates\n# may change a lot or even get removed in the public release of the\n# extension.\n# (optional, default: false)\nckanext.files.enable_resource_migration_template_patch = false\n\n# Any authenticated user can upload files.\n# (optional, default: false)\nckanext.files.authenticated_uploads.allow = false\n\n# Names of storages that can by used by non-sysadmin users when authenticated\n# uploads enabled\n# (optional, default: default)\nckanext.files.authenticated_uploads.storages = default\n\n# List of owner types that grant access on owned file to anyone who has\n# access to the owner of file. For example, if this option has value\n# `resource package`, anyone who passes `resource_show` auth, can see all\n# files owned by resource; anyone who passes `package_show`, can see all\n# files owned by package; anyone who passes\n# `package_update`/`resource_update` can modify files owned by\n# package/resource; anyone who passes `package_delete`/`resource_delete` can\n# delete files owned by package/resoure. IMPORTANT: Do not add `user` to this\n# list. Files may be temporarily owned by user during resource creation.\n# Using cascade access rules with `user` exposes such temporal files to\n# anyone who can read user's profile.\n# (optional, default: package resource group organization)\nckanext.files.owner.cascade_access = package resource group organization\n\n# Use `<OWNER_TYPE>_update` auth function to check access for ownership\n# transfer. When this flag is disabled `<OWNER_TYPE>_file_transfer` auth\n# function is used.\n# (optional, default: true)\nckanext.files.owner.transfer_as_update = true\n\n# Use `<OWNER_TYPE>_update` auth function to check access when listing all\n# files of the owner. When this flag is disabled `<OWNER_TYPE>_file_scan`\n# auth function is used.\n# (optional, default: true)\nckanext.files.owner.scan_as_update = true\n
"},{"location":"configuration/libcloud/","title":"Apache libcloud storage configuration","text":"To use this storage install extension with libcloud
extras.
pip install 'ckanext-files[libcloud]'\n
The actual storage backend is controlled by provider
option of the storage. List of all providers is available here
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:libcloud\n## apache-libcloud storage provider. List of providers available at https://libcloud.readthedocs.io/en/stable/storage/supported_providers.html#provider-matrix . Use upper-cased value from Provider Constant column\nckanext.files.storage.NAME.provider =\n## API key or username\nckanext.files.storage.NAME.key =\n## Secret password\nckanext.files.storage.NAME.secret =\n## JSON object with additional parameters passed directly to storage constructor.\nckanext.files.storage.NAME.params =\n## Name of the container(bucket)\nckanext.files.storage.NAME.container =\n
"},{"location":"configuration/opendal/","title":"OpenDAL storage configuration","text":"To use this storage install extension with opendal
extras.
pip install 'ckanext-files[opendal]'\n
The actual storage backend is controlled by scheme
option of the storage. List of all schemes is available here
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:opendal\n## OpenDAL service type. Check available services at https://docs.rs/opendal/latest/opendal/services/index.html\nckanext.files.storage.NAME.scheme =\n## JSON object with parameters passed directly to OpenDAL operator.\nckanext.files.storage.NAME.params =\n
"},{"location":"configuration/redis/","title":"Redis storage configuration","text":"## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.NAME.prefix = ckanext:files:default:file_content:\n
"},{"location":"configuration/storage/","title":"Storage configuration","text":"All available options for the storage type can be checked via config declarations CLI. First, add the storage type to the config file:
ckanext.files.storage.xxx.type = files:redis\n
Now run the command that shows all available config option of the plugin.
ckan config declaration files -d\n
Because Redis storage adapter is enabled, you'll see all the options regsitered by Redis adapter alongside with the global options:
## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.xxx.prefix = ckanext:files:default:file_content:\n
Sometimes you will see a validation error if storage has required config options. Let's try using files:fs
storage instead of the redis:
ckanext.files.storage.xxx.type = files:fs\n
Now any attempt to run ckan config declaration files -d
will show an error, because required path
option is missing:
Invalid configuration values provided:\nckanext.files.storage.xxx.path: Missing value\nAborted!\n
Add the required option to satisfy the application
ckanext.files.storage.xxx.type = files:fs\nckanext.files.storage.xxx.path = /tmp\n
And run CLI command once again. This time you'll see the list of allowed options:
## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.xxx.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.xxx.create_path = false\n
There is a number of options that are supported by every storage. You can set them and expect that every storage, regardless of type, will use these options in the same way:
## Storage adapter used by the storage\nckanext.files.storage.NAME.type = ADAPTER\n## The maximum size of a single upload.\n## Supports size suffixes: 42B, 2M, 24KiB, 1GB. `0` means no restrictions.\nckanext.files.storage.NAME.max_size = 0\n## Space-separated list of MIME types or just type or subtype part.\n## Example: text/csv pdf application video jpeg\nckanext.files.storage.NAME.supported_types =\n## Descriptive name of the storage used for debugging. When empty, name from\n## the config option is used, i.e: `ckanext.files.storage.DEFAULT_NAME...`\nckanext.files.storage.NAME.name = NAME\n
"},{"location":"migration/","title":"Migration from native CKAN storage system","text":"Important: ckanext-files itself is an independent file-management system. You don't have to migrate existing files from groups, users and resources to it. You can just start using ckanext-files for new fields defined in metadata schema or for uploading arbitrary files. And continue using native CKAN uploads for group/user images and resource files. Migration workflows described here merely exist as a PoC of using ckanext-files for everything in CKAN. Don't migrate your production instances yet, because concepts and rules may change in future and migration process will change as well. Try migration only as an experiment, that gives you an idea of what else you want to see in ckanext-file, and share this idea with us.
Note: every migration workflow described below requires installed ckanext-files. Complete installation section before going further.
CKAN has following types of files:
At the moment, there is no migration strategy for the last two types. Replacing site logo manually is a trivial task, so there will be no dedicated command for it. As for extensions, every of them is unique, so feel free to create an issue in the current repository: we'll consider creation of migration script for your scenario or, at least, explain how you can perform migration by yourself.
Migration process for group/organization/user images and resource uploads described below. Keep in mind, that this process only describes migration from native CKAN storage system, that keeps files inside local filesystem. If you are using storage extensions, like ckanext-s3filestore or ckanext-cloudstorage, create an issue in the current repository with a request of migration command. As there are a lot of different forks of such extension, creating reliable migration script may be challenging, so we need some details about your environment to help with migration.
Migration workflows bellow require certain changes to metadata schemas, UI widgets for file uploads and styles of your portal(depending on the customization).
"},{"location":"migration/group/","title":"Migration for group/organization images","text":"Note: internally, groups and organizations are the same entity, so this workflow describes both of them.
First of all, you need a configured storage that supports public links. As all group/organization images are stored inside local filesystem, you can use files:public_fs
storage adapter.
This extension expects that the name of group images storage will be group_images
. This name will be used in all other commands of this migration workflow. If you want to use different name for group images storage, override ckanext.files.group_images_storage
config option which has default value group_images
and don't forget to adapt commands if you use a different name for the storage.
This configuration example sets 10MiB restriction on upload size via ckanext.files.storage.group_images.max_size
option. Feel free to change it or remove completely to allow any upload size. This restriction is applied to future uploads only. Any existing file that exceeds limit is kept.
Uploads restricted to image/*
MIMEtype via ckanext.files.storage.group_images.supported_types
option. You can make this option more or less restrictive. This restriction is applied to future uploads only. Any existing file with wrong MIMEtype is kept.
ckanext.files.storage.group_images.path
controls location of the upload folder in filesystem. It should match value of ckan.storage_path
option plus storage/uploads/group
. In example below we assume that value of ckan.storage_path
is /var/storage/ckan
.
ckanext.files.storage.group_images.public_root
option specifies base URL from which every group image can be accessed. In most cases it's CKAN URL plus uploads/group
. If you are serving CKAN application from the ckan.site_url
, leave this option unchanged. If you are using ckan.root_path
, like /data/
, insert this root path into the value of the option. Example below uses %(ckan.site_url)s
wildcard, which will be automatically replaced with the value of ckan.site_url
config option. You can specify site URL explicitely if you don't like this wildcard syntax.
ckanext.files.storage.group_images.type = files:public_fs\nckanext.files.storage.group_images.max_size = 10MiB\nckanext.files.storage.group_images.supported_types = image\nckanext.files.storage.group_images.path = /var/storage/ckan/storage/uploads/group\nckanext.files.storage.group_images.public_root = %(ckan.site_url)s/uploads/group\n
Now let's run a command that show us the list of files available under newly configured storage:
ckan files scan -s group_images\n
All these files are not tracked by files extension yet, i.e they don't have corresponding record in DB with base details, like size, MIMEtype, filehash, etc. Let's create these details via the command below. It's safe to run this command multiple times: it will gather and store information about files not registered in system and ignore any previously registered file.
ckan files scan -s group_images -t\n
Finally, let's run the command, that shows only untracked files. Ideally, you'll see nothing upon executing it, because you just registered every file in the system.
ckan files scan -s group_images -u\n
Note, all the file are still available inside storage directory. If previous command shows nothing, it only means that CKAN already knows details about each file from the storage directory. If you want to see the list of the files again, omit -u
flag(which stands for \"untracked\") and you'll see again all the files in the command output:
ckan files scan -s group_images\n
Now, when all images are tracked by the system, we can give the ownership over these files to groups/organizations that are using them. Run the command below to connect files with their owners. It will search for groups/organizations first and report, how many connections were identified. There will be suggestion to show identified relationship and the list of files that have no owner(if there are such files). Presence of files without owner usually means that you removed group/organization from database, but did not remove its image.
Finally, you'll be asked if you want to transfer ownership over files. This operation does not change existing data and if you disable ckanext-files after ownership transfer, you won't see any difference. The whole ownership transfer is managed inside custom DB tables generated by ckanext-files, so it's safe operation.
ckan files migrate groups group_images\n
Here's an example of output that you can see when running the command:
Found 3 files. Searching file owners...\n[####################################] 100% Located owners for 2 files out of 3.\n\nShow group IDs and corresponding file? [y/N]: y\nd7186937-3080-429f-a434-22b74b9a8d39: file-1.png\n87e2a1aa-7905-4a28-a087-90433f8e169e: file-2.png\n\nShow files that do not belong to any group? [y/N]: y\nfile-3.png\n\nTransfer file ownership to group identified in previous steps? [y/N]: y\nTransfering file-2.png [####################################] 100%\n
Now comes the most complex part. You need to change metadata schema and UI in order to:
Original CKAN workflow for uploading files was:
This approach is different from strategy recommended by ckanext-files. But in order to make the migration as simple as possible, we'll stay close to original workflow.
Note: suggestet approach resembles existing process of file uploads in CKAN. But ckanext-files was designed as a system, that gives you a choice. Check file upload strategies to learn more about alternative implementations of upload and their pros/cons.
First, we need to replace Upload/Link widget on group/organization form. If you are using native group templates, create group/snippets/group_form.html
and organization/snippets/organization_form.html
. Inside both files, extend original template and override block basic_fields
. You only need to replace last field
{{ form.image_upload(\n data, errors, is_upload_enabled=h.uploads_enabled(),\n is_url=is_url, is_upload=is_upload) }}\n
with
{{ form.image_upload(\n data, errors, is_upload_enabled=h.files_group_images_storage_is_configured(),\n is_url=is_url, is_upload=is_upload,\n field_upload=\"files_image_upload\") }}\n
There are two differences with the original. First, we use h.files_group_images_storage_is_configured()
instead of h.uploads_enabled()
. As we are using different storage for different upload types, now upload widgets can be enabled independently. And second, we pass field_upload=\"files_image_upload\"
argument into macro. It will send uploaded file to CKAN inside files_image_upload
instead of original image_upload
field. This must be done because CKAN unconditionally strips image_upload
field from submission payload, making processing of the file too unreliable. We changed the name of upload field and CKAN keeps this new field, so that we can process it as we wish.
Note: if you are using ckanext-scheming, you only need to replace form_snippet
of the image_url
field, instead of rewriting the whole template.
Now, let's define validation rules for this new upload field. We need to create plugins that modify validation schema for group and organization. Due to CKAN implementation details, you need separate plugin for group and organization.
Note: if you are using ckanext-scheming, you can add files_image_upload
validators to schemas of organization and group. Check the list of validators that must be applied to this new field below.
Here's an example of plugins that modify validation schemas of group and organization. As you can see, they are mostly the same:
from ckan.lib.plugins import DefaultGroupForm, DefaultOrganizationForm\nfrom ckan.logic.schema import default_create_group_schema, default_update_group_schema\n\n\ndef _modify_schema(schema, type):\n schema[\"files_image_upload\"] = [\n tk.get_validator(\"ignore_empty\"),\n tk.get_validator(\"files_into_upload\"),\n tk.get_validator(\"files_validate_with_storage\")(\"group_images\"),\n tk.get_validator(\"files_upload_as\")(\n \"group_images\",\n type,\n \"id\",\n \"public_url\",\n type + \"_patch\",\n \"image_url\",\n ),\n ]\n\n\nclass FilesGroupPlugin(p.SingletonPlugin, DefaultGroupForm):\n p.implements(p.IGroupForm, inherit=True)\n is_organization = False\n\n def group_types(self):\n return [\"group\"]\n\n def create_group_schema(self):\n return _modify_schema(default_create_group_schema(), \"group\")\n\n def update_group_schema(self):\n return _modify_schema(default_update_group_schema(), \"group\")\n\n\nclass FilesOrganizationPlugin(p.SingletonPlugin, DefaultOrganizationForm):\n p.implements(p.IGroupForm, inherit=True)\n is_organization = True\n\n def group_types(self):\n return [\"organization\"]\n\n def create_group_schema(self):\n return _modify_schema(default_create_group_schema(), \"organization\")\n\n def update_group_schema(self):\n return _modify_schema(default_update_group_schema(), \"organization\")\n
There are 4 validators that must be applied to the new upload field:
ignore_empty
: to skip validation, when image URL set manually and no upload selected.files_into_upload
: to convert value of upload field into normalized format, which is expected by ckanext-filesfiles_validate_with_storage(STORAGE_NAME)
: this validator requires an argument: the name of the storage we are using for image uploads. The validator will use storage settings to verify size and MIMEtype of the appload.files_upload_as(STORAGE_NAME, GROUP_TYPE, NAME_OF_ID_FIELD, \"public_url\", NAME_OF_PATCH_ACTION, NAME_OF_URL_FIELF)
: this validator is the most challenging. It accepts 6 arguments:group
or organization
depending on processed entityid
in your case.public_url
- use this exact value. It tells which property of file you want to use as link to the file.group_patch
or organization_patch
depending on processed entityimage_url
- name of the field that contains URL of the image. ckanext-files will put the public link of uploaded file into this field when form is processed.That's all. Now every image upload for group/organization is handled by ckanext-files. To verify it, do the following. First, check list of files currently stored in group_images
storage via command that we used in the beginning of the migration:
ckan files scan -s group_images\n
You'll see a list of existing files. Their names follow format <ISO_8601_DATETIME><FILENAME>
, e.g 2024-06-14-133840.539670photo.jpg
.
Now upload an image into existing group, or create a new group with any image. When you check list of files again, you'll see one new record. But this time this record resembles UUID: da046887-e76c-4a68-97cf-7477665710ff
.
Configure named storage for resources. Use files:ckan_resource_fs
storage adapter.
This extension expects that the name of resources storage will be resources
. This name will be used in all other commands of this migration workflow. If you want to use different name for resources storage, override ckanext.files.resources_storage
config option which has default value resources
and don't forget to adapt commands if you use a different name for the storage.
ckanext.files.storage.resources.path
must match value of ckan.storage_path
option, followed by resources
directory. In example below we assume that value of ckan.storage_path
is /var/storage/ckan
.
Example below sets 10MiB limit on resource size. Modify it if you are using different limit set by ckan.max_resource_size
.
Unlike group and user images, this storage does not need upload type restriction and public_root
.
ckanext.files.storage.resources.type = files:ckan_resource_fs\nckanext.files.storage.resources.max_size = 10MiB\nckanext.files.storage.resources.path = /var/storage/ckan/resources\n
Check the list of untracked files available inside newly configured storage:
ckan files scan -s resources -u\n
Track all these files:
ckan files scan -s resources -t\n
Re-check that now you see no untracked files:
ckan files scan -s resources -u\n
Transfer file ownership to corresponding resources. In addition to simple ownership transfer, this command will ask you, whether you want to modify resource's url_type
and url
fields. It's required to move file management to files extension completely and enable possibility of migration to different storage type.
If you accept resource modifications, for every file owner url_type
will be changed to file
and url
will be changed to file ID. Then all modified packages will be reindexed.
Changing url_type
means that some pages will change. For example, instead of Download button CKAN will show you Go to resource button on the resource page, because Download label is specific to url_type=upload
. And some views may stop working as well. But this is safer option for migration, than leaving url_type
unchanged: ckanext-files manages files in its own way and some assumptions about files will not work anymore, so using different url_type
is the fastest way to tell everyone that something changed.
Broken views can be easily fixed. Every view implemented as a separate plugin. You always can inherit from this plugin and override methods that relied on different behavior. And a lot of views work with file URL directly, so they won't even see the difference.
ckan files migrate local-resources resources\n
And the next goal is correct metadata schema. If you are using ckanext-scheming, you need to modify validators of url
and format
fields.
If you are working with native schemas, you have to modify dataset schema via implementing IDatasetForm. Here's an example:
from ckan.lib.plugins import DefaultDatasetForm\nfrom ckan.logic import schema\n\nclass FilesDatasetPlugin(p.SingletonPlugin, DefaultDatasetForm):\n p.implements(p.IDatasetForm, inherit=True)\n\n def is_fallback(self):\n return True\n\n def package_types(self):\n return [\"dataset\"]\n\n def _modify_schema(self, schema):\n schema[\"resources\"][\"url\"].extend([\n tk.get_validator(\"files_verify_url_type_and_value\"),\n tk.get_validator(\"files_file_id_exists\"),\n tk.get_validator(\"files_transfer_ownership\")(\"resource\",\"id\"),\n ])\n schema[\"resources\"][\"format\"].insert(0, tk.get_validator(\"files_content_type_from_file\")(\"url\"))\n\n def create_package_schema(self):\n sch = schema.default_create_package_schema()\n self._modify_schema(sch)\n return sch\n\n def update_package_schema(self):\n sch = schema.default_update_package_schema()\n self._modify_schema(sch)\n return sch\n\n def show_package_schema(self):\n sch = schema.default_show_package_schema()\n sch[\"resources\"][\"url\"].extend([\n tk.get_validator(\"files_verify_url_type_and_value\"),\n tk.get_validator(\"files_id_into_resource_download_url\"),\n ])\n return sch\n
Both create and update schemas are updated in the same way. We add a new validator to format field, to correctly identify file format. And there is a number of new validators for url
:
files_verify_url_type_and_value
: skip validation if we are not working with resource that contains file.files_file_id_exists
: verify existence of file IDfiles_transfer_ownership(\"resource\",\"id\")
: move file ownership to resource after successful validationAt top of this, we also have two validators applied to show_package_schema
(use output_validators
in ckanext-scheming):
files_verify_url_type_and_value
: skip validation if we are not working with resource that contains file.files_id_into_resource_download_url
: replace file ID with download URL in API outputAnd the next part is the trickiest. You need to create a number of templates and JS modules. But because ckanext-files is actively developed, most likely, your custom files will be outdated pretty soon.
Instead, we recommend enabling patch for resource form that shipped with ckanext-files. It's a bit hacky, but because the extension itself is stil in alpha-stage, it should be acceptable. Check file upload strategies for examples of implementation that you can add to your portal instead of the default patch.
To enable patch for templates, add following line to the config file:
ckanext.files.enable_resource_migration_template_patch = true\n
This option adds Add file button to resource form
Upon clicking, this button is replaced by widget that supports uploading new files of selecting previously uploaded files that are not used by any resource yet
"},{"location":"migration/user/","title":"Migration for user avatars","text":"This workflow is similar to group/organization migration. It contains the sequence of actions, but explanations are removed, because you already know details from the group migration. Only steps that are different will contain detailed explanation of the process.
Configure local filesystem storage with support of public links(files:public_fs
) for user images.
This extension expects that the name of user images storage will be user_images
. This name will be used in all other commands of this migration workflow. If you want to use different name for user images storage, override ckanext.files.user_images_storage
config option which has default value user_images
and don't forget to adapt commands if you use a different name for the storage.
ckanext.files.storage.user_images.path
resembles this option for group/organization images storage. But user images are kept inside user
folder by default. As result, value of this option should match value of ckan.storage_path
option plus storage/uploads/user
. In example below we assume that value of ckan.storage_path
is /var/storage/ckan
.
ckanext.files.storage.user_images.public_root
resebles this option for group/organization images storage. But user images are available at CKAN URL plus uploads/user
.
ckanext.files.storage.user_images.type = files:public_fs\nckanext.files.storage.user_images.max_size = 10MiB\nckanext.files.storage.user_images.supported_types = image\nckanext.files.storage.user_images.path = /var/storage/ckan/storage/uploads/user\nckanext.files.storage.user_images.public_root = %(ckan.site_url)s/uploads/user\n
Check the list of untracked files available inside newly configured storage:
ckan files scan -s user_images -u\n
Track all these files:
ckan files scan -s user_images -t\n
Re-check that now you see no untracked files:
ckan files scan -s user_images -u\n
Transfer image ownership to corresponding users:
ckan files migrate users user_images\n
Update user template. Required field is defined in user/new_user_form.html
and user/edit_user_form.html
. It's a bit different from the filed used by group/organization, but you again need to add field_upload=\"files_image_upload\"
parameter to the macro image_upload
and replace h.uploads_enabled()
with h.files_user_images_storage_is_configured()
.
User has no dedicated interface for validation schema modification and here comes the biggest difference from group migration. You need to chain user_create
and user_update
action and modify schema from context
:
def _patch_schema(schema):\n schema[\"files_image_upload\"] = [\n tk.get_validator(\"ignore_empty\"),\n tk.get_validator(\"files_into_upload\"),\n tk.get_validator(\"files_validate_with_storage\")(\"user_images\"),\n tk.get_validator(\"files_upload_as\")(\n \"user_images\",\n \"user\",\n \"id\",\n \"public_url\",\n \"user_patch\",\n \"image_url\",\n ),\n ]\n\n\n@tk.chained_action\ndef user_update(next_action, context, data_dict):\n schema = context.setdefault('schema', ckan.logic.schema.default_update_user_schema())\n _patch_schema(schema)\n return next_action(context, data_dict)\n\n\n\n@tk.chained_action\ndef user_create(next_action, context, data_dict):\n schema = context.setdefault('schema', ckan.logic.schema.default_user_schema())\n _patch_schema(schema)\n return next_action(context, data_dict)\n
Validators are all the same, but now we are using user
instead of group
/organization
in parameters.
That's all. Just as with groups, you can update an avatar and verify that all new filenames resemble UUIDs.
"},{"location":"usage/capabilities/","title":"Capabilities","text":"To understand in advance whether specific storage can perform certain actions, ckanext-files uses ckanext.files.shared.Capability
. It's an enumeration of operations that can be supported by storage:
These capabilities are defined when storage is created and are automatically checked by actions that work with storage. If you want to check if storage supports certain capability, it can be done manually. If you want to check presence of multiple capabilities at once, you can combine them via bitwise-or operator.
from ckanext.files.shared import Capability, get_storage\n\nstorage = get_storage()\n\ncan_read = storage.supports(Capability.STREAM)\n\nread_and_write = Capability.CREATE | Capability.STREAM\ncan_read_and_write = storage.supports(read_and_write)\n
ckan files storages -v
CLI command lists all configured storages with their capabilities.
Before uploading files, you have to configure a storage: place where all uploaded files are stored. Storage relies on adapter that describes where and how data is be stored: filesystem, cloud, DB, etc. And, depending on the adapter, storage may have a couple of addition specific options. For example, filesystem adapter likely requires a path to the folder where uploads are stored. DB adapter may need DB connection parameters. Cloud adapter most likely will not work without an API key. These additional options are specific to adapter and you have to check its documentation to find out what are the possible options.
Let's start from the Redis adapter, because it has minimal requirements in terms of configuration.
Add the following line to the CKAN config file:
ckanext.files.storage.default.type = files:redis\n
The name of adapter is files:redis
. It follows recommended naming convention for adapters:<EXTENSION>:<TYPE>
. You can tell from the name above that we are using adapter defined in the files
extension with redis
type. But this naming convention is not enforced and its only purpose is avoiding name conflicts. Technically, adapter name can use any character, including spaces, newlines and emoji.
If you make a typo in the adapter's name, any CKAN CLI command will produce an error message with the list of available adapters:
Invalid configuration values provided:\nckanext.files.storage.default.type: Value must be one of ['files:fs', 'files:public_fs', 'files:redis']\nAborted!\n
Storage is configured, so we can actually upload the file. Let's use ckanapi for this task. Files are created via files_file_create
API action and this time we have to pass 2 parameters into it:
name
: the name of uploaded fileupload
: content of the fileThe final command is here:
echo -n 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n
And that's what you see as result:
{\n \"atime\": null,\n \"content_type\": \"text/plain\",\n \"ctime\": \"2024-06-02T15:02:14.819117+00:00\",\n \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n \"id\": \"e21162ab-abfb-476c-b8c5-5fe7cb89eca0\",\n \"location\": \"24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\",\n \"mtime\": null,\n \"name\": \"hello.txt\",\n \"size\": 11,\n \"storage\": \"default\",\n \"storage_data\": {}\n}\n
Content of the file can be checked via CKAN CLI. Use id
from the last API call's output in the command ckan files stream ID
:
ckan files stream e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n
Alternatively, we can use Redis CLI to get the content of the file. Note, you cannot get the content via CKAN API, because it's JSON-based and streaming files doesn't suit its principles.
By default, Redis adapter puts the content under the key <PREFIX><LOCATION>
. Pay attention to LOCATION
. It's the value available as location
in the API response(i.e, 24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46
in our case). It's different from the id
(ID used by DB to uniquely identify file record) and name
(human readable name of the file). In our scenario, location
looks like UUID because of the internal details of Redis adapter implementation. But different adapters may use more path-like value, i.e. something similar to path/to/folder/hello.txt
.
PREFIX
can be configured, but we skipped this step and got the default value: ckanext:files:default:file_content:
. So the final Redis key of our file is ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46
redis-cli\n\n127.0.0.1:6379> GET ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\n\"hello world\"\n
And before we moved further, let's remove the file, using its id
:
ckanapi action files_file_delete id=e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n
"},{"location":"usage/js/","title":"JavaScript utilities","text":"Note: ckanext-files does not provide stable CKAN JS modules at the moment. Try creating your own widgets and share with us your examples or requirements. We'll consider creating and including widgets into ckanext-files if they are generic enough for majority of the users.
ckanext-files registers few utilities inside CKAN JS namespace to help with building UI components.
First group of utilities registered inside CKAN Sandbox. Inside CKAN JS modules it's accessible as this.sandbox
. If you are writing code outside of JS modules, Sandbox can be initialized via call to ckan.sandbox()
const sandbox = ckan.sandbox()\n
When files
plugin loaded, sandbox contains files
attribute with two members:
upload
: high-level helper for uploding files.makeUploader
: factory for uploader-objects that gives more control over upload process.The simplest way to upload the file is using upload
helper.
await sandbox.files.upload(\n new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n)\n
This function uploads file to default
storage via files_file_create
action. Extra parameters for API call can be passed using second argument of upload
helper. Use an object with requestParams
key. Value of this key will be added to standard API request parameters. For example, if you want to use storage
with name memory
and field
with value custom
:
await sandbox.files.upload(\n new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n {requestParams: {storage: \"memory\", field: \"custom\"}}\n)\n
If you need more control over upload, you can create an uploader and interact with it directly, instead of using upload
helper.
Uploader is an object that uploads file to server. It extends base uploader, which defines standard interface for this object. Uploader perfroms all the API calls internally and returns uploaded file details. Out of the box you can use Standard
and Multipart
uploaders. Standard
uses files_file_create
API action and specializes on normal uploads. Multipart
relies on files_multipart_*
actions and can be used to pause and continue upload.
To create uploader instance, pass its name as a string to makeUploader
. And then you can call upload
method of the uploader to perform the actual upload. This method requires two arguments:
requestParams
from example above. If you want to use default parameters, pass an empty object. If you want to use memory
storage, pass {storage: \"memory\"}
, etc.const uploader = sandbox.files.makeUploader(\"Standard\")\nawait uploader.upload(new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}), {})\n
One of the reasons to use manually created uploader is progress tracking. Uploader supports event subscriptions via uploader.addEventListener(event, callback)
and here's the list of possible upload events:
start
: file upload started. Event has detail
property with object that contains uploaded file as file
.multipartid
: multipart upload initialized. Event has detail
property with object that contains uploaded file as file
and ID of multipart upload as id
.progress
: another chunk of file was transferred to server. Event has detail
property with object that contains uploaded file as file
, number of loaded bytes as loaded
and total number of bytes that must be transferred as total
.finish
: file upload successfully finished. Event has detail
property with object that contains uploaded file as file
and file details from API response as result
.fail
: file upload failed. Event has detail
property with object that contains uploaded file as file
and object with CKAN validation errors as reasons
.error
: error unrelated to validation happened during upload, like call to non-existing action. Event has detail
property with object that contains uploaded file as file
and error as message
.If you want to use upload
helper with customized uploader, there are two ways to do it.
adapter
property with uploader name inside second argument of upload
helper: await sandbox.files.upload(new File(...), {adapter: \"Multipart\"})\n
uploader
property with uploader instance inside second argument of upload
helper: const uploader = sandbox.files.makeUploader(\"Multipart\")\nawait sandbox.files.upload(new File(...), {uploader})\n
The second group of ckanext-files utilities is available as ckan.CKANEXT_FILES
object. This object mainly serves as extension and configuration point for sandbox.files
.
ckan.CKANEXT_FILES.adapters
is a collection of all classes that can be used to initialize uploader. It contains Standard
, Multipart
and Base
classes. Standard
and Multipart
can be used as is, while Base
must be extended by your custom uploader class. Add your custom uploader classes to adapters
, to make them available application-wide:
class MyUploader extends Base { ... }\n\nckan.CKANEXT_FILES.adapters[\"My\"] = MyUploader;\n\nawait sandbox.files.upload(new File(...), {adapter: \"My\"})\n
ckan.CKANEXT_FILES.defaultSettings
contain the object with default settings available as this.settings
inside any uploader. You can change the name of the storage used by all uploaders using this object. Note, changes will apply only to uploaders initialized after modification.
ckan.CKANEXT_FILES.defaultSettings.storage = \"memory\"\n
"},{"location":"usage/multi-storage/","title":"Multi-storage","text":"It's possible to configure multiple storages at once and specify which one you want to use for the individual file upload. Up until now we used the following storage options:
ckanext.files.storage.default.type
ckanext.files.storage.default.path
ckanext.files.storage.default.create_path
All of them have a common prefix ckanext.files.storage.default.
and it's a key for using multiple storages simultaneously.
Every option of the storage follows the pattern: ckanext.files.storage.<STORAGE_NAME>.<OPTION>
. As all the options above contain default
on position of <STORAGE_NAME>
, they are related to the default
storage.
If you want to configure a storage with the name custom
change the configuration of storage:
ckanext.files.storage.custom.type = files:fs\nckanext.files.storage.custom.path = /tmp/example\nckanext.files.storage.custom.create_path = true\n
And, if you want to use Redis-based storage named memory
and filesystem-based storage named default
, use the following configuration:
ckanext.files.storage.memory.type = files:redis\n\nckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n
The default
storage is special. ckanext-files use it by default, as name suggests. If you remove configuration for the default
storage and try to create a file, you'll see the following error:
echo 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n\n... ckan.logic.ValidationError: None - {'storage': ['Storage default is not configured']}\n
Storage default is not configured. That's why we need default
configuration. But if you want to upload a file into a different storage or you don't want to add the default
storage at all, you can always specify explicitly the name of the storage you are going to use.
When using API actions, add storage
parameter to the call:
echo 'hello world' > /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt storage=memory\n
When writing python code, pass storage name to get_storage
function:
storage = get_storage(\"memory\")\n
When writing JS code, pass object {requestParams: {storage: \"memory\"}}
to upload
function:
const sandbox = ckan.sandbox()\nconst file = new File([\"content\"], \"file.txt\")\nconst options = {requestParams: {storage: \"memory\"}};\n\nawait sandbox.files.upload(file, options)\n
"},{"location":"usage/multipart/","title":"Multipart, resumable and signed uploads","text":"This feature has many names, but it basically divides a single upload into multiple stages. It can be used in following situations:
All these situations are handled by 4 API actions, which are available is storage has MULTIPART
capability:
files_multipart_start
: initialize multipart upload and set expected final size and MIMEtype. Real multipart upload usually just return upload ID from this action. Resumable upload creates empty file in the storage to accumulate content inside it. Signed upload produces a URL for direct upload.files_multipart_update
: upload the fragment of the file of modify the upload in some other way. Most often this action accepts ID of the upload and upload
field with fragment of the uploaded file.files_multipart_refresh
: this action synchronizes and returns current upload progress. It can be used if upload was paused and client does not know how many bytes were uploaded and from which byte the next upload fragment starts.files_multipart_complete
: finalize the upload and convert it into normal file, available to other parts of the application. Multipart upload usually combines all uploaded parts into single file here. Resumable upload verifies that the result has expected MIMEtype and size. Signed upload just registers completed file in the system.Implementation of multipart upload depends on the used adapter, so make sure you checked its documentation before using any multipart actions. There are some common steps in multipart upload workflow that are usually the same among all adapters:
files_multipart_start
requires content_type
and size
parameters. These values will be used to validate completed upload.files_multipart_start
allows hash
parameter. This value will be used to validate completed upload. Unlike content_type
and size
, hash
is usually optional, because it may be difficult for client to compute it.files_multipart_update
accepts upload ID as id
and fragment of the file as upload
. Sequence of calls to files_multipart_update
with non-overlapping fragments can be used to upload the file. Even if adapter implements signed uploads and client is supposed to send file to the signed URL instead of using files_multipart_update
.files_multipart_complete
compares content_type
, size
and hash
(if present) specified during initialization of upload with actual values. If they are different, upload is not converted into normal file. Depending on implementation, storage may just ignore incorrect initial expectations an assign a real values to the file as long as they are allowed by storage configuration. But it's recommended to reject such uploads, so it safer to assume, that incorrect expectations are not accepted.Incomplete files support most of normal file actions, but you need to pass completed=False
to action when working with incomplete files. I.e, if you want to remove incomplete upload, use its ID and completed=False
:
ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24 completed=False\n
Incompleted files do not support streaming and downloading via public interface of the extension. But storage adapter can expose such features via custom methods if it's technically possible.
Example of basic multipart upload is shown above. files:fs
adapter can be used for running this example, as it implements MULTIPART
.
First, create text file and check its size:
echo 'hello world!' > /tmp/file.txt\nwc -c /tmp/file.txt\n\n... 13 /tmp/file.txt\n
The size is 13
bytes and content type is text/plain
. These values must be used for upload initialization.
ckanapi action files_multipart_start name=file.txt size=13 content_type=text/plain\n\n... {\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n... \"hash\": \"\",\n... \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n... \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n... \"name\": \"file.txt\",\n... \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n... \"owner_type\": \"user\",\n... \"pinned\": false,\n... \"size\": 13,\n... \"storage\": \"default\",\n... \"storage_data\": {\n... \"uploaded\": 0\n... }\n... }\n
Here storage_data
contains {\"uploaded\": 0}
. It may be different for other adaptes, especially if they implement non-consecutive uploads, but generally it's the recommended way to keep upload progress.
Now we'll upload first 5 bytes of file.
ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n upload@<(dd if=/tmp/file.txt bs=1 count=5)\n\n... {\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n... \"hash\": \"\",\n... \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n... \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n... \"name\": \"file.txt\",\n... \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n... \"owner_type\": \"user\",\n... \"pinned\": false,\n... \"size\": 13,\n... \"storage\": \"default\",\n... \"storage_data\": {\n... \"uploaded\": 5\n... }\n... }\n
If you try finalizing upload right now, you'll get an error.
ckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... ckan.logic.ValidationError: None - {'upload': ['Actual value of upload size(5) does not match expected value(13)']}\n
Let's upload the rest of bytes and complete the upload.
ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n upload@<(dd if=/tmp/file.txt bs=1 skip=5)\n\nckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... {\n... \"atime\": null,\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-22T14:57:18.483716+00:00\",\n... \"hash\": \"c897d1410af8f2c74fba11b1db511e9e\",\n... \"id\": \"a740692f-e3d5-492f-82eb-f04e47c13848\",\n... \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n... \"mtime\": null,\n... \"name\": \"file.txt\",\n... \"owner_id\": null,\n... \"owner_type\": null,\n... \"pinned\": false,\n... \"size\": 13,\n... \"storage\": \"default\",\n... \"storage_data\": {}\n... }\n
Now file can be used normally. You can transfer file ownership to someone, stream or modify it. Pay attention to ID: completed file has its own unique ID, which is different from ID of the incomplete upload.
"},{"location":"usage/ownership/","title":"File ownership","text":"Every file can have an owner and there can be only one owner of the file. It's possible to create file without an owner, but usually application will only benefit from keeping every file with its owner. Owner is described with two fields: ID and type.
When file is created, by default the current user from API action's context is assigned as an owner of the file. From now on, the owner can perform other operations, such as renaming/displaying/removing with the file.
Apart from chaining auth function, to modify access rules for the file, plugin can implement IFiles.files_file_allows
and IFiles.files_owner_allows
methods.
def files_file_allows(\n self,\n context: Context,\n file: File | Multipart,\n operation: types.FileOperation,\n) -> bool | None:\n ...\n\ndef files_owner_allows(\n self,\n context: Context,\n owner_type: str, owner_id: str,\n operation: types.OwnerOperation,\n) -> bool | None:\n ...\n
These methods receive current action context, the tested object details, and the name of operation(show
, update
, delete
, file_transfer
). files_file_allows
checks permission for accessed file. It's usually called when user interacts with file directly. files_owner_allows
works with owner described by type and ID. It's usually called when user transfer file ownership, perform bulk file operation for owner files, or just trying to get the list of files that belongs to owner.
If method returns true/false, operation is allowed/denied. If method returns None
, default logic used to check access.
As already mentoined, by default, user who owns the file, can access it. But what about different owners? What if file owned by other entity, like resource or dataset?
Out of the box, nobody can access such files. But there are three config options that modify this restriction.
ckanext.files.owner.cascade_access = ENTITY_TYPE ANOTHER_TYPE
gives access to file owned by entity if user already has access to entity itself. Use words like package
, resource
, group
instead of ENTITY_TYPE
.
For example: file is owned by resource. If cascade access is enabled, whoever has access to resource_show
of the resource, can also see the file owned by this resource. If user passes resource_update
for resource, he can also modify the file owned by this resource, etc.
Important: be careful and do not add user
to ckanext.files.owner.cascade_access
. User's own files are considered private and most likely you don't really need anyone else to be able to see or modify these files.
The second option is ckanext.files.owner.transfer_as_update
. When transfer-as-update enabled, any user who has <OWNER_TYPE>_update
permission, can transfer own files to this OWNER_TYPE
. Intead of using this option, you can define <OWNER_TYPE>_file_transfer
.
And the third option is ckanext.files.owner.scan_as_update
. Just as with ownership transfer, it gives user permission to list all files of the owner if user can <OWNER_TYPE>_update
it. Intead of using this option, you can define <OWNER_TYPE>_file_scan
.
File creation is not allowed by default. Only sysadmin can use files_file_create
and files_multipart_start
actions. This is done deliberately: uncontrolled uploads can turn your portal into user's personal cloud-storage.
There are three ways to grant upload permission to normal users.
The BAD option is simple. Enable ckanext.files.authenticated_uploads.allow
config option and every registered user will be allowed to upload files. But only into default
storage. If you want to change the list of storages available to common user, specify storage names as ckanext.files.authenticated_uploads.storages
option.
The GOOD option is relatively simple. Define chained auth function with name files_file_create
. It's called whenever user initiates an upload. Now you can decide whether user is allowed to upload files with specified parameters.
The BEST option is to leave this restriction unchanged. Do not allow any user to call files_file_create
. Instead, create a new action for your goal. ckanext-files isn't a solution - it's a tool that helps you in building the solution.
If you need to add documents field to dataset that contains uploaded PDF files, create a separate action dataset_document_attach
. Specify access rules and validation for it. Or even hardcode the storage that will be used for uploads. And then, from this new action, call files_file_create
with ignore_auth: True
.
In this way you control every side of uploading documents into dataset and do not accidentally break other functionality, because every other feature will define its own action.
"},{"location":"usage/task-queue/","title":"Task queue","text":"One of the challenges introduced by independently managed files is related to file ownership. As long as you can call files_transfer_ownership
manually, things are transparent. But as soon as you add custom file field to dataset, you probably want to automatically transfer ownership of the file refered by this custom field.
Imagine, that you have PDF file owned by you. And you specify ID of this file in the attachment_id
field of the dataset. You want to show download link for this file on the dataset page. But if file owned by you, nobody will be able to download the file. So you decide to transfer file ownership to dataset, so that anyone who sees dataset, can see the file as well.
You cannot update dataset and transfer ownership after it, because there will be a time window between these two actions, when data is not valid. Or even worse, after updating dataset you'll lose internet connection and won't be able to finish the transfer.
Neither you can transfer ownership first and then update the dataset. attachment_id
may have additional validators and you don't know in advance, whether you'll be able to successfully update dataset after the transfer.
This problem can be solved via queuing additional tasks inside the action. For example, validator that checks if certain file ID can be used as attachment_id
can queue ownership transfer. If dataset update completed without errors, queued task is executed automatically and dataset becomes the owner of the file.
Task is queued via ckanext.files.shared.add_task
function, which accepts objects inherited from ckanext.files.shared.Task
. Task
class requires implementing abstract method run(result: Any, idx: int, prev: Any)
, which is called when task is executed. This method receives the result of action which caused task execution, task's position in queue and the result of previous task.
For example, one of attachment_id
validatos can queue the following MyTask
via add_task(MyTask(file_id))
to transfer file_id
ownership to the updated dataset:
from ckanext.files.shared import Task\n\nclass MyTask(Task):\n def __init__(self, file_id):\n self.file_id = file_id\n\n def run(self, dataset, idx, prev):\n return tk.get_action(\"files_transfer_ownership\")(\n {\"ignore_auth\": True},\n {\n \"id\": self.file_id,\n \"owner_type\": \"package\",\n \"owner_id\": dataset[\"id\"],\n \"pin\": True,\n },\n )\n
As the first argument, Task.run
receives the result of action which was called. Right now only following actions support tasks:
package_create
packaage_update
resource_create
resource_update
group_create
group_update
organization_create
organization_update
user_create
user_update
If you want to enable tasks support for your custom action, decorate it with ckanext.files.shared.with_task_queue
decorator:
from ckanext.files.shared import with_task_queue\n\n@with_task_queue\ndef my_action(context, data_dict)\n # you can call `add_task` inside this action's stack frame.\n ...\n
Good example of validator using tasks is files_transfer_ownership
validator factory. It can be added to metadata schema as files_transfer_ownership(owner_type, name_of_id_field)
. For example, if you are adding this validator to resource, call it as files_transfer_ownership(\"resource\", \"id\")
. The second argument is the name of the ID field. As in most cases it's id
, you can omit the second argument:
files_transfer_ownership(\"organization\")
files_transfer_ownership(\"package\")
files_transfer_ownership(\"user\")
There is a difference between creating files via action:
tk.get_action(\"files_file_create\")(\n {\"ignore_auth\": True},\n {\"upload\": \"hello\", \"name\": \"hello.txt\"}\n)\n
and via direct call to Storage.upload
:
from ckanext.files.shared import get_storage, make_upload\n\nstorage = get_storage()\nstorage.upload(\"hello.txt\", make_upload(b\"hello\"), {})\n
The former snippet creates a tracked file: file uploaded to the storage and its details are saved to database.
The latter snippet creates an untracked file: file uploaded to the storage, but its details are not saved anywhere.
Untracked files can be used to achieve specific goals. For example, imagine a storage adapter that writes files to the specified ZIP archive. You can create an interface, that initializes such storage for an existing ZIP resource and uploads files into it. You don't need a separate record in DB for every uploaded file, because all of them go into the resource, that is already stored in DB.
But such use-cases are pretty specific, so prefer to use API if you are not sure, what you need. The main reason to use tracked files is their discoverability: you can use files_file_search
API action to list all the tracked files and optionally filter them by storage, location, content_type, etc:
ckanapi action files_file_search\n\n... {\n... \"count\": 123,\n... \"results\": [\n... {\n... \"atime\": null,\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n... \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n... \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n... \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n... \"mtime\": null,\n... \"name\": \"hello.txt\",\n... \"size\": 11,\n... \"storage\": \"default\",\n... \"storage_data\": {}\n... },\n... ...\n... ]\n... }\n\nckanapi action files_file_search size:5 rows=1\n\n... {\n... \"count\": 2,\n... \"results\": [\n... {\n... \"atime\": null,\n... \"content_type\": \"text/plain\",\n... \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n... \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n... \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n... \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n... \"mtime\": null,\n... \"name\": \"hello.txt\",\n... \"size\": 5,\n... \"storage\": \"default\",\n... \"storage_data\": {}\n... }\n... ]\n... }\n\nckanapi action files_file_search content_type=application/pdf\n\n... {\n... \"count\": 0,\n... \"results\": []\n... }\n
As for untracked files, their discoverability depends on the storage adapters. Some of them, files:fs
for example, can scan the storage and locate all uploaded files, both thacked and untracked. If you have files:fs
storage configured as default
, use the following command to scan its content:
ckan files scan\n
If you want to scan a different storage, specify its name via -s/--storage-name
option. Remember, that some storage adapters do not support scanning.
ckan files scan -s memory\n
If you want to see untracked files only, add -u/--untracked-only
flag.
ckan files scan -u\n
If you want to track any untracked files, by creating a DB record for every such file, add -t/--track
flag. After that you'll be able to discover previously untracked files via files_file_search
API action. Most usable this option will be during the migration, when you are configuring a new storage, that points to an existing location with files.
ckan files scan -t\n
"},{"location":"usage/transfer/","title":"Ownership transfer","text":"File ownership can be transfered. As there can be only one owner of the file, as soon as you transfer ownership over file, you yourself do not own this file.
To transfer ownership, use files_transfer_ownership
action and specify id
of the file, owner_id
and owner_type
of the new owner.
You can't just transfer ownership to anyone. You either must pass IFiles.files_owner_allows
check for file_transfer
operation, or pass a cascade access check for the future owner of the file when cascade access and transfer-as-update is enabled.
For example, if you have the following options in config file:
ckanext.files.owner.cascade_access = organization\nckanext.files.owner.transfer_as_update = true\n
you must pass organization_update
auth function if you want to transfer file ownership to organization. In addition, file can be pinned. In this way we mark important files. Imagine the resource and its uploaded file. The link to this file is used by resource and we don't want this file to be accidentally transfered to someone else. We pin the file and now nobody can transfer the file without explicit confirmation of his intention.
There are two ways to move pinned file:
files_file_unpin
first and then transfer the ownership via separate API callforce
parameter to files_transfer_ownership
You can upload files using JavaScript CKAN modules. ckanext-files extends CKAN's Sandbox object(available as this.sandbox
inside the JS CKAN module), so we can use shortcut and upload file directly from the DevTools. Open any CKAN page, switch to JS console and create the sandbox instance. Inside it we have files
object, which in turn contains upload
method. This method accepts File
object for upload(the same object you can get from the input[type=file]
).
sandbox = ckan.sandbox()\nawait sandbox.files.upload(\nnew File([\"content\"], \"file.txt\")\n)\n\n... {\n... \"id\": \"18cdaa65-5eed-4078-89a8-469b137627ce\",\n... \"name\": \"file.txt\",\n... \"location\": \"b53907c3-8434-4dee-9a9e-6c4d3055d200\",\n... \"content_type\": \"text/plain\",\n... \"size\": 7,\n... \"hash\": \"9a0364b9e99bb480dd25e1f0284c8555\",\n... \"storage\": \"default\",\n... \"ctime\": \"2024-06-02T16:12:27.902055+00:00\",\n... \"mtime\": null,\n... \"atime\": null,\n... \"storage_data\": {}\n... }\n
If you are still using FS storage configured in previous section, switch to /tmp/example
folder and check it's content:
ls /tmp/example\n... b53907c3-8434-4dee-9a9e-6c4d3055d200\n\ncat b53907c3-8434-4dee-9a9e-6c4d3055d200\n... content\n
And, as usually, let's remove file using the ID from the upload
promise:
sandbox.client.call(\"POST\", \"files_file_delete\", {\nid: \"18cdaa65-5eed-4078-89a8-469b137627ce\"\n})\n
"},{"location":"usage/use-in-code/","title":"Usage in code","text":"If you are writing the code and you want to interact with the storage directly, without the API layer, you can do it via a number of public functions of the extension available in ckanext.files.shared
.
Let's configure filesystem storage first. Filesystem adapter has a mandatory option path
that controls filesystem location, where files are stored. If path does not exist, storage will raise an exception by default. But it can also create missing path if you enable create_path
option. Here's our final version of settings:
ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n
Now we are going to connect to CKAN shell via ckan shell
CLI command and create an instance of the storage:
from ckanext.files.shared import get_storage\nstorage = get_storage()\n
Because you have all configuration in place, the rest is fairly straightforward. We will upload the file, read it's content and remove it from the CKAN shell.
To create the file, storage.upload
method must be called with 2 parameters:
You can use any string as the first parameter. As for the \"special stream-like object\", ckanext-files has ckanext.files.shared.make_upload
function, that accepts a number of different types(bytes
, werkzeug.datastructures.FileStorage
, BytesIO
, file descriptor) and converts them into expected format.
from ckanext.files.shared import make_upload\n\nupload = make_upload(b\"hello world\")\nresult = storage.upload('file.txt', upload)\n\nprint(result)\n\n... FileData(\n... location='60b385e7-8137-496c-bb1d-6ae4d7963ab3',\n... size=11,\n... content_type='text/plain',\n... hash='5eb63bbbe01eeed093cb22bb8f5acdc3',\n... storage_data={}\n... )\n
result
is an instance of ckanext.files.shared.FileData
dataclass. It contains all the information required by storage to manage the file.
result
object has location
attribute that contains the name of the file relative to the path
option specified in the storage configuration. If you visit /tmp/example
directory, which was set as a path
for the storage, you'll see there a file with the name matching location
from result. And its content matches the content of our upload, which is quite an expected outcome.
cat /tmp/example/60b385e7-8137-496c-bb1d-6ae4d7963ab3\n\n... hello world\n
But let's go back to the shell and try reading file from the python's code. We'll pass result
to the storage's stream
method, which produces an iterable of bytes based on our result:
buffer = storage.stream(result)\ncontent = b\"\".join(buffer)\n\n... b'hello world'\n
In most cases, storage only needs a location of the file object to read it. So, if you don't have result
generated during the upload, you still can read the file as long as you have its location. But remember, that some storage adapters may require additional information, and the following example must be adapted depending on the adapter:
from ckanext.files.shared import FileData\n\nlocation = \"60b385e7-8137-496c-bb1d-6ae4d7963ab3\"\ndata = FileData(location)\n\nbuffer = storage.stream(data)\ncontent = b\"\".join(buffer)\nprint(content)\n\n... b'hello world'\n
And finally we can to remove the file
storage.remove(result)\n
"}]}
\ No newline at end of file