diff --git a/index.html b/index.html
index 4432929..02fef1d 100644
--- a/index.html
+++ b/index.html
@@ -69,6 +69,11 @@
     <label class="md-overlay" for="__drawer"></label>
     <div data-md-component="skip">
       
+        
+        <a href="#ckanext-files" class="md-skip">
+          Skip to content
+        </a>
+      
     </div>
     <div data-md-component="announce">
       
@@ -222,6 +227,19 @@
       <input class="md-nav__toggle md-toggle" type="checkbox" id="__toc">
       
       
+        
+      
+      
+        <label class="md-nav__link md-nav__link--active" for="__toc">
+          
+  
+  <span class="md-ellipsis">
+    Home
+  </span>
+  
+
+          <span class="md-nav__icon md-icon"></span>
+        </label>
       
       <a href="." class="md-nav__link md-nav__link--active">
         
@@ -233,6 +251,52 @@
 
       </a>
       
+        
+
+<nav class="md-nav md-nav--secondary" aria-label="Table of contents">
+  
+  
+  
+    
+  
+  
+    <label class="md-nav__title" for="__toc">
+      <span class="md-nav__icon md-icon"></span>
+      Table of contents
+    </label>
+    <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
+      
+        <li class="md-nav__item">
+  <a href="#quickstart" class="md-nav__link">
+    <span class="md-ellipsis">
+      Quickstart
+    </span>
+  </a>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#development" class="md-nav__link">
+    <span class="md-ellipsis">
+      Development
+    </span>
+  </a>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#license" class="md-nav__link">
+    <span class="md-ellipsis">
+      License
+    </span>
+  </a>
+  
+</li>
+      
+    </ul>
+  
+</nav>
+      
     </li>
   
 
@@ -1168,6 +1232,43 @@
   
   
   
+    
+  
+  
+    <label class="md-nav__title" for="__toc">
+      <span class="md-nav__icon md-icon"></span>
+      Table of contents
+    </label>
+    <ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
+      
+        <li class="md-nav__item">
+  <a href="#quickstart" class="md-nav__link">
+    <span class="md-ellipsis">
+      Quickstart
+    </span>
+  </a>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#development" class="md-nav__link">
+    <span class="md-ellipsis">
+      Development
+    </span>
+  </a>
+  
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#license" class="md-nav__link">
+    <span class="md-ellipsis">
+      License
+    </span>
+  </a>
+  
+</li>
+      
+    </ul>
   
 </nav>
                   </div>
@@ -1185,9 +1286,60 @@
   
 
 
-  <h1>Home</h1>
-
-
+<p><a href="https://github.com/DataShades/ckanext-files/actions/workflows/test.yml"><img alt="Tests" src="https://github.com/DataShades/ckanext-files/actions/workflows/test.yml/badge.svg" /></a></p>
+<h1 id="ckanext-files">ckanext-files</h1>
+<p>Files as first-class citizens of CKAN. Upload, manage, remove files directly
+and attach them to datasets, resources, etc.</p>
+<p>Read the <a href="https://datashades.github.io/ckanext-files/">documentation</a> for a full user guide.</p>
+<h2 id="quickstart">Quickstart</h2>
+<ol>
+<li>
+<p>Install the extension
+   <div class="highlight"><pre><span></span><code>pip<span class="w"> </span>install<span class="w"> </span>ckanext-files
+</code></pre></div></p>
+</li>
+<li>
+<p>Add <code>files</code> to the <code>ckan.plugins</code> setting in your CKAN
+   config file.</p>
+</li>
+<li>
+<p>Run DB migrations
+   <div class="highlight"><pre><span></span><code>ckan<span class="w"> </span>db<span class="w"> </span>upgrade<span class="w"> </span>-p<span class="w"> </span>files
+</code></pre></div></p>
+</li>
+<li>
+<p>Configure storage</p>
+<div class="highlight"><pre><span></span><code><span class="na">ckanext.files.storage.default.type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">files:fs</span>
+<span class="na">ckanext.files.storage.default.path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/tmp/example</span>
+<span class="na">ckanext.files.storage.default.create_path</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span>
+</code></pre></div>
+</li>
+<li>
+<p>Upload your first file</p>
+<div class="highlight"><pre><span></span><code>ckanapi<span class="w"> </span>action<span class="w"> </span>files_file_create<span class="w"> </span>upload@~/Downloads/file.txt<span class="sb">`</span>
+</code></pre></div>
+</li>
+</ol>
+<h2 id="development">Development</h2>
+<p>Install <code>dev</code> extras and nodeJS dependencies:</p>
+<div class="highlight"><pre><span></span><code>pip<span class="w"> </span>install<span class="w"> </span>-e<span class="w"> </span><span class="s1">&#39;.[dev]&#39;</span>
+npm<span class="w"> </span>ci
+</code></pre></div>
+<p>Run unittests:
+<div class="highlight"><pre><span></span><code>pytest
+</code></pre></div></p>
+<p>Run frontend tests:
+<div class="highlight"><pre><span></span><code><span class="c1"># start test server in separate terminal</span>
+make<span class="w"> </span>test-server
+
+<span class="c1"># run tests</span>
+npx<span class="w"> </span>cypress<span class="w"> </span>run
+</code></pre></div></p>
+<p>Run typecheck:
+<div class="highlight"><pre><span></span><code>npx<span class="w"> </span>pyright
+</code></pre></div></p>
+<h2 id="license">License</h2>
+<p><a href="https://www.gnu.org/licenses/agpl-3.0.en.html">AGPL</a></p>
 
 
 
diff --git a/search/search_index.json b/search/search_index.json
index a384bf1..2108b26 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-\\.\\_]+","pipeline":["stopWordFilter"]},"docs":[{"location":"api/","title":"API","text":""},{"location":"api/#files_file_createcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_create(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Create a new file.</p> <p>This action passes uploaded file to the storage without strict validation. File is converted into standard upload object and everything else is controlled by storage. The same file may be uploaded to one storage and rejected by other, depending on configuration.</p> <p>This action is way too powerful to use it directly. The recommended approach is to register a different action for handling specific type of uploads and call current action internally.</p> <p>When uploading a real file(or using <code>werkqeug.datastructures.FileStorage</code>), name parameter can be omited. In this case, the name of uploaded file is used.</p> <pre><code>ckanapi action files_file_create upload@path/to/file.txt\n</code></pre> <p>When uploading a raw content of the file using string or bytes object, name is mandatory.</p> <pre><code>ckanapi action files_file_create upload@&lt;(echo -n \"hello world\") name=file.txt\n</code></pre> <p>Requires storage with <code>CREATE</code> capability.</p> <p>Params:</p> <ul> <li><code>name</code>: human-readable name of the file. Default: guess using upload field</li> <li><code>storage</code>: name of the storage that will handle the upload. Default: <code>default</code></li> <li><code>upload</code>: content of the file as string, bytes, file descriptor or uploaded file</li> </ul> <p>Returns:</p> <p>dictionary with file details.</p>"},{"location":"api/#files_file_deletecontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_delete(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Remove file from storage.</p> <p>Unlike packages, file has no <code>state</code> field. Removal usually means that file details removed from DB and file itself removed from the storage.</p> <p>Some storage can implement revisions of the file and keep archived versions or backups. Check storage documentation if you need to know whether there are chances that file is not completely removed with this operation.</p> <p>Requires storage with <code>REMOVE</code> capability.</p> <pre><code>ckanapi action files_file_delete id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n</code></pre> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>completed</code>: use <code>False</code> to remove incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with details of the removed file.</p>"},{"location":"api/#files_file_pincontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_pin(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Pin file to the current owner.</p> <p>Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>completed</code>: use <code>False</code> to pin incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with details of updated file</p>"},{"location":"api/#files_file_renamecontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_rename(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Rename the file.</p> <p>This action changes human-readable name of the file, which is stored in DB. Real location of the file in the storage is not modified.</p> <pre><code>ckanapi action files_file_show \\\n    id=226056e2-6f83-47c5-8bd2-102e2b82ab9a \\\n    name=new-name.txt\n</code></pre> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>name</code>: new name of the file</li> <li><code>completed</code>: use <code>False</code> to rename incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with file details</p>"},{"location":"api/#files_file_scancontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_scan(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>List files of the owner</p> <p>This action internally calls files_file_search, but with static values of owner filters. If owner is not specified, files filtered by current user. If owner is specified, user must pass authorization check to see files.</p> <p>Params:</p> <ul> <li><code>owner_id</code>: ID of the owner</li> <li><code>owner_type</code>: type of the owner</li> </ul> <p>The all other parameters are passed as-is to <code>files_file_search</code>.</p> <p>Returns:</p> <ul> <li><code>count</code>: total number of files matching filters</li> <li><code>results</code>: array of dictionaries with file details.</li> </ul>"},{"location":"api/#files_file_searchcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_search(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Search files.</p> <p>This action is not stabilized yet and will change in future.</p> <p>Provides an ability to search files using exact filter by name, content_type, size, owner, etc. Results are paginated and returned in package_search manner, as dict with <code>count</code> and <code>results</code> items.</p> <p>All columns of File model can be used as filters. Before the search, type of column and type of filter value are compared. If they are the same, original values are used in search. If type different, column value and filter value are casted to string.</p> <p>This request produces <code>size = 10</code> SQL expression: <pre><code>ckanapi action files_file_search size:10\n</code></pre></p> <p>This request produces <code>size::text = '10'</code> SQL expression: <pre><code>ckanapi action files_file_search size=10\n</code></pre></p> <p>Even though results are usually not changed, using correct types leads to more efficient search.</p> <p>Apart from File columns, the following Owner properties can be used for searching: <code>owner_id</code>, <code>owner_type</code>, <code>pinned</code>.</p> <p><code>storage_data</code> and <code>plugin_data</code> are dictionaries. Filter's value for these fields used as a mask. For example, <code>storage_data={\"a\": {\"b\": 1}}</code> matches any File with <code>storage_data</code> containing item <code>a</code> with value that contains <code>b=1</code>. This works only with data represented by nested dictionaries, without other structures, like list or sets.</p> <p>Experimental feature: File columns can be passed as a pair of operator and value. This feature will be replaced by strictly defined query language at some point:</p> <p><pre><code>ckanapi action files_file_search size:'[\"&lt;\", 100]' content_type:'[\"like\", \"text/%\"]'\n</code></pre> Fillowing operators are accepted: <code>=</code>, <code>&lt;</code>, <code>&gt;</code>, <code>!=</code>, <code>like</code></p> <p>Params:</p> <ul> <li><code>start</code>: index of first row in result/number of rows to skip. Default: <code>0</code></li> <li><code>rows</code>: number of rows to return. Default: <code>10</code></li> <li><code>sort</code>: name of File column used for sorting. Default: <code>name</code></li> <li><code>reverse</code>: sort results in descending order. Default: <code>False</code></li> <li><code>storage_data</code>: mask for <code>storage_data</code> column. Default: <code>{}</code></li> <li><code>plugin_data</code>: mask for <code>plugin_data</code> column. Default: <code>{}</code></li> <li><code>owner_type: str</code>: show only specific owner id if present. Default: <code>None</code></li> <li><code>owner_type</code>: show only specific owner type if present. Default: <code>None</code></li> <li><code>pinned</code>: show only pinned/unpinned items if present. Default: <code>None</code></li> <li><code>completed</code>: use <code>False</code> to search incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <ul> <li><code>count</code>: total number of files matching filters</li> <li><code>results</code>: array of dictionaries with file details.</li> </ul>"},{"location":"api/#files_file_search_by_usercontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_search_by_user(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Internal action. Do not use it.</p>"},{"location":"api/#files_file_showcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_show(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Show file details.</p> <p>This action only displays information from DB record. There is no way to get the content of the file using this action(or any other API action).</p> <pre><code>ckanapi action files_file_show id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n</code></pre> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>completed</code>: use <code>False</code> to show incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with file details</p>"},{"location":"api/#files_file_unpincontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_unpin(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Pin file to the current owner.</p> <p>Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>completed</code>: use <code>False</code> to unpin incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with details of updated file</p>"},{"location":"api/#files_multipart_completecontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_multipart_complete(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Finalize multipart upload and transform it into completed file.</p> <p>Depending on storage this action may require additional parameters. But usually it just takes ID and verify that content type, size and hash provided when upload was initialized, much the actual value.</p> <p>If data is valid and file is completed inside the storage, new File entry with file details created in DB and file can be used just as any normal file.</p> <p>Requires storage with <code>MULTIPART</code> capability.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the incomplete upload</li> </ul> <p>Returns:</p> <p>dictionary with details of the created file</p>"},{"location":"api/#files_multipart_refreshcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_multipart_refresh(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Refresh details of incomplete upload.</p> <p>Can be used if upload process was interrupted and client does not how many bytes were already uploaded.</p> <p>Requires storage with <code>MULTIPART</code> capability.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the incomplete upload</li> </ul> <p>Returns:</p> <p>dictionary with details of the updated upload</p>"},{"location":"api/#files_multipart_startcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_multipart_start(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Initialize multipart(resumable,continuous,signed,etc) upload.</p> <p>Apart from standard parameters, different storages can require additional data, so always check documentation of the storage before initiating multipart upload.</p> <p>When upload initialized, storage usually returns details required for further upload. It may be a presigned URL for direct upload, or just an ID of upload which must be used with <code>files_multipart_update</code>.</p> <p>Requires storage with <code>MULTIPART</code> capability.</p> <p>Params:</p> <ul> <li><code>storage</code>: name of the storage that will handle the upload. Default: <code>default</code></li> <li><code>name</code>: name of the uploaded file.</li> <li><code>content_type</code>: MIMEtype of the uploaded file. Used for validation</li> <li><code>size</code>: Expected size of upload. Used for validation</li> <li><code>hash</code>: Expected content hash. If present, used for validation.</li> </ul> <p>Returns:</p> <p>dictionary with details of initiated upload. Depends on used storage</p>"},{"location":"api/#files_multipart_updatecontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_multipart_update(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Update incomplete upload.</p> <p>Depending on storage this action may require additional parameters. Most likely, <code>upload</code> with the fragment of uploaded file.</p> <p>Requires storage with <code>MULTIPART</code> capability.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the incomplete upload</li> </ul> <p>Returns:</p> <p>dictionary with details of the updated upload</p>"},{"location":"api/#files_resource_uploadcontext-context-data_dict-dictstr-any","title":"<code>files_resource_upload(context: 'Context', data_dict: 'dict[str, Any]')</code>","text":"<p>Create a new file inside resource storage.</p> <p>This action internally calls <code>files_file_create</code> with <code>ignore_auth=True</code> and always uses resources storage.</p> <p>New file is not attached to resource. You need to call <code>files_transfer_ownership</code> manually, when resource created.</p> <p>Params:</p> <ul> <li><code>name</code>: human-readable name of the file. Default: guess using upload field</li> <li><code>upload</code>: content of the file as string, bytes, file descriptor or uploaded file</li> </ul> <p>Returns:</p> <p>dictionary with file details.</p>"},{"location":"api/#files_transfer_ownershipcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_transfer_ownership(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Transfer file ownership.</p> <p>Depending on storage this action may require additional parameters. Most likely, <code>upload</code> with the fragment of uploaded file.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the file upload</li> <li><code>completed</code>: use <code>False</code> to transfer incomplete uploads. Default: <code>True</code></li> <li><code>owner_id</code>: ID of the new owner</li> <li><code>owner_type</code>: type of the new owner</li> <li><code>force</code>: move file even if it's pinned. Default: <code>False</code></li> <li><code>pin</code>: pin file after transfer to stop future transfers. Default: <code>False</code></li> </ul> <p>Returns:</p> <p>dictionary with details of updated file</p>"},{"location":"changelog/","title":"Changelog","text":"<p>All notable changes to this project will be documented in this file.</p> <p>The format is based on Keep a Changelog and this project adheres to Semantic Versioning.</p>"},{"location":"changelog/#unreleased","title":"Unreleased","text":"<p>Compare with latest</p>"},{"location":"changelog/#features","title":"Features","text":"<ul> <li>transfer_history table (434abda by Sergey Motornyuk).</li> <li>libcloud adapter (6594bb9 by Sergey Motornyuk).</li> <li>add SCAN and ANALYZE to redis storage (96c2706 by Sergey Motornyuk).</li> <li>pinned files (3e1db60 by Sergey Motornyuk).</li> <li>files_download_info helper (2659ae3 by Sergey Motornyuk).</li> <li>validators for file fields (a849247 by Sergey Motornyuk).</li> <li>add owner details to dictized file (9ffd098 by Sergey Motornyuk).</li> <li>restrict list of available storage for authenticated uploads (a075263 by Sergey Motornyuk).</li> <li>allow_authenticated_uploads config option (7737b44 by Sergey Motornyuk).</li> <li>implement temporal_link for fs (45ec242 by Sergey Motornyuk).</li> <li>add public_link method to storage (15a685b by Sergey Motornyuk).</li> <li>optional hash verification for multipart upload (28e5f69 by Sergey Motornyuk).</li> <li>add supported_types option for storages to restict upload types (c5b43ac by Sergey Motornyuk).</li> <li>add files_file_search action (b8e8b4c by Sergey Motornyuk).</li> <li>File.get method (591ec48 by Sergey Motornyuk).</li> <li>get_storage without arguments returns default storage (571e021 by Sergey Motornyuk).</li> <li>use timezone-aware date columns in model (ae91cc7 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#code-refactoring","title":"Code Refactoring","text":"<ul> <li>remove HashingReader.reset (7e67d5f by Sergey Motornyuk).</li> <li>do not allow str as upload source (cae738d by Sergey Motornyuk).</li> <li>Capability.combine removed (332c1a4 by Sergey Motornyuk).</li> <li>access column removed from owner table and now only single owner allowed (f8d385d by Sergey Motornyuk).</li> <li>add completed flag for rename, show and delete actions for simultaneous file and multipart support. (10fb202 by Sergey Motornyuk).</li> <li>rename files_upload_show to files_multipart_refresh (ee2a4df by Sergey Motornyuk).</li> <li>add kwargs to all Storage methods and extras to all service methods (1526df1 by Sergey Motornyuk).</li> <li>rename Storage and Uploader multipart_upload into multipart for consistency with actions (ddfd111 by Sergey Motornyuk).</li> <li>rename files_upload_ actions to files_multipart_(initialize changed to start) (6493a1d by Sergey Motornyuk).</li> <li>rename MULTIPART_UPLOAD capability to MULTIPART (20d01bf by Sergey Motornyuk).</li> <li>use custom dataclass for Upload instead of werkzeug.datastructures.FileStorage (78ae63b by Sergey Motornyuk).</li> <li>move hash, size, location(former filename) and content_type to the top level of file entity (45a2679 by Sergey Motornyuk).</li> <li>extract File.completed==False into Multipart model (d90d786 by Sergey Motornyuk).</li> <li>use dataclasses instead of dict in storage (4965568 by Sergey Motornyuk).</li> <li>storage_from_settings renamed to make_storage (08fd767 by Sergey Motornyuk).</li> <li>transform combine_capabilities and exclude_capabilities into Capability methods (73d32d4 by Sergey Motornyuk).</li> <li>replace CapabilityCluster and CapabilityUnit with Capability (16d3b7e by Sergey Motornyuk).</li> <li>remove re-imported types from ckanext.files.types (4b9e870 by Sergey Motornyuk).</li> <li>remove support of CKAN pre v2.10 (3e70bc2 by Sergey Motornyuk).</li> <li>UnsupportedOperationError constructed with adapter type instead of name (55d038d by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v031-2024-05-22","title":"v0.3.1 - 2024-05-22","text":"<p>Compare with v0.3.0</p>"},{"location":"changelog/#features_1","title":"Features","text":"<ul> <li>generic_download view (d000446 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v030-2024-05-16","title":"v0.3.0 - 2024-05-16","text":"<p>Compare with v0.2.6</p>"},{"location":"changelog/#features_2","title":"Features","text":"<ul> <li>files_uploader plugin compatible with native uploader interface (31aaaa6 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#bug-fixes","title":"Bug Fixes","text":"<ul> <li>upload errors rendered outside of view box (48005ed by Sergey Motornyuk).</li> <li>upload errors in actions not tracked (530c6d9 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#code-refactoring_1","title":"Code Refactoring","text":"<ul> <li>disallow file creation via auth function (0db289b by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v006-2024-04-24","title":"v0.0.6 - 2024-04-24","text":"<p>Compare with v0.0.5</p>"},{"location":"changelog/#bug-fixes_1","title":"Bug Fixes","text":"<ul> <li>declarations are missing from the package (15fa97b by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v026-2024-04-24","title":"v0.2.6 - 2024-04-24","text":"<p>Compare with v0.2.4</p>"},{"location":"changelog/#bug-fixes_2","title":"Bug Fixes","text":"<ul> <li>declarations are missing from the package (16c4999 by Sergey Motornyuk).</li> <li>catch permission error on delete (9e6d799 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v024-2024-04-15","title":"v0.2.4 - 2024-04-15","text":"<p>Compare with v0.2.3</p>"},{"location":"changelog/#features_3","title":"Features","text":"<ul> <li>add dropzone and immediate upload (0486a00 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v023-2024-04-07","title":"v0.2.3 - 2024-04-07","text":"<p>Compare with v0.2.2</p>"},{"location":"changelog/#features_4","title":"Features","text":"<ul> <li>file search by plugin data (9dc51bd by Sergey Motornyuk).</li> <li>multipart uploaders accept initialize/complete payloads in JS (97d9933 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#bug-fixes_3","title":"Bug Fixes","text":"<ul> <li>python2 fails when content-length accessed (6e99315 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v022-2024-03-18","title":"v0.2.2 - 2024-03-18","text":"<p>Compare with v0.2.1</p>"},{"location":"changelog/#v021-2024-03-18","title":"v0.2.1 - 2024-03-18","text":"<p>Compare with v0.2.0</p>"},{"location":"changelog/#features_5","title":"Features","text":"<ul> <li>add move and copy operations (577b537 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v020-2024-03-12","title":"v0.2.0 - 2024-03-12","text":"<p>Compare with v0.0.5</p>"},{"location":"changelog/#features_6","title":"Features","text":"<ul> <li>UI for file uploads (4121e6f by Sergey Motornyuk).</li> <li>redis storage (cced1e8 by Sergey Motornyuk).</li> <li>multipart upload api (96051d4 by Sergey Motornyuk).</li> <li>GCS storage (cc5ee76 by Sergey Motornyuk).</li> <li>split files into Storage and File (71a7765 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#code-refactoring_2","title":"Code Refactoring","text":"<ul> <li>switch to typescript (dfb5060 by Sergey Motornyuk).</li> <li>full type coverage (f399582 by Sergey Motornyuk).</li> <li>get rid of blankets (dfe901c by Sergey Motornyuk).</li> <li>make types py2 compatible (c099dfc by Sergey Motornyuk).</li> <li>remove ckanext-toolbelt dependency (dc885c7 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v005-2024-02-26","title":"v0.0.5 - 2024-02-26","text":"<p>Compare with v0.0.4</p>"},{"location":"changelog/#bug-fixes_4","title":"Bug Fixes","text":"<ul> <li>fix auth functions (c76d5ff by mutantsan).</li> <li>collection type without generic (0647355 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v004-2023-10-25","title":"v0.0.4 - 2023-10-25","text":"<p>Compare with v0.0.2</p>"},{"location":"changelog/#v002-2022-02-09","title":"v0.0.2 - 2022-02-09","text":"<p>Compare with v0.0.1</p>"},{"location":"changelog/#v001-2021-09-21","title":"v0.0.1 - 2021-09-21","text":"<p>Compare with first commit</p>"},{"location":"cli/","title":"CLI","text":"<p>ckanext-files register <code>files</code> entrypoint under <code>ckan</code> command. Commands below must be executed as <code>ckan -c $CKAN_INI files &lt;COMMAND&gt;</code>.</p> <p><code>adapters [-v]</code></p> <p>List all available storage adapters. With <code>-v/--verbose</code> flag docstring from adapter classes are printed as well.</p> <p><code>storages [-v]</code></p> <p>List all configured storages. With <code>-v/--verbose</code> flag all supported capabilities are shown.</p> <p><code>stream FILE_ID [-o OUTPUT] [--start START] [--end END]</code></p> <p>Stream content of the file to STDOUT. For non-textual files use output redirection <code>stream ID &gt; file.ext</code>. Alternatively, output destination can be specified via <code>-o/--output</code> option. If it contains path to directory, inside this directory will be created file with the same name as streamed item. Otherwise, <code>OUTPUT</code> is used as filename.</p> <p><code>--start</code> and <code>--end</code> can be used to receive a fragment of the file. Only positive values are guaranteed to work with any storage that supports STREAM. Some storages support negative values for these options and count them from the end of file. I.e <code>--start -10</code> reads last 10 bytes of file. <code>--end -1</code> reads till the last byte, but the last byte is not included into output.</p> <p><code>scan [-s default] [-u] [-t [-a OWNER_ID]]</code></p> <p>List all files that exist in storage. Works only if storage supports <code>SCAN</code>. By default shows content of <code>default</code> storage. <code>-s/--storage-name</code> option changes target storage.</p> <p><code>-u/--untracked-only</code> flag shows only untracked files, that has no corresponding record in DB. Can be used to identify leftovers after removing data from portal.</p> <p><code>-t/--track</code> flag registers any untracked file by creating DB record for it. Can be used only when <code>ANALYZE</code> is supported. Files are created without an owner. Use <code>-a/--adopt-by</code> option with user ID to give ownership over new files to the specified user. Can be used when configuring a new storage connected to existing location with files.</p>"},{"location":"implementation-example/","title":"Example implementation of custom storage adapter","text":"<p>Storage consist of the storage object that dispatches operation requests and 3 services that do the actual job: Reader, Uploader and Manager. To define a custom storage, you need to extend the main storage class, describe storage logic and register storage via <code>IFiles.files_get_storage_adapters</code>.</p> <p>Let's implement DB storage. It will store files in SQL table using SQLAlchemy. There will be just one requirement for the table: it must have column for storing unique identifier of the file and another column for storing content of the file as bytes.</p> <p>For the sake of simplicity, our storage will work only with existing tables. Create the table manually before we begin.</p> <p>First of all, we create an adapter that does nothing and register it in our plugin.</p> <pre><code>from __future__ import annotations\n\nfrom typing import Any\nimport sqlalchemy as sa\n\nimport ckan.plugins as p\nfrom ckan.model.types import make_uuid\nfrom ckanext.files import shared\n\n\nclass ExamplePlugin(p.SingletonPlugin):\n    p.implements(shared.IFiles)\n    def files_get_storage_adapters(self) -&gt; dict[str, Any]:\n        return {\"example:db\": DbStorage}\n\n\nclass DbStorage(shared.Storage):\n    ...\n</code></pre> <p>After installing and enabling your custom plugin, you can configure storage with this adapter by adding a single new line to config file:</p> <pre><code>ckanext.files.storage.db.type = files:db\n</code></pre> <p>But if you check storage via <code>ckan files storages -v</code>, you'll see that it can't do anything.</p> <pre><code>ckan files storages -v\n\n... db: example:db\n...        Supports: Capability.NONE\n...        Does not support: Capability.REMOVE|STREAM|CREATE|...\n</code></pre> <p>Before we start uploading files, let's make sure that storage has proper configuration. As files will be stored in the DB table, we need the name of the table and DB connection string. Let's assume that table already exists, but we don't know which columns to use for files. So we need name of column for content and for file's unique identifier. ckanext-files uses term <code>location</code> instead of identifier, so we'll do the same in our implementation.</p> <p>There are 4 required options in total: * <code>db_url</code>: DB connection string * <code>table</code>: name of the table * <code>location_column</code>: name of column for file's unique identifier * <code>content_column</code>: name of column for file's content</p> <p>It's not mandatory, but is highly recommended that you declare config options for the adapter. It can be done via <code>Storage.declare_config_options</code> class method, which accepts <code>declaration</code> object and <code>key</code> namespace for storage options.</p> <pre><code>class DbStorage(shared.Storage):\n\n    @classmethod\n    def declare_config_options(cls, declaration, key) -&gt; None:\n        declaration.declare(key.db_url).required()\n        declaration.declare(key.table).required()\n        declaration.declare(key.location_column).required()\n        declaration.declare(key.content_column).required()\n</code></pre> <p>And we probably want to initialize DB connection when storage is initialized. For this we'll extend constructor, which must be defined as method accepting keyword-only arguments:</p> <pre><code>class DbStorage(shared.Storage):\n    ...\n\n    def __init__(self, **settings: Any) -&gt; None:\n        db_url = self.ensure_option(settings, \"db_url\")\n\n        self.engine = sa.create_engine(db_url)\n        self.location_column = sa.column(\n            self.ensure_option(settings, \"location_column\")\n        )\n        self.content_column = sa.column(self.ensure_option(settings, \"content_column\"))\n        self.table = sa.table(\n            self.ensure_option(settings, \"table\"),\n            self.location_column,\n            self.content_column,\n        )\n        super().__init__(**settings)\n</code></pre> <p>You can notice that we are using <code>Storage.ensure_option</code> quite often. This method returns the value of specified option from settings or raises an exception.</p> <p>The table definition and columns are saved as storage attributes, to simplify building SQL queries in future.</p> <p>Now we are going to define classes for all 3 storage services and tell storage, how to initialize these services.</p> <p>There are 3 services: Reader, Uploader and Manager. Each of them initialized via corresponding storage method: <code>make_reader</code>, <code>make_uploader</code> and <code>make_manager</code>. And each of them accepts a single argument during creation, the storage itself.</p> <pre><code>class DbStorage(shared.Storage):\n    def make_reader(self):\n        return DbReader(self)\n\n    def make_uploader(self):\n        return DbUploader(self)\n\n    def make_manager(self):\n        return DbManager(self)\n\n\nclass DbReader(shared.Reader):\n    ...\n\n\nclass DbUploader(shared.Uploader):\n    ...\n\n\nclass DbManager(shared.Manager):\n    ...\n</code></pre> <p>Our first target is Uploader service. It's responsible for file creation. For the minimal implementation it needs <code>upload</code> method and <code>capabilities</code> attribute which tells the storage, what exactly the Uploader can do.</p> <pre><code>class DbUploader(shared.Uploader):\n    capabilities = shared.Capability.CREATE\n\n    def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -&gt; shared.FileData:\n        ...\n</code></pre> <p><code>upload</code> receives the <code>location</code>(name) of the uploaded file; <code>upload</code> object with file's content; and <code>extras</code> dictionary that contains any additional arguments that can be passed to uploader. We are going to ignore <code>location</code> and generate a unique UUID for every uploaded file instead of using user-defined filename.</p> <p>The goal is to write the file into DB and return <code>shared.FileData</code> that contains location of the file in DB(value of <code>location_column</code>), size of the file in bytes, MIMEtype of the file and hash of file content.</p> <p>For location we'll just use <code>ckan.model.types.make_uuid</code> function. Size and MIMEtype are already available as <code>upload.size</code> and <code>upload.content_type</code>.</p> <p>The only problem is hash of the content. You can compute it in any way you like, but there is a simple option if you have no preferences. <code>upload</code> has <code>hashing_reader</code> method, which returns an iterable for file content. When you read file through it, content hash is automatically computed and you can get it using <code>get_hash</code> method of the reader.</p> <p>Just make sure to read the whole file before checking the hash, because hash computed using consumed content. I.e, if you just create the hashing reader, but do not read a single byte from it, you'll receive the hash of empty string. If you read just 1 byte, you'll receive the hash of this single byte, etc.</p> <p>The easiest option for you is to call <code>reader.read()</code> method to consume the whole file and then call <code>reader.get_hash()</code> to receive the hash.</p> <p>Here's the final implementation of DbUploader:</p> <pre><code>class DbUploader(shared.Uploader):\n    capabilities = shared.Capability.CREATE\n\n    def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -&gt; shared.FileData:\n        uuid = make_uuid()\n        reader = upload.hashing_reader()\n\n        values = {\n            self.storage.location_column: uuid,\n            self.storage.content_column: reader.read(),\n        }\n        stmt = sa.insert(self.storage.table, values)\n\n        result = self.storage.engine.execute(stmt)\n\n        return shared.FileData(\n            uuid,\n            upload.size,\n            upload.content_type,\n            reader.get_hash()\n        )\n</code></pre> <p>Now you can upload file into your new <code>db</code> storage:</p> <pre><code>ckanapi action files_file_create storage=db name=hello.txt upload@&lt;(echo -n 'hello world')\n\n...{\n...  \"atime\": null,\n...  \"content_type\": \"text/plain\",\n...  \"ctime\": \"2024-06-17T13:48:52.121755+00:00\",\n...  \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...  \"id\": \"bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\",\n...  \"location\": \"5a4472b3-cf38-4c58-81a6-4d4acb7b170e\",\n...  \"mtime\": null,\n...  \"name\": \"hello.txt\",\n...  \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...  \"owner_type\": \"user\",\n...  \"pinned\": false,\n...  \"size\": 11,\n...  \"storage\": \"db\",\n...  \"storage_data\": {}\n...}\n</code></pre> <p>File is created, but you cannot read it just yet. Try running <code>ckan files stream</code> CLI command with file ID:</p> <pre><code>ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... Operation stream is not supported by db storage\n... Aborted!\n</code></pre> <p>As expected, you have to write extra code.</p> <p>Streaming, reading and generating links is a responsibility of Reader service. We only need <code>stream</code> method for minimal implementation. This method receives <code>shared.FileData</code> object(the same object as the one returned from <code>Uploader.upload</code>) and <code>extras</code> containing all additional arguments passed by the caller. The result is any iterable producing bytes.</p> <p>We'll use <code>location</code> property of <code>shared.FileData</code> as a value for <code>location_column</code> inside the table.</p> <p>And don't forget to add <code>STREAM</code> capability to <code>Reader.capabilities</code>.</p> <pre><code>class DbReader(shared.Reader):\n    capabilities = shared.Capability.STREAM\n\n    def stream(self, data: shared.FileData, extras: dict[str, Any]) -&gt; Iterable[bytes]:\n        stmt = (\n            sa.select(self.storage.content_column)\n            .select_from(self.storage.table)\n            .where(self.storage.location_column == data.location)\n        )\n        row = self.storage.engine.execute(stmt).fetchone()\n\n        return row\n</code></pre> <p>The result may be confusing: we returning Row object from the stream method. But our goal is to return any iterable that produces bytes. Row is iterable(tuple-like). And it contains only one item - value of column with file content, i.e, bytes. So it satisfy the requirements.</p> <p>Now you can check content via CLI once again.</p> <pre><code>ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... hello world\n</code></pre> <p>Finally, we need to add file removal for the minimal implementation. And it also nice to to have <code>SCAN</code> capability, as it shows all files currently available in storage, so we add it as bonus. These operations handled by Manager. We need <code>remove</code> and <code>scan</code> methods. Arguments are already familiar to you. As for results:</p> <ul> <li><code>remove</code>: return <code>True</code> if file was successfully removed. Should return   <code>False</code> if file does not exist, but it's allowed to return <code>True</code> as long as   you are not checking the result.</li> <li><code>scan</code>: return iterable with all file locations</li> </ul> <pre><code>class DbManager(shared.Manager):\n    storage: DbStorage\n    capabilities = shared.Capability.SCAN | shared.Capability.REMOVE\n\n    def scan(self, extras: dict[str, Any]) -&gt; Iterable[str]:\n        stmt = sa.select(self.storage.location_column).select_from(self.storage.table)\n        for row in self.storage.engine.execute(stmt):\n            yield row[0]\n\n    def remove(\n        self,\n        data: shared.FileData | shared.MultipartData,\n        extras: dict[str, Any],\n    ) -&gt; bool:\n        stmt = sa.delete(self.storage.table).where(\n            self.storage.location_column == data.location,\n        )\n        self.storage.engine.execute(stmt)\n        return True\n</code></pre> <p>Now you can list the all the files in storage: <pre><code>ckan files scan -s db\n</code></pre></p> <p>And remove file using ckanaapi and file ID</p> <pre><code>ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n</code></pre> <p>That's all you need for the basic storage. But check definition of base storage and services to find details about other methods. And also check implementation of other storages for additional ideas. &lt;</p>"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#requirements","title":"Requirements","text":"<p>Compatibility with core CKAN versions:</p> CKAN version Compatible? 2.9 no 2.10 yes 2.11 yes master yes <p>Note</p> <p>It's recommended to install the extension via pip. If you are using GitHub version of the extension, stick to the vX.Y.Z tags to avoid breaking changes. Check the changelog before upgrading the extension.</p>"},{"location":"installation/#installation_1","title":"Installation","text":"<p>Install the extension</p> <pre><code>pip install ckanext-files # (1)!\n</code></pre> <ol> <li>If you want to use additional adapters, like Apache-libcloud or OpenDAL,    specify corresponding package extras    <pre><code>pip install ckanext-files[opendal,libcloud]\n</code></pre></li> </ol> <p>Add <code>files</code> to the <code>ckan.plugins</code> setting in your CKAN config file.</p> <p>Run DB migrations</p> <pre><code>ckan db upgrade -p files\n</code></pre>"},{"location":"interfaces/","title":"Interfaces","text":""},{"location":"interfaces/#interfaces","title":"Interfaces","text":"<p>ckanext-files registers <code>ckanext.files.shared.IFiles</code> interface. As extension is actively developed, this interface may change in future. Always use <code>inherit=True</code> when implementing <code>IFiles</code>.</p> <pre><code>class IFiles(Interface):\n    \"\"\"Extension point for ckanext-files.\"\"\"\n\n    def files_get_storage_adapters(self) -&gt; dict[str, Any]:\n        \"\"\"Return mapping of storage type to adapter class.\n\n        Example:\n        &gt;&gt;&gt; def files_get_storage_adapters(self):\n        &gt;&gt;&gt;     return {\n        &gt;&gt;&gt;         \"my_ext:dropbox\": DropboxStorage,\n        &gt;&gt;&gt;     }\n\n        \"\"\"\n\n        return {}\n\n    def files_register_owner_getters(self) -&gt; dict[str, Callable[[str], Any]]:\n        \"\"\"Return mapping with lookup functions for owner types.\n\n        Name of the getter is the name used as `Owner.owner_type`. The getter\n        itself is a function that accepts owner ID and returns optional owner\n        entity.\n\n        Example:\n        &gt;&gt;&gt; def files_register_owner_getters(self):\n        &gt;&gt;&gt;     return {\"resource\": model.Resource.get}\n        \"\"\"\n        return {}\n\n    def files_file_allows(\n        self,\n        context: types.Context,\n        file: File | Multipart,\n        operation: types.FileOperation,\n    ) -&gt; bool | None:\n        \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n        Return True/False if user allowed/not allowed. Return `None` to rely on\n        other plugins.\n\n        Default implementation relies on cascade_access config option. If owner\n        of file is included into cascade access, user can perform operation on\n        file if he can perform the same operation with file's owner.\n\n        If current owner is not affected by cascade access, user can perform\n        operation on file only if user owns the file.\n\n        Example:\n        &gt;&gt;&gt; def files_file_allows(\n        &gt;&gt;&gt;         self, context,\n        &gt;&gt;&gt;         file: shared.File | shared.Multipart,\n        &gt;&gt;&gt;         operation: shared.types.FileOperation\n        &gt;&gt;&gt; ) -&gt; bool | None:\n        &gt;&gt;&gt;     if file.owner_info and file.owner_info.owner_type == \"resource\":\n        &gt;&gt;&gt;         return is_authorized_boolean(\n        &gt;&gt;&gt;             f\"resource_{operation}\",\n        &gt;&gt;&gt;             context,\n        &gt;&gt;&gt;             {\"id\": file.owner_info.id}\n        &gt;&gt;&gt;         )\n        &gt;&gt;&gt;\n        &gt;&gt;&gt;     return None\n\n        \"\"\"\n        return None\n\n    def files_owner_allows(\n        self,\n        context: types.Context,\n        owner_type: str,\n        owner_id: str,\n        operation: types.OwnerOperation,\n    ) -&gt; bool | None:\n        \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n        Return True/False if user allowed/not allowed. Return `None` to rely on\n        other plugins.\n\n        Example:\n        &gt;&gt;&gt; def files_owner_allows(\n        &gt;&gt;&gt;         self, context,\n        &gt;&gt;&gt;         owner_type: str, owner_id: str,\n        &gt;&gt;&gt;         operation: shared.types.OwnerOperation\n        &gt;&gt;&gt; ) -&gt; bool | None:\n        &gt;&gt;&gt;     if owner_type == \"resource\" and operation == \"file_transfer\":\n        &gt;&gt;&gt;         return is_authorized_boolean(\n        &gt;&gt;&gt;             f\"resource_update\",\n        &gt;&gt;&gt;             context,\n        &gt;&gt;&gt;             {\"id\": owner_id}\n        &gt;&gt;&gt;         )\n        &gt;&gt;&gt;\n        &gt;&gt;&gt;     return None\n\n        \"\"\"\n        return None\n</code></pre>"},{"location":"primer/","title":"Welcome to MkDocs","text":"<p>For full documentation visit mkdocs.org{ data-preview }</p> <p>Attribute Lists{ data-preview }</p> <p>Some title</p> <p>Some content</p> <p>Some title</p> <p>Some content</p> Open styled details Nested details! <p>And more content again.</p> <pre><code>theme:\nfeatures:\n- content.code.annotate # (1)!\n</code></pre> <ol> <li>:man_raising_hand: I'm a code annotation! I can contain <code>code</code>, formatted text, images, ... basically anything that can be written in Markdown.</li> </ol> C <pre><code>#include &lt;stdio.h&gt;\n\nint main(void) {\nprintf(\"Hello world!\\n\");\nreturn 0;\n}\n</code></pre> C++ <pre><code>#include &lt;iostream&gt;\n\nint main(void) {\nstd::cout &lt;&lt; \"Hello world!\" &lt;&lt; std::endl;\nreturn 0;\n}\n</code></pre> <pre><code>graph LR\nA[Start] --&gt; B{Error?};\nB --&gt;|Yes| C[Hmm...];\nC --&gt; D[Debug];\nD --&gt; B;\nB ----&gt;|No| E[Yay!];</code></pre> <pre><code>sequenceDiagram\nautonumber\nAlice-&gt;&gt;John: Hello John, how are you?\nloop Healthcheck\nJohn-&gt;&gt;John: Fight against hypochondria\nend\nNote right of John: Rational thoughts!\nJohn--&gt;&gt;Alice: Great!\nJohn-&gt;&gt;Bob: How about you?\nBob--&gt;&gt;John: Jolly good!</code></pre> <p>```py title=\"IFiles\" class IFiles(Interface):     \"\"\"Extension point for ckanext-files.\"\"\"</p> <pre><code>def files_get_storage_adapters(self) -&gt; dict[str, Any]:\n    \"\"\"Return mapping of storage type to adapter class.\n\n    Example:\n    &gt;&gt;&gt; def files_get_storage_adapters(self):\n    &gt;&gt;&gt;     return {\n    &gt;&gt;&gt;         \"my_ext:dropbox\": DropboxStorage,\n    &gt;&gt;&gt;     }\n\n    \"\"\"\n\n    return {}\n\ndef files_register_owner_getters(self) -&gt; dict[str, Callable[[str], Any]]:\n    \"\"\"Return mapping with lookup functions for owner types.\n\n    Name of the getter is the name used as `Owner.owner_type`. The getter\n    itself is a function that accepts owner ID and returns optional owner\n    entity.\n\n    Example:\n    &gt;&gt;&gt; def files_register_owner_getters(self):\n    &gt;&gt;&gt;     return {\"resource\": model.Resource.get}\n    \"\"\"\n    return {}\n\ndef files_file_allows(\n    self,\n    context: types.Context,\n    file: File | Multipart,\n    operation: types.FileOperation,\n) -&gt; bool | None:\n    \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n    Return True/False if user allowed/not allowed. Return `None` to rely on\n    other plugins.\n\n    Default implementation relies on cascade_access config option. If owner\n    of file is included into cascade access, user can perform operation on\n    file if he can perform the same operation with file's owner.\n\n    If current owner is not affected by cascade access, user can perform\n    operation on file only if user owns the file.\n\n    Example:\n    &gt;&gt;&gt; def files_file_allows(\n    &gt;&gt;&gt;         self, context,\n    &gt;&gt;&gt;         file: shared.File | shared.Multipart,\n    &gt;&gt;&gt;         operation: shared.types.FileOperation\n    &gt;&gt;&gt; ) -&gt; bool | None:\n    &gt;&gt;&gt;     if file.owner_info and file.owner_info.owner_type == \"resource\":\n    &gt;&gt;&gt;         return is_authorized_boolean(\n    &gt;&gt;&gt;             f\"resource_{operation}\",\n    &gt;&gt;&gt;             context,\n    &gt;&gt;&gt;             {\"id\": file.owner_info.id}\n    &gt;&gt;&gt;         )\n    &gt;&gt;&gt;\n    &gt;&gt;&gt;     return None\n\n    \"\"\"\n    return None\n\ndef files_owner_allows(\n    self,\n    context: types.Context,\n    owner_type: str,\n    owner_id: str,\n    operation: types.OwnerOperation,\n) -&gt; bool | None:\n    \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n    Return True/False if user allowed/not allowed. Return `None` to rely on\n    other plugins.\n\n    Example:\n    &gt;&gt;&gt; def files_owner_allows(\n    &gt;&gt;&gt;         self, context,\n    &gt;&gt;&gt;         owner_type: str, owner_id: str,\n    &gt;&gt;&gt;         operation: shared.types.OwnerOperation\n    &gt;&gt;&gt; ) -&gt; bool | None:\n    &gt;&gt;&gt;     if owner_type == \"resource\" and operation == \"file_transfer\":\n    &gt;&gt;&gt;         return is_authorized_boolean(\n    &gt;&gt;&gt;             f\"resource_update\",\n    &gt;&gt;&gt;             context,\n    &gt;&gt;&gt;             {\"id\": owner_id}\n    &gt;&gt;&gt;         )\n    &gt;&gt;&gt;\n    &gt;&gt;&gt;     return None\n\n    \"\"\"\n    return None\n\n\n\n  ```\n\n  === \"Hello\"\n\n  world\n\n  === \"bye\"\n\n  world\n</code></pre>"},{"location":"shared/","title":"Shared","text":"<p>All public utilites are collected inside     <code>ckanext.files.shared</code> module. Avoid using anything that     is not listed there. Do not import anything from modules other than     <code>shared</code>.</p>"},{"location":"shared/#get_storagename-str-none-none-storage","title":"<code>get_storage(name: 'str | None' = None) -&gt; 'Storage'</code>","text":"<p>Return existing storage instance.</p> <p>Storages are initialized when plugin is loaded. As result, this function always returns the same storage object for the given name.</p> <p>If no name specified, default storage is returned.</p> <p>Example: <pre><code>default_storage = get_storage()\nstorage = get_storage(\"storage name\")\n</code></pre></p>"},{"location":"shared/#make_storagename-str-settings-dictstr-any-storage","title":"<code>make_storage(name: 'str', settings: 'dict[str, Any]') -&gt; 'Storage'</code>","text":"<p>Initialize storage instance with specified settings.</p> <p>Storage adapter is defined by <code>type</code> key of the settings. All other settings depend on the specific adapter.</p> <p>Example: <pre><code>storage = make_storage(\"memo\", {\"type\": \"files:redis\"})\n</code></pre></p>"},{"location":"shared/#make_uploadvalue-typesuploadable-upload-upload","title":"<code>make_upload(value: 'types.Uploadable | Upload') -&gt; 'Upload'</code>","text":"<p>Convert value into Upload object</p> <p>Use this function for simple and reliable initialization of Upload object. Avoid creating Upload manually, unless you are 100% sure you can provide correct MIMEtype, size and stream.</p> <p>Example: <pre><code>storage.upload(\"file.txt\", make_upload(b\"hello world\"))\n</code></pre></p>"},{"location":"shared/#with_task_queuefunc-any-name-str-none-none","title":"<code>with_task_queue(func: 'Any', name: 'str | None' = None)</code>","text":"<p>Decorator for functions that schedule tasks.</p> <p>Decorated function automatically initializes separate task queue that is processed when function is finished. All tasks receive function's result as execution data(first argument to Task.run).</p> <p>Without this decorator, you have to manually create task queue context before queuing tasks.</p> <p>Example: <pre><code>@with_task_queue\ndef my_action(context, data_dict):\n    ...\n</code></pre></p>"},{"location":"shared/#add_tasktask-task","title":"<code>add_task(task: 'Task')</code>","text":"<p>Add task to the current task queue.</p> <p>This function can be called only inside task queue context. Such context initialized automatically inside functions decorated with <code>with_task_queue</code>: <pre><code>@with_task_queue\ndef taks_producer():\n    add_task(...)\n\ntask_producer()\n</code></pre></p> <p>If task queue context can be initialized manually using TaskQueue and <code>with</code> statement: <pre><code>queue = TaskQueue()\nwith queue:\n    add_task(...)\n\nqueue.process(execution_data)\n</code></pre></p>"},{"location":"upload-strategies/","title":"File upload strategies","text":"<p>There is no \"right\" way to add file to entity via ckanext-files. Everything depends on your use-case and here you can find a few different ways to combine file and arbitrary entity.</p>"},{"location":"upload-strategies/#attach-existing-file-and-then-transfer-ownership-via-api","title":"Attach existing file and then transfer ownership via API","text":"<p>The simplest option is just saving file ID inside a field of the entity. It's recommended to transfer file ownership to the entity and pin the file.</p> <pre><code>ckanapi action package_patch id=PACKAGE_ID attachment_id=FILE_ID\n\nckanapi action files_transfer_ownership id=FILE_ID \\\n    owner_type=package owner_id=PACKAGE_ID pin=true\n</code></pre> <p>Pros: * simple and transparent</p> <p>Cons: * it's easy to forget about ownership transfer and leave the entity with the   inaccessible file * after entity got reference to file and before ownership is transfered data   may be considered invalid.</p>"},{"location":"upload-strategies/#automatically-transfer-ownership-using-validator","title":"Automatically transfer ownership using validator","text":"<p>Add <code>files_transfer_ownership(owner_type)</code> to the validation schema of entity. When it validated, ownership transfer task is queued and file automatically transfered to the entity after the update.</p> <p>Pros: * minimal amount of changes if metadata schema already modified * relationships between owner and file are up-to-date after any modification</p> <p>Cons: * works only with files uploaded in advance and cannot handle native   implementation of resource form</p>"},{"location":"upload-strategies/#upload-file-and-assign-owner-via-queued-task","title":"Upload file and assign owner via queued task","text":"<p>Add a field that accepts uploaded file. The action itself does not process the upload. Instead create a validator for the upload field, that will schedule a task for file upload and ownership transfer.</p> <p>In this way, if action is failed, no upload happens and you don't need to do anything with the file, as it never left server's temporal directory. If action finished without an error, the task is executed and file uploaded/attached to action result.</p> <p>Pros: * can be used together with native group/user/resource form after small   modification of CKAN core. * handles upload inside other action as an atomic operation</p> <p>Cons: * you have to validate file before upload happens to prevent situation when   action finished successfully but then upload failed because of file's content   type or size. * tasks themselves are experimental and it's not recommended to put a lot of   logic into them * there are just too many things that can go wrong</p>"},{"location":"upload-strategies/#add-a-new-action-that-combines-uploads-modifications-and-ownership-transfer","title":"Add a new action that combines uploads, modifications and ownership transfer","text":"<p>If you want to add attachmen to dataset, create a separate action that accepts dataset ID and uploaded file. Internally it will upload the file by calling <code>files_file_create</code>, then update dataset via <code>packaage_patch</code> and finally transfer ownership via <code>files_transfer_ownership</code>.</p> <p>Pros: * no magic. Everything is described in the new action * can be extracted into shared extension and used across multiple portals</p> <p>Cons: * if you need to upload multiple files and update multipe fields, action   quickly becomes too compicated. * integration with existing workflows, like dataset/resource creation is   hard. You have to override existing views or create a brand new ones.</p>"},{"location":"validators/","title":"Validators","text":"Validator Effect files_into_upload Transform value of field(usually file uploaded via <code>&lt;input type=\"file\"&gt;</code>) into upload object using <code>ckanext.files.shared.make_upload</code> files_parse_filesize Convert human-readable filesize(1B, 10MiB, 20GB) into an integer files_ensure_name(name_field) If <code>name_field</code> is empty, copy into it filename from current field. Current field must be processed with <code>files_into_upload</code> first files_file_id_exists Verify that file ID exists files_accept_file_with_type(*type) Verify that file ID refers to file with one of specified types. As a type can be used full MIMEtype(<code>image/png</code>), or just its main(<code>image</code>) or secondary(<code>png</code>) part files_accept_file_with_storage(*storage_name) Verify that file ID refers to file stored inside one of specified storages files_transfer_ownership(owner_type, name_of_owner_id_field) Transfer ownership for file ID to specified entity when current API action is successfully finished"},{"location":"configuration/","title":"Configuration","text":"<p>There are two types of config options for ckanext-files:</p> <ul> <li>Global: affects the behavior of the extension and every available storage   adapter.</li> <li>Storage configuration: changes behavior of the specific storage and never   affects anything outside of the storage.</li> </ul> <p>Depending on the type of the storage, available options are quite different. For example, <code>files:fs</code> storage type requires <code>path</code> option that controls filesystem path where uploads are stored. <code>files:redis</code> storage type accepts <code>prefix</code> option that defines Redis' key prefix of files stored in Redis. All storage specific options always have form <code>ckanext.files.storage.&lt;STORAGE&gt;.&lt;OPTION&gt;</code>:</p> <pre><code>ckanext.files.storage.memory.prefix = xxx:\n# or\nckanext.files.storage.my_drive.path = /tmp/hello\n</code></pre>"},{"location":"configuration/fs/","title":"Filesystem storage configuration","text":"<p>Private filesystem storage</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n</code></pre> <p>Public filesystem storage</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:public_fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n## URL of the storage folder. `public_root + location` must produce a public URL\nckanext.files.storage.NAME.public_root =\n</code></pre>"},{"location":"configuration/global/","title":"Global configuration","text":"<pre><code># Default storage used for upload when no explicit storage specified\n# (optional, default: default)\nckanext.files.default_storage = default\n\n# MIMEtypes that can be served without content-disposition:attachment header.\n# (optional, default: application/pdf image video)\nckanext.files.inline_content_types = application/pdf image video\n\n# Storage used for user image uploads. When empty, user image uploads are not\n# allowed.\n# (optional, default: user_images)\nckanext.files.user_images_storage = user_images\n\n# Storage used for group image uploads. When empty, group image uploads are\n# not allowed.\n# (optional, default: group_images)\nckanext.files.group_images_storage = group_images\n\n# Storage used for resource uploads. When empty, resource uploads are not\n# allowed.\n# (optional, default: resources)\nckanext.files.resources_storage = resources\n\n# Enable HTML templates and JS modules required for unsafe default\n# implementation of resource uploads via files. IMPORTANT: this option exists\n# to simplify migration and experiments with the extension. These templates\n# may change a lot or even get removed in the public release of the\n# extension.\n# (optional, default: false)\nckanext.files.enable_resource_migration_template_patch = false\n\n# Any authenticated user can upload files.\n# (optional, default: false)\nckanext.files.authenticated_uploads.allow = false\n\n# Names of storages that can by used by non-sysadmin users when authenticated\n# uploads enabled\n# (optional, default: default)\nckanext.files.authenticated_uploads.storages = default\n\n# List of owner types that grant access on owned file to anyone who has\n# access to the owner of file. For example, if this option has value\n# `resource package`, anyone who passes `resource_show` auth, can see all\n# files owned by resource; anyone who passes `package_show`, can see all\n# files owned by package; anyone who passes\n# `package_update`/`resource_update` can modify files owned by\n# package/resource; anyone who passes `package_delete`/`resource_delete` can\n# delete files owned by package/resoure. IMPORTANT: Do not add `user` to this\n# list. Files may be temporarily owned by user during resource creation.\n# Using cascade access rules with `user` exposes such temporal files to\n# anyone who can read user's profile.\n# (optional, default: package resource group organization)\nckanext.files.owner.cascade_access = package resource group organization\n\n# Use `&lt;OWNER_TYPE&gt;_update` auth function to check access for ownership\n# transfer. When this flag is disabled `&lt;OWNER_TYPE&gt;_file_transfer` auth\n# function is used.\n# (optional, default: true)\nckanext.files.owner.transfer_as_update = true\n\n# Use `&lt;OWNER_TYPE&gt;_update` auth function to check access when listing all\n# files of the owner. When this flag is disabled `&lt;OWNER_TYPE&gt;_file_scan`\n# auth function is used.\n# (optional, default: true)\nckanext.files.owner.scan_as_update = true\n</code></pre>"},{"location":"configuration/libcloud/","title":"Apache libcloud storage configuration","text":"<p>To use this storage install extension with <code>libcloud</code> extras.</p> <pre><code>pip install 'ckanext-files[libcloud]'\n</code></pre> <p>The actual storage backend is controlled by <code>provider</code> option of the storage. List of all providers is available here</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:libcloud\n## apache-libcloud storage provider. List of providers available at https://libcloud.readthedocs.io/en/stable/storage/supported_providers.html#provider-matrix . Use upper-cased value from Provider Constant column\nckanext.files.storage.NAME.provider =\n## API key or username\nckanext.files.storage.NAME.key =\n## Secret password\nckanext.files.storage.NAME.secret =\n## JSON object with additional parameters passed directly to storage constructor.\nckanext.files.storage.NAME.params =\n## Name of the container(bucket)\nckanext.files.storage.NAME.container =\n</code></pre>"},{"location":"configuration/opendal/","title":"OpenDAL storage configuration","text":"<p>To use this storage install extension with <code>opendal</code> extras.</p> <pre><code>pip install 'ckanext-files[opendal]'\n</code></pre> <p>The actual storage backend is controlled by <code>scheme</code> option of the storage. List of all schemes is available here</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:opendal\n## OpenDAL service type. Check available services at  https://docs.rs/opendal/latest/opendal/services/index.html\nckanext.files.storage.NAME.scheme =\n## JSON object with parameters passed directly to OpenDAL operator.\nckanext.files.storage.NAME.params =\n</code></pre>"},{"location":"configuration/redis/","title":"Redis storage configuration","text":"<pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.NAME.prefix = ckanext:files:default:file_content:\n</code></pre>"},{"location":"configuration/storage/","title":"Storage configuration","text":"<p>All available options for the storage type can be checked via config declarations CLI. First, add the storage type to the config file:</p> <pre><code>ckanext.files.storage.xxx.type = files:redis\n</code></pre> <p>Now run the command that shows all available config option of the plugin.</p> <pre><code>ckan config declaration files -d\n</code></pre> <p>Because Redis storage adapter is enabled, you'll see all the options regsitered by Redis adapter alongside with the global options:</p> <pre><code>## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.xxx.prefix = ckanext:files:default:file_content:\n</code></pre> <p>Sometimes you will see a validation error if storage has required config options. Let's try using <code>files:fs</code> storage instead of the redis:</p> <pre><code>ckanext.files.storage.xxx.type = files:fs\n</code></pre> <p>Now any attempt to run <code>ckan config declaration files -d</code> will show an error, because required <code>path</code> option is missing:</p> <pre><code>Invalid configuration values provided:\nckanext.files.storage.xxx.path: Missing value\nAborted!\n</code></pre> <p>Add the required option to satisfy the application</p> <pre><code>ckanext.files.storage.xxx.type = files:fs\nckanext.files.storage.xxx.path = /tmp\n</code></pre> <p>And run CLI command once again. This time you'll see the list of allowed options:</p> <pre><code>## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.xxx.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.xxx.create_path = false\n</code></pre> <p>There is a number of options that are supported by every storage. You can set them and expect that every storage, regardless of type, will use these options in the same way:</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = ADAPTER\n## The maximum size of a single upload.\n## Supports size suffixes: 42B, 2M, 24KiB, 1GB. `0` means no restrictions.\nckanext.files.storage.NAME.max_size = 0\n## Space-separated list of MIME types or just type or subtype part.\n## Example: text/csv pdf application video jpeg\nckanext.files.storage.NAME.supported_types =\n## Descriptive name of the storage used for debugging. When empty, name from\n## the config option is used, i.e: `ckanext.files.storage.DEFAULT_NAME...`\nckanext.files.storage.NAME.name = NAME\n</code></pre>"},{"location":"migration/","title":"Migration from native CKAN storage system","text":"<p>Important: ckanext-files itself is an independent file-management system. You don't have to migrate existing files from groups, users and resources to it. You can just start using ckanext-files for new fields defined in metadata schema or for uploading arbitrary files. And continue using native CKAN uploads for group/user images and resource files. Migration workflows described here merely exist as a PoC of using ckanext-files for everything in CKAN. Don't migrate your production instances yet, because concepts and rules may change in future and migration process will change as well. Try migration only as an experiment, that gives you an idea of what else you want to see in ckanext-file, and share this idea with us.</p> <p>Note: every migration workflow described below requires installed ckanext-files. Complete installation section before going further.</p> <p>CKAN has following types of files:</p> <ul> <li>group/organization images</li> <li>user avatars</li> <li>resource files</li> <li>site logo</li> <li>files uploaded via custom logic from extensions</li> </ul> <p>At the moment, there is no migration strategy for the last two types. Replacing site logo manually is a trivial task, so there will be no dedicated command for it. As for extensions, every of them is unique, so feel free to create an issue in the current repository: we'll consider creation of migration script for your scenario or, at least, explain how you can perform migration by yourself.</p> <p>Migration process for group/organization/user images and resource uploads described below. Keep in mind, that this process only describes migration from native CKAN storage system, that keeps files inside local filesystem. If you are using storage extensions, like ckanext-s3filestore or ckanext-cloudstorage, create an issue in the current repository with a request of migration command. As there are a lot of different forks of such extension, creating reliable migration script may be challenging, so we need some details about your environment to help with migration.</p> <p>Migration workflows bellow require certain changes to metadata schemas, UI widgets for file uploads and styles of your portal(depending on the customization).</p>"},{"location":"migration/group/","title":"Migration for group/organization images","text":"<p>Note: internally, groups and organizations are the same entity, so this workflow describes both of them.</p> <p>First of all, you need a configured storage that supports public links. As all group/organization images are stored inside local filesystem, you can use <code>files:public_fs</code> storage adapter.</p> <p>This extension expects that the name of group images storage will be <code>group_images</code>. This name will be used in all other commands of this migration workflow. If you want to use different name for group images storage, override <code>ckanext.files.group_images_storage</code> config option which has default value <code>group_images</code> and don't forget to adapt commands if you use a different name for the storage.</p> <p>This configuration example sets 10MiB restriction on upload size via <code>ckanext.files.storage.group_images.max_size</code> option. Feel free to change it or remove completely to allow any upload size. This restriction is applied to future uploads only. Any existing file that exceeds limit is kept.</p> <p>Uploads restricted to <code>image/*</code> MIMEtype via <code>ckanext.files.storage.group_images.supported_types</code> option. You can make this option more or less restrictive. This restriction is applied to future uploads only. Any existing file with wrong MIMEtype is kept.</p> <p><code>ckanext.files.storage.group_images.path</code> controls location of the upload folder in filesystem. It should match value of <code>ckan.storage_path</code> option plus <code>storage/uploads/group</code>. In example below we assume that value of <code>ckan.storage_path</code> is <code>/var/storage/ckan</code>.</p> <p><code>ckanext.files.storage.group_images.public_root</code> option specifies base URL from which every group image can be accessed. In most cases it's CKAN URL plus <code>uploads/group</code>. If you are serving CKAN application from the <code>ckan.site_url</code>, leave this option unchanged. If you are using <code>ckan.root_path</code>, like <code>/data/</code>, insert this root path into the value of the option. Example below uses <code>%(ckan.site_url)s</code> wildcard, which will be automatically replaced with the value of <code>ckan.site_url</code> config option. You can specify site URL explicitely if you don't like this wildcard syntax.</p> <pre><code>ckanext.files.storage.group_images.type = files:public_fs\nckanext.files.storage.group_images.max_size = 10MiB\nckanext.files.storage.group_images.supported_types = image\nckanext.files.storage.group_images.path = /var/storage/ckan/storage/uploads/group\nckanext.files.storage.group_images.public_root = %(ckan.site_url)s/uploads/group\n</code></pre> <p>Now let's run a command that show us the list of files available under newly configured storage:</p> <pre><code>ckan files scan -s group_images\n</code></pre> <p>All these files are not tracked by files extension yet, i.e they don't have corresponding record in DB with base details, like size, MIMEtype, filehash, etc. Let's create these details via the command below. It's safe to run this command multiple times: it will gather and store information about files not registered in system and ignore any previously registered file.</p> <pre><code>ckan files scan -s group_images -t\n</code></pre> <p>Finally, let's run the command, that shows only untracked files. Ideally, you'll see nothing upon executing it, because you just registered every file in the system.</p> <pre><code>ckan files scan -s group_images -u\n</code></pre> <p>Note, all the file are still available inside storage directory. If previous command shows nothing, it only means that CKAN already knows details about each file from the storage directory. If you want to see the list of the files again, omit <code>-u</code> flag(which stands for \"untracked\") and you'll see again all the files in the command output:</p> <pre><code>ckan files scan -s group_images\n</code></pre> <p>Now, when all images are tracked by the system, we can give the ownership over these files to groups/organizations that are using them. Run the command below to connect files with their owners. It will search for groups/organizations first and report, how many connections were identified. There will be suggestion to show identified relationship and the list of files that have no owner(if there are such files). Presence of files without owner usually means that you removed group/organization from database, but did not remove its image.</p> <p>Finally, you'll be asked if you want to transfer ownership over files. This operation does not change existing data and if you disable ckanext-files after ownership transfer, you won't see any difference. The whole ownership transfer is managed inside custom DB tables generated by ckanext-files, so it's safe operation.</p> <pre><code>ckan files migrate groups group_images\n</code></pre> <p>Here's an example of output that you can see when running the command:</p> <pre><code>Found 3 files. Searching file owners...\n[####################################] 100% Located owners for 2 files out of 3.\n\nShow group IDs and corresponding file? [y/N]: y\nd7186937-3080-429f-a434-22b74b9a8d39: file-1.png\n87e2a1aa-7905-4a28-a087-90433f8e169e: file-2.png\n\nShow files that do not belong to any group? [y/N]: y\nfile-3.png\n\nTransfer file ownership to group identified in previous steps? [y/N]: y\nTransfering file-2.png  [####################################]  100%\n</code></pre> <p>Now comes the most complex part. You need to change metadata schema and UI in order to:</p> <ul> <li>make sure that all new files are uploaded and managed by ckanext-files   instead of native CKAN's uploader</li> <li>generate image URLs using ckanext-files functionality. Right now, while files   stored in the original storage folder it makes no difference. But if you   change upload directory in future or even decide to move files from local   filesystem into different storage backend, it will guarantee that files are   remain visible.</li> </ul> <p>Original CKAN workflow for uploading files was:</p> <ul> <li>just save image URL provided by user or</li> <li>upload a file</li> <li>put it into directory that is publicly served by application</li> <li>replace uploaded file in the HTML form/group metadata with the public URL of   the uploaded file</li> </ul> <p>This approach is different from strategy recommended by ckanext-files. But in order to make the migration as simple as possible, we'll stay close to original workflow.</p> <p>Note: suggestet approach resembles existing process of file uploads in CKAN. But ckanext-files was designed as a system, that gives you a choice. Check file upload strategies to learn more about alternative implementations of upload and their pros/cons.</p> <p>First, we need to replace Upload/Link widget on group/organization form. If you are using native group templates, create <code>group/snippets/group_form.html</code> and <code>organization/snippets/organization_form.html</code>. Inside both files, extend original template and override block <code>basic_fields</code>. You only need to replace last field</p> <pre><code>{{ form.image_upload(\n    data, errors, is_upload_enabled=h.uploads_enabled(),\n    is_url=is_url, is_upload=is_upload) }}\n</code></pre> <p>with</p> <pre><code>{{ form.image_upload(\n    data, errors, is_upload_enabled=h.files_group_images_storage_is_configured(),\n    is_url=is_url, is_upload=is_upload,\n    field_upload=\"files_image_upload\") }}\n</code></pre> <p>There are two differences with the original. First, we use <code>h.files_group_images_storage_is_configured()</code> instead of <code>h.uploads_enabled()</code>. As we are using different storage for different upload types, now upload widgets can be enabled independently. And second, we pass <code>field_upload=\"files_image_upload\"</code> argument into macro. It will send uploaded file to CKAN inside <code>files_image_upload</code> instead of original <code>image_upload</code> field. This must be done because CKAN unconditionally strips <code>image_upload</code> field from submission payload, making processing of the file too unreliable. We changed the name of upload field and CKAN keeps this new field, so that we can process it as we wish.</p> <p>Note: if you are using ckanext-scheming, you only need to replace <code>form_snippet</code> of the <code>image_url</code> field, instead of rewriting the whole template.</p> <p>Now, let's define validation rules for this new upload field. We need to create plugins that modify validation schema for group and organization. Due to CKAN implementation details, you need separate plugin for group and organization.</p> <p>Note: if you are using ckanext-scheming, you can add <code>files_image_upload</code> validators to schemas of organization and group. Check the list of validators that must be applied to this new field below.</p> <p>Here's an example of plugins that modify validation schemas of group and organization. As you can see, they are mostly the same:</p> <pre><code>from ckan.lib.plugins import DefaultGroupForm, DefaultOrganizationForm\nfrom ckan.logic.schema import default_create_group_schema, default_update_group_schema\n\n\ndef _modify_schema(schema, type):\n    schema[\"files_image_upload\"] = [\n        tk.get_validator(\"ignore_empty\"),\n        tk.get_validator(\"files_into_upload\"),\n        tk.get_validator(\"files_validate_with_storage\")(\"group_images\"),\n        tk.get_validator(\"files_upload_as\")(\n            \"group_images\",\n            type,\n            \"id\",\n            \"public_url\",\n            type + \"_patch\",\n            \"image_url\",\n        ),\n    ]\n\n\nclass FilesGroupPlugin(p.SingletonPlugin, DefaultGroupForm):\n    p.implements(p.IGroupForm, inherit=True)\n    is_organization = False\n\n    def group_types(self):\n        return [\"group\"]\n\n    def create_group_schema(self):\n        return _modify_schema(default_create_group_schema(), \"group\")\n\n    def update_group_schema(self):\n        return _modify_schema(default_update_group_schema(), \"group\")\n\n\nclass FilesOrganizationPlugin(p.SingletonPlugin, DefaultOrganizationForm):\n    p.implements(p.IGroupForm, inherit=True)\n    is_organization = True\n\n    def group_types(self):\n        return [\"organization\"]\n\n    def create_group_schema(self):\n        return _modify_schema(default_create_group_schema(), \"organization\")\n\n    def update_group_schema(self):\n        return _modify_schema(default_update_group_schema(), \"organization\")\n</code></pre> <p>There are 4 validators that must be applied to the new upload field:</p> <ul> <li><code>ignore_empty</code>: to skip validation, when image URL set manually and no upload   selected.</li> <li><code>files_into_upload</code>: to convert value of upload field into normalized format,   which is expected by ckanext-files</li> <li><code>files_validate_with_storage(STORAGE_NAME)</code>: this validator requires an   argument: the name of the storage we are using for image uploads. The   validator will use storage settings to verify size and MIMEtype of the   appload.</li> <li><code>files_upload_as(STORAGE_NAME, GROUP_TYPE, NAME_OF_ID_FIELD, \"public_url\",   NAME_OF_PATCH_ACTION, NAME_OF_URL_FIELF)</code>: this validator is the most   challenging. It accepts 6 arguments:</li> <li>the name of storage used for image uploads</li> <li><code>group</code> or <code>organization</code> depending on processed entity</li> <li>name of the ID field of processed entity. It's <code>id</code> in your case.</li> <li><code>public_url</code> - use this exact value. It tells which property of file you     want to use as link to the file.</li> <li><code>group_patch</code> or <code>organization_patch</code> depending on processed entity</li> <li><code>image_url</code> - name of the field that contains URL of the     image. ckanext-files will put the public link of uploaded file into this     field when form is processed.</li> </ul> <p>That's all. Now every image upload for group/organization is handled by ckanext-files. To verify it, do the following. First, check list of files currently stored in <code>group_images</code> storage via command that we used in the beginning of the migration:</p> <pre><code>ckan files scan -s group_images\n</code></pre> <p>You'll see a list of existing files. Their names follow format <code>&lt;ISO_8601_DATETIME&gt;&lt;FILENAME&gt;</code>, e.g <code>2024-06-14-133840.539670photo.jpg</code>.</p> <p>Now upload an image into existing group, or create a new group with any image. When you check list of files again, you'll see one new record. But this time this record resembles UUID: <code>da046887-e76c-4a68-97cf-7477665710ff</code>.</p>"},{"location":"migration/resource/","title":"Resource","text":""},{"location":"migration/resource/#migration-for-resource-uploads","title":"Migration for resource uploads","text":"<p>Configure named storage for resources. Use <code>files:ckan_resource_fs</code> storage adapter.</p> <p>This extension expects that the name of resources storage will be <code>resources</code>. This name will be used in all other commands of this migration workflow. If you want to use different name for resources storage, override <code>ckanext.files.resources_storage</code> config option which has default value <code>resources</code> and don't forget to adapt commands if you use a different name for the storage.</p> <p><code>ckanext.files.storage.resources.path</code> must match value of <code>ckan.storage_path</code> option, followed by <code>resources</code> directory. In example below we assume that value of <code>ckan.storage_path</code> is <code>/var/storage/ckan</code>.</p> <p>Example below sets 10MiB limit on resource size. Modify it if you are using different limit set by <code>ckan.max_resource_size</code>.</p> <p>Unlike group and user images, this storage does not need upload type restriction and <code>public_root</code>.</p> <pre><code>ckanext.files.storage.resources.type = files:ckan_resource_fs\nckanext.files.storage.resources.max_size = 10MiB\nckanext.files.storage.resources.path = /var/storage/ckan/resources\n</code></pre> <p>Check the list of untracked files available inside newly configured storage:</p> <pre><code>ckan files scan -s resources -u\n</code></pre> <p>Track all these files:</p> <pre><code>ckan files scan -s resources -t\n</code></pre> <p>Re-check that now you see no untracked files:</p> <pre><code>ckan files scan -s resources -u\n</code></pre> <p>Transfer file ownership to corresponding resources. In addition to simple ownership transfer, this command will ask you, whether you want to modify resource's <code>url_type</code> and <code>url</code> fields. It's required to move file management to files extension completely and enable possibility of migration to different storage type.</p> <p>If you accept resource modifications, for every file owner <code>url_type</code> will be changed to <code>file</code> and <code>url</code> will be changed to file ID. Then all modified packages will be reindexed.</p> <p>Changing <code>url_type</code> means that some pages will change. For example, instead of Download button CKAN will show you Go to resource button on the resource page, because Download label is specific to <code>url_type=upload</code>. And some views may stop working as well. But this is safer option for migration, than leaving <code>url_type</code> unchanged: ckanext-files manages files in its own way and some assumptions about files will not work anymore, so using different <code>url_type</code> is the fastest way to tell everyone that something changed.</p> <p>Broken views can be easily fixed. Every view implemented as a separate plugin. You always can inherit from this plugin and override methods that relied on different behavior. And a lot of views work with file URL directly, so they won't even see the difference.</p> <pre><code>ckan files migrate local-resources resources\n</code></pre> <p>And the next goal is correct metadata schema. If you are using ckanext-scheming, you need to modify validators of <code>url</code> and <code>format</code> fields.</p> <p>If you are working with native schemas, you have to modify dataset schema via implementing IDatasetForm. Here's an example:</p> <pre><code>from ckan.lib.plugins import DefaultDatasetForm\nfrom ckan.logic import schema\n\nclass FilesDatasetPlugin(p.SingletonPlugin, DefaultDatasetForm):\n    p.implements(p.IDatasetForm, inherit=True)\n\n    def is_fallback(self):\n        return True\n\n    def package_types(self):\n        return [\"dataset\"]\n\n    def _modify_schema(self, schema):\n        schema[\"resources\"][\"url\"].extend([\n            tk.get_validator(\"files_verify_url_type_and_value\"),\n            tk.get_validator(\"files_file_id_exists\"),\n            tk.get_validator(\"files_transfer_ownership\")(\"resource\",\"id\"),\n        ])\n        schema[\"resources\"][\"format\"].insert(0, tk.get_validator(\"files_content_type_from_file\")(\"url\"))\n\n    def create_package_schema(self):\n        sch = schema.default_create_package_schema()\n        self._modify_schema(sch)\n        return sch\n\n    def update_package_schema(self):\n        sch = schema.default_update_package_schema()\n        self._modify_schema(sch)\n        return sch\n\n    def show_package_schema(self):\n        sch = schema.default_show_package_schema()\n        sch[\"resources\"][\"url\"].extend([\n            tk.get_validator(\"files_verify_url_type_and_value\"),\n            tk.get_validator(\"files_id_into_resource_download_url\"),\n        ])\n        return sch\n</code></pre> <p>Both create and update schemas are updated in the same way. We add a new validator to format field, to correctly identify file format. And there is a number of new validators for <code>url</code>:</p> <ul> <li><code>files_verify_url_type_and_value</code>: skip validation if we are not working with   resource that contains file.</li> <li><code>files_file_id_exists</code>: verify existence of file ID</li> <li><code>files_transfer_ownership(\"resource\",\"id\")</code>: move file ownership to resource   after successful validation</li> </ul> <p>At top of this, we also have two validators applied to <code>show_package_schema</code>(use <code>output_validators</code> in ckanext-scheming):</p> <ul> <li><code>files_verify_url_type_and_value</code>: skip validation if we are not working with   resource that contains file.</li> <li><code>files_id_into_resource_download_url</code>: replace file ID with download URL in   API output</li> </ul> <p>And the next part is the trickiest. You need to create a number of templates and JS modules. But because ckanext-files is actively developed, most likely, your custom files will be outdated pretty soon.</p> <p>Instead, we recommend enabling patch for resource form that shipped with ckanext-files. It's a bit hacky, but because the extension itself is stil in alpha-stage, it should be acceptable. Check file upload strategies for examples of implementation that you can add to your portal instead of the default patch.</p> <p>To enable patch for templates, add following line to the config file:</p> <pre><code>ckanext.files.enable_resource_migration_template_patch = true\n</code></pre> <p>This option adds Add file button to resource form</p> <p></p> <p>Upon clicking, this button is replaced by widget that supports uploading new files of selecting previously uploaded files that are not used by any resource yet</p> <p></p>"},{"location":"migration/user/","title":"Migration for user avatars","text":"<p>This workflow is similar to group/organization migration. It contains the sequence of actions, but explanations are removed, because you already know details from the group migration. Only steps that are different will contain detailed explanation of the process.</p> <p>Configure local filesystem storage with support of public links(<code>files:public_fs</code>) for user images.</p> <p>This extension expects that the name of user images storage will be <code>user_images</code>. This name will be used in all other commands of this migration workflow. If you want to use different name for user images storage, override <code>ckanext.files.user_images_storage</code> config option which has default value <code>user_images</code> and don't forget to adapt commands if you use a different name for the storage.</p> <p><code>ckanext.files.storage.user_images.path</code> resembles this option for group/organization images storage. But user images are kept inside <code>user</code> folder by default. As result, value of this option should match value of <code>ckan.storage_path</code> option plus <code>storage/uploads/user</code>. In example below we assume that value of <code>ckan.storage_path</code> is <code>/var/storage/ckan</code>.</p> <p><code>ckanext.files.storage.user_images.public_root</code> resebles this option for group/organization images storage. But user images are available at CKAN URL plus <code>uploads/user</code>.</p> <pre><code>ckanext.files.storage.user_images.type = files:public_fs\nckanext.files.storage.user_images.max_size = 10MiB\nckanext.files.storage.user_images.supported_types = image\nckanext.files.storage.user_images.path = /var/storage/ckan/storage/uploads/user\nckanext.files.storage.user_images.public_root = %(ckan.site_url)s/uploads/user\n</code></pre> <p>Check the list of untracked files available inside newly configured storage:</p> <pre><code>ckan files scan -s user_images -u\n</code></pre> <p>Track all these files:</p> <pre><code>ckan files scan -s user_images -t\n</code></pre> <p>Re-check that now you see no untracked files:</p> <pre><code>ckan files scan -s user_images -u\n</code></pre> <p>Transfer image ownership to corresponding users:</p> <pre><code>ckan files migrate users user_images\n</code></pre> <p>Update user template. Required field is defined in <code>user/new_user_form.html</code> and <code>user/edit_user_form.html</code>. It's a bit different from the filed used by group/organization, but you again need to add <code>field_upload=\"files_image_upload\"</code> parameter to the macro <code>image_upload</code> and replace <code>h.uploads_enabled()</code> with <code>h.files_user_images_storage_is_configured()</code>.</p> <p>User has no dedicated interface for validation schema modification and here comes the biggest difference from group migration. You need to chain <code>user_create</code> and <code>user_update</code> action and modify schema from <code>context</code>:</p> <pre><code>def _patch_schema(schema):\n    schema[\"files_image_upload\"] = [\n        tk.get_validator(\"ignore_empty\"),\n        tk.get_validator(\"files_into_upload\"),\n        tk.get_validator(\"files_validate_with_storage\")(\"user_images\"),\n        tk.get_validator(\"files_upload_as\")(\n            \"user_images\",\n            \"user\",\n            \"id\",\n            \"public_url\",\n            \"user_patch\",\n            \"image_url\",\n        ),\n    ]\n\n\n@tk.chained_action\ndef user_update(next_action, context, data_dict):\n    schema = context.setdefault('schema', ckan.logic.schema.default_update_user_schema())\n    _patch_schema(schema)\n    return next_action(context, data_dict)\n\n\n\n@tk.chained_action\ndef user_create(next_action, context, data_dict):\n    schema = context.setdefault('schema', ckan.logic.schema.default_user_schema())\n    _patch_schema(schema)\n    return next_action(context, data_dict)\n</code></pre> <p>Validators are all the same, but now we are using <code>user</code> instead of <code>group</code>/<code>organization</code> in parameters.</p> <p>That's all. Just as with groups, you can update an avatar and verify that all new filenames resemble UUIDs.</p>"},{"location":"usage/capabilities/","title":"Capabilities","text":"<p>To understand in advance whether specific storage can perform certain actions, ckanext-files uses <code>ckanext.files.shared.Capability</code>. It's an enumeration of operations that can be supported by storage:</p> <ul> <li>CREATE: create a file as an atomic object</li> <li>STREAM: return file content as stream of bytes</li> <li>COPY: make a copy of the file inside the same storage</li> <li>REMOVE: remove file from the storage</li> <li>MULTIPART: create file in 3 stages: start, upload(repeatable), complete</li> <li>MOVE: move file to a different location inside the same storage</li> <li>EXISTS: check if file exists</li> <li>SCAN: iterate over all files in the storage</li> <li>APPEND: add content to the existing file</li> <li>COMPOSE: combine multiple files into a new one in the same storage</li> <li>RANGE: return specific range of bytes from the file</li> <li>ANALYZE: return file details from the storage, as if file was uploaded just now</li> <li>PERMANENT_LINK: make permanent download link for private file</li> <li>TEMPORAL_LINK: make expiring download link for private file</li> <li>ONE_TIME_LINK: make one-time download link for private file</li> <li>PUBLIC_LINK: make permanent public link</li> </ul> <p>These capabilities are defined when storage is created and are automatically checked by actions that work with storage. If you want to check if storage supports certain capability, it can be done manually. If you want to check presence of multiple capabilities at once, you can combine them via bitwise-or operator.</p> <pre><code>from ckanext.files.shared import Capability, get_storage\n\nstorage = get_storage()\n\ncan_read = storage.supports(Capability.STREAM)\n\nread_and_write = Capability.CREATE | Capability.STREAM\ncan_read_and_write = storage.supports(read_and_write)\n</code></pre> <p><code>ckan files storages -v</code> CLI command lists all configured storages with their capabilities.</p>"},{"location":"usage/configure/","title":"Configure the storage","text":"<p>Before uploading files, you have to configure a storage: place where all uploaded files are stored. Storage relies on adapter that describes where and how data is be stored: filesystem, cloud, DB, etc. And, depending on the adapter, storage may have a couple of addition specific options. For example, filesystem adapter likely requires a path to the folder where uploads are stored. DB adapter may need DB connection parameters. Cloud adapter most likely will not work without an API key. These additional options are specific to adapter and you have to check its documentation to find out what are the possible options.</p> <p>Let's start from the Redis adapter, because it has minimal requirements in terms of configuration.</p> <p>Add the following line to the CKAN config file:</p> <pre><code>ckanext.files.storage.default.type = files:redis\n</code></pre> <p>The name of adapter is <code>files:redis</code>. It follows recommended naming convention for adapters:<code>&lt;EXTENSION&gt;:&lt;TYPE&gt;</code>. You can tell from the name above that we are using adapter defined in the <code>files</code> extension with <code>redis</code> type. But this naming convention is not enforced and its only purpose is avoiding name conflicts. Technically, adapter name can use any character, including spaces, newlines and emoji.</p> <p>If you make a typo in the adapter's name, any CKAN CLI command will produce an error message with the list of available adapters:</p> <pre><code>Invalid configuration values provided:\nckanext.files.storage.default.type: Value must be one of ['files:fs', 'files:public_fs', 'files:redis']\nAborted!\n</code></pre> <p>Storage is configured, so we can actually upload the file. Let's use ckanapi for this task. Files are created via <code>files_file_create</code> API action and this time we have to pass 2 parameters into it:</p> <ul> <li><code>name</code>: the name of uploaded file</li> <li><code>upload</code>: content of the file</li> </ul> <p>The final command is here:</p> <pre><code>echo -n 'hello world' &gt; /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n</code></pre> <p>And that's what you see as result:</p> <pre><code>{\n  \"atime\": null,\n  \"content_type\": \"text/plain\",\n  \"ctime\": \"2024-06-02T15:02:14.819117+00:00\",\n  \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n  \"id\": \"e21162ab-abfb-476c-b8c5-5fe7cb89eca0\",\n  \"location\": \"24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\",\n  \"mtime\": null,\n  \"name\": \"hello.txt\",\n  \"size\": 11,\n  \"storage\": \"default\",\n  \"storage_data\": {}\n}\n</code></pre> <p>Content of the file can be checked via CKAN CLI. Use <code>id</code> from the last API call's output in the command <code>ckan files stream ID</code>:</p> <pre><code>ckan files stream e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n</code></pre> <p>Alternatively, we can use Redis CLI to get the content of the file. Note, you cannot get the content via CKAN API, because it's JSON-based and streaming files doesn't suit its principles.</p> <p>By default, Redis adapter puts the content under the key <code>&lt;PREFIX&gt;&lt;LOCATION&gt;</code>. Pay attention to <code>LOCATION</code>. It's the value available as <code>location</code> in the API response(i.e, <code>24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46</code> in our case). It's different from the <code>id</code>(ID used by DB to uniquely identify file record) and <code>name</code>(human readable name of the file). In our scenario, <code>location</code> looks like UUID because of the internal details of Redis adapter implementation. But different adapters may use more path-like value, i.e. something similar to <code>path/to/folder/hello.txt</code>.</p> <p><code>PREFIX</code> can be configured, but we skipped this step and got the default value: <code>ckanext:files:default:file_content:</code>. So the final Redis key of our file is <code>ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46</code></p> <pre><code>redis-cli\n\n127.0.0.1:6379&gt; GET ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\n\"hello world\"\n</code></pre> <p>And before we moved further, let's remove the file, using its <code>id</code>:</p> <pre><code>ckanapi action files_file_delete id=e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n</code></pre>"},{"location":"usage/js/","title":"JavaScript utilities","text":"<p>Note: ckanext-files does not provide stable CKAN JS modules at the moment. Try creating your own widgets and share with us your examples or requirements. We'll consider creating and including widgets into ckanext-files if they are generic enough for majority of the users.</p> <p>ckanext-files registers few utilities inside CKAN JS namespace to help with building UI components.</p> <p>First group of utilities registered inside CKAN Sandbox. Inside CKAN JS modules it's accessible as <code>this.sandbox</code>. If you are writing code outside of JS modules, Sandbox can be initialized via call to <code>ckan.sandbox()</code></p> <pre><code>const sandbox = ckan.sandbox()\n</code></pre> <p>When <code>files</code> plugin loaded, sandbox contains <code>files</code> attribute with two members:</p> <ul> <li><code>upload</code>: high-level helper for uploding files.</li> <li><code>makeUploader</code>: factory for uploader-objects that gives more control over   upload process.</li> </ul> <p>The simplest way to upload the file is using <code>upload</code> helper.</p> <pre><code>await sandbox.files.upload(\n    new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n)\n</code></pre> <p>This function uploads file to <code>default</code> storage via <code>files_file_create</code> action. Extra parameters for API call can be passed using second argument of <code>upload</code> helper. Use an object with <code>requestParams</code> key. Value of this key will be added to standard API request parameters. For example, if you want to use <code>storage</code> with name <code>memory</code> and <code>field</code> with value <code>custom</code>:</p> <pre><code>await sandbox.files.upload(\n    new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n    {requestParams: {storage: \"memory\", field: \"custom\"}}\n)\n</code></pre> <p>If you need more control over upload, you can create an uploader and interact with it directly, instead of using <code>upload</code> helper.</p> <p>Uploader is an object that uploads file to server. It extends base uploader, which defines standard interface for this object. Uploader perfroms all the API calls internally and returns uploaded file details. Out of the box you can use <code>Standard</code> and <code>Multipart</code> uploaders. <code>Standard</code> uses <code>files_file_create</code> API action and specializes on normal uploads. <code>Multipart</code> relies on <code>files_multipart_*</code> actions and can be used to pause and continue upload.</p> <p>To create uploader instance, pass its name as a string to <code>makeUploader</code>. And then you can call <code>upload</code> method of the uploader to perform the actual upload. This method requires two arguments:</p> <ul> <li>the file object</li> <li>object with additional parameters of API request, the same as <code>requestParams</code>   from example above. If you want to use default parameters, pass an empty   object. If you want to use <code>memory</code> storage, pass <code>{storage: \"memory\"}</code>, etc.</li> </ul> <pre><code>const uploader = sandbox.files.makeUploader(\"Standard\")\nawait uploader.upload(new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}), {})\n</code></pre> <p>One of the reasons to use manually created uploader is progress tracking. Uploader supports event subscriptions via <code>uploader.addEventListener(event, callback)</code> and here's the list of possible upload events:</p> <ul> <li><code>start</code>: file upload started. Event has <code>detail</code> property with object that   contains uploaded file as <code>file</code>.</li> <li><code>multipartid</code>: multipart upload initialized. Event has <code>detail</code> property with   object that contains uploaded file as <code>file</code> and ID of multipart upload as   <code>id</code>.</li> <li><code>progress</code>: another chunk of file was transferred to server. Event has   <code>detail</code> property with object that contains uploaded file as <code>file</code>, number   of loaded bytes as <code>loaded</code> and total number of bytes that must be   transferred as <code>total</code>.</li> <li><code>finish</code>: file upload successfully finished. Event has <code>detail</code> property with   object that contains uploaded file as <code>file</code> and file details from API   response as <code>result</code>.</li> <li><code>fail</code>: file upload failed. Event has <code>detail</code> property with object that   contains uploaded file as <code>file</code> and object with CKAN validation errors as   <code>reasons</code>.</li> <li><code>error</code>: error unrelated to validation happened during upload, like call to   non-existing action. Event has <code>detail</code> property with object that contains   uploaded file as <code>file</code> and error as <code>message</code>.</li> </ul> <p>If you want to use <code>upload</code> helper with customized uploader, there are two ways to do it.</p> <ul> <li>pass <code>adapter</code> property with uploader name inside second argument of <code>upload</code>   helper:   <pre><code>await sandbox.files.upload(new File(...), {adapter: \"Multipart\"})\n</code></pre></li> <li>pass <code>uploader</code> property with uploader instance inside second argument of <code>upload</code>   helper:   <pre><code>const uploader = sandbox.files.makeUploader(\"Multipart\")\nawait sandbox.files.upload(new File(...), {uploader})\n</code></pre></li> </ul> <p>The second group of ckanext-files utilities is available as <code>ckan.CKANEXT_FILES</code> object. This object mainly serves as extension and configuration point for <code>sandbox.files</code>.</p> <p><code>ckan.CKANEXT_FILES.adapters</code> is a collection of all classes that can be used to initialize uploader. It contains <code>Standard</code>, <code>Multipart</code> and <code>Base</code> classes. <code>Standard</code> and <code>Multipart</code> can be used as is, while <code>Base</code> must be extended by your custom uploader class. Add your custom uploader classes to <code>adapters</code>, to make them available application-wide:</p> <pre><code>class MyUploader extends Base { ... }\n\nckan.CKANEXT_FILES.adapters[\"My\"] = MyUploader;\n\nawait sandbox.files.upload(new File(...), {adapter: \"My\"})\n</code></pre> <p><code>ckan.CKANEXT_FILES.defaultSettings</code> contain the object with default settings available as <code>this.settings</code> inside any uploader. You can change the name of the storage used by all uploaders using this object. Note, changes will apply only to uploaders initialized after modification.</p> <pre><code>ckan.CKANEXT_FILES.defaultSettings.storage = \"memory\"\n</code></pre>"},{"location":"usage/multi-storage/","title":"Multi-storage","text":"<p>It's possible to configure multiple storages at once and specify which one you want to use for the individual file upload. Up until now we used the following storage options:</p> <ul> <li><code>ckanext.files.storage.default.type</code></li> <li><code>ckanext.files.storage.default.path</code></li> <li><code>ckanext.files.storage.default.create_path</code></li> </ul> <p>All of them have a common prefix <code>ckanext.files.storage.default.</code> and it's a key for using multiple storages simultaneously.</p> <p>Every option of the storage follows the pattern: <code>ckanext.files.storage.&lt;STORAGE_NAME&gt;.&lt;OPTION&gt;</code>. As all the options above contain <code>default</code> on position of <code>&lt;STORAGE_NAME&gt;</code>, they are related to the <code>default</code> storage.</p> <p>If you want to configure a storage with the name <code>custom</code> change the configuration of storage:</p> <pre><code>ckanext.files.storage.custom.type = files:fs\nckanext.files.storage.custom.path = /tmp/example\nckanext.files.storage.custom.create_path = true\n</code></pre> <p>And, if you want to use Redis-based storage named <code>memory</code> and filesystem-based storage named <code>default</code>, use the following configuration:</p> <pre><code>ckanext.files.storage.memory.type = files:redis\n\nckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n</code></pre> <p>The <code>default</code> storage is special. ckanext-files use it by default, as name suggests. If you remove configuration for the <code>default</code> storage and try to create a file, you'll see the following error:</p> <pre><code>echo 'hello world' &gt; /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n\n... ckan.logic.ValidationError: None - {'storage': ['Storage default is not configured']}\n</code></pre> <p>Storage default is not configured. That's why we need <code>default</code> configuration. But if you want to upload a file into a different storage or you don't want to add the <code>default</code> storage at all, you can always specify explicitly the name of the storage you are going to use.</p> <p>When using API actions, add <code>storage</code> parameter to the call:</p> <pre><code>echo 'hello world' &gt; /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt storage=memory\n</code></pre> <p>When writing python code, pass storage name to <code>get_storage</code> function: <pre><code>storage = get_storage(\"memory\")\n</code></pre></p> <p>When writing JS code, pass object <code>{requestParams: {storage: \"memory\"}}</code> to <code>upload</code> function:</p> <pre><code>const sandbox = ckan.sandbox()\nconst file = new File([\"content\"], \"file.txt\")\nconst options = {requestParams: {storage: \"memory\"}};\n\nawait sandbox.files.upload(file, options)\n</code></pre>"},{"location":"usage/multipart/","title":"Multipart, resumable and signed uploads","text":"<p>This feature has many names, but it basically divides a single upload into multiple stages. It can be used in following situations:</p> <ul> <li>a really big file must be uploaded to cloud. It cannot fit into server's   temporal storage, so you split the file into smaller part and upload them   separately. Every part is uploaded to server and next part must wait till the   previous moved from server to cloud. This is a multipart upload.</li> <li>client has unstable or slow connection. Any upload takes ages and quite often   connection is interrupted so user has to spend extra time for re-uploading   files. To improve user experience, you want to track the upload progress and   keep incomplete file on server. If connection is interrupted, user can   continue upload from the point he stopped the last time, appending content to   existing incomplete file. This is a resumable upload.</li> <li>files are kept on cloud and uploads are quite intense on the portal. You   don't want to spend server resources on transferring content from client to   cloud. Instead you generate a URL that allows user to upload a single file   directly into specific location on cloud. User sends data to this URL and   only notifies the application, when upload is finished, so that the   application can make file visible. This is a signed upload.</li> </ul> <p>All these situations are handled by 4 API actions, which are available is storage has <code>MULTIPART</code> capability:</p> <ul> <li><code>files_multipart_start</code>: initialize multipart upload and set expected final   size and MIMEtype. Real multipart upload usually just return upload ID from   this action. Resumable upload creates empty file in the storage to accumulate   content inside it. Signed upload produces a URL for direct upload.</li> <li><code>files_multipart_update</code>: upload the fragment of the file of modify the   upload in some other way. Most often this action accepts ID of the upload and   <code>upload</code> field with fragment of the uploaded file.</li> <li><code>files_multipart_refresh</code>: this action synchronizes and returns current   upload progress. It can be used if upload was paused and client does not know   how many bytes were uploaded and from which byte the next upload fragment   starts.</li> <li><code>files_multipart_complete</code>: finalize the upload and convert it into normal   file, available to other parts of the application. Multipart upload usually   combines all uploaded parts into single file here. Resumable upload verifies   that the result has expected MIMEtype and size. Signed upload just registers   completed file in the system.</li> </ul> <p>Implementation of multipart upload depends on the used adapter, so make sure you checked its documentation before using any multipart actions. There are some common steps in multipart upload workflow that are usually the same among all adapters:</p> <ul> <li><code>files_multipart_start</code> requires <code>content_type</code> and <code>size</code> parameters. These   values will be used to validate completed upload.</li> <li><code>files_multipart_start</code> allows <code>hash</code> parameter. This value will be used to   validate completed upload. Unlike <code>content_type</code> and <code>size</code>, <code>hash</code> is   usually optional, because it may be difficult for client to compute it.</li> <li><code>files_multipart_update</code> accepts upload ID as <code>id</code> and fragment of the file   as <code>upload</code>. Sequence of calls to <code>files_multipart_update</code> with   non-overlapping fragments can be used to upload the file. Even if adapter   implements signed uploads and client is supposed to send file to the signed   URL instead of using <code>files_multipart_update</code>.</li> <li><code>files_multipart_complete</code> compares <code>content_type</code>, <code>size</code> and <code>hash</code>(if   present) specified during initialization of upload with actual values. If   they are different, upload is not converted into normal file. Depending on   implementation, storage may just ignore incorrect initial expectations an   assign a real values to the file as long as they are allowed by storage   configuration. But it's recommended to reject such uploads, so it safer to   assume, that incorrect expectations are not accepted.</li> </ul> <p>Incomplete files support most of normal file actions, but you need to pass <code>completed=False</code> to action when working with incomplete files. I.e, if you want to remove incomplete upload, use its ID and <code>completed=False</code>:</p> <pre><code>ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24 completed=False\n</code></pre> <p>Incompleted files do not support streaming and downloading via public interface of the extension. But storage adapter can expose such features via custom methods if it's technically possible.</p> <p>Example of basic multipart upload is shown above. <code>files:fs</code> adapter can be used for running this example, as it implements <code>MULTIPART</code>.</p> <p>First, create text file and check its size:</p> <pre><code>echo 'hello world!' &gt; /tmp/file.txt\nwc -c /tmp/file.txt\n\n... 13 /tmp/file.txt\n</code></pre> <p>The size is <code>13</code> bytes and content type is <code>text/plain</code>. These values must be used for upload initialization.</p> <pre><code>ckanapi action files_multipart_start name=file.txt size=13 content_type=text/plain\n\n... {\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n...   \"hash\": \"\",\n...   \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"name\": \"file.txt\",\n...   \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...   \"owner_type\": \"user\",\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {\n...     \"uploaded\": 0\n...   }\n... }\n</code></pre> <p>Here <code>storage_data</code> contains <code>{\"uploaded\": 0}</code>. It may be different for other adaptes, especially if they implement non-consecutive uploads, but generally it's the recommended way to keep upload progress.</p> <p>Now we'll upload first 5 bytes of file.</p> <pre><code>ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n    upload@&lt;(dd if=/tmp/file.txt bs=1 count=5)\n\n... {\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n...   \"hash\": \"\",\n...   \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"name\": \"file.txt\",\n...   \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...   \"owner_type\": \"user\",\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {\n...     \"uploaded\": 5\n...   }\n... }\n</code></pre> <p>If you try finalizing upload right now, you'll get an error.</p> <pre><code>ckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... ckan.logic.ValidationError: None - {'upload': ['Actual value of upload size(5) does not match expected value(13)']}\n</code></pre> <p>Let's upload the rest of bytes and complete the upload.</p> <pre><code>ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n    upload@&lt;(dd if=/tmp/file.txt bs=1 skip=5)\n\nckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... {\n...   \"atime\": null,\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:57:18.483716+00:00\",\n...   \"hash\": \"c897d1410af8f2c74fba11b1db511e9e\",\n...   \"id\": \"a740692f-e3d5-492f-82eb-f04e47c13848\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"mtime\": null,\n...   \"name\": \"file.txt\",\n...   \"owner_id\": null,\n...   \"owner_type\": null,\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {}\n... }\n</code></pre> <p>Now file can be used normally. You can transfer file ownership to someone, stream or modify it. Pay attention to ID: completed file has its own unique ID, which is different from ID of the incomplete upload.</p>"},{"location":"usage/ownership/","title":"File ownership","text":"<p>Every file can have an owner and there can be only one owner of the file. It's possible to create file without an owner, but usually application will only benefit from keeping every file with its owner. Owner is described with two fields: ID and type.</p> <p>When file is created, by default the current user from API action's context is assigned as an owner of the file. From now on, the owner can perform other operations, such as renaming/displaying/removing with the file.</p> <p>Apart from chaining auth function, to modify access rules for the file, plugin can implement <code>IFiles.files_file_allows</code> and <code>IFiles.files_owner_allows</code> methods.</p> <pre><code>def files_file_allows(\n    self,\n    context: Context,\n    file: File | Multipart,\n    operation: types.FileOperation,\n) -&gt; bool | None:\n    ...\n\ndef files_owner_allows(\n    self,\n    context: Context,\n    owner_type: str, owner_id: str,\n    operation: types.OwnerOperation,\n) -&gt; bool | None:\n    ...\n</code></pre> <p>These methods receive current action context, the tested object details, and the name of operation(<code>show</code>, <code>update</code>, <code>delete</code>, <code>file_transfer</code>). <code>files_file_allows</code> checks permission for accessed file. It's usually called when user interacts with file directly. <code>files_owner_allows</code> works with owner described by type and ID. It's usually called when user transfer file ownership, perform bulk file operation for owner files, or just trying to get the list of files that belongs to owner.</p> <p>If method returns true/false, operation is allowed/denied. If method returns <code>None</code>, default logic used to check access.</p> <p>As already mentoined, by default, user who owns the file, can access it. But what about different owners? What if file owned by other entity, like resource or dataset?</p> <p>Out of the box, nobody can access such files. But there are three config options that modify this restriction.</p> <p><code>ckanext.files.owner.cascade_access = ENTITY_TYPE ANOTHER_TYPE</code> gives access to file owned by entity if user already has access to entity itself. Use words like <code>package</code>, <code>resource</code>, <code>group</code> instead of <code>ENTITY_TYPE</code>.</p> <p>For example: file is owned by resource. If cascade access is enabled, whoever has access to <code>resource_show</code> of the resource, can also see the file owned by this resource. If user passes <code>resource_update</code> for resource, he can also modify the file owned by this resource, etc.</p> <p>Important: be careful and do not add <code>user</code> to <code>ckanext.files.owner.cascade_access</code>. User's own files are considered private and most likely you don't really need anyone else to be able to see or modify these files.</p> <p>The second option is <code>ckanext.files.owner.transfer_as_update</code>.  When transfer-as-update enabled, any user who has <code>&lt;OWNER_TYPE&gt;_update</code> permission, can transfer own files to this <code>OWNER_TYPE</code>. Intead of using this option, you can define <code>&lt;OWNER_TYPE&gt;_file_transfer</code>.</p> <p>And the third option is <code>ckanext.files.owner.scan_as_update</code>.  Just as with ownership transfer, it gives user permission to list all files of the owner if user can <code>&lt;OWNER_TYPE&gt;_update</code> it. Intead of using this option, you can define <code>&lt;OWNER_TYPE&gt;_file_scan</code>.</p>"},{"location":"usage/permissions/","title":"Permissions","text":"<p>File creation is not allowed by default. Only sysadmin can use <code>files_file_create</code> and <code>files_multipart_start</code> actions. This is done deliberately: uncontrolled uploads can turn your portal into user's personal cloud-storage.</p> <p>There are three ways to grant upload permission to normal users.</p> <p>The BAD option is simple. Enable <code>ckanext.files.authenticated_uploads.allow</code> config option and every registered user will be allowed to upload files. But only into <code>default</code> storage. If you want to change the list of storages available to common user, specify storage names as <code>ckanext.files.authenticated_uploads.storages</code> option.</p> <p>The GOOD option is relatively simple. Define chained auth function with name <code>files_file_create</code>. It's called whenever user initiates an upload. Now you can decide whether user is allowed to upload files with specified parameters.</p> <p>The BEST option is to leave this restriction unchanged. Do not allow any user to call <code>files_file_create</code>. Instead, create a new action for your goal. ckanext-files isn't a solution - it's a tool that helps you in building the solution.</p> <p>If you need to add documents field to dataset that contains uploaded PDF files, create a separate action <code>dataset_document_attach</code>. Specify access rules and validation for it. Or even hardcode the storage that will be used for uploads. And then, from this new action, call <code>files_file_create</code> with <code>ignore_auth: True</code>.</p> <p>In this way you control every side of uploading documents into dataset and do not accidentally break other functionality, because every other feature will define its own action.</p>"},{"location":"usage/task-queue/","title":"Task queue","text":"<p>One of the challenges introduced by independently managed files is related to file ownership. As long as you can call <code>files_transfer_ownership</code> manually, things are transparent. But as soon as you add custom file field to dataset, you probably want to automatically transfer ownership of the file refered by this custom field.</p> <p>Imagine, that you have PDF file owned by you. And you specify ID of this file in the <code>attachment_id</code> field of the dataset. You want to show download link for this file on the dataset page. But if file owned by you, nobody will be able to download the file. So you decide to transfer file ownership to dataset, so that anyone who sees dataset, can see the file as well.</p> <p>You cannot update dataset and transfer ownership after it, because there will be a time window between these two actions, when data is not valid. Or even worse, after updating dataset you'll lose internet connection and won't be able to finish the transfer.</p> <p>Neither you can transfer ownership first and then update the dataset. <code>attachment_id</code> may have additional validators and you don't know in advance, whether you'll be able to successfully update dataset after the transfer.</p> <p>This problem can be solved via queuing additional tasks inside the action. For example, validator that checks if certain file ID can be used as <code>attachment_id</code> can queue ownership transfer. If dataset update completed without errors, queued task is executed automatically and dataset becomes the owner of the file.</p> <p>Task is queued via <code>ckanext.files.shared.add_task</code> function, which accepts objects inherited from <code>ckanext.files.shared.Task</code>. <code>Task</code> class requires implementing abstract method <code>run(result: Any, idx: int, prev: Any)</code>, which is called when task is executed. This method receives the result of action which caused task execution, task's position in queue and the result of previous task.</p> <p>For example, one of <code>attachment_id</code> validatos can queue the following <code>MyTask</code> via <code>add_task(MyTask(file_id))</code> to transfer <code>file_id</code> ownership to the updated dataset:</p> <pre><code>from ckanext.files.shared import Task\n\nclass MyTask(Task):\n    def __init__(self, file_id):\n        self.file_id = file_id\n\n    def run(self, dataset, idx, prev):\n        return tk.get_action(\"files_transfer_ownership\")(\n            {\"ignore_auth\": True},\n            {\n                \"id\": self.file_id,\n                \"owner_type\": \"package\",\n                \"owner_id\": dataset[\"id\"],\n                \"pin\": True,\n            },\n        )\n</code></pre> <p>As the first argument, <code>Task.run</code> receives the result of action which was called. Right now only following actions support tasks:</p> <ul> <li><code>package_create</code></li> <li><code>packaage_update</code></li> <li><code>resource_create</code></li> <li><code>resource_update</code></li> <li><code>group_create</code></li> <li><code>group_update</code></li> <li><code>organization_create</code></li> <li><code>organization_update</code></li> <li><code>user_create</code></li> <li><code>user_update</code></li> </ul> <p>If you want to enable tasks support for your custom action, decorate it with <code>ckanext.files.shared.with_task_queue</code> decorator:</p> <pre><code>from ckanext.files.shared import with_task_queue\n\n@with_task_queue\ndef my_action(context, data_dict)\n    # you can call `add_task` inside this action's stack frame.\n    ...\n</code></pre> <p>Good example of validator using tasks is <code>files_transfer_ownership</code> validator factory. It can be added to metadata schema as <code>files_transfer_ownership(owner_type, name_of_id_field)</code>. For example, if you are adding this validator to resource, call it as <code>files_transfer_ownership(\"resource\", \"id\")</code>. The second argument is the name of the ID field. As in most cases it's <code>id</code>, you can omit the second argument:</p> <ul> <li>organization: <code>files_transfer_ownership(\"organization\")</code></li> <li>dataset: <code>files_transfer_ownership(\"package\")</code></li> <li>user: <code>files_transfer_ownership(\"user\")</code></li> </ul>"},{"location":"usage/tracked-files/","title":"Tracked and untracked files","text":"<p>There is a difference between creating files via action:</p> <pre><code>tk.get_action(\"files_file_create\")(\n    {\"ignore_auth\": True},\n    {\"upload\": \"hello\", \"name\": \"hello.txt\"}\n)\n</code></pre> <p>and via direct call to <code>Storage.upload</code>:</p> <pre><code>from ckanext.files.shared import get_storage, make_upload\n\nstorage = get_storage()\nstorage.upload(\"hello.txt\", make_upload(b\"hello\"), {})\n</code></pre> <p>The former snippet creates a tracked file: file uploaded to the storage and its details are saved to database.</p> <p>The latter snippet creates an untracked file: file uploaded to the storage, but its details are not saved anywhere.</p> <p>Untracked files can be used to achieve specific goals. For example, imagine a storage adapter that writes files to the specified ZIP archive. You can create an interface, that initializes such storage for an existing ZIP resource and uploads files into it. You don't need a separate record in DB for every uploaded file, because all of them go into the resource, that is already stored in DB.</p> <p>But such use-cases are pretty specific, so prefer to use API if you are not sure, what you need. The main reason to use tracked files is their discoverability: you can use <code>files_file_search</code> API action to list all the tracked files and optionally filter them by storage, location, content_type, etc:</p> <pre><code>ckanapi action files_file_search\n\n... {\n...   \"count\": 123,\n...   \"results\": [\n...     {\n...       \"atime\": null,\n...       \"content_type\": \"text/plain\",\n...       \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n...       \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...       \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n...       \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n...       \"mtime\": null,\n...       \"name\": \"hello.txt\",\n...       \"size\": 11,\n...       \"storage\": \"default\",\n...       \"storage_data\": {}\n...     },\n...     ...\n...   ]\n... }\n\nckanapi action files_file_search size:5 rows=1\n\n... {\n...   \"count\": 2,\n...   \"results\": [\n...     {\n...       \"atime\": null,\n...       \"content_type\": \"text/plain\",\n...       \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n...       \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...       \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n...       \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n...       \"mtime\": null,\n...       \"name\": \"hello.txt\",\n...       \"size\": 5,\n...       \"storage\": \"default\",\n...       \"storage_data\": {}\n...     }\n...   ]\n... }\n\nckanapi action files_file_search content_type=application/pdf\n\n... {\n...   \"count\": 0,\n...   \"results\": []\n... }\n</code></pre> <p>As for untracked files, their discoverability depends on the storage adapters. Some of them, <code>files:fs</code> for example, can scan the storage and locate all uploaded files, both thacked and untracked. If you have <code>files:fs</code> storage configured as <code>default</code>, use the following command to scan its content:</p> <pre><code>ckan files scan\n</code></pre> <p>If you want to scan a different storage, specify its name via <code>-s/--storage-name</code> option. Remember, that some storage adapters do not support scanning.</p> <pre><code>ckan files scan -s memory\n</code></pre> <p>If you want to see untracked files only, add <code>-u/--untracked-only</code> flag.</p> <pre><code>ckan files scan -u\n</code></pre> <p>If you want to track any untracked files, by creating a DB record for every such file, add <code>-t/--track</code> flag. After that you'll be able to discover previously untracked files via <code>files_file_search</code> API action. Most usable this option will be during the migration, when you are configuring a new storage, that points to an existing location with files.</p> <pre><code>ckan files scan -t\n</code></pre>"},{"location":"usage/transfer/","title":"Ownership transfer","text":"<p>File ownership can be transfered. As there can be only one owner of the file, as soon as you transfer ownership over file, you yourself do not own this file.</p> <p>To transfer ownership, use <code>files_transfer_ownership</code> action and specify <code>id</code> of the file, <code>owner_id</code> and <code>owner_type</code> of the new owner.</p> <p>You can't just transfer ownership to anyone. You either must pass <code>IFiles.files_owner_allows</code> check for <code>file_transfer</code> operation, or pass a cascade access check for the future owner of the file when cascade access and transfer-as-update is enabled.</p> <p>For example, if you have the following options in config file:</p> <p><pre><code>ckanext.files.owner.cascade_access = organization\nckanext.files.owner.transfer_as_update = true\n</code></pre> you must pass <code>organization_update</code> auth function if you want to transfer file ownership to organization.</p> <p>In addition, file can be pinned. In this way we mark important files. Imagine the resource and its uploaded file. The link to this file is used by resource and we don't want this file to be accidentally transfered to someone else. We pin the file and now nobody can transfer the file without explicit confirmation of his intention.</p> <p>There are two ways to move pinned file:</p> <ul> <li>you can call <code>files_file_unpin</code> first and then transfer the ownership via   separate API call</li> <li>you can pass <code>force</code> parameter to <code>files_transfer_ownership</code></li> </ul>"},{"location":"usage/use-in-browser/","title":"Usage in browser","text":"<p>You can upload files using JavaScript CKAN modules. ckanext-files extends CKAN's Sandbox object(available as <code>this.sandbox</code> inside the JS CKAN module), so we can use shortcut and upload file directly from the DevTools. Open any CKAN page, switch to JS console and create the sandbox instance. Inside it we have <code>files</code> object, which in turn contains <code>upload</code> method. This method accepts <code>File</code> object for upload(the same object you can get from the <code>input[type=file]</code>).</p> <pre><code>sandbox = ckan.sandbox()\nawait sandbox.files.upload(\nnew File([\"content\"], \"file.txt\")\n)\n\n... {\n...     \"id\": \"18cdaa65-5eed-4078-89a8-469b137627ce\",\n...     \"name\": \"file.txt\",\n...     \"location\": \"b53907c3-8434-4dee-9a9e-6c4d3055d200\",\n...     \"content_type\": \"text/plain\",\n...     \"size\": 7,\n...     \"hash\": \"9a0364b9e99bb480dd25e1f0284c8555\",\n...     \"storage\": \"default\",\n...     \"ctime\": \"2024-06-02T16:12:27.902055+00:00\",\n...     \"mtime\": null,\n...     \"atime\": null,\n...     \"storage_data\": {}\n... }\n</code></pre> <p>If you are still using FS storage configured in previous section, switch to <code>/tmp/example</code> folder and check it's content:</p> <pre><code>ls /tmp/example\n... b53907c3-8434-4dee-9a9e-6c4d3055d200\n\ncat b53907c3-8434-4dee-9a9e-6c4d3055d200\n... content\n</code></pre> <p>And, as usually, let's remove file using the ID from the <code>upload</code> promise:</p> <pre><code>sandbox.client.call(\"POST\", \"files_file_delete\", {\nid: \"18cdaa65-5eed-4078-89a8-469b137627ce\"\n})\n</code></pre>"},{"location":"usage/use-in-code/","title":"Usage in code","text":"<p>If you are writing the code and you want to interact with the storage directly, without the API layer, you can do it via a number of public functions of the extension available in <code>ckanext.files.shared</code>.</p> <p>Let's configure filesystem storage first. Filesystem adapter has a mandatory option <code>path</code> that controls filesystem location, where files are stored. If path does not exist, storage will raise an exception by default. But it can also create missing path if you enable <code>create_path</code> option. Here's our final version of settings:</p> <pre><code>ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n</code></pre> <p>Now we are going to connect to CKAN shell via <code>ckan shell</code> CLI command and create an instance of the storage:</p> <pre><code>from ckanext.files.shared import get_storage\nstorage = get_storage()\n</code></pre> <p>Because you have all configuration in place, the rest is fairly straightforward. We will upload the file, read it's content and remove it from the CKAN shell.</p> <p>To create the file, <code>storage.upload</code> method must be called with 2 parameters:</p> <ul> <li>the human readable name of the file</li> <li>special steam-like object with content of the file</li> </ul> <p>You can use any string as the first parameter. As for the \"special stream-like object\", ckanext-files has <code>ckanext.files.shared.make_upload</code> function, that accepts a number of different types(<code>bytes</code>, <code>werkzeug.datastructures.FileStorage</code>, <code>BytesIO</code>, file descriptor) and converts them into expected format.</p> <pre><code>from ckanext.files.shared import make_upload\n\nupload = make_upload(b\"hello world\")\nresult = storage.upload('file.txt', upload)\n\nprint(result)\n\n... FileData(\n...     location='60b385e7-8137-496c-bb1d-6ae4d7963ab3',\n...     size=11,\n...     content_type='text/plain',\n...     hash='5eb63bbbe01eeed093cb22bb8f5acdc3',\n...     storage_data={}\n... )\n</code></pre> <p><code>result</code> is an instance of <code>ckanext.files.shared.FileData</code> dataclass. It contains all the information required by storage to manage the file.</p> <p><code>result</code> object has <code>location</code> attribute that contains the name of the file relative to the <code>path</code> option specified in the storage configuration. If you visit <code>/tmp/example</code> directory, which was set as a <code>path</code> for the storage, you'll see there a file with the name matching <code>location</code> from result. And its content matches the content of our upload, which is quite an expected outcome.</p> <pre><code>cat /tmp/example/60b385e7-8137-496c-bb1d-6ae4d7963ab3\n\n... hello world\n</code></pre> <p>But let's go back to the shell and try reading file from the python's code. We'll pass <code>result</code> to the storage's <code>stream</code> method, which produces an iterable of bytes based on our result:</p> <pre><code>buffer = storage.stream(result)\ncontent = b\"\".join(buffer)\n\n... b'hello world'\n</code></pre> <p>In most cases, storage only needs a location of the file object to read it. So, if you don't have <code>result</code> generated during the upload, you still can read the file as long as you have its location. But remember, that some storage adapters may require additional information, and the following example must be adapted depending on the adapter:</p> <pre><code>from ckanext.files.shared import FileData\n\nlocation = \"60b385e7-8137-496c-bb1d-6ae4d7963ab3\"\ndata = FileData(location)\n\nbuffer = storage.stream(data)\ncontent = b\"\".join(buffer)\nprint(content)\n\n... b'hello world'\n</code></pre> <p>And finally we can to remove the file</p> <pre><code>storage.remove(result)\n</code></pre>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-\\.\\_]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":""},{"location":"#ckanext-files","title":"ckanext-files","text":"<p>Files as first-class citizens of CKAN. Upload, manage, remove files directly and attach them to datasets, resources, etc.</p> <p>Read the documentation for a full user guide.</p>"},{"location":"#quickstart","title":"Quickstart","text":"<ol> <li> <p>Install the extension    <pre><code>pip install ckanext-files\n</code></pre></p> </li> <li> <p>Add <code>files</code> to the <code>ckan.plugins</code> setting in your CKAN    config file.</p> </li> <li> <p>Run DB migrations    <pre><code>ckan db upgrade -p files\n</code></pre></p> </li> <li> <p>Configure storage</p> <pre><code>ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n</code></pre> </li> <li> <p>Upload your first file</p> <pre><code>ckanapi action files_file_create upload@~/Downloads/file.txt`\n</code></pre> </li> </ol>"},{"location":"#development","title":"Development","text":"<p>Install <code>dev</code> extras and nodeJS dependencies:</p> <pre><code>pip install -e '.[dev]'\nnpm ci\n</code></pre> <p>Run unittests: <pre><code>pytest\n</code></pre></p> <p>Run frontend tests: <pre><code># start test server in separate terminal\nmake test-server\n\n# run tests\nnpx cypress run\n</code></pre></p> <p>Run typecheck: <pre><code>npx pyright\n</code></pre></p>"},{"location":"#license","title":"License","text":"<p>AGPL</p>"},{"location":"api/","title":"API","text":""},{"location":"api/#files_file_createcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_create(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Create a new file.</p> <p>This action passes uploaded file to the storage without strict validation. File is converted into standard upload object and everything else is controlled by storage. The same file may be uploaded to one storage and rejected by other, depending on configuration.</p> <p>This action is way too powerful to use it directly. The recommended approach is to register a different action for handling specific type of uploads and call current action internally.</p> <p>When uploading a real file(or using <code>werkqeug.datastructures.FileStorage</code>), name parameter can be omited. In this case, the name of uploaded file is used.</p> <pre><code>ckanapi action files_file_create upload@path/to/file.txt\n</code></pre> <p>When uploading a raw content of the file using string or bytes object, name is mandatory.</p> <pre><code>ckanapi action files_file_create upload@&lt;(echo -n \"hello world\") name=file.txt\n</code></pre> <p>Requires storage with <code>CREATE</code> capability.</p> <p>Params:</p> <ul> <li><code>name</code>: human-readable name of the file. Default: guess using upload field</li> <li><code>storage</code>: name of the storage that will handle the upload. Default: <code>default</code></li> <li><code>upload</code>: content of the file as string, bytes, file descriptor or uploaded file</li> </ul> <p>Returns:</p> <p>dictionary with file details.</p>"},{"location":"api/#files_file_deletecontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_delete(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Remove file from storage.</p> <p>Unlike packages, file has no <code>state</code> field. Removal usually means that file details removed from DB and file itself removed from the storage.</p> <p>Some storage can implement revisions of the file and keep archived versions or backups. Check storage documentation if you need to know whether there are chances that file is not completely removed with this operation.</p> <p>Requires storage with <code>REMOVE</code> capability.</p> <pre><code>ckanapi action files_file_delete id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n</code></pre> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>completed</code>: use <code>False</code> to remove incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with details of the removed file.</p>"},{"location":"api/#files_file_pincontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_pin(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Pin file to the current owner.</p> <p>Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>completed</code>: use <code>False</code> to pin incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with details of updated file</p>"},{"location":"api/#files_file_renamecontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_rename(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Rename the file.</p> <p>This action changes human-readable name of the file, which is stored in DB. Real location of the file in the storage is not modified.</p> <pre><code>ckanapi action files_file_show \\\n    id=226056e2-6f83-47c5-8bd2-102e2b82ab9a \\\n    name=new-name.txt\n</code></pre> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>name</code>: new name of the file</li> <li><code>completed</code>: use <code>False</code> to rename incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with file details</p>"},{"location":"api/#files_file_scancontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_scan(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>List files of the owner</p> <p>This action internally calls files_file_search, but with static values of owner filters. If owner is not specified, files filtered by current user. If owner is specified, user must pass authorization check to see files.</p> <p>Params:</p> <ul> <li><code>owner_id</code>: ID of the owner</li> <li><code>owner_type</code>: type of the owner</li> </ul> <p>The all other parameters are passed as-is to <code>files_file_search</code>.</p> <p>Returns:</p> <ul> <li><code>count</code>: total number of files matching filters</li> <li><code>results</code>: array of dictionaries with file details.</li> </ul>"},{"location":"api/#files_file_searchcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_search(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Search files.</p> <p>This action is not stabilized yet and will change in future.</p> <p>Provides an ability to search files using exact filter by name, content_type, size, owner, etc. Results are paginated and returned in package_search manner, as dict with <code>count</code> and <code>results</code> items.</p> <p>All columns of File model can be used as filters. Before the search, type of column and type of filter value are compared. If they are the same, original values are used in search. If type different, column value and filter value are casted to string.</p> <p>This request produces <code>size = 10</code> SQL expression: <pre><code>ckanapi action files_file_search size:10\n</code></pre></p> <p>This request produces <code>size::text = '10'</code> SQL expression: <pre><code>ckanapi action files_file_search size=10\n</code></pre></p> <p>Even though results are usually not changed, using correct types leads to more efficient search.</p> <p>Apart from File columns, the following Owner properties can be used for searching: <code>owner_id</code>, <code>owner_type</code>, <code>pinned</code>.</p> <p><code>storage_data</code> and <code>plugin_data</code> are dictionaries. Filter's value for these fields used as a mask. For example, <code>storage_data={\"a\": {\"b\": 1}}</code> matches any File with <code>storage_data</code> containing item <code>a</code> with value that contains <code>b=1</code>. This works only with data represented by nested dictionaries, without other structures, like list or sets.</p> <p>Experimental feature: File columns can be passed as a pair of operator and value. This feature will be replaced by strictly defined query language at some point:</p> <p><pre><code>ckanapi action files_file_search size:'[\"&lt;\", 100]' content_type:'[\"like\", \"text/%\"]'\n</code></pre> Fillowing operators are accepted: <code>=</code>, <code>&lt;</code>, <code>&gt;</code>, <code>!=</code>, <code>like</code></p> <p>Params:</p> <ul> <li><code>start</code>: index of first row in result/number of rows to skip. Default: <code>0</code></li> <li><code>rows</code>: number of rows to return. Default: <code>10</code></li> <li><code>sort</code>: name of File column used for sorting. Default: <code>name</code></li> <li><code>reverse</code>: sort results in descending order. Default: <code>False</code></li> <li><code>storage_data</code>: mask for <code>storage_data</code> column. Default: <code>{}</code></li> <li><code>plugin_data</code>: mask for <code>plugin_data</code> column. Default: <code>{}</code></li> <li><code>owner_type: str</code>: show only specific owner id if present. Default: <code>None</code></li> <li><code>owner_type</code>: show only specific owner type if present. Default: <code>None</code></li> <li><code>pinned</code>: show only pinned/unpinned items if present. Default: <code>None</code></li> <li><code>completed</code>: use <code>False</code> to search incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <ul> <li><code>count</code>: total number of files matching filters</li> <li><code>results</code>: array of dictionaries with file details.</li> </ul>"},{"location":"api/#files_file_search_by_usercontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_search_by_user(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Internal action. Do not use it.</p>"},{"location":"api/#files_file_showcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_show(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Show file details.</p> <p>This action only displays information from DB record. There is no way to get the content of the file using this action(or any other API action).</p> <pre><code>ckanapi action files_file_show id=226056e2-6f83-47c5-8bd2-102e2b82ab9a\n</code></pre> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>completed</code>: use <code>False</code> to show incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with file details</p>"},{"location":"api/#files_file_unpincontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_file_unpin(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Pin file to the current owner.</p> <p>Pinned file cannot be transfered to a different owner. Use it to guarantee that file referred by entity is not accidentally transferred to a different owner.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the file</li> <li><code>completed</code>: use <code>False</code> to unpin incomplete uploads. Default: <code>True</code></li> </ul> <p>Returns:</p> <p>dictionary with details of updated file</p>"},{"location":"api/#files_multipart_completecontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_multipart_complete(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Finalize multipart upload and transform it into completed file.</p> <p>Depending on storage this action may require additional parameters. But usually it just takes ID and verify that content type, size and hash provided when upload was initialized, much the actual value.</p> <p>If data is valid and file is completed inside the storage, new File entry with file details created in DB and file can be used just as any normal file.</p> <p>Requires storage with <code>MULTIPART</code> capability.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the incomplete upload</li> </ul> <p>Returns:</p> <p>dictionary with details of the created file</p>"},{"location":"api/#files_multipart_refreshcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_multipart_refresh(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Refresh details of incomplete upload.</p> <p>Can be used if upload process was interrupted and client does not how many bytes were already uploaded.</p> <p>Requires storage with <code>MULTIPART</code> capability.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the incomplete upload</li> </ul> <p>Returns:</p> <p>dictionary with details of the updated upload</p>"},{"location":"api/#files_multipart_startcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_multipart_start(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Initialize multipart(resumable,continuous,signed,etc) upload.</p> <p>Apart from standard parameters, different storages can require additional data, so always check documentation of the storage before initiating multipart upload.</p> <p>When upload initialized, storage usually returns details required for further upload. It may be a presigned URL for direct upload, or just an ID of upload which must be used with <code>files_multipart_update</code>.</p> <p>Requires storage with <code>MULTIPART</code> capability.</p> <p>Params:</p> <ul> <li><code>storage</code>: name of the storage that will handle the upload. Default: <code>default</code></li> <li><code>name</code>: name of the uploaded file.</li> <li><code>content_type</code>: MIMEtype of the uploaded file. Used for validation</li> <li><code>size</code>: Expected size of upload. Used for validation</li> <li><code>hash</code>: Expected content hash. If present, used for validation.</li> </ul> <p>Returns:</p> <p>dictionary with details of initiated upload. Depends on used storage</p>"},{"location":"api/#files_multipart_updatecontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_multipart_update(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Update incomplete upload.</p> <p>Depending on storage this action may require additional parameters. Most likely, <code>upload</code> with the fragment of uploaded file.</p> <p>Requires storage with <code>MULTIPART</code> capability.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the incomplete upload</li> </ul> <p>Returns:</p> <p>dictionary with details of the updated upload</p>"},{"location":"api/#files_resource_uploadcontext-context-data_dict-dictstr-any","title":"<code>files_resource_upload(context: 'Context', data_dict: 'dict[str, Any]')</code>","text":"<p>Create a new file inside resource storage.</p> <p>This action internally calls <code>files_file_create</code> with <code>ignore_auth=True</code> and always uses resources storage.</p> <p>New file is not attached to resource. You need to call <code>files_transfer_ownership</code> manually, when resource created.</p> <p>Params:</p> <ul> <li><code>name</code>: human-readable name of the file. Default: guess using upload field</li> <li><code>upload</code>: content of the file as string, bytes, file descriptor or uploaded file</li> </ul> <p>Returns:</p> <p>dictionary with file details.</p>"},{"location":"api/#files_transfer_ownershipcontext-context-data_dict-dictstr-any-dictstr-any","title":"<code>files_transfer_ownership(context: 'Context', data_dict: 'dict[str, Any]') -&gt; 'dict[str, Any]'</code>","text":"<p>Transfer file ownership.</p> <p>Depending on storage this action may require additional parameters. Most likely, <code>upload</code> with the fragment of uploaded file.</p> <p>Params:</p> <ul> <li><code>id</code>: ID of the file upload</li> <li><code>completed</code>: use <code>False</code> to transfer incomplete uploads. Default: <code>True</code></li> <li><code>owner_id</code>: ID of the new owner</li> <li><code>owner_type</code>: type of the new owner</li> <li><code>force</code>: move file even if it's pinned. Default: <code>False</code></li> <li><code>pin</code>: pin file after transfer to stop future transfers. Default: <code>False</code></li> </ul> <p>Returns:</p> <p>dictionary with details of updated file</p>"},{"location":"changelog/","title":"Changelog","text":"<p>All notable changes to this project will be documented in this file.</p> <p>The format is based on Keep a Changelog and this project adheres to Semantic Versioning.</p>"},{"location":"changelog/#unreleased","title":"Unreleased","text":"<p>Compare with latest</p>"},{"location":"changelog/#features","title":"Features","text":"<ul> <li>transfer_history table (434abda by Sergey Motornyuk).</li> <li>libcloud adapter (6594bb9 by Sergey Motornyuk).</li> <li>add SCAN and ANALYZE to redis storage (96c2706 by Sergey Motornyuk).</li> <li>pinned files (3e1db60 by Sergey Motornyuk).</li> <li>files_download_info helper (2659ae3 by Sergey Motornyuk).</li> <li>validators for file fields (a849247 by Sergey Motornyuk).</li> <li>add owner details to dictized file (9ffd098 by Sergey Motornyuk).</li> <li>restrict list of available storage for authenticated uploads (a075263 by Sergey Motornyuk).</li> <li>allow_authenticated_uploads config option (7737b44 by Sergey Motornyuk).</li> <li>implement temporal_link for fs (45ec242 by Sergey Motornyuk).</li> <li>add public_link method to storage (15a685b by Sergey Motornyuk).</li> <li>optional hash verification for multipart upload (28e5f69 by Sergey Motornyuk).</li> <li>add supported_types option for storages to restict upload types (c5b43ac by Sergey Motornyuk).</li> <li>add files_file_search action (b8e8b4c by Sergey Motornyuk).</li> <li>File.get method (591ec48 by Sergey Motornyuk).</li> <li>get_storage without arguments returns default storage (571e021 by Sergey Motornyuk).</li> <li>use timezone-aware date columns in model (ae91cc7 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#code-refactoring","title":"Code Refactoring","text":"<ul> <li>remove HashingReader.reset (7e67d5f by Sergey Motornyuk).</li> <li>do not allow str as upload source (cae738d by Sergey Motornyuk).</li> <li>Capability.combine removed (332c1a4 by Sergey Motornyuk).</li> <li>access column removed from owner table and now only single owner allowed (f8d385d by Sergey Motornyuk).</li> <li>add completed flag for rename, show and delete actions for simultaneous file and multipart support. (10fb202 by Sergey Motornyuk).</li> <li>rename files_upload_show to files_multipart_refresh (ee2a4df by Sergey Motornyuk).</li> <li>add kwargs to all Storage methods and extras to all service methods (1526df1 by Sergey Motornyuk).</li> <li>rename Storage and Uploader multipart_upload into multipart for consistency with actions (ddfd111 by Sergey Motornyuk).</li> <li>rename files_upload_ actions to files_multipart_(initialize changed to start) (6493a1d by Sergey Motornyuk).</li> <li>rename MULTIPART_UPLOAD capability to MULTIPART (20d01bf by Sergey Motornyuk).</li> <li>use custom dataclass for Upload instead of werkzeug.datastructures.FileStorage (78ae63b by Sergey Motornyuk).</li> <li>move hash, size, location(former filename) and content_type to the top level of file entity (45a2679 by Sergey Motornyuk).</li> <li>extract File.completed==False into Multipart model (d90d786 by Sergey Motornyuk).</li> <li>use dataclasses instead of dict in storage (4965568 by Sergey Motornyuk).</li> <li>storage_from_settings renamed to make_storage (08fd767 by Sergey Motornyuk).</li> <li>transform combine_capabilities and exclude_capabilities into Capability methods (73d32d4 by Sergey Motornyuk).</li> <li>replace CapabilityCluster and CapabilityUnit with Capability (16d3b7e by Sergey Motornyuk).</li> <li>remove re-imported types from ckanext.files.types (4b9e870 by Sergey Motornyuk).</li> <li>remove support of CKAN pre v2.10 (3e70bc2 by Sergey Motornyuk).</li> <li>UnsupportedOperationError constructed with adapter type instead of name (55d038d by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v031-2024-05-22","title":"v0.3.1 - 2024-05-22","text":"<p>Compare with v0.3.0</p>"},{"location":"changelog/#features_1","title":"Features","text":"<ul> <li>generic_download view (d000446 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v030-2024-05-16","title":"v0.3.0 - 2024-05-16","text":"<p>Compare with v0.2.6</p>"},{"location":"changelog/#features_2","title":"Features","text":"<ul> <li>files_uploader plugin compatible with native uploader interface (31aaaa6 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#bug-fixes","title":"Bug Fixes","text":"<ul> <li>upload errors rendered outside of view box (48005ed by Sergey Motornyuk).</li> <li>upload errors in actions not tracked (530c6d9 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#code-refactoring_1","title":"Code Refactoring","text":"<ul> <li>disallow file creation via auth function (0db289b by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v006-2024-04-24","title":"v0.0.6 - 2024-04-24","text":"<p>Compare with v0.0.5</p>"},{"location":"changelog/#bug-fixes_1","title":"Bug Fixes","text":"<ul> <li>declarations are missing from the package (15fa97b by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v026-2024-04-24","title":"v0.2.6 - 2024-04-24","text":"<p>Compare with v0.2.4</p>"},{"location":"changelog/#bug-fixes_2","title":"Bug Fixes","text":"<ul> <li>declarations are missing from the package (16c4999 by Sergey Motornyuk).</li> <li>catch permission error on delete (9e6d799 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v024-2024-04-15","title":"v0.2.4 - 2024-04-15","text":"<p>Compare with v0.2.3</p>"},{"location":"changelog/#features_3","title":"Features","text":"<ul> <li>add dropzone and immediate upload (0486a00 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v023-2024-04-07","title":"v0.2.3 - 2024-04-07","text":"<p>Compare with v0.2.2</p>"},{"location":"changelog/#features_4","title":"Features","text":"<ul> <li>file search by plugin data (9dc51bd by Sergey Motornyuk).</li> <li>multipart uploaders accept initialize/complete payloads in JS (97d9933 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#bug-fixes_3","title":"Bug Fixes","text":"<ul> <li>python2 fails when content-length accessed (6e99315 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v022-2024-03-18","title":"v0.2.2 - 2024-03-18","text":"<p>Compare with v0.2.1</p>"},{"location":"changelog/#v021-2024-03-18","title":"v0.2.1 - 2024-03-18","text":"<p>Compare with v0.2.0</p>"},{"location":"changelog/#features_5","title":"Features","text":"<ul> <li>add move and copy operations (577b537 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v020-2024-03-12","title":"v0.2.0 - 2024-03-12","text":"<p>Compare with v0.0.5</p>"},{"location":"changelog/#features_6","title":"Features","text":"<ul> <li>UI for file uploads (4121e6f by Sergey Motornyuk).</li> <li>redis storage (cced1e8 by Sergey Motornyuk).</li> <li>multipart upload api (96051d4 by Sergey Motornyuk).</li> <li>GCS storage (cc5ee76 by Sergey Motornyuk).</li> <li>split files into Storage and File (71a7765 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#code-refactoring_2","title":"Code Refactoring","text":"<ul> <li>switch to typescript (dfb5060 by Sergey Motornyuk).</li> <li>full type coverage (f399582 by Sergey Motornyuk).</li> <li>get rid of blankets (dfe901c by Sergey Motornyuk).</li> <li>make types py2 compatible (c099dfc by Sergey Motornyuk).</li> <li>remove ckanext-toolbelt dependency (dc885c7 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v005-2024-02-26","title":"v0.0.5 - 2024-02-26","text":"<p>Compare with v0.0.4</p>"},{"location":"changelog/#bug-fixes_4","title":"Bug Fixes","text":"<ul> <li>fix auth functions (c76d5ff by mutantsan).</li> <li>collection type without generic (0647355 by Sergey Motornyuk).</li> </ul>"},{"location":"changelog/#v004-2023-10-25","title":"v0.0.4 - 2023-10-25","text":"<p>Compare with v0.0.2</p>"},{"location":"changelog/#v002-2022-02-09","title":"v0.0.2 - 2022-02-09","text":"<p>Compare with v0.0.1</p>"},{"location":"changelog/#v001-2021-09-21","title":"v0.0.1 - 2021-09-21","text":"<p>Compare with first commit</p>"},{"location":"cli/","title":"CLI","text":"<p>ckanext-files register <code>files</code> entrypoint under <code>ckan</code> command. Commands below must be executed as <code>ckan -c $CKAN_INI files &lt;COMMAND&gt;</code>.</p> <p><code>adapters [-v]</code></p> <p>List all available storage adapters. With <code>-v/--verbose</code> flag docstring from adapter classes are printed as well.</p> <p><code>storages [-v]</code></p> <p>List all configured storages. With <code>-v/--verbose</code> flag all supported capabilities are shown.</p> <p><code>stream FILE_ID [-o OUTPUT] [--start START] [--end END]</code></p> <p>Stream content of the file to STDOUT. For non-textual files use output redirection <code>stream ID &gt; file.ext</code>. Alternatively, output destination can be specified via <code>-o/--output</code> option. If it contains path to directory, inside this directory will be created file with the same name as streamed item. Otherwise, <code>OUTPUT</code> is used as filename.</p> <p><code>--start</code> and <code>--end</code> can be used to receive a fragment of the file. Only positive values are guaranteed to work with any storage that supports STREAM. Some storages support negative values for these options and count them from the end of file. I.e <code>--start -10</code> reads last 10 bytes of file. <code>--end -1</code> reads till the last byte, but the last byte is not included into output.</p> <p><code>scan [-s default] [-u] [-t [-a OWNER_ID]]</code></p> <p>List all files that exist in storage. Works only if storage supports <code>SCAN</code>. By default shows content of <code>default</code> storage. <code>-s/--storage-name</code> option changes target storage.</p> <p><code>-u/--untracked-only</code> flag shows only untracked files, that has no corresponding record in DB. Can be used to identify leftovers after removing data from portal.</p> <p><code>-t/--track</code> flag registers any untracked file by creating DB record for it. Can be used only when <code>ANALYZE</code> is supported. Files are created without an owner. Use <code>-a/--adopt-by</code> option with user ID to give ownership over new files to the specified user. Can be used when configuring a new storage connected to existing location with files.</p>"},{"location":"implementation-example/","title":"Example implementation of custom storage adapter","text":"<p>Storage consist of the storage object that dispatches operation requests and 3 services that do the actual job: Reader, Uploader and Manager. To define a custom storage, you need to extend the main storage class, describe storage logic and register storage via <code>IFiles.files_get_storage_adapters</code>.</p> <p>Let's implement DB storage. It will store files in SQL table using SQLAlchemy. There will be just one requirement for the table: it must have column for storing unique identifier of the file and another column for storing content of the file as bytes.</p> <p>For the sake of simplicity, our storage will work only with existing tables. Create the table manually before we begin.</p> <p>First of all, we create an adapter that does nothing and register it in our plugin.</p> <pre><code>from __future__ import annotations\n\nfrom typing import Any\nimport sqlalchemy as sa\n\nimport ckan.plugins as p\nfrom ckan.model.types import make_uuid\nfrom ckanext.files import shared\n\n\nclass ExamplePlugin(p.SingletonPlugin):\n    p.implements(shared.IFiles)\n    def files_get_storage_adapters(self) -&gt; dict[str, Any]:\n        return {\"example:db\": DbStorage}\n\n\nclass DbStorage(shared.Storage):\n    ...\n</code></pre> <p>After installing and enabling your custom plugin, you can configure storage with this adapter by adding a single new line to config file:</p> <pre><code>ckanext.files.storage.db.type = files:db\n</code></pre> <p>But if you check storage via <code>ckan files storages -v</code>, you'll see that it can't do anything.</p> <pre><code>ckan files storages -v\n\n... db: example:db\n...        Supports: Capability.NONE\n...        Does not support: Capability.REMOVE|STREAM|CREATE|...\n</code></pre> <p>Before we start uploading files, let's make sure that storage has proper configuration. As files will be stored in the DB table, we need the name of the table and DB connection string. Let's assume that table already exists, but we don't know which columns to use for files. So we need name of column for content and for file's unique identifier. ckanext-files uses term <code>location</code> instead of identifier, so we'll do the same in our implementation.</p> <p>There are 4 required options in total: * <code>db_url</code>: DB connection string * <code>table</code>: name of the table * <code>location_column</code>: name of column for file's unique identifier * <code>content_column</code>: name of column for file's content</p> <p>It's not mandatory, but is highly recommended that you declare config options for the adapter. It can be done via <code>Storage.declare_config_options</code> class method, which accepts <code>declaration</code> object and <code>key</code> namespace for storage options.</p> <pre><code>class DbStorage(shared.Storage):\n\n    @classmethod\n    def declare_config_options(cls, declaration, key) -&gt; None:\n        declaration.declare(key.db_url).required()\n        declaration.declare(key.table).required()\n        declaration.declare(key.location_column).required()\n        declaration.declare(key.content_column).required()\n</code></pre> <p>And we probably want to initialize DB connection when storage is initialized. For this we'll extend constructor, which must be defined as method accepting keyword-only arguments:</p> <pre><code>class DbStorage(shared.Storage):\n    ...\n\n    def __init__(self, **settings: Any) -&gt; None:\n        db_url = self.ensure_option(settings, \"db_url\")\n\n        self.engine = sa.create_engine(db_url)\n        self.location_column = sa.column(\n            self.ensure_option(settings, \"location_column\")\n        )\n        self.content_column = sa.column(self.ensure_option(settings, \"content_column\"))\n        self.table = sa.table(\n            self.ensure_option(settings, \"table\"),\n            self.location_column,\n            self.content_column,\n        )\n        super().__init__(**settings)\n</code></pre> <p>You can notice that we are using <code>Storage.ensure_option</code> quite often. This method returns the value of specified option from settings or raises an exception.</p> <p>The table definition and columns are saved as storage attributes, to simplify building SQL queries in future.</p> <p>Now we are going to define classes for all 3 storage services and tell storage, how to initialize these services.</p> <p>There are 3 services: Reader, Uploader and Manager. Each of them initialized via corresponding storage method: <code>make_reader</code>, <code>make_uploader</code> and <code>make_manager</code>. And each of them accepts a single argument during creation, the storage itself.</p> <pre><code>class DbStorage(shared.Storage):\n    def make_reader(self):\n        return DbReader(self)\n\n    def make_uploader(self):\n        return DbUploader(self)\n\n    def make_manager(self):\n        return DbManager(self)\n\n\nclass DbReader(shared.Reader):\n    ...\n\n\nclass DbUploader(shared.Uploader):\n    ...\n\n\nclass DbManager(shared.Manager):\n    ...\n</code></pre> <p>Our first target is Uploader service. It's responsible for file creation. For the minimal implementation it needs <code>upload</code> method and <code>capabilities</code> attribute which tells the storage, what exactly the Uploader can do.</p> <pre><code>class DbUploader(shared.Uploader):\n    capabilities = shared.Capability.CREATE\n\n    def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -&gt; shared.FileData:\n        ...\n</code></pre> <p><code>upload</code> receives the <code>location</code>(name) of the uploaded file; <code>upload</code> object with file's content; and <code>extras</code> dictionary that contains any additional arguments that can be passed to uploader. We are going to ignore <code>location</code> and generate a unique UUID for every uploaded file instead of using user-defined filename.</p> <p>The goal is to write the file into DB and return <code>shared.FileData</code> that contains location of the file in DB(value of <code>location_column</code>), size of the file in bytes, MIMEtype of the file and hash of file content.</p> <p>For location we'll just use <code>ckan.model.types.make_uuid</code> function. Size and MIMEtype are already available as <code>upload.size</code> and <code>upload.content_type</code>.</p> <p>The only problem is hash of the content. You can compute it in any way you like, but there is a simple option if you have no preferences. <code>upload</code> has <code>hashing_reader</code> method, which returns an iterable for file content. When you read file through it, content hash is automatically computed and you can get it using <code>get_hash</code> method of the reader.</p> <p>Just make sure to read the whole file before checking the hash, because hash computed using consumed content. I.e, if you just create the hashing reader, but do not read a single byte from it, you'll receive the hash of empty string. If you read just 1 byte, you'll receive the hash of this single byte, etc.</p> <p>The easiest option for you is to call <code>reader.read()</code> method to consume the whole file and then call <code>reader.get_hash()</code> to receive the hash.</p> <p>Here's the final implementation of DbUploader:</p> <pre><code>class DbUploader(shared.Uploader):\n    capabilities = shared.Capability.CREATE\n\n    def upload(self, location: str, upload: shared.Upload, extras: dict[str, Any]) -&gt; shared.FileData:\n        uuid = make_uuid()\n        reader = upload.hashing_reader()\n\n        values = {\n            self.storage.location_column: uuid,\n            self.storage.content_column: reader.read(),\n        }\n        stmt = sa.insert(self.storage.table, values)\n\n        result = self.storage.engine.execute(stmt)\n\n        return shared.FileData(\n            uuid,\n            upload.size,\n            upload.content_type,\n            reader.get_hash()\n        )\n</code></pre> <p>Now you can upload file into your new <code>db</code> storage:</p> <pre><code>ckanapi action files_file_create storage=db name=hello.txt upload@&lt;(echo -n 'hello world')\n\n...{\n...  \"atime\": null,\n...  \"content_type\": \"text/plain\",\n...  \"ctime\": \"2024-06-17T13:48:52.121755+00:00\",\n...  \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...  \"id\": \"bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\",\n...  \"location\": \"5a4472b3-cf38-4c58-81a6-4d4acb7b170e\",\n...  \"mtime\": null,\n...  \"name\": \"hello.txt\",\n...  \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...  \"owner_type\": \"user\",\n...  \"pinned\": false,\n...  \"size\": 11,\n...  \"storage\": \"db\",\n...  \"storage_data\": {}\n...}\n</code></pre> <p>File is created, but you cannot read it just yet. Try running <code>ckan files stream</code> CLI command with file ID:</p> <pre><code>ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... Operation stream is not supported by db storage\n... Aborted!\n</code></pre> <p>As expected, you have to write extra code.</p> <p>Streaming, reading and generating links is a responsibility of Reader service. We only need <code>stream</code> method for minimal implementation. This method receives <code>shared.FileData</code> object(the same object as the one returned from <code>Uploader.upload</code>) and <code>extras</code> containing all additional arguments passed by the caller. The result is any iterable producing bytes.</p> <p>We'll use <code>location</code> property of <code>shared.FileData</code> as a value for <code>location_column</code> inside the table.</p> <p>And don't forget to add <code>STREAM</code> capability to <code>Reader.capabilities</code>.</p> <pre><code>class DbReader(shared.Reader):\n    capabilities = shared.Capability.STREAM\n\n    def stream(self, data: shared.FileData, extras: dict[str, Any]) -&gt; Iterable[bytes]:\n        stmt = (\n            sa.select(self.storage.content_column)\n            .select_from(self.storage.table)\n            .where(self.storage.location_column == data.location)\n        )\n        row = self.storage.engine.execute(stmt).fetchone()\n\n        return row\n</code></pre> <p>The result may be confusing: we returning Row object from the stream method. But our goal is to return any iterable that produces bytes. Row is iterable(tuple-like). And it contains only one item - value of column with file content, i.e, bytes. So it satisfy the requirements.</p> <p>Now you can check content via CLI once again.</p> <pre><code>ckan files stream bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n\n... hello world\n</code></pre> <p>Finally, we need to add file removal for the minimal implementation. And it also nice to to have <code>SCAN</code> capability, as it shows all files currently available in storage, so we add it as bonus. These operations handled by Manager. We need <code>remove</code> and <code>scan</code> methods. Arguments are already familiar to you. As for results:</p> <ul> <li><code>remove</code>: return <code>True</code> if file was successfully removed. Should return   <code>False</code> if file does not exist, but it's allowed to return <code>True</code> as long as   you are not checking the result.</li> <li><code>scan</code>: return iterable with all file locations</li> </ul> <pre><code>class DbManager(shared.Manager):\n    storage: DbStorage\n    capabilities = shared.Capability.SCAN | shared.Capability.REMOVE\n\n    def scan(self, extras: dict[str, Any]) -&gt; Iterable[str]:\n        stmt = sa.select(self.storage.location_column).select_from(self.storage.table)\n        for row in self.storage.engine.execute(stmt):\n            yield row[0]\n\n    def remove(\n        self,\n        data: shared.FileData | shared.MultipartData,\n        extras: dict[str, Any],\n    ) -&gt; bool:\n        stmt = sa.delete(self.storage.table).where(\n            self.storage.location_column == data.location,\n        )\n        self.storage.engine.execute(stmt)\n        return True\n</code></pre> <p>Now you can list the all the files in storage: <pre><code>ckan files scan -s db\n</code></pre></p> <p>And remove file using ckanaapi and file ID</p> <pre><code>ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24\n</code></pre> <p>That's all you need for the basic storage. But check definition of base storage and services to find details about other methods. And also check implementation of other storages for additional ideas. &lt;</p>"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#requirements","title":"Requirements","text":"<p>Compatibility with core CKAN versions:</p> CKAN version Compatible? 2.9 no 2.10 yes 2.11 yes master yes <p>Note</p> <p>It's recommended to install the extension via pip. If you are using GitHub version of the extension, stick to the vX.Y.Z tags to avoid breaking changes. Check the changelog before upgrading the extension.</p>"},{"location":"installation/#installation_1","title":"Installation","text":"<p>Install the extension</p> <pre><code>pip install ckanext-files # (1)!\n</code></pre> <ol> <li>If you want to use additional adapters, like Apache-libcloud or OpenDAL,    specify corresponding package extras    <pre><code>pip install ckanext-files[opendal,libcloud]\n</code></pre></li> </ol> <p>Add <code>files</code> to the <code>ckan.plugins</code> setting in your CKAN config file.</p> <p>Run DB migrations</p> <pre><code>ckan db upgrade -p files\n</code></pre>"},{"location":"interfaces/","title":"Interfaces","text":""},{"location":"interfaces/#interfaces","title":"Interfaces","text":"<p>ckanext-files registers <code>ckanext.files.shared.IFiles</code> interface. As extension is actively developed, this interface may change in future. Always use <code>inherit=True</code> when implementing <code>IFiles</code>.</p> <pre><code>class IFiles(Interface):\n    \"\"\"Extension point for ckanext-files.\"\"\"\n\n    def files_get_storage_adapters(self) -&gt; dict[str, Any]:\n        \"\"\"Return mapping of storage type to adapter class.\n\n        Example:\n        &gt;&gt;&gt; def files_get_storage_adapters(self):\n        &gt;&gt;&gt;     return {\n        &gt;&gt;&gt;         \"my_ext:dropbox\": DropboxStorage,\n        &gt;&gt;&gt;     }\n\n        \"\"\"\n\n        return {}\n\n    def files_register_owner_getters(self) -&gt; dict[str, Callable[[str], Any]]:\n        \"\"\"Return mapping with lookup functions for owner types.\n\n        Name of the getter is the name used as `Owner.owner_type`. The getter\n        itself is a function that accepts owner ID and returns optional owner\n        entity.\n\n        Example:\n        &gt;&gt;&gt; def files_register_owner_getters(self):\n        &gt;&gt;&gt;     return {\"resource\": model.Resource.get}\n        \"\"\"\n        return {}\n\n    def files_file_allows(\n        self,\n        context: types.Context,\n        file: File | Multipart,\n        operation: types.FileOperation,\n    ) -&gt; bool | None:\n        \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n        Return True/False if user allowed/not allowed. Return `None` to rely on\n        other plugins.\n\n        Default implementation relies on cascade_access config option. If owner\n        of file is included into cascade access, user can perform operation on\n        file if he can perform the same operation with file's owner.\n\n        If current owner is not affected by cascade access, user can perform\n        operation on file only if user owns the file.\n\n        Example:\n        &gt;&gt;&gt; def files_file_allows(\n        &gt;&gt;&gt;         self, context,\n        &gt;&gt;&gt;         file: shared.File | shared.Multipart,\n        &gt;&gt;&gt;         operation: shared.types.FileOperation\n        &gt;&gt;&gt; ) -&gt; bool | None:\n        &gt;&gt;&gt;     if file.owner_info and file.owner_info.owner_type == \"resource\":\n        &gt;&gt;&gt;         return is_authorized_boolean(\n        &gt;&gt;&gt;             f\"resource_{operation}\",\n        &gt;&gt;&gt;             context,\n        &gt;&gt;&gt;             {\"id\": file.owner_info.id}\n        &gt;&gt;&gt;         )\n        &gt;&gt;&gt;\n        &gt;&gt;&gt;     return None\n\n        \"\"\"\n        return None\n\n    def files_owner_allows(\n        self,\n        context: types.Context,\n        owner_type: str,\n        owner_id: str,\n        operation: types.OwnerOperation,\n    ) -&gt; bool | None:\n        \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n        Return True/False if user allowed/not allowed. Return `None` to rely on\n        other plugins.\n\n        Example:\n        &gt;&gt;&gt; def files_owner_allows(\n        &gt;&gt;&gt;         self, context,\n        &gt;&gt;&gt;         owner_type: str, owner_id: str,\n        &gt;&gt;&gt;         operation: shared.types.OwnerOperation\n        &gt;&gt;&gt; ) -&gt; bool | None:\n        &gt;&gt;&gt;     if owner_type == \"resource\" and operation == \"file_transfer\":\n        &gt;&gt;&gt;         return is_authorized_boolean(\n        &gt;&gt;&gt;             f\"resource_update\",\n        &gt;&gt;&gt;             context,\n        &gt;&gt;&gt;             {\"id\": owner_id}\n        &gt;&gt;&gt;         )\n        &gt;&gt;&gt;\n        &gt;&gt;&gt;     return None\n\n        \"\"\"\n        return None\n</code></pre>"},{"location":"primer/","title":"Welcome to MkDocs","text":"<p>For full documentation visit mkdocs.org{ data-preview }</p> <p>Attribute Lists{ data-preview }</p> <p>Some title</p> <p>Some content</p> <p>Some title</p> <p>Some content</p> Open styled details Nested details! <p>And more content again.</p> <pre><code>theme:\nfeatures:\n- content.code.annotate # (1)!\n</code></pre> <ol> <li>:man_raising_hand: I'm a code annotation! I can contain <code>code</code>, formatted text, images, ... basically anything that can be written in Markdown.</li> </ol> C <pre><code>#include &lt;stdio.h&gt;\n\nint main(void) {\nprintf(\"Hello world!\\n\");\nreturn 0;\n}\n</code></pre> C++ <pre><code>#include &lt;iostream&gt;\n\nint main(void) {\nstd::cout &lt;&lt; \"Hello world!\" &lt;&lt; std::endl;\nreturn 0;\n}\n</code></pre> <pre><code>graph LR\nA[Start] --&gt; B{Error?};\nB --&gt;|Yes| C[Hmm...];\nC --&gt; D[Debug];\nD --&gt; B;\nB ----&gt;|No| E[Yay!];</code></pre> <pre><code>sequenceDiagram\nautonumber\nAlice-&gt;&gt;John: Hello John, how are you?\nloop Healthcheck\nJohn-&gt;&gt;John: Fight against hypochondria\nend\nNote right of John: Rational thoughts!\nJohn--&gt;&gt;Alice: Great!\nJohn-&gt;&gt;Bob: How about you?\nBob--&gt;&gt;John: Jolly good!</code></pre> <p>```py title=\"IFiles\" class IFiles(Interface):     \"\"\"Extension point for ckanext-files.\"\"\"</p> <pre><code>def files_get_storage_adapters(self) -&gt; dict[str, Any]:\n    \"\"\"Return mapping of storage type to adapter class.\n\n    Example:\n    &gt;&gt;&gt; def files_get_storage_adapters(self):\n    &gt;&gt;&gt;     return {\n    &gt;&gt;&gt;         \"my_ext:dropbox\": DropboxStorage,\n    &gt;&gt;&gt;     }\n\n    \"\"\"\n\n    return {}\n\ndef files_register_owner_getters(self) -&gt; dict[str, Callable[[str], Any]]:\n    \"\"\"Return mapping with lookup functions for owner types.\n\n    Name of the getter is the name used as `Owner.owner_type`. The getter\n    itself is a function that accepts owner ID and returns optional owner\n    entity.\n\n    Example:\n    &gt;&gt;&gt; def files_register_owner_getters(self):\n    &gt;&gt;&gt;     return {\"resource\": model.Resource.get}\n    \"\"\"\n    return {}\n\ndef files_file_allows(\n    self,\n    context: types.Context,\n    file: File | Multipart,\n    operation: types.FileOperation,\n) -&gt; bool | None:\n    \"\"\"Decide if user is allowed to perform specified operation on the file.\n\n    Return True/False if user allowed/not allowed. Return `None` to rely on\n    other plugins.\n\n    Default implementation relies on cascade_access config option. If owner\n    of file is included into cascade access, user can perform operation on\n    file if he can perform the same operation with file's owner.\n\n    If current owner is not affected by cascade access, user can perform\n    operation on file only if user owns the file.\n\n    Example:\n    &gt;&gt;&gt; def files_file_allows(\n    &gt;&gt;&gt;         self, context,\n    &gt;&gt;&gt;         file: shared.File | shared.Multipart,\n    &gt;&gt;&gt;         operation: shared.types.FileOperation\n    &gt;&gt;&gt; ) -&gt; bool | None:\n    &gt;&gt;&gt;     if file.owner_info and file.owner_info.owner_type == \"resource\":\n    &gt;&gt;&gt;         return is_authorized_boolean(\n    &gt;&gt;&gt;             f\"resource_{operation}\",\n    &gt;&gt;&gt;             context,\n    &gt;&gt;&gt;             {\"id\": file.owner_info.id}\n    &gt;&gt;&gt;         )\n    &gt;&gt;&gt;\n    &gt;&gt;&gt;     return None\n\n    \"\"\"\n    return None\n\ndef files_owner_allows(\n    self,\n    context: types.Context,\n    owner_type: str,\n    owner_id: str,\n    operation: types.OwnerOperation,\n) -&gt; bool | None:\n    \"\"\"Decide if user is allowed to perform specified operation on the owner.\n\n    Return True/False if user allowed/not allowed. Return `None` to rely on\n    other plugins.\n\n    Example:\n    &gt;&gt;&gt; def files_owner_allows(\n    &gt;&gt;&gt;         self, context,\n    &gt;&gt;&gt;         owner_type: str, owner_id: str,\n    &gt;&gt;&gt;         operation: shared.types.OwnerOperation\n    &gt;&gt;&gt; ) -&gt; bool | None:\n    &gt;&gt;&gt;     if owner_type == \"resource\" and operation == \"file_transfer\":\n    &gt;&gt;&gt;         return is_authorized_boolean(\n    &gt;&gt;&gt;             f\"resource_update\",\n    &gt;&gt;&gt;             context,\n    &gt;&gt;&gt;             {\"id\": owner_id}\n    &gt;&gt;&gt;         )\n    &gt;&gt;&gt;\n    &gt;&gt;&gt;     return None\n\n    \"\"\"\n    return None\n\n\n\n  ```\n\n  === \"Hello\"\n\n  world\n\n  === \"bye\"\n\n  world\n</code></pre>"},{"location":"shared/","title":"Shared","text":"<p>All public utilites are collected inside     <code>ckanext.files.shared</code> module. Avoid using anything that     is not listed there. Do not import anything from modules other than     <code>shared</code>.</p>"},{"location":"shared/#get_storagename-str-none-none-storage","title":"<code>get_storage(name: 'str | None' = None) -&gt; 'Storage'</code>","text":"<p>Return existing storage instance.</p> <p>Storages are initialized when plugin is loaded. As result, this function always returns the same storage object for the given name.</p> <p>If no name specified, default storage is returned.</p> <p>Example: <pre><code>default_storage = get_storage()\nstorage = get_storage(\"storage name\")\n</code></pre></p>"},{"location":"shared/#make_storagename-str-settings-dictstr-any-storage","title":"<code>make_storage(name: 'str', settings: 'dict[str, Any]') -&gt; 'Storage'</code>","text":"<p>Initialize storage instance with specified settings.</p> <p>Storage adapter is defined by <code>type</code> key of the settings. All other settings depend on the specific adapter.</p> <p>Example: <pre><code>storage = make_storage(\"memo\", {\"type\": \"files:redis\"})\n</code></pre></p>"},{"location":"shared/#make_uploadvalue-typesuploadable-upload-upload","title":"<code>make_upload(value: 'types.Uploadable | Upload') -&gt; 'Upload'</code>","text":"<p>Convert value into Upload object</p> <p>Use this function for simple and reliable initialization of Upload object. Avoid creating Upload manually, unless you are 100% sure you can provide correct MIMEtype, size and stream.</p> <p>Example: <pre><code>storage.upload(\"file.txt\", make_upload(b\"hello world\"))\n</code></pre></p>"},{"location":"shared/#with_task_queuefunc-any-name-str-none-none","title":"<code>with_task_queue(func: 'Any', name: 'str | None' = None)</code>","text":"<p>Decorator for functions that schedule tasks.</p> <p>Decorated function automatically initializes separate task queue that is processed when function is finished. All tasks receive function's result as execution data(first argument to Task.run).</p> <p>Without this decorator, you have to manually create task queue context before queuing tasks.</p> <p>Example: <pre><code>@with_task_queue\ndef my_action(context, data_dict):\n    ...\n</code></pre></p>"},{"location":"shared/#add_tasktask-task","title":"<code>add_task(task: 'Task')</code>","text":"<p>Add task to the current task queue.</p> <p>This function can be called only inside task queue context. Such context initialized automatically inside functions decorated with <code>with_task_queue</code>: <pre><code>@with_task_queue\ndef taks_producer():\n    add_task(...)\n\ntask_producer()\n</code></pre></p> <p>If task queue context can be initialized manually using TaskQueue and <code>with</code> statement: <pre><code>queue = TaskQueue()\nwith queue:\n    add_task(...)\n\nqueue.process(execution_data)\n</code></pre></p>"},{"location":"upload-strategies/","title":"File upload strategies","text":"<p>There is no \"right\" way to add file to entity via ckanext-files. Everything depends on your use-case and here you can find a few different ways to combine file and arbitrary entity.</p>"},{"location":"upload-strategies/#attach-existing-file-and-then-transfer-ownership-via-api","title":"Attach existing file and then transfer ownership via API","text":"<p>The simplest option is just saving file ID inside a field of the entity. It's recommended to transfer file ownership to the entity and pin the file.</p> <pre><code>ckanapi action package_patch id=PACKAGE_ID attachment_id=FILE_ID\n\nckanapi action files_transfer_ownership id=FILE_ID \\\n    owner_type=package owner_id=PACKAGE_ID pin=true\n</code></pre> <p>Pros: * simple and transparent</p> <p>Cons: * it's easy to forget about ownership transfer and leave the entity with the   inaccessible file * after entity got reference to file and before ownership is transfered data   may be considered invalid.</p>"},{"location":"upload-strategies/#automatically-transfer-ownership-using-validator","title":"Automatically transfer ownership using validator","text":"<p>Add <code>files_transfer_ownership(owner_type)</code> to the validation schema of entity. When it validated, ownership transfer task is queued and file automatically transfered to the entity after the update.</p> <p>Pros: * minimal amount of changes if metadata schema already modified * relationships between owner and file are up-to-date after any modification</p> <p>Cons: * works only with files uploaded in advance and cannot handle native   implementation of resource form</p>"},{"location":"upload-strategies/#upload-file-and-assign-owner-via-queued-task","title":"Upload file and assign owner via queued task","text":"<p>Add a field that accepts uploaded file. The action itself does not process the upload. Instead create a validator for the upload field, that will schedule a task for file upload and ownership transfer.</p> <p>In this way, if action is failed, no upload happens and you don't need to do anything with the file, as it never left server's temporal directory. If action finished without an error, the task is executed and file uploaded/attached to action result.</p> <p>Pros: * can be used together with native group/user/resource form after small   modification of CKAN core. * handles upload inside other action as an atomic operation</p> <p>Cons: * you have to validate file before upload happens to prevent situation when   action finished successfully but then upload failed because of file's content   type or size. * tasks themselves are experimental and it's not recommended to put a lot of   logic into them * there are just too many things that can go wrong</p>"},{"location":"upload-strategies/#add-a-new-action-that-combines-uploads-modifications-and-ownership-transfer","title":"Add a new action that combines uploads, modifications and ownership transfer","text":"<p>If you want to add attachmen to dataset, create a separate action that accepts dataset ID and uploaded file. Internally it will upload the file by calling <code>files_file_create</code>, then update dataset via <code>packaage_patch</code> and finally transfer ownership via <code>files_transfer_ownership</code>.</p> <p>Pros: * no magic. Everything is described in the new action * can be extracted into shared extension and used across multiple portals</p> <p>Cons: * if you need to upload multiple files and update multipe fields, action   quickly becomes too compicated. * integration with existing workflows, like dataset/resource creation is   hard. You have to override existing views or create a brand new ones.</p>"},{"location":"validators/","title":"Validators","text":"Validator Effect files_into_upload Transform value of field(usually file uploaded via <code>&lt;input type=\"file\"&gt;</code>) into upload object using <code>ckanext.files.shared.make_upload</code> files_parse_filesize Convert human-readable filesize(1B, 10MiB, 20GB) into an integer files_ensure_name(name_field) If <code>name_field</code> is empty, copy into it filename from current field. Current field must be processed with <code>files_into_upload</code> first files_file_id_exists Verify that file ID exists files_accept_file_with_type(*type) Verify that file ID refers to file with one of specified types. As a type can be used full MIMEtype(<code>image/png</code>), or just its main(<code>image</code>) or secondary(<code>png</code>) part files_accept_file_with_storage(*storage_name) Verify that file ID refers to file stored inside one of specified storages files_transfer_ownership(owner_type, name_of_owner_id_field) Transfer ownership for file ID to specified entity when current API action is successfully finished"},{"location":"configuration/","title":"Configuration","text":"<p>There are two types of config options for ckanext-files:</p> <ul> <li>Global: affects the behavior of the extension and every available storage   adapter.</li> <li>Storage configuration: changes behavior of the specific storage and never   affects anything outside of the storage.</li> </ul> <p>Depending on the type of the storage, available options are quite different. For example, <code>files:fs</code> storage type requires <code>path</code> option that controls filesystem path where uploads are stored. <code>files:redis</code> storage type accepts <code>prefix</code> option that defines Redis' key prefix of files stored in Redis. All storage specific options always have form <code>ckanext.files.storage.&lt;STORAGE&gt;.&lt;OPTION&gt;</code>:</p> <pre><code>ckanext.files.storage.memory.prefix = xxx:\n# or\nckanext.files.storage.my_drive.path = /tmp/hello\n</code></pre>"},{"location":"configuration/fs/","title":"Filesystem storage configuration","text":"<p>Private filesystem storage</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n</code></pre> <p>Public filesystem storage</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:public_fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.NAME.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.NAME.create_path = false\n## Use this flag if files can be stored inside subfolders\n## of the main storage path.\nckanext.files.storage.NAME.recursive = false\n## URL of the storage folder. `public_root + location` must produce a public URL\nckanext.files.storage.NAME.public_root =\n</code></pre>"},{"location":"configuration/global/","title":"Global configuration","text":"<pre><code># Default storage used for upload when no explicit storage specified\n# (optional, default: default)\nckanext.files.default_storage = default\n\n# MIMEtypes that can be served without content-disposition:attachment header.\n# (optional, default: application/pdf image video)\nckanext.files.inline_content_types = application/pdf image video\n\n# Storage used for user image uploads. When empty, user image uploads are not\n# allowed.\n# (optional, default: user_images)\nckanext.files.user_images_storage = user_images\n\n# Storage used for group image uploads. When empty, group image uploads are\n# not allowed.\n# (optional, default: group_images)\nckanext.files.group_images_storage = group_images\n\n# Storage used for resource uploads. When empty, resource uploads are not\n# allowed.\n# (optional, default: resources)\nckanext.files.resources_storage = resources\n\n# Enable HTML templates and JS modules required for unsafe default\n# implementation of resource uploads via files. IMPORTANT: this option exists\n# to simplify migration and experiments with the extension. These templates\n# may change a lot or even get removed in the public release of the\n# extension.\n# (optional, default: false)\nckanext.files.enable_resource_migration_template_patch = false\n\n# Any authenticated user can upload files.\n# (optional, default: false)\nckanext.files.authenticated_uploads.allow = false\n\n# Names of storages that can by used by non-sysadmin users when authenticated\n# uploads enabled\n# (optional, default: default)\nckanext.files.authenticated_uploads.storages = default\n\n# List of owner types that grant access on owned file to anyone who has\n# access to the owner of file. For example, if this option has value\n# `resource package`, anyone who passes `resource_show` auth, can see all\n# files owned by resource; anyone who passes `package_show`, can see all\n# files owned by package; anyone who passes\n# `package_update`/`resource_update` can modify files owned by\n# package/resource; anyone who passes `package_delete`/`resource_delete` can\n# delete files owned by package/resoure. IMPORTANT: Do not add `user` to this\n# list. Files may be temporarily owned by user during resource creation.\n# Using cascade access rules with `user` exposes such temporal files to\n# anyone who can read user's profile.\n# (optional, default: package resource group organization)\nckanext.files.owner.cascade_access = package resource group organization\n\n# Use `&lt;OWNER_TYPE&gt;_update` auth function to check access for ownership\n# transfer. When this flag is disabled `&lt;OWNER_TYPE&gt;_file_transfer` auth\n# function is used.\n# (optional, default: true)\nckanext.files.owner.transfer_as_update = true\n\n# Use `&lt;OWNER_TYPE&gt;_update` auth function to check access when listing all\n# files of the owner. When this flag is disabled `&lt;OWNER_TYPE&gt;_file_scan`\n# auth function is used.\n# (optional, default: true)\nckanext.files.owner.scan_as_update = true\n</code></pre>"},{"location":"configuration/libcloud/","title":"Apache libcloud storage configuration","text":"<p>To use this storage install extension with <code>libcloud</code> extras.</p> <pre><code>pip install 'ckanext-files[libcloud]'\n</code></pre> <p>The actual storage backend is controlled by <code>provider</code> option of the storage. List of all providers is available here</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:libcloud\n## apache-libcloud storage provider. List of providers available at https://libcloud.readthedocs.io/en/stable/storage/supported_providers.html#provider-matrix . Use upper-cased value from Provider Constant column\nckanext.files.storage.NAME.provider =\n## API key or username\nckanext.files.storage.NAME.key =\n## Secret password\nckanext.files.storage.NAME.secret =\n## JSON object with additional parameters passed directly to storage constructor.\nckanext.files.storage.NAME.params =\n## Name of the container(bucket)\nckanext.files.storage.NAME.container =\n</code></pre>"},{"location":"configuration/opendal/","title":"OpenDAL storage configuration","text":"<p>To use this storage install extension with <code>opendal</code> extras.</p> <pre><code>pip install 'ckanext-files[opendal]'\n</code></pre> <p>The actual storage backend is controlled by <code>scheme</code> option of the storage. List of all schemes is available here</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:opendal\n## OpenDAL service type. Check available services at  https://docs.rs/opendal/latest/opendal/services/index.html\nckanext.files.storage.NAME.scheme =\n## JSON object with parameters passed directly to OpenDAL operator.\nckanext.files.storage.NAME.params =\n</code></pre>"},{"location":"configuration/redis/","title":"Redis storage configuration","text":"<pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.NAME.prefix = ckanext:files:default:file_content:\n</code></pre>"},{"location":"configuration/storage/","title":"Storage configuration","text":"<p>All available options for the storage type can be checked via config declarations CLI. First, add the storage type to the config file:</p> <pre><code>ckanext.files.storage.xxx.type = files:redis\n</code></pre> <p>Now run the command that shows all available config option of the plugin.</p> <pre><code>ckan config declaration files -d\n</code></pre> <p>Because Redis storage adapter is enabled, you'll see all the options regsitered by Redis adapter alongside with the global options:</p> <pre><code>## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:redis\n## Static prefix of the Redis key generated for every upload.\nckanext.files.storage.xxx.prefix = ckanext:files:default:file_content:\n</code></pre> <p>Sometimes you will see a validation error if storage has required config options. Let's try using <code>files:fs</code> storage instead of the redis:</p> <pre><code>ckanext.files.storage.xxx.type = files:fs\n</code></pre> <p>Now any attempt to run <code>ckan config declaration files -d</code> will show an error, because required <code>path</code> option is missing:</p> <pre><code>Invalid configuration values provided:\nckanext.files.storage.xxx.path: Missing value\nAborted!\n</code></pre> <p>Add the required option to satisfy the application</p> <pre><code>ckanext.files.storage.xxx.type = files:fs\nckanext.files.storage.xxx.path = /tmp\n</code></pre> <p>And run CLI command once again. This time you'll see the list of allowed options:</p> <pre><code>## ckanext-files ###############################################################\n## ...\n## Storage adapter used by the storage\nckanext.files.storage.xxx.type = files:fs\n## Path to the folder where uploaded data will be stored.\nckanext.files.storage.xxx.path =\n## Create storage folder if it does not exist.\nckanext.files.storage.xxx.create_path = false\n</code></pre> <p>There is a number of options that are supported by every storage. You can set them and expect that every storage, regardless of type, will use these options in the same way:</p> <pre><code>## Storage adapter used by the storage\nckanext.files.storage.NAME.type = ADAPTER\n## The maximum size of a single upload.\n## Supports size suffixes: 42B, 2M, 24KiB, 1GB. `0` means no restrictions.\nckanext.files.storage.NAME.max_size = 0\n## Space-separated list of MIME types or just type or subtype part.\n## Example: text/csv pdf application video jpeg\nckanext.files.storage.NAME.supported_types =\n## Descriptive name of the storage used for debugging. When empty, name from\n## the config option is used, i.e: `ckanext.files.storage.DEFAULT_NAME...`\nckanext.files.storage.NAME.name = NAME\n</code></pre>"},{"location":"migration/","title":"Migration from native CKAN storage system","text":"<p>Important: ckanext-files itself is an independent file-management system. You don't have to migrate existing files from groups, users and resources to it. You can just start using ckanext-files for new fields defined in metadata schema or for uploading arbitrary files. And continue using native CKAN uploads for group/user images and resource files. Migration workflows described here merely exist as a PoC of using ckanext-files for everything in CKAN. Don't migrate your production instances yet, because concepts and rules may change in future and migration process will change as well. Try migration only as an experiment, that gives you an idea of what else you want to see in ckanext-file, and share this idea with us.</p> <p>Note: every migration workflow described below requires installed ckanext-files. Complete installation section before going further.</p> <p>CKAN has following types of files:</p> <ul> <li>group/organization images</li> <li>user avatars</li> <li>resource files</li> <li>site logo</li> <li>files uploaded via custom logic from extensions</li> </ul> <p>At the moment, there is no migration strategy for the last two types. Replacing site logo manually is a trivial task, so there will be no dedicated command for it. As for extensions, every of them is unique, so feel free to create an issue in the current repository: we'll consider creation of migration script for your scenario or, at least, explain how you can perform migration by yourself.</p> <p>Migration process for group/organization/user images and resource uploads described below. Keep in mind, that this process only describes migration from native CKAN storage system, that keeps files inside local filesystem. If you are using storage extensions, like ckanext-s3filestore or ckanext-cloudstorage, create an issue in the current repository with a request of migration command. As there are a lot of different forks of such extension, creating reliable migration script may be challenging, so we need some details about your environment to help with migration.</p> <p>Migration workflows bellow require certain changes to metadata schemas, UI widgets for file uploads and styles of your portal(depending on the customization).</p>"},{"location":"migration/group/","title":"Migration for group/organization images","text":"<p>Note: internally, groups and organizations are the same entity, so this workflow describes both of them.</p> <p>First of all, you need a configured storage that supports public links. As all group/organization images are stored inside local filesystem, you can use <code>files:public_fs</code> storage adapter.</p> <p>This extension expects that the name of group images storage will be <code>group_images</code>. This name will be used in all other commands of this migration workflow. If you want to use different name for group images storage, override <code>ckanext.files.group_images_storage</code> config option which has default value <code>group_images</code> and don't forget to adapt commands if you use a different name for the storage.</p> <p>This configuration example sets 10MiB restriction on upload size via <code>ckanext.files.storage.group_images.max_size</code> option. Feel free to change it or remove completely to allow any upload size. This restriction is applied to future uploads only. Any existing file that exceeds limit is kept.</p> <p>Uploads restricted to <code>image/*</code> MIMEtype via <code>ckanext.files.storage.group_images.supported_types</code> option. You can make this option more or less restrictive. This restriction is applied to future uploads only. Any existing file with wrong MIMEtype is kept.</p> <p><code>ckanext.files.storage.group_images.path</code> controls location of the upload folder in filesystem. It should match value of <code>ckan.storage_path</code> option plus <code>storage/uploads/group</code>. In example below we assume that value of <code>ckan.storage_path</code> is <code>/var/storage/ckan</code>.</p> <p><code>ckanext.files.storage.group_images.public_root</code> option specifies base URL from which every group image can be accessed. In most cases it's CKAN URL plus <code>uploads/group</code>. If you are serving CKAN application from the <code>ckan.site_url</code>, leave this option unchanged. If you are using <code>ckan.root_path</code>, like <code>/data/</code>, insert this root path into the value of the option. Example below uses <code>%(ckan.site_url)s</code> wildcard, which will be automatically replaced with the value of <code>ckan.site_url</code> config option. You can specify site URL explicitely if you don't like this wildcard syntax.</p> <pre><code>ckanext.files.storage.group_images.type = files:public_fs\nckanext.files.storage.group_images.max_size = 10MiB\nckanext.files.storage.group_images.supported_types = image\nckanext.files.storage.group_images.path = /var/storage/ckan/storage/uploads/group\nckanext.files.storage.group_images.public_root = %(ckan.site_url)s/uploads/group\n</code></pre> <p>Now let's run a command that show us the list of files available under newly configured storage:</p> <pre><code>ckan files scan -s group_images\n</code></pre> <p>All these files are not tracked by files extension yet, i.e they don't have corresponding record in DB with base details, like size, MIMEtype, filehash, etc. Let's create these details via the command below. It's safe to run this command multiple times: it will gather and store information about files not registered in system and ignore any previously registered file.</p> <pre><code>ckan files scan -s group_images -t\n</code></pre> <p>Finally, let's run the command, that shows only untracked files. Ideally, you'll see nothing upon executing it, because you just registered every file in the system.</p> <pre><code>ckan files scan -s group_images -u\n</code></pre> <p>Note, all the file are still available inside storage directory. If previous command shows nothing, it only means that CKAN already knows details about each file from the storage directory. If you want to see the list of the files again, omit <code>-u</code> flag(which stands for \"untracked\") and you'll see again all the files in the command output:</p> <pre><code>ckan files scan -s group_images\n</code></pre> <p>Now, when all images are tracked by the system, we can give the ownership over these files to groups/organizations that are using them. Run the command below to connect files with their owners. It will search for groups/organizations first and report, how many connections were identified. There will be suggestion to show identified relationship and the list of files that have no owner(if there are such files). Presence of files without owner usually means that you removed group/organization from database, but did not remove its image.</p> <p>Finally, you'll be asked if you want to transfer ownership over files. This operation does not change existing data and if you disable ckanext-files after ownership transfer, you won't see any difference. The whole ownership transfer is managed inside custom DB tables generated by ckanext-files, so it's safe operation.</p> <pre><code>ckan files migrate groups group_images\n</code></pre> <p>Here's an example of output that you can see when running the command:</p> <pre><code>Found 3 files. Searching file owners...\n[####################################] 100% Located owners for 2 files out of 3.\n\nShow group IDs and corresponding file? [y/N]: y\nd7186937-3080-429f-a434-22b74b9a8d39: file-1.png\n87e2a1aa-7905-4a28-a087-90433f8e169e: file-2.png\n\nShow files that do not belong to any group? [y/N]: y\nfile-3.png\n\nTransfer file ownership to group identified in previous steps? [y/N]: y\nTransfering file-2.png  [####################################]  100%\n</code></pre> <p>Now comes the most complex part. You need to change metadata schema and UI in order to:</p> <ul> <li>make sure that all new files are uploaded and managed by ckanext-files   instead of native CKAN's uploader</li> <li>generate image URLs using ckanext-files functionality. Right now, while files   stored in the original storage folder it makes no difference. But if you   change upload directory in future or even decide to move files from local   filesystem into different storage backend, it will guarantee that files are   remain visible.</li> </ul> <p>Original CKAN workflow for uploading files was:</p> <ul> <li>just save image URL provided by user or</li> <li>upload a file</li> <li>put it into directory that is publicly served by application</li> <li>replace uploaded file in the HTML form/group metadata with the public URL of   the uploaded file</li> </ul> <p>This approach is different from strategy recommended by ckanext-files. But in order to make the migration as simple as possible, we'll stay close to original workflow.</p> <p>Note: suggestet approach resembles existing process of file uploads in CKAN. But ckanext-files was designed as a system, that gives you a choice. Check file upload strategies to learn more about alternative implementations of upload and their pros/cons.</p> <p>First, we need to replace Upload/Link widget on group/organization form. If you are using native group templates, create <code>group/snippets/group_form.html</code> and <code>organization/snippets/organization_form.html</code>. Inside both files, extend original template and override block <code>basic_fields</code>. You only need to replace last field</p> <pre><code>{{ form.image_upload(\n    data, errors, is_upload_enabled=h.uploads_enabled(),\n    is_url=is_url, is_upload=is_upload) }}\n</code></pre> <p>with</p> <pre><code>{{ form.image_upload(\n    data, errors, is_upload_enabled=h.files_group_images_storage_is_configured(),\n    is_url=is_url, is_upload=is_upload,\n    field_upload=\"files_image_upload\") }}\n</code></pre> <p>There are two differences with the original. First, we use <code>h.files_group_images_storage_is_configured()</code> instead of <code>h.uploads_enabled()</code>. As we are using different storage for different upload types, now upload widgets can be enabled independently. And second, we pass <code>field_upload=\"files_image_upload\"</code> argument into macro. It will send uploaded file to CKAN inside <code>files_image_upload</code> instead of original <code>image_upload</code> field. This must be done because CKAN unconditionally strips <code>image_upload</code> field from submission payload, making processing of the file too unreliable. We changed the name of upload field and CKAN keeps this new field, so that we can process it as we wish.</p> <p>Note: if you are using ckanext-scheming, you only need to replace <code>form_snippet</code> of the <code>image_url</code> field, instead of rewriting the whole template.</p> <p>Now, let's define validation rules for this new upload field. We need to create plugins that modify validation schema for group and organization. Due to CKAN implementation details, you need separate plugin for group and organization.</p> <p>Note: if you are using ckanext-scheming, you can add <code>files_image_upload</code> validators to schemas of organization and group. Check the list of validators that must be applied to this new field below.</p> <p>Here's an example of plugins that modify validation schemas of group and organization. As you can see, they are mostly the same:</p> <pre><code>from ckan.lib.plugins import DefaultGroupForm, DefaultOrganizationForm\nfrom ckan.logic.schema import default_create_group_schema, default_update_group_schema\n\n\ndef _modify_schema(schema, type):\n    schema[\"files_image_upload\"] = [\n        tk.get_validator(\"ignore_empty\"),\n        tk.get_validator(\"files_into_upload\"),\n        tk.get_validator(\"files_validate_with_storage\")(\"group_images\"),\n        tk.get_validator(\"files_upload_as\")(\n            \"group_images\",\n            type,\n            \"id\",\n            \"public_url\",\n            type + \"_patch\",\n            \"image_url\",\n        ),\n    ]\n\n\nclass FilesGroupPlugin(p.SingletonPlugin, DefaultGroupForm):\n    p.implements(p.IGroupForm, inherit=True)\n    is_organization = False\n\n    def group_types(self):\n        return [\"group\"]\n\n    def create_group_schema(self):\n        return _modify_schema(default_create_group_schema(), \"group\")\n\n    def update_group_schema(self):\n        return _modify_schema(default_update_group_schema(), \"group\")\n\n\nclass FilesOrganizationPlugin(p.SingletonPlugin, DefaultOrganizationForm):\n    p.implements(p.IGroupForm, inherit=True)\n    is_organization = True\n\n    def group_types(self):\n        return [\"organization\"]\n\n    def create_group_schema(self):\n        return _modify_schema(default_create_group_schema(), \"organization\")\n\n    def update_group_schema(self):\n        return _modify_schema(default_update_group_schema(), \"organization\")\n</code></pre> <p>There are 4 validators that must be applied to the new upload field:</p> <ul> <li><code>ignore_empty</code>: to skip validation, when image URL set manually and no upload   selected.</li> <li><code>files_into_upload</code>: to convert value of upload field into normalized format,   which is expected by ckanext-files</li> <li><code>files_validate_with_storage(STORAGE_NAME)</code>: this validator requires an   argument: the name of the storage we are using for image uploads. The   validator will use storage settings to verify size and MIMEtype of the   appload.</li> <li><code>files_upload_as(STORAGE_NAME, GROUP_TYPE, NAME_OF_ID_FIELD, \"public_url\",   NAME_OF_PATCH_ACTION, NAME_OF_URL_FIELF)</code>: this validator is the most   challenging. It accepts 6 arguments:</li> <li>the name of storage used for image uploads</li> <li><code>group</code> or <code>organization</code> depending on processed entity</li> <li>name of the ID field of processed entity. It's <code>id</code> in your case.</li> <li><code>public_url</code> - use this exact value. It tells which property of file you     want to use as link to the file.</li> <li><code>group_patch</code> or <code>organization_patch</code> depending on processed entity</li> <li><code>image_url</code> - name of the field that contains URL of the     image. ckanext-files will put the public link of uploaded file into this     field when form is processed.</li> </ul> <p>That's all. Now every image upload for group/organization is handled by ckanext-files. To verify it, do the following. First, check list of files currently stored in <code>group_images</code> storage via command that we used in the beginning of the migration:</p> <pre><code>ckan files scan -s group_images\n</code></pre> <p>You'll see a list of existing files. Their names follow format <code>&lt;ISO_8601_DATETIME&gt;&lt;FILENAME&gt;</code>, e.g <code>2024-06-14-133840.539670photo.jpg</code>.</p> <p>Now upload an image into existing group, or create a new group with any image. When you check list of files again, you'll see one new record. But this time this record resembles UUID: <code>da046887-e76c-4a68-97cf-7477665710ff</code>.</p>"},{"location":"migration/resource/","title":"Resource","text":""},{"location":"migration/resource/#migration-for-resource-uploads","title":"Migration for resource uploads","text":"<p>Configure named storage for resources. Use <code>files:ckan_resource_fs</code> storage adapter.</p> <p>This extension expects that the name of resources storage will be <code>resources</code>. This name will be used in all other commands of this migration workflow. If you want to use different name for resources storage, override <code>ckanext.files.resources_storage</code> config option which has default value <code>resources</code> and don't forget to adapt commands if you use a different name for the storage.</p> <p><code>ckanext.files.storage.resources.path</code> must match value of <code>ckan.storage_path</code> option, followed by <code>resources</code> directory. In example below we assume that value of <code>ckan.storage_path</code> is <code>/var/storage/ckan</code>.</p> <p>Example below sets 10MiB limit on resource size. Modify it if you are using different limit set by <code>ckan.max_resource_size</code>.</p> <p>Unlike group and user images, this storage does not need upload type restriction and <code>public_root</code>.</p> <pre><code>ckanext.files.storage.resources.type = files:ckan_resource_fs\nckanext.files.storage.resources.max_size = 10MiB\nckanext.files.storage.resources.path = /var/storage/ckan/resources\n</code></pre> <p>Check the list of untracked files available inside newly configured storage:</p> <pre><code>ckan files scan -s resources -u\n</code></pre> <p>Track all these files:</p> <pre><code>ckan files scan -s resources -t\n</code></pre> <p>Re-check that now you see no untracked files:</p> <pre><code>ckan files scan -s resources -u\n</code></pre> <p>Transfer file ownership to corresponding resources. In addition to simple ownership transfer, this command will ask you, whether you want to modify resource's <code>url_type</code> and <code>url</code> fields. It's required to move file management to files extension completely and enable possibility of migration to different storage type.</p> <p>If you accept resource modifications, for every file owner <code>url_type</code> will be changed to <code>file</code> and <code>url</code> will be changed to file ID. Then all modified packages will be reindexed.</p> <p>Changing <code>url_type</code> means that some pages will change. For example, instead of Download button CKAN will show you Go to resource button on the resource page, because Download label is specific to <code>url_type=upload</code>. And some views may stop working as well. But this is safer option for migration, than leaving <code>url_type</code> unchanged: ckanext-files manages files in its own way and some assumptions about files will not work anymore, so using different <code>url_type</code> is the fastest way to tell everyone that something changed.</p> <p>Broken views can be easily fixed. Every view implemented as a separate plugin. You always can inherit from this plugin and override methods that relied on different behavior. And a lot of views work with file URL directly, so they won't even see the difference.</p> <pre><code>ckan files migrate local-resources resources\n</code></pre> <p>And the next goal is correct metadata schema. If you are using ckanext-scheming, you need to modify validators of <code>url</code> and <code>format</code> fields.</p> <p>If you are working with native schemas, you have to modify dataset schema via implementing IDatasetForm. Here's an example:</p> <pre><code>from ckan.lib.plugins import DefaultDatasetForm\nfrom ckan.logic import schema\n\nclass FilesDatasetPlugin(p.SingletonPlugin, DefaultDatasetForm):\n    p.implements(p.IDatasetForm, inherit=True)\n\n    def is_fallback(self):\n        return True\n\n    def package_types(self):\n        return [\"dataset\"]\n\n    def _modify_schema(self, schema):\n        schema[\"resources\"][\"url\"].extend([\n            tk.get_validator(\"files_verify_url_type_and_value\"),\n            tk.get_validator(\"files_file_id_exists\"),\n            tk.get_validator(\"files_transfer_ownership\")(\"resource\",\"id\"),\n        ])\n        schema[\"resources\"][\"format\"].insert(0, tk.get_validator(\"files_content_type_from_file\")(\"url\"))\n\n    def create_package_schema(self):\n        sch = schema.default_create_package_schema()\n        self._modify_schema(sch)\n        return sch\n\n    def update_package_schema(self):\n        sch = schema.default_update_package_schema()\n        self._modify_schema(sch)\n        return sch\n\n    def show_package_schema(self):\n        sch = schema.default_show_package_schema()\n        sch[\"resources\"][\"url\"].extend([\n            tk.get_validator(\"files_verify_url_type_and_value\"),\n            tk.get_validator(\"files_id_into_resource_download_url\"),\n        ])\n        return sch\n</code></pre> <p>Both create and update schemas are updated in the same way. We add a new validator to format field, to correctly identify file format. And there is a number of new validators for <code>url</code>:</p> <ul> <li><code>files_verify_url_type_and_value</code>: skip validation if we are not working with   resource that contains file.</li> <li><code>files_file_id_exists</code>: verify existence of file ID</li> <li><code>files_transfer_ownership(\"resource\",\"id\")</code>: move file ownership to resource   after successful validation</li> </ul> <p>At top of this, we also have two validators applied to <code>show_package_schema</code>(use <code>output_validators</code> in ckanext-scheming):</p> <ul> <li><code>files_verify_url_type_and_value</code>: skip validation if we are not working with   resource that contains file.</li> <li><code>files_id_into_resource_download_url</code>: replace file ID with download URL in   API output</li> </ul> <p>And the next part is the trickiest. You need to create a number of templates and JS modules. But because ckanext-files is actively developed, most likely, your custom files will be outdated pretty soon.</p> <p>Instead, we recommend enabling patch for resource form that shipped with ckanext-files. It's a bit hacky, but because the extension itself is stil in alpha-stage, it should be acceptable. Check file upload strategies for examples of implementation that you can add to your portal instead of the default patch.</p> <p>To enable patch for templates, add following line to the config file:</p> <pre><code>ckanext.files.enable_resource_migration_template_patch = true\n</code></pre> <p>This option adds Add file button to resource form</p> <p></p> <p>Upon clicking, this button is replaced by widget that supports uploading new files of selecting previously uploaded files that are not used by any resource yet</p> <p></p>"},{"location":"migration/user/","title":"Migration for user avatars","text":"<p>This workflow is similar to group/organization migration. It contains the sequence of actions, but explanations are removed, because you already know details from the group migration. Only steps that are different will contain detailed explanation of the process.</p> <p>Configure local filesystem storage with support of public links(<code>files:public_fs</code>) for user images.</p> <p>This extension expects that the name of user images storage will be <code>user_images</code>. This name will be used in all other commands of this migration workflow. If you want to use different name for user images storage, override <code>ckanext.files.user_images_storage</code> config option which has default value <code>user_images</code> and don't forget to adapt commands if you use a different name for the storage.</p> <p><code>ckanext.files.storage.user_images.path</code> resembles this option for group/organization images storage. But user images are kept inside <code>user</code> folder by default. As result, value of this option should match value of <code>ckan.storage_path</code> option plus <code>storage/uploads/user</code>. In example below we assume that value of <code>ckan.storage_path</code> is <code>/var/storage/ckan</code>.</p> <p><code>ckanext.files.storage.user_images.public_root</code> resebles this option for group/organization images storage. But user images are available at CKAN URL plus <code>uploads/user</code>.</p> <pre><code>ckanext.files.storage.user_images.type = files:public_fs\nckanext.files.storage.user_images.max_size = 10MiB\nckanext.files.storage.user_images.supported_types = image\nckanext.files.storage.user_images.path = /var/storage/ckan/storage/uploads/user\nckanext.files.storage.user_images.public_root = %(ckan.site_url)s/uploads/user\n</code></pre> <p>Check the list of untracked files available inside newly configured storage:</p> <pre><code>ckan files scan -s user_images -u\n</code></pre> <p>Track all these files:</p> <pre><code>ckan files scan -s user_images -t\n</code></pre> <p>Re-check that now you see no untracked files:</p> <pre><code>ckan files scan -s user_images -u\n</code></pre> <p>Transfer image ownership to corresponding users:</p> <pre><code>ckan files migrate users user_images\n</code></pre> <p>Update user template. Required field is defined in <code>user/new_user_form.html</code> and <code>user/edit_user_form.html</code>. It's a bit different from the filed used by group/organization, but you again need to add <code>field_upload=\"files_image_upload\"</code> parameter to the macro <code>image_upload</code> and replace <code>h.uploads_enabled()</code> with <code>h.files_user_images_storage_is_configured()</code>.</p> <p>User has no dedicated interface for validation schema modification and here comes the biggest difference from group migration. You need to chain <code>user_create</code> and <code>user_update</code> action and modify schema from <code>context</code>:</p> <pre><code>def _patch_schema(schema):\n    schema[\"files_image_upload\"] = [\n        tk.get_validator(\"ignore_empty\"),\n        tk.get_validator(\"files_into_upload\"),\n        tk.get_validator(\"files_validate_with_storage\")(\"user_images\"),\n        tk.get_validator(\"files_upload_as\")(\n            \"user_images\",\n            \"user\",\n            \"id\",\n            \"public_url\",\n            \"user_patch\",\n            \"image_url\",\n        ),\n    ]\n\n\n@tk.chained_action\ndef user_update(next_action, context, data_dict):\n    schema = context.setdefault('schema', ckan.logic.schema.default_update_user_schema())\n    _patch_schema(schema)\n    return next_action(context, data_dict)\n\n\n\n@tk.chained_action\ndef user_create(next_action, context, data_dict):\n    schema = context.setdefault('schema', ckan.logic.schema.default_user_schema())\n    _patch_schema(schema)\n    return next_action(context, data_dict)\n</code></pre> <p>Validators are all the same, but now we are using <code>user</code> instead of <code>group</code>/<code>organization</code> in parameters.</p> <p>That's all. Just as with groups, you can update an avatar and verify that all new filenames resemble UUIDs.</p>"},{"location":"usage/capabilities/","title":"Capabilities","text":"<p>To understand in advance whether specific storage can perform certain actions, ckanext-files uses <code>ckanext.files.shared.Capability</code>. It's an enumeration of operations that can be supported by storage:</p> <ul> <li>CREATE: create a file as an atomic object</li> <li>STREAM: return file content as stream of bytes</li> <li>COPY: make a copy of the file inside the same storage</li> <li>REMOVE: remove file from the storage</li> <li>MULTIPART: create file in 3 stages: start, upload(repeatable), complete</li> <li>MOVE: move file to a different location inside the same storage</li> <li>EXISTS: check if file exists</li> <li>SCAN: iterate over all files in the storage</li> <li>APPEND: add content to the existing file</li> <li>COMPOSE: combine multiple files into a new one in the same storage</li> <li>RANGE: return specific range of bytes from the file</li> <li>ANALYZE: return file details from the storage, as if file was uploaded just now</li> <li>PERMANENT_LINK: make permanent download link for private file</li> <li>TEMPORAL_LINK: make expiring download link for private file</li> <li>ONE_TIME_LINK: make one-time download link for private file</li> <li>PUBLIC_LINK: make permanent public link</li> </ul> <p>These capabilities are defined when storage is created and are automatically checked by actions that work with storage. If you want to check if storage supports certain capability, it can be done manually. If you want to check presence of multiple capabilities at once, you can combine them via bitwise-or operator.</p> <pre><code>from ckanext.files.shared import Capability, get_storage\n\nstorage = get_storage()\n\ncan_read = storage.supports(Capability.STREAM)\n\nread_and_write = Capability.CREATE | Capability.STREAM\ncan_read_and_write = storage.supports(read_and_write)\n</code></pre> <p><code>ckan files storages -v</code> CLI command lists all configured storages with their capabilities.</p>"},{"location":"usage/configure/","title":"Configure the storage","text":"<p>Before uploading files, you have to configure a storage: place where all uploaded files are stored. Storage relies on adapter that describes where and how data is be stored: filesystem, cloud, DB, etc. And, depending on the adapter, storage may have a couple of addition specific options. For example, filesystem adapter likely requires a path to the folder where uploads are stored. DB adapter may need DB connection parameters. Cloud adapter most likely will not work without an API key. These additional options are specific to adapter and you have to check its documentation to find out what are the possible options.</p> <p>Let's start from the Redis adapter, because it has minimal requirements in terms of configuration.</p> <p>Add the following line to the CKAN config file:</p> <pre><code>ckanext.files.storage.default.type = files:redis\n</code></pre> <p>The name of adapter is <code>files:redis</code>. It follows recommended naming convention for adapters:<code>&lt;EXTENSION&gt;:&lt;TYPE&gt;</code>. You can tell from the name above that we are using adapter defined in the <code>files</code> extension with <code>redis</code> type. But this naming convention is not enforced and its only purpose is avoiding name conflicts. Technically, adapter name can use any character, including spaces, newlines and emoji.</p> <p>If you make a typo in the adapter's name, any CKAN CLI command will produce an error message with the list of available adapters:</p> <pre><code>Invalid configuration values provided:\nckanext.files.storage.default.type: Value must be one of ['files:fs', 'files:public_fs', 'files:redis']\nAborted!\n</code></pre> <p>Storage is configured, so we can actually upload the file. Let's use ckanapi for this task. Files are created via <code>files_file_create</code> API action and this time we have to pass 2 parameters into it:</p> <ul> <li><code>name</code>: the name of uploaded file</li> <li><code>upload</code>: content of the file</li> </ul> <p>The final command is here:</p> <pre><code>echo -n 'hello world' &gt; /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n</code></pre> <p>And that's what you see as result:</p> <pre><code>{\n  \"atime\": null,\n  \"content_type\": \"text/plain\",\n  \"ctime\": \"2024-06-02T15:02:14.819117+00:00\",\n  \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n  \"id\": \"e21162ab-abfb-476c-b8c5-5fe7cb89eca0\",\n  \"location\": \"24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\",\n  \"mtime\": null,\n  \"name\": \"hello.txt\",\n  \"size\": 11,\n  \"storage\": \"default\",\n  \"storage_data\": {}\n}\n</code></pre> <p>Content of the file can be checked via CKAN CLI. Use <code>id</code> from the last API call's output in the command <code>ckan files stream ID</code>:</p> <pre><code>ckan files stream e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n</code></pre> <p>Alternatively, we can use Redis CLI to get the content of the file. Note, you cannot get the content via CKAN API, because it's JSON-based and streaming files doesn't suit its principles.</p> <p>By default, Redis adapter puts the content under the key <code>&lt;PREFIX&gt;&lt;LOCATION&gt;</code>. Pay attention to <code>LOCATION</code>. It's the value available as <code>location</code> in the API response(i.e, <code>24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46</code> in our case). It's different from the <code>id</code>(ID used by DB to uniquely identify file record) and <code>name</code>(human readable name of the file). In our scenario, <code>location</code> looks like UUID because of the internal details of Redis adapter implementation. But different adapters may use more path-like value, i.e. something similar to <code>path/to/folder/hello.txt</code>.</p> <p><code>PREFIX</code> can be configured, but we skipped this step and got the default value: <code>ckanext:files:default:file_content:</code>. So the final Redis key of our file is <code>ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46</code></p> <pre><code>redis-cli\n\n127.0.0.1:6379&gt; GET ckanext:files:default:file_content:24d27fb9-a5f0-42f6-aaa3-7dcb599a0d46\n\"hello world\"\n</code></pre> <p>And before we moved further, let's remove the file, using its <code>id</code>:</p> <pre><code>ckanapi action files_file_delete id=e21162ab-abfb-476c-b8c5-5fe7cb89eca0\n</code></pre>"},{"location":"usage/js/","title":"JavaScript utilities","text":"<p>Note: ckanext-files does not provide stable CKAN JS modules at the moment. Try creating your own widgets and share with us your examples or requirements. We'll consider creating and including widgets into ckanext-files if they are generic enough for majority of the users.</p> <p>ckanext-files registers few utilities inside CKAN JS namespace to help with building UI components.</p> <p>First group of utilities registered inside CKAN Sandbox. Inside CKAN JS modules it's accessible as <code>this.sandbox</code>. If you are writing code outside of JS modules, Sandbox can be initialized via call to <code>ckan.sandbox()</code></p> <pre><code>const sandbox = ckan.sandbox()\n</code></pre> <p>When <code>files</code> plugin loaded, sandbox contains <code>files</code> attribute with two members:</p> <ul> <li><code>upload</code>: high-level helper for uploding files.</li> <li><code>makeUploader</code>: factory for uploader-objects that gives more control over   upload process.</li> </ul> <p>The simplest way to upload the file is using <code>upload</code> helper.</p> <pre><code>await sandbox.files.upload(\n    new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n)\n</code></pre> <p>This function uploads file to <code>default</code> storage via <code>files_file_create</code> action. Extra parameters for API call can be passed using second argument of <code>upload</code> helper. Use an object with <code>requestParams</code> key. Value of this key will be added to standard API request parameters. For example, if you want to use <code>storage</code> with name <code>memory</code> and <code>field</code> with value <code>custom</code>:</p> <pre><code>await sandbox.files.upload(\n    new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}),\n    {requestParams: {storage: \"memory\", field: \"custom\"}}\n)\n</code></pre> <p>If you need more control over upload, you can create an uploader and interact with it directly, instead of using <code>upload</code> helper.</p> <p>Uploader is an object that uploads file to server. It extends base uploader, which defines standard interface for this object. Uploader perfroms all the API calls internally and returns uploaded file details. Out of the box you can use <code>Standard</code> and <code>Multipart</code> uploaders. <code>Standard</code> uses <code>files_file_create</code> API action and specializes on normal uploads. <code>Multipart</code> relies on <code>files_multipart_*</code> actions and can be used to pause and continue upload.</p> <p>To create uploader instance, pass its name as a string to <code>makeUploader</code>. And then you can call <code>upload</code> method of the uploader to perform the actual upload. This method requires two arguments:</p> <ul> <li>the file object</li> <li>object with additional parameters of API request, the same as <code>requestParams</code>   from example above. If you want to use default parameters, pass an empty   object. If you want to use <code>memory</code> storage, pass <code>{storage: \"memory\"}</code>, etc.</li> </ul> <pre><code>const uploader = sandbox.files.makeUploader(\"Standard\")\nawait uploader.upload(new File([\"file content\"], \"name.txt\", {type: \"text/plain\"}), {})\n</code></pre> <p>One of the reasons to use manually created uploader is progress tracking. Uploader supports event subscriptions via <code>uploader.addEventListener(event, callback)</code> and here's the list of possible upload events:</p> <ul> <li><code>start</code>: file upload started. Event has <code>detail</code> property with object that   contains uploaded file as <code>file</code>.</li> <li><code>multipartid</code>: multipart upload initialized. Event has <code>detail</code> property with   object that contains uploaded file as <code>file</code> and ID of multipart upload as   <code>id</code>.</li> <li><code>progress</code>: another chunk of file was transferred to server. Event has   <code>detail</code> property with object that contains uploaded file as <code>file</code>, number   of loaded bytes as <code>loaded</code> and total number of bytes that must be   transferred as <code>total</code>.</li> <li><code>finish</code>: file upload successfully finished. Event has <code>detail</code> property with   object that contains uploaded file as <code>file</code> and file details from API   response as <code>result</code>.</li> <li><code>fail</code>: file upload failed. Event has <code>detail</code> property with object that   contains uploaded file as <code>file</code> and object with CKAN validation errors as   <code>reasons</code>.</li> <li><code>error</code>: error unrelated to validation happened during upload, like call to   non-existing action. Event has <code>detail</code> property with object that contains   uploaded file as <code>file</code> and error as <code>message</code>.</li> </ul> <p>If you want to use <code>upload</code> helper with customized uploader, there are two ways to do it.</p> <ul> <li>pass <code>adapter</code> property with uploader name inside second argument of <code>upload</code>   helper:   <pre><code>await sandbox.files.upload(new File(...), {adapter: \"Multipart\"})\n</code></pre></li> <li>pass <code>uploader</code> property with uploader instance inside second argument of <code>upload</code>   helper:   <pre><code>const uploader = sandbox.files.makeUploader(\"Multipart\")\nawait sandbox.files.upload(new File(...), {uploader})\n</code></pre></li> </ul> <p>The second group of ckanext-files utilities is available as <code>ckan.CKANEXT_FILES</code> object. This object mainly serves as extension and configuration point for <code>sandbox.files</code>.</p> <p><code>ckan.CKANEXT_FILES.adapters</code> is a collection of all classes that can be used to initialize uploader. It contains <code>Standard</code>, <code>Multipart</code> and <code>Base</code> classes. <code>Standard</code> and <code>Multipart</code> can be used as is, while <code>Base</code> must be extended by your custom uploader class. Add your custom uploader classes to <code>adapters</code>, to make them available application-wide:</p> <pre><code>class MyUploader extends Base { ... }\n\nckan.CKANEXT_FILES.adapters[\"My\"] = MyUploader;\n\nawait sandbox.files.upload(new File(...), {adapter: \"My\"})\n</code></pre> <p><code>ckan.CKANEXT_FILES.defaultSettings</code> contain the object with default settings available as <code>this.settings</code> inside any uploader. You can change the name of the storage used by all uploaders using this object. Note, changes will apply only to uploaders initialized after modification.</p> <pre><code>ckan.CKANEXT_FILES.defaultSettings.storage = \"memory\"\n</code></pre>"},{"location":"usage/multi-storage/","title":"Multi-storage","text":"<p>It's possible to configure multiple storages at once and specify which one you want to use for the individual file upload. Up until now we used the following storage options:</p> <ul> <li><code>ckanext.files.storage.default.type</code></li> <li><code>ckanext.files.storage.default.path</code></li> <li><code>ckanext.files.storage.default.create_path</code></li> </ul> <p>All of them have a common prefix <code>ckanext.files.storage.default.</code> and it's a key for using multiple storages simultaneously.</p> <p>Every option of the storage follows the pattern: <code>ckanext.files.storage.&lt;STORAGE_NAME&gt;.&lt;OPTION&gt;</code>. As all the options above contain <code>default</code> on position of <code>&lt;STORAGE_NAME&gt;</code>, they are related to the <code>default</code> storage.</p> <p>If you want to configure a storage with the name <code>custom</code> change the configuration of storage:</p> <pre><code>ckanext.files.storage.custom.type = files:fs\nckanext.files.storage.custom.path = /tmp/example\nckanext.files.storage.custom.create_path = true\n</code></pre> <p>And, if you want to use Redis-based storage named <code>memory</code> and filesystem-based storage named <code>default</code>, use the following configuration:</p> <pre><code>ckanext.files.storage.memory.type = files:redis\n\nckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n</code></pre> <p>The <code>default</code> storage is special. ckanext-files use it by default, as name suggests. If you remove configuration for the <code>default</code> storage and try to create a file, you'll see the following error:</p> <pre><code>echo 'hello world' &gt; /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt\n\n... ckan.logic.ValidationError: None - {'storage': ['Storage default is not configured']}\n</code></pre> <p>Storage default is not configured. That's why we need <code>default</code> configuration. But if you want to upload a file into a different storage or you don't want to add the <code>default</code> storage at all, you can always specify explicitly the name of the storage you are going to use.</p> <p>When using API actions, add <code>storage</code> parameter to the call:</p> <pre><code>echo 'hello world' &gt; /tmp/myfile.txt\nckanapi action files_file_create name=hello.txt upload@/tmp/myfile.txt storage=memory\n</code></pre> <p>When writing python code, pass storage name to <code>get_storage</code> function: <pre><code>storage = get_storage(\"memory\")\n</code></pre></p> <p>When writing JS code, pass object <code>{requestParams: {storage: \"memory\"}}</code> to <code>upload</code> function:</p> <pre><code>const sandbox = ckan.sandbox()\nconst file = new File([\"content\"], \"file.txt\")\nconst options = {requestParams: {storage: \"memory\"}};\n\nawait sandbox.files.upload(file, options)\n</code></pre>"},{"location":"usage/multipart/","title":"Multipart, resumable and signed uploads","text":"<p>This feature has many names, but it basically divides a single upload into multiple stages. It can be used in following situations:</p> <ul> <li>a really big file must be uploaded to cloud. It cannot fit into server's   temporal storage, so you split the file into smaller part and upload them   separately. Every part is uploaded to server and next part must wait till the   previous moved from server to cloud. This is a multipart upload.</li> <li>client has unstable or slow connection. Any upload takes ages and quite often   connection is interrupted so user has to spend extra time for re-uploading   files. To improve user experience, you want to track the upload progress and   keep incomplete file on server. If connection is interrupted, user can   continue upload from the point he stopped the last time, appending content to   existing incomplete file. This is a resumable upload.</li> <li>files are kept on cloud and uploads are quite intense on the portal. You   don't want to spend server resources on transferring content from client to   cloud. Instead you generate a URL that allows user to upload a single file   directly into specific location on cloud. User sends data to this URL and   only notifies the application, when upload is finished, so that the   application can make file visible. This is a signed upload.</li> </ul> <p>All these situations are handled by 4 API actions, which are available is storage has <code>MULTIPART</code> capability:</p> <ul> <li><code>files_multipart_start</code>: initialize multipart upload and set expected final   size and MIMEtype. Real multipart upload usually just return upload ID from   this action. Resumable upload creates empty file in the storage to accumulate   content inside it. Signed upload produces a URL for direct upload.</li> <li><code>files_multipart_update</code>: upload the fragment of the file of modify the   upload in some other way. Most often this action accepts ID of the upload and   <code>upload</code> field with fragment of the uploaded file.</li> <li><code>files_multipart_refresh</code>: this action synchronizes and returns current   upload progress. It can be used if upload was paused and client does not know   how many bytes were uploaded and from which byte the next upload fragment   starts.</li> <li><code>files_multipart_complete</code>: finalize the upload and convert it into normal   file, available to other parts of the application. Multipart upload usually   combines all uploaded parts into single file here. Resumable upload verifies   that the result has expected MIMEtype and size. Signed upload just registers   completed file in the system.</li> </ul> <p>Implementation of multipart upload depends on the used adapter, so make sure you checked its documentation before using any multipart actions. There are some common steps in multipart upload workflow that are usually the same among all adapters:</p> <ul> <li><code>files_multipart_start</code> requires <code>content_type</code> and <code>size</code> parameters. These   values will be used to validate completed upload.</li> <li><code>files_multipart_start</code> allows <code>hash</code> parameter. This value will be used to   validate completed upload. Unlike <code>content_type</code> and <code>size</code>, <code>hash</code> is   usually optional, because it may be difficult for client to compute it.</li> <li><code>files_multipart_update</code> accepts upload ID as <code>id</code> and fragment of the file   as <code>upload</code>. Sequence of calls to <code>files_multipart_update</code> with   non-overlapping fragments can be used to upload the file. Even if adapter   implements signed uploads and client is supposed to send file to the signed   URL instead of using <code>files_multipart_update</code>.</li> <li><code>files_multipart_complete</code> compares <code>content_type</code>, <code>size</code> and <code>hash</code>(if   present) specified during initialization of upload with actual values. If   they are different, upload is not converted into normal file. Depending on   implementation, storage may just ignore incorrect initial expectations an   assign a real values to the file as long as they are allowed by storage   configuration. But it's recommended to reject such uploads, so it safer to   assume, that incorrect expectations are not accepted.</li> </ul> <p>Incomplete files support most of normal file actions, but you need to pass <code>completed=False</code> to action when working with incomplete files. I.e, if you want to remove incomplete upload, use its ID and <code>completed=False</code>:</p> <pre><code>ckanapi action files_file_delete id=bdfc0268-d36d-4f1b-8a03-2f2aaa21de24 completed=False\n</code></pre> <p>Incompleted files do not support streaming and downloading via public interface of the extension. But storage adapter can expose such features via custom methods if it's technically possible.</p> <p>Example of basic multipart upload is shown above. <code>files:fs</code> adapter can be used for running this example, as it implements <code>MULTIPART</code>.</p> <p>First, create text file and check its size:</p> <pre><code>echo 'hello world!' &gt; /tmp/file.txt\nwc -c /tmp/file.txt\n\n... 13 /tmp/file.txt\n</code></pre> <p>The size is <code>13</code> bytes and content type is <code>text/plain</code>. These values must be used for upload initialization.</p> <pre><code>ckanapi action files_multipart_start name=file.txt size=13 content_type=text/plain\n\n... {\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n...   \"hash\": \"\",\n...   \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"name\": \"file.txt\",\n...   \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...   \"owner_type\": \"user\",\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {\n...     \"uploaded\": 0\n...   }\n... }\n</code></pre> <p>Here <code>storage_data</code> contains <code>{\"uploaded\": 0}</code>. It may be different for other adaptes, especially if they implement non-consecutive uploads, but generally it's the recommended way to keep upload progress.</p> <p>Now we'll upload first 5 bytes of file.</p> <pre><code>ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n    upload@&lt;(dd if=/tmp/file.txt bs=1 count=5)\n\n... {\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:47:01.313016+00:00\",\n...   \"hash\": \"\",\n...   \"id\": \"90ebd047-96a0-4f32-a810-ffc962cbc380\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"name\": \"file.txt\",\n...   \"owner_id\": \"59ea0f6c-5c2f-438d-9d2e-e045be9a2beb\",\n...   \"owner_type\": \"user\",\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {\n...     \"uploaded\": 5\n...   }\n... }\n</code></pre> <p>If you try finalizing upload right now, you'll get an error.</p> <pre><code>ckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... ckan.logic.ValidationError: None - {'upload': ['Actual value of upload size(5) does not match expected value(13)']}\n</code></pre> <p>Let's upload the rest of bytes and complete the upload.</p> <pre><code>ckanapi action files_multipart_update id=90ebd047-96a0-4f32-a810-ffc962cbc380 \\\n    upload@&lt;(dd if=/tmp/file.txt bs=1 skip=5)\n\nckanapi action files_multipart_complete id=90ebd047-96a0-4f32-a810-ffc962cbc380\n\n... {\n...   \"atime\": null,\n...   \"content_type\": \"text/plain\",\n...   \"ctime\": \"2024-06-22T14:57:18.483716+00:00\",\n...   \"hash\": \"c897d1410af8f2c74fba11b1db511e9e\",\n...   \"id\": \"a740692f-e3d5-492f-82eb-f04e47c13848\",\n...   \"location\": \"77e629f2-8938-4442-b825-8e344660e119\",\n...   \"mtime\": null,\n...   \"name\": \"file.txt\",\n...   \"owner_id\": null,\n...   \"owner_type\": null,\n...   \"pinned\": false,\n...   \"size\": 13,\n...   \"storage\": \"default\",\n...   \"storage_data\": {}\n... }\n</code></pre> <p>Now file can be used normally. You can transfer file ownership to someone, stream or modify it. Pay attention to ID: completed file has its own unique ID, which is different from ID of the incomplete upload.</p>"},{"location":"usage/ownership/","title":"File ownership","text":"<p>Every file can have an owner and there can be only one owner of the file. It's possible to create file without an owner, but usually application will only benefit from keeping every file with its owner. Owner is described with two fields: ID and type.</p> <p>When file is created, by default the current user from API action's context is assigned as an owner of the file. From now on, the owner can perform other operations, such as renaming/displaying/removing with the file.</p> <p>Apart from chaining auth function, to modify access rules for the file, plugin can implement <code>IFiles.files_file_allows</code> and <code>IFiles.files_owner_allows</code> methods.</p> <pre><code>def files_file_allows(\n    self,\n    context: Context,\n    file: File | Multipart,\n    operation: types.FileOperation,\n) -&gt; bool | None:\n    ...\n\ndef files_owner_allows(\n    self,\n    context: Context,\n    owner_type: str, owner_id: str,\n    operation: types.OwnerOperation,\n) -&gt; bool | None:\n    ...\n</code></pre> <p>These methods receive current action context, the tested object details, and the name of operation(<code>show</code>, <code>update</code>, <code>delete</code>, <code>file_transfer</code>). <code>files_file_allows</code> checks permission for accessed file. It's usually called when user interacts with file directly. <code>files_owner_allows</code> works with owner described by type and ID. It's usually called when user transfer file ownership, perform bulk file operation for owner files, or just trying to get the list of files that belongs to owner.</p> <p>If method returns true/false, operation is allowed/denied. If method returns <code>None</code>, default logic used to check access.</p> <p>As already mentoined, by default, user who owns the file, can access it. But what about different owners? What if file owned by other entity, like resource or dataset?</p> <p>Out of the box, nobody can access such files. But there are three config options that modify this restriction.</p> <p><code>ckanext.files.owner.cascade_access = ENTITY_TYPE ANOTHER_TYPE</code> gives access to file owned by entity if user already has access to entity itself. Use words like <code>package</code>, <code>resource</code>, <code>group</code> instead of <code>ENTITY_TYPE</code>.</p> <p>For example: file is owned by resource. If cascade access is enabled, whoever has access to <code>resource_show</code> of the resource, can also see the file owned by this resource. If user passes <code>resource_update</code> for resource, he can also modify the file owned by this resource, etc.</p> <p>Important: be careful and do not add <code>user</code> to <code>ckanext.files.owner.cascade_access</code>. User's own files are considered private and most likely you don't really need anyone else to be able to see or modify these files.</p> <p>The second option is <code>ckanext.files.owner.transfer_as_update</code>.  When transfer-as-update enabled, any user who has <code>&lt;OWNER_TYPE&gt;_update</code> permission, can transfer own files to this <code>OWNER_TYPE</code>. Intead of using this option, you can define <code>&lt;OWNER_TYPE&gt;_file_transfer</code>.</p> <p>And the third option is <code>ckanext.files.owner.scan_as_update</code>.  Just as with ownership transfer, it gives user permission to list all files of the owner if user can <code>&lt;OWNER_TYPE&gt;_update</code> it. Intead of using this option, you can define <code>&lt;OWNER_TYPE&gt;_file_scan</code>.</p>"},{"location":"usage/permissions/","title":"Permissions","text":"<p>File creation is not allowed by default. Only sysadmin can use <code>files_file_create</code> and <code>files_multipart_start</code> actions. This is done deliberately: uncontrolled uploads can turn your portal into user's personal cloud-storage.</p> <p>There are three ways to grant upload permission to normal users.</p> <p>The BAD option is simple. Enable <code>ckanext.files.authenticated_uploads.allow</code> config option and every registered user will be allowed to upload files. But only into <code>default</code> storage. If you want to change the list of storages available to common user, specify storage names as <code>ckanext.files.authenticated_uploads.storages</code> option.</p> <p>The GOOD option is relatively simple. Define chained auth function with name <code>files_file_create</code>. It's called whenever user initiates an upload. Now you can decide whether user is allowed to upload files with specified parameters.</p> <p>The BEST option is to leave this restriction unchanged. Do not allow any user to call <code>files_file_create</code>. Instead, create a new action for your goal. ckanext-files isn't a solution - it's a tool that helps you in building the solution.</p> <p>If you need to add documents field to dataset that contains uploaded PDF files, create a separate action <code>dataset_document_attach</code>. Specify access rules and validation for it. Or even hardcode the storage that will be used for uploads. And then, from this new action, call <code>files_file_create</code> with <code>ignore_auth: True</code>.</p> <p>In this way you control every side of uploading documents into dataset and do not accidentally break other functionality, because every other feature will define its own action.</p>"},{"location":"usage/task-queue/","title":"Task queue","text":"<p>One of the challenges introduced by independently managed files is related to file ownership. As long as you can call <code>files_transfer_ownership</code> manually, things are transparent. But as soon as you add custom file field to dataset, you probably want to automatically transfer ownership of the file refered by this custom field.</p> <p>Imagine, that you have PDF file owned by you. And you specify ID of this file in the <code>attachment_id</code> field of the dataset. You want to show download link for this file on the dataset page. But if file owned by you, nobody will be able to download the file. So you decide to transfer file ownership to dataset, so that anyone who sees dataset, can see the file as well.</p> <p>You cannot update dataset and transfer ownership after it, because there will be a time window between these two actions, when data is not valid. Or even worse, after updating dataset you'll lose internet connection and won't be able to finish the transfer.</p> <p>Neither you can transfer ownership first and then update the dataset. <code>attachment_id</code> may have additional validators and you don't know in advance, whether you'll be able to successfully update dataset after the transfer.</p> <p>This problem can be solved via queuing additional tasks inside the action. For example, validator that checks if certain file ID can be used as <code>attachment_id</code> can queue ownership transfer. If dataset update completed without errors, queued task is executed automatically and dataset becomes the owner of the file.</p> <p>Task is queued via <code>ckanext.files.shared.add_task</code> function, which accepts objects inherited from <code>ckanext.files.shared.Task</code>. <code>Task</code> class requires implementing abstract method <code>run(result: Any, idx: int, prev: Any)</code>, which is called when task is executed. This method receives the result of action which caused task execution, task's position in queue and the result of previous task.</p> <p>For example, one of <code>attachment_id</code> validatos can queue the following <code>MyTask</code> via <code>add_task(MyTask(file_id))</code> to transfer <code>file_id</code> ownership to the updated dataset:</p> <pre><code>from ckanext.files.shared import Task\n\nclass MyTask(Task):\n    def __init__(self, file_id):\n        self.file_id = file_id\n\n    def run(self, dataset, idx, prev):\n        return tk.get_action(\"files_transfer_ownership\")(\n            {\"ignore_auth\": True},\n            {\n                \"id\": self.file_id,\n                \"owner_type\": \"package\",\n                \"owner_id\": dataset[\"id\"],\n                \"pin\": True,\n            },\n        )\n</code></pre> <p>As the first argument, <code>Task.run</code> receives the result of action which was called. Right now only following actions support tasks:</p> <ul> <li><code>package_create</code></li> <li><code>packaage_update</code></li> <li><code>resource_create</code></li> <li><code>resource_update</code></li> <li><code>group_create</code></li> <li><code>group_update</code></li> <li><code>organization_create</code></li> <li><code>organization_update</code></li> <li><code>user_create</code></li> <li><code>user_update</code></li> </ul> <p>If you want to enable tasks support for your custom action, decorate it with <code>ckanext.files.shared.with_task_queue</code> decorator:</p> <pre><code>from ckanext.files.shared import with_task_queue\n\n@with_task_queue\ndef my_action(context, data_dict)\n    # you can call `add_task` inside this action's stack frame.\n    ...\n</code></pre> <p>Good example of validator using tasks is <code>files_transfer_ownership</code> validator factory. It can be added to metadata schema as <code>files_transfer_ownership(owner_type, name_of_id_field)</code>. For example, if you are adding this validator to resource, call it as <code>files_transfer_ownership(\"resource\", \"id\")</code>. The second argument is the name of the ID field. As in most cases it's <code>id</code>, you can omit the second argument:</p> <ul> <li>organization: <code>files_transfer_ownership(\"organization\")</code></li> <li>dataset: <code>files_transfer_ownership(\"package\")</code></li> <li>user: <code>files_transfer_ownership(\"user\")</code></li> </ul>"},{"location":"usage/tracked-files/","title":"Tracked and untracked files","text":"<p>There is a difference between creating files via action:</p> <pre><code>tk.get_action(\"files_file_create\")(\n    {\"ignore_auth\": True},\n    {\"upload\": \"hello\", \"name\": \"hello.txt\"}\n)\n</code></pre> <p>and via direct call to <code>Storage.upload</code>:</p> <pre><code>from ckanext.files.shared import get_storage, make_upload\n\nstorage = get_storage()\nstorage.upload(\"hello.txt\", make_upload(b\"hello\"), {})\n</code></pre> <p>The former snippet creates a tracked file: file uploaded to the storage and its details are saved to database.</p> <p>The latter snippet creates an untracked file: file uploaded to the storage, but its details are not saved anywhere.</p> <p>Untracked files can be used to achieve specific goals. For example, imagine a storage adapter that writes files to the specified ZIP archive. You can create an interface, that initializes such storage for an existing ZIP resource and uploads files into it. You don't need a separate record in DB for every uploaded file, because all of them go into the resource, that is already stored in DB.</p> <p>But such use-cases are pretty specific, so prefer to use API if you are not sure, what you need. The main reason to use tracked files is their discoverability: you can use <code>files_file_search</code> API action to list all the tracked files and optionally filter them by storage, location, content_type, etc:</p> <pre><code>ckanapi action files_file_search\n\n... {\n...   \"count\": 123,\n...   \"results\": [\n...     {\n...       \"atime\": null,\n...       \"content_type\": \"text/plain\",\n...       \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n...       \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...       \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n...       \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n...       \"mtime\": null,\n...       \"name\": \"hello.txt\",\n...       \"size\": 11,\n...       \"storage\": \"default\",\n...       \"storage_data\": {}\n...     },\n...     ...\n...   ]\n... }\n\nckanapi action files_file_search size:5 rows=1\n\n... {\n...   \"count\": 2,\n...   \"results\": [\n...     {\n...       \"atime\": null,\n...       \"content_type\": \"text/plain\",\n...       \"ctime\": \"2024-06-02T14:53:12.345358+00:00\",\n...       \"hash\": \"5eb63bbbe01eeed093cb22bb8f5acdc3\",\n...       \"id\": \"67a0dc8f-be91-48cd-bc8a-9934e12a48d0\",\n...       \"location\": \"25c01077-c2cf-484b-a417-f231bb6b448b\",\n...       \"mtime\": null,\n...       \"name\": \"hello.txt\",\n...       \"size\": 5,\n...       \"storage\": \"default\",\n...       \"storage_data\": {}\n...     }\n...   ]\n... }\n\nckanapi action files_file_search content_type=application/pdf\n\n... {\n...   \"count\": 0,\n...   \"results\": []\n... }\n</code></pre> <p>As for untracked files, their discoverability depends on the storage adapters. Some of them, <code>files:fs</code> for example, can scan the storage and locate all uploaded files, both thacked and untracked. If you have <code>files:fs</code> storage configured as <code>default</code>, use the following command to scan its content:</p> <pre><code>ckan files scan\n</code></pre> <p>If you want to scan a different storage, specify its name via <code>-s/--storage-name</code> option. Remember, that some storage adapters do not support scanning.</p> <pre><code>ckan files scan -s memory\n</code></pre> <p>If you want to see untracked files only, add <code>-u/--untracked-only</code> flag.</p> <pre><code>ckan files scan -u\n</code></pre> <p>If you want to track any untracked files, by creating a DB record for every such file, add <code>-t/--track</code> flag. After that you'll be able to discover previously untracked files via <code>files_file_search</code> API action. Most usable this option will be during the migration, when you are configuring a new storage, that points to an existing location with files.</p> <pre><code>ckan files scan -t\n</code></pre>"},{"location":"usage/transfer/","title":"Ownership transfer","text":"<p>File ownership can be transfered. As there can be only one owner of the file, as soon as you transfer ownership over file, you yourself do not own this file.</p> <p>To transfer ownership, use <code>files_transfer_ownership</code> action and specify <code>id</code> of the file, <code>owner_id</code> and <code>owner_type</code> of the new owner.</p> <p>You can't just transfer ownership to anyone. You either must pass <code>IFiles.files_owner_allows</code> check for <code>file_transfer</code> operation, or pass a cascade access check for the future owner of the file when cascade access and transfer-as-update is enabled.</p> <p>For example, if you have the following options in config file:</p> <p><pre><code>ckanext.files.owner.cascade_access = organization\nckanext.files.owner.transfer_as_update = true\n</code></pre> you must pass <code>organization_update</code> auth function if you want to transfer file ownership to organization.</p> <p>In addition, file can be pinned. In this way we mark important files. Imagine the resource and its uploaded file. The link to this file is used by resource and we don't want this file to be accidentally transfered to someone else. We pin the file and now nobody can transfer the file without explicit confirmation of his intention.</p> <p>There are two ways to move pinned file:</p> <ul> <li>you can call <code>files_file_unpin</code> first and then transfer the ownership via   separate API call</li> <li>you can pass <code>force</code> parameter to <code>files_transfer_ownership</code></li> </ul>"},{"location":"usage/use-in-browser/","title":"Usage in browser","text":"<p>You can upload files using JavaScript CKAN modules. ckanext-files extends CKAN's Sandbox object(available as <code>this.sandbox</code> inside the JS CKAN module), so we can use shortcut and upload file directly from the DevTools. Open any CKAN page, switch to JS console and create the sandbox instance. Inside it we have <code>files</code> object, which in turn contains <code>upload</code> method. This method accepts <code>File</code> object for upload(the same object you can get from the <code>input[type=file]</code>).</p> <pre><code>sandbox = ckan.sandbox()\nawait sandbox.files.upload(\nnew File([\"content\"], \"file.txt\")\n)\n\n... {\n...     \"id\": \"18cdaa65-5eed-4078-89a8-469b137627ce\",\n...     \"name\": \"file.txt\",\n...     \"location\": \"b53907c3-8434-4dee-9a9e-6c4d3055d200\",\n...     \"content_type\": \"text/plain\",\n...     \"size\": 7,\n...     \"hash\": \"9a0364b9e99bb480dd25e1f0284c8555\",\n...     \"storage\": \"default\",\n...     \"ctime\": \"2024-06-02T16:12:27.902055+00:00\",\n...     \"mtime\": null,\n...     \"atime\": null,\n...     \"storage_data\": {}\n... }\n</code></pre> <p>If you are still using FS storage configured in previous section, switch to <code>/tmp/example</code> folder and check it's content:</p> <pre><code>ls /tmp/example\n... b53907c3-8434-4dee-9a9e-6c4d3055d200\n\ncat b53907c3-8434-4dee-9a9e-6c4d3055d200\n... content\n</code></pre> <p>And, as usually, let's remove file using the ID from the <code>upload</code> promise:</p> <pre><code>sandbox.client.call(\"POST\", \"files_file_delete\", {\nid: \"18cdaa65-5eed-4078-89a8-469b137627ce\"\n})\n</code></pre>"},{"location":"usage/use-in-code/","title":"Usage in code","text":"<p>If you are writing the code and you want to interact with the storage directly, without the API layer, you can do it via a number of public functions of the extension available in <code>ckanext.files.shared</code>.</p> <p>Let's configure filesystem storage first. Filesystem adapter has a mandatory option <code>path</code> that controls filesystem location, where files are stored. If path does not exist, storage will raise an exception by default. But it can also create missing path if you enable <code>create_path</code> option. Here's our final version of settings:</p> <pre><code>ckanext.files.storage.default.type = files:fs\nckanext.files.storage.default.path = /tmp/example\nckanext.files.storage.default.create_path = true\n</code></pre> <p>Now we are going to connect to CKAN shell via <code>ckan shell</code> CLI command and create an instance of the storage:</p> <pre><code>from ckanext.files.shared import get_storage\nstorage = get_storage()\n</code></pre> <p>Because you have all configuration in place, the rest is fairly straightforward. We will upload the file, read it's content and remove it from the CKAN shell.</p> <p>To create the file, <code>storage.upload</code> method must be called with 2 parameters:</p> <ul> <li>the human readable name of the file</li> <li>special steam-like object with content of the file</li> </ul> <p>You can use any string as the first parameter. As for the \"special stream-like object\", ckanext-files has <code>ckanext.files.shared.make_upload</code> function, that accepts a number of different types(<code>bytes</code>, <code>werkzeug.datastructures.FileStorage</code>, <code>BytesIO</code>, file descriptor) and converts them into expected format.</p> <pre><code>from ckanext.files.shared import make_upload\n\nupload = make_upload(b\"hello world\")\nresult = storage.upload('file.txt', upload)\n\nprint(result)\n\n... FileData(\n...     location='60b385e7-8137-496c-bb1d-6ae4d7963ab3',\n...     size=11,\n...     content_type='text/plain',\n...     hash='5eb63bbbe01eeed093cb22bb8f5acdc3',\n...     storage_data={}\n... )\n</code></pre> <p><code>result</code> is an instance of <code>ckanext.files.shared.FileData</code> dataclass. It contains all the information required by storage to manage the file.</p> <p><code>result</code> object has <code>location</code> attribute that contains the name of the file relative to the <code>path</code> option specified in the storage configuration. If you visit <code>/tmp/example</code> directory, which was set as a <code>path</code> for the storage, you'll see there a file with the name matching <code>location</code> from result. And its content matches the content of our upload, which is quite an expected outcome.</p> <pre><code>cat /tmp/example/60b385e7-8137-496c-bb1d-6ae4d7963ab3\n\n... hello world\n</code></pre> <p>But let's go back to the shell and try reading file from the python's code. We'll pass <code>result</code> to the storage's <code>stream</code> method, which produces an iterable of bytes based on our result:</p> <pre><code>buffer = storage.stream(result)\ncontent = b\"\".join(buffer)\n\n... b'hello world'\n</code></pre> <p>In most cases, storage only needs a location of the file object to read it. So, if you don't have <code>result</code> generated during the upload, you still can read the file as long as you have its location. But remember, that some storage adapters may require additional information, and the following example must be adapted depending on the adapter:</p> <pre><code>from ckanext.files.shared import FileData\n\nlocation = \"60b385e7-8137-496c-bb1d-6ae4d7963ab3\"\ndata = FileData(location)\n\nbuffer = storage.stream(data)\ncontent = b\"\".join(buffer)\nprint(content)\n\n... b'hello world'\n</code></pre> <p>And finally we can to remove the file</p> <pre><code>storage.remove(result)\n</code></pre>"}]}
\ No newline at end of file