From fdd5a459aeeddd03349ff4fc82a8b446c620a866 Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Tue, 19 Mar 2024 11:38:00 +0000 Subject: [PATCH 1/7] update ordering of get_entities output --- README.md | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index 295cdc9..07fff05 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,6 @@ ro-crate-py is a Python library to create and consume [Research Object Crates](https://w3id.org/ro/crate). It currently supports the [RO-Crate 1.1](https://w3id.org/ro/crate/1.1) specification. - ## Installation ro-crate-py requires Python 3.7 or later. The easiest way to install is via [pip](https://docs.python.org/3/installing/): @@ -19,7 +18,6 @@ cd ro-crate-py pip install . ``` - ## Usage ### Creating an RO-Crate @@ -93,7 +91,6 @@ logs = crate.add_dataset("exp/logs") Note that the above adds all files and directories contained in `"exp/logs"` recursively to the crate, but only the top-level `"exp/logs"` dataset itself is listed in the metadata file (there is no requirement to represent every file and folder in the JSON-LD). To also add files and directory recursively to the metadata, use `add_tree` (but note that it only works on local directory trees). - #### Appending elements to property values What ro-crate-py entities actually store is their JSON representation: @@ -151,7 +148,6 @@ If you add `fetch_remote=True` to the `add_file` call, however, the library (whe Another option that influences the behavior when dealing with remote entities is `validate_url`, also `False` by default: if it's set to `True`, when the crate is serialized, the library will try to open the URL to add / update metadata bits such as the content's length and format (but it won't try to download the file unless `fetch_remote` is also set). - #### Adding entities with an arbitrary type An entity can be of any type listed in the [RO-Crate context](https://www.researchobject.org/ro-crate/1.1/context.jsonld). However, only a few of them have a counterpart (e.g., `File`) in the library's class hierarchy (either because they are very common or because they are associated with specific functionality that can be conveniently embedded in the class implementation). In other cases, you can explicitly pass the type via the `properties` argument: @@ -239,8 +235,8 @@ for e in crate.get_entities(): ``` ``` -ro-crate-metadata.json CreativeWork ./ Dataset +ro-crate-metadata.json CreativeWork paper.pdf File results.csv File images/figure.svg File @@ -248,7 +244,7 @@ https://orcid.org/0000-0000-0000-0000 Person https://orcid.org/0000-0000-0000-0001 Person ``` -The first two entities shown in the output are the [metadata file descriptor](https://www.researchobject.org/ro-crate/1.1/metadata.html) and the [root data entity](https://www.researchobject.org/ro-crate/1.1/root-data-entity.html), respectively. These are special entities managed by the `ROCrate` object, and are always present. The other entities are the ones we added in the [section on RO-Crate creation](#creating-an-ro-crate). You can access data entities with `crate.data_entities` and contextual entities with `crate.contextual_entities`. For instance: +The first two entities shown in the output are the [root data entity](https://www.researchobject.org/ro-crate/1.1/root-data-entity.html) and the [metadata file descriptor](https://www.researchobject.org/ro-crate/1.1/metadata.html), respectively. These are special entities managed by the `ROCrate` object, and are always present. The other entities are the ones we added in the [section on RO-Crate creation](#creating-an-ro-crate). You can access data entities with `crate.data_entities` and contextual entities with `crate.contextual_entities`. For instance: ```python for e in crate.data_entities: @@ -273,7 +269,6 @@ You can fetch an entity by its `@id` as follows: article = crate.dereference("paper.pdf") ``` - ## Command Line Interface `ro-crate-py` includes a hierarchical command line interface: the `rocrate` tool. `rocrate` is the top-level command, while specific functionalities are provided via sub-commands. Currently, the tool allows to initialize a directory tree as an RO-Crate (`rocrate init`) and to modify the metadata of an existing RO-Crate (`rocrate add`). @@ -396,15 +391,14 @@ Options: --help Show this message and exit. ``` - ## License - * Copyright 2019-2024 The University of Manchester, UK - * Copyright 2020-2024 Vlaams Instituut voor Biotechnologie (VIB), BE - * Copyright 2020-2024 Barcelona Supercomputing Center (BSC), ES - * Copyright 2020-2024 Center for Advanced Studies, Research and Development in Sardinia (CRS4), IT - * Copyright 2022-2024 École Polytechnique Fédérale de Lausanne, CH - * Copyright 2024 Data Centre, SciLifeLab, SE +* Copyright 2019-2024 The University of Manchester, UK +* Copyright 2020-2024 Vlaams Instituut voor Biotechnologie (VIB), BE +* Copyright 2020-2024 Barcelona Supercomputing Center (BSC), ES +* Copyright 2020-2024 Center for Advanced Studies, Research and Development in Sardinia (CRS4), IT +* Copyright 2022-2024 École Polytechnique Fédérale de Lausanne, CH +* Copyright 2024 Data Centre, SciLifeLab, SE Licensed under the Apache License, version 2.0 , From e58d467d1d3429ad42bc4fa516c033b8093d11a2 Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Tue, 19 Mar 2024 11:43:42 +0000 Subject: [PATCH 2/7] add name property for Donald Duck --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 07fff05..be35a4e 100644 --- a/README.md +++ b/README.md @@ -115,7 +115,9 @@ paper.properties() When `paper["author"]` is accessed, a new list containing the `alice` and `bob` entities is generated on the fly. For this reason, calling `append` on `paper["author"]` won't actually modify the `paper` entity in any way. To add an author, use the `append_to` method instead: ```python -donald = crate.add(Person(crate, "https://en.wikipedia.org/wiki/Donald_Duck")) +donald = crate.add(Person(crate, "https://en.wikipedia.org/wiki/Donald_Duck", properties={ + "name": "Donald Duck" + })) paper.append_to("author", donald) ``` From 6743520936e7b487e8cc08406982933c8e10b42f Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Tue, 19 Mar 2024 11:48:39 +0000 Subject: [PATCH 3/7] add file creation at start of tutorial --- README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index be35a4e..2fcbc79 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,16 @@ pip install . In its simplest form, an RO-Crate is a directory tree with an `ro-crate-metadata.json` file at the top level that contains metadata about the other files and directories, represented by [data entities](https://www.researchobject.org/ro-crate/1.1/data-entities.html). These metadata consist both of properties of the data entities themselves and of other, non-digital entities called [contextual entities](https://www.researchobject.org/ro-crate/1.1/contextual-entities.html) (representing, e.g., a person or an organization). -Suppose Alice and Bob worked on a research task together, which resulted in a manuscript written by both; additionally, Alice prepared a spreadsheet containing the experimental data, which Bob used to generate a diagram. Let's make an RO-Crate to package all this: +Suppose Alice and Bob worked on a research task together, which resulted in a manuscript written by both; additionally, Alice prepared a spreadsheet containing the experimental data, which Bob used to generate a diagram. For the purpose of this tutorial, you can just create dummy files for the documents: + +```bash +mkdir exp +touch exp/paper.pdf +touch exp/results.csv +touch exp/diagram.svg +``` + +Let's make an RO-Crate to package all this: ```python from rocrate.rocrate import ROCrate From a87ec99c2b9b99696d88d2ca9c465b1bb87672aa Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Tue, 26 Mar 2024 13:13:47 +0000 Subject: [PATCH 4/7] bring updates across from GTN --- README.md | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 2fcbc79..be61970 100644 --- a/README.md +++ b/README.md @@ -22,9 +22,9 @@ pip install . ### Creating an RO-Crate -In its simplest form, an RO-Crate is a directory tree with an `ro-crate-metadata.json` file at the top level that contains metadata about the other files and directories, represented by [data entities](https://www.researchobject.org/ro-crate/1.1/data-entities.html). These metadata consist both of properties of the data entities themselves and of other, non-digital entities called [contextual entities](https://www.researchobject.org/ro-crate/1.1/contextual-entities.html) (representing, e.g., a person or an organization). +In its simplest form, an RO-Crate is a directory tree with an `ro-crate-metadata.json` file at the top level. This file contains metadata about the other files and directories, represented by [data entities](https://www.researchobject.org/ro-crate/1.1/data-entities.html). These metadata consist both of properties of the data entities themselves and of other, non-digital entities called [contextual entities](https://www.researchobject.org/ro-crate/1.1/contextual-entities.html). A contextual entity can represent, for instance, a person, an organization or an event. -Suppose Alice and Bob worked on a research task together, which resulted in a manuscript written by both; additionally, Alice prepared a spreadsheet containing the experimental data, which Bob used to generate a diagram. For the purpose of this tutorial, you can just create dummy files for the documents: +Suppose Alice and Bob worked on a research task together, which resulted in a manuscript written by both; additionally, Alice prepared a spreadsheet containing the experimental data, which Bob used to generate a diagram. For the purpose of this tutorial, you can just create placeholder files for the documents: ```bash mkdir exp @@ -70,7 +70,7 @@ bob = crate.add(Person(crate, bob_id, properties={ })) ``` -Next, we express authorship of the various files: +At this point, we have a representation of the various entities. Now we need to express the relationships between them. This is done by adding properties that reference other entities: ```python paper["author"] = [alice, bob] @@ -78,28 +78,38 @@ table["author"] = alice diagram["author"] = bob ``` +You can also add whole directories together with their contents. In an RO-Crate, a directory is represented by the `Dataset` entity. Create a directory with some placeholder files: + +```bash +mkdir exp/logs +touch exp/logs/log1.txt +touch exp/logs/log2.txt +``` + +Now add it to the crate: + +```python +logs = crate.add_dataset("exp/logs") +``` + Finally, we serialize the crate to disk: ```python crate.write("exp_crate") ``` -Now the `exp_crate` directory should contain copies of the three files and an `ro-crate-metadata.json` file with a JSON-LD serialization of the entities and relationships we created, according to the RO-Crate profile. Note that we have chosen a different destination path for the diagram, while the other two files have been placed at the top level with their names unchanged (the default). +Now the `exp_crate` directory should contain copies of all the files we added and an `ro-crate-metadata.json` file with a [JSON-LD](https://json-ld.org) representation of the entities and relationships we created. Note that we have chosen a different destination path for the diagram, while the other two files have been placed at the top level with their names unchanged (the default). -Some applications and services support RO-Crates stored as archives. To save the crate in zip format, use `write_zip`: +Exploring the `exp_crate` directory, we see that all files and directories contained in `exp/logs` have been added recursively to the crate. However, in the `ro-crate-metadata.json` file, only the top level Dataset with `@id` `"exp/logs"` is listed. This is because we used `crate.add_dataset("exp/logs")` rather than adding every file individually. There is no requirement to represent every file and folder within the crate in the `ro-crate-metadata.json` file - in fact, if there were many files in the crate it would be impractical to do so. -```python -crate.write_zip("exp_crate.zip") -``` +If you do want to add files and directories recursively to the metadata, use `crate.add_tree` instead of `crate.add_dataset` (but note that it only works on local directory trees). -You can also add whole directories. A directory in RO-Crate is represented by the `Dataset` entity: +Some applications and services support RO-Crates stored as archives. To save the crate in zip format, use `write_zip`: ```python -logs = crate.add_dataset("exp/logs") +crate.write_zip("exp_crate.zip") ``` -Note that the above adds all files and directories contained in `"exp/logs"` recursively to the crate, but only the top-level `"exp/logs"` dataset itself is listed in the metadata file (there is no requirement to represent every file and folder in the JSON-LD). To also add files and directory recursively to the metadata, use `add_tree` (but note that it only works on local directory trees). - #### Appending elements to property values What ro-crate-py entities actually store is their JSON representation: @@ -255,7 +265,9 @@ https://orcid.org/0000-0000-0000-0000 Person https://orcid.org/0000-0000-0000-0001 Person ``` -The first two entities shown in the output are the [root data entity](https://www.researchobject.org/ro-crate/1.1/root-data-entity.html) and the [metadata file descriptor](https://www.researchobject.org/ro-crate/1.1/metadata.html), respectively. These are special entities managed by the `ROCrate` object, and are always present. The other entities are the ones we added in the [section on RO-Crate creation](#creating-an-ro-crate). You can access data entities with `crate.data_entities` and contextual entities with `crate.contextual_entities`. For instance: +The first two entities shown in the output are the [root data entity](https://www.researchobject.org/ro-crate/1.1/root-data-entity.html) and the [metadata file descriptor](https://www.researchobject.org/ro-crate/1.1/metadata.html), respectively. The former represents the whole crate, while the latter represents the metadata file. These are special entities managed by the `ROCrate` object, and are always present. The other entities are the ones we added in the [section on RO-Crate creation](#creating-an-ro-crate). + +As shown above, `get_entities` allows to iterate over all entities in the crate. You can also access only data entities with `crate.data_entities` and only contextual entities with `crate.contextual_entities`. For instance: ```python for e in crate.data_entities: From d21b215ab228411bd7f202bc8e6945a6a91fee45 Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Tue, 26 Mar 2024 13:30:40 +0000 Subject: [PATCH 5/7] move add_jsonld section to an advanced section --- README.md | 98 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 50 insertions(+), 48 deletions(-) diff --git a/README.md b/README.md index be61970..3196bd4 100644 --- a/README.md +++ b/README.md @@ -191,7 +191,56 @@ Note that entities can have multiple types, e.g.: "@type" = ["File", "SoftwareSourceCode"] ``` -#### Modifying the crate from JSON-LD dictionaries +### Consuming an RO-Crate + +An existing RO-Crate package can be loaded from a directory or zip file: + +```python +crate = ROCrate('exp_crate') # or ROCrate('exp_crate.zip') +for e in crate.get_entities(): + print(e.id, e.type) +``` + +``` +./ Dataset +ro-crate-metadata.json CreativeWork +paper.pdf File +results.csv File +images/figure.svg File +https://orcid.org/0000-0000-0000-0000 Person +https://orcid.org/0000-0000-0000-0001 Person +``` + +The first two entities shown in the output are the [root data entity](https://www.researchobject.org/ro-crate/1.1/root-data-entity.html) and the [metadata file descriptor](https://www.researchobject.org/ro-crate/1.1/metadata.html), respectively. The former represents the whole crate, while the latter represents the metadata file. These are special entities managed by the `ROCrate` object, and are always present. The other entities are the ones we added in the [section on RO-Crate creation](#creating-an-ro-crate). + +As shown above, `get_entities` allows to iterate over all entities in the crate. You can also access only data entities with `crate.data_entities` and only contextual entities with `crate.contextual_entities`. For instance: + +```python +for e in crate.data_entities: + author = e.get("author") + if not author: + continue + elif isinstance(author, list): + print(e.id, [p["name"] for p in author]) + else: + print(e.id, repr(author["name"])) +``` + +``` +paper.pdf ['Alice Doe', 'Bob Doe'] +results.csv 'Alice Doe' +images/figure.svg 'Bob Doe' +``` + +You can fetch an entity by its `@id` as follows: + +```python +article = crate.dereference("paper.pdf") +``` + +## Advanced features + +### Modifying the crate from JSON-LD dictionaries The `add_jsonld` method allows to add a contextual entity directly from a JSON-LD dictionary containing at least the `@id` and `@type` keys: @@ -245,53 +294,6 @@ for d in json_data.get("@graph", []): crate.add_or_update_jsonld(d) ``` -### Consuming an RO-Crate - -An existing RO-Crate package can be loaded from a directory or zip file: - -```python -crate = ROCrate('exp_crate') # or ROCrate('exp_crate.zip') -for e in crate.get_entities(): - print(e.id, e.type) -``` - -``` -./ Dataset -ro-crate-metadata.json CreativeWork -paper.pdf File -results.csv File -images/figure.svg File -https://orcid.org/0000-0000-0000-0000 Person -https://orcid.org/0000-0000-0000-0001 Person -``` - -The first two entities shown in the output are the [root data entity](https://www.researchobject.org/ro-crate/1.1/root-data-entity.html) and the [metadata file descriptor](https://www.researchobject.org/ro-crate/1.1/metadata.html), respectively. The former represents the whole crate, while the latter represents the metadata file. These are special entities managed by the `ROCrate` object, and are always present. The other entities are the ones we added in the [section on RO-Crate creation](#creating-an-ro-crate). - -As shown above, `get_entities` allows to iterate over all entities in the crate. You can also access only data entities with `crate.data_entities` and only contextual entities with `crate.contextual_entities`. For instance: - -```python -for e in crate.data_entities: - author = e.get("author") - if not author: - continue - elif isinstance(author, list): - print(e.id, [p["name"] for p in author]) - else: - print(e.id, repr(author["name"])) -``` - -``` -paper.pdf ['Alice Doe', 'Bob Doe'] -results.csv 'Alice Doe' -images/figure.svg 'Bob Doe' -``` - -You can fetch an entity by its `@id` as follows: - -```python -article = crate.dereference("paper.pdf") -``` - ## Command Line Interface `ro-crate-py` includes a hierarchical command line interface: the `rocrate` tool. `rocrate` is the top-level command, while specific functionalities are provided via sub-commands. Currently, the tool allows to initialize a directory tree as an RO-Crate (`rocrate init`) and to modify the metadata of an existing RO-Crate (`rocrate add`). From f5981e831c927a27324d8d3c1186ff2d911d96ae Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Tue, 26 Mar 2024 14:07:29 +0000 Subject: [PATCH 6/7] provide extra context for mainEntity --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3196bd4..3eecb97 100644 --- a/README.md +++ b/README.md @@ -384,7 +384,7 @@ To register the workflow as a `ComputationalWorkflow`: rocrate add workflow -l galaxy sort-and-change-case.ga ``` -Now the workflow has a type of `["File", "SoftwareSourceCode", "ComputationalWorkflow"]` and points to a `ComputerLanguage` entity that represents the Galaxy workflow language. Also, the workflow is listed as the crate's `mainEntity` (see the [Workflow RO-Crate profile](https://w3id.org/workflowhub/workflow-ro-crate/1.0)). +Now the workflow has a type of `["File", "SoftwareSourceCode", "ComputationalWorkflow"]` and points to a `ComputerLanguage` entity that represents the Galaxy workflow language. Also, the workflow is listed as the crate's `mainEntity` (this is required by the [Workflow RO-Crate profile](https://w3id.org/workflowhub/workflow-ro-crate/1.0), a subtype of RO-Crate which provides extra specifications for workflow metadata). To add [workflow testing metadata](https://crs4.github.io/life_monitor/workflow_testing_ro_crate) to the crate: From 3f40f3cb546c68beeeeffea37f7d82bd2af75fb8 Mon Sep 17 00:00:00 2001 From: simleo Date: Tue, 26 Mar 2024 15:51:32 +0100 Subject: [PATCH 7/7] minor updates to the readme --- README.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 3eecb97..895fc69 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ pip install . In its simplest form, an RO-Crate is a directory tree with an `ro-crate-metadata.json` file at the top level. This file contains metadata about the other files and directories, represented by [data entities](https://www.researchobject.org/ro-crate/1.1/data-entities.html). These metadata consist both of properties of the data entities themselves and of other, non-digital entities called [contextual entities](https://www.researchobject.org/ro-crate/1.1/contextual-entities.html). A contextual entity can represent, for instance, a person, an organization or an event. -Suppose Alice and Bob worked on a research task together, which resulted in a manuscript written by both; additionally, Alice prepared a spreadsheet containing the experimental data, which Bob used to generate a diagram. For the purpose of this tutorial, you can just create placeholder files for the documents: +Suppose Alice and Bob worked on a research task together, which resulted in a manuscript written by both; additionally, Alice prepared a spreadsheet containing the experimental data, which Bob used to generate a diagram. We will create placeholder files for these documents: ```bash mkdir exp @@ -100,9 +100,7 @@ crate.write("exp_crate") Now the `exp_crate` directory should contain copies of all the files we added and an `ro-crate-metadata.json` file with a [JSON-LD](https://json-ld.org) representation of the entities and relationships we created. Note that we have chosen a different destination path for the diagram, while the other two files have been placed at the top level with their names unchanged (the default). -Exploring the `exp_crate` directory, we see that all files and directories contained in `exp/logs` have been added recursively to the crate. However, in the `ro-crate-metadata.json` file, only the top level Dataset with `@id` `"exp/logs"` is listed. This is because we used `crate.add_dataset("exp/logs")` rather than adding every file individually. There is no requirement to represent every file and folder within the crate in the `ro-crate-metadata.json` file - in fact, if there were many files in the crate it would be impractical to do so. - -If you do want to add files and directories recursively to the metadata, use `crate.add_tree` instead of `crate.add_dataset` (but note that it only works on local directory trees). +Exploring the `exp_crate` directory, we see that all files and directories contained in `exp/logs` have been added recursively to the crate. However, in the `ro-crate-metadata.json` file, only the top level Dataset with `@id` `"exp/logs"` is listed. This is because we used `crate.add_dataset("exp/logs")` rather than adding every file individually. There is no requirement to represent every file and folder within the crate in the `ro-crate-metadata.json` file. If you do want to add files and directories recursively to the metadata, use `crate.add_tree` instead of `crate.add_dataset` (but note that it only works on local directory trees). Some applications and services support RO-Crates stored as archives. To save the crate in zip format, use `write_zip`: @@ -136,7 +134,7 @@ When `paper["author"]` is accessed, a new list containing the `alice` and `bob` ```python donald = crate.add(Person(crate, "https://en.wikipedia.org/wiki/Donald_Duck", properties={ "name": "Donald Duck" - })) +})) paper.append_to("author", donald) ```