From 260dc7a52547858efca4c1ac9522ceb7ddcbd969 Mon Sep 17 00:00:00 2001
From: "djacob65@gmail.com" <djacob65@gmail.com>
Date: Thu, 30 May 2024 06:55:22 +0100
Subject: [PATCH] Deployed afa6e9e with MkDocs version: 1.5.3

---
 about/index.html         |  15 +++++++++------
 print_page/index.html    |   8 ++------
 search/search_index.json |   2 +-
 sitemap.xml.gz           | Bin 127 -> 127 bytes
 4 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/about/index.html b/about/index.html
index 57129a7..19f6cf5 100755
--- a/about/index.html
+++ b/about/index.html
@@ -808,6 +808,13 @@
     Links
   </a>
   
+</li>
+      
+        <li class="md-nav__item">
+  <a href="#preprint" class="md-nav__link">
+    Preprint
+  </a>
+  
 </li>
       
         <li class="md-nav__item">
@@ -896,12 +903,8 @@ <h3 id="links">Links<a class="headerlink" href="#links" title="Permanent link">&
 <li><strong>Issues tracker</strong> : <a href="https://github.com/inrae/pgd-mmdt/issues" target="_blank">inrae/pgd-mmdt/issues</a></li>
 <li><strong>Instance online</strong> : <a href="https://pmb-bordeaux.fr/maggot/" target="_blank">INRAE UMR 1322 BFP</a></li>
 </ul>
-<!--
-### Publication
-
-* **Preprint** :  Maggot: An ecosystem for sharing metadata within the web of FAIR Data, BIORXIV/2024/595703 (Submission to GigaScience, May 17, 2024)
--->
-
+<h3 id="preprint">Preprint<a class="headerlink" href="#preprint" title="Permanent link">&para;</a></h3>
+<p>Daniel Jacob, Francois Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil (2024) Maggot: An ecosystem for sharing metadata within the web of <abbr title="Findable, Accessible, Interoperable and Reusable">FAIR</abbr> Data, BioRxiv, <a href="https://doi.org/10.1101/2024.05.24.595703">https://doi.org/10.1101/2024.05.24.595703</a></p>
 <h3 id="contacts">Contacts<a class="headerlink" href="#contacts" title="Permanent link">&para;</a></h3>
 <ul>
 <li>Daniel Jacob (<a href="https://eng-bfp.bordeaux-aquitaine.hub.inrae.fr/" target="_blank">INRAE UMR BFP</a>) : <a href="https://orcid.org/0000-0002-6687-7169" target="_blank">daniel.jacob @ inrae.fr</a></li>
diff --git a/print_page/index.html b/print_page/index.html
index 89fc30c..eb81990 100755
--- a/print_page/index.html
+++ b/print_page/index.html
@@ -1805,12 +1805,8 @@ <h3 id="about-links">Links<a class="headerlink" href="#about-links" title="Perma
 <li><strong>Issues tracker</strong> : <a href="https://github.com/inrae/pgd-mmdt/issues" target="_blank">inrae/pgd-mmdt/issues</a></li>
 <li><strong>Instance online</strong> : <a href="https://pmb-bordeaux.fr/maggot/" target="_blank">INRAE UMR 1322 BFP</a></li>
 </ul>
-<!--
-### Publication
-
-* **Preprint** :  Maggot: An ecosystem for sharing metadata within the web of FAIR Data, BIORXIV/2024/595703 (Submission to GigaScience, May 17, 2024)
--->
-
+<h3 id="about-preprint">Preprint<a class="headerlink" href="#about-preprint" title="Permanent link">&para;</a></h3>
+<p>Daniel Jacob, Francois Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil (2024) Maggot: An ecosystem for sharing metadata within the web of <abbr title="Findable, Accessible, Interoperable and Reusable">FAIR</abbr> Data, BioRxiv, <a href="https://doi.org/10.1101/2024.05.24.595703">https://doi.org/10.1101/2024.05.24.595703</a></p>
 <h3 id="about-contacts">Contacts<a class="headerlink" href="#about-contacts" title="Permanent link">&para;</a></h3>
 <ul>
 <li>Daniel Jacob (<a href="https://eng-bfp.bordeaux-aquitaine.hub.inrae.fr/" target="_blank">INRAE UMR BFP</a>) : <a href="https://orcid.org/0000-0002-6687-7169" target="_blank">daniel.jacob @ inrae.fr</a></li>
diff --git a/search/search_index.json b/search/search_index.json
index e2b0e4d..67d6ea3 100755
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\\\s\\\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":"<p> An ecosystem for sharing metadata <p> </p>"},{"location":"#foster-good-data-management-with-data-sharing-in-mind","title":"Foster good data management, with data sharing in mind","text":"<p>Sharing descriptive Metadata is the first essential step towards Open Scientific Data. With this in mind, Maggot was specifically designed to annotate datasets by creating a metadata file to attach to the storage space. Indeed, it allows users to easily add descriptive metadata to datasets produced within a collective of people (research unit, platform, multi-partner project, etc.). This approach fits perfectly into a data management plan as it addresses the issues of data organization and documentation, data storage and frictionless metadata sharing within this same collective and beyond.</p>"},{"location":"#main-features-of-maggot","title":"Main features of Maggot","text":"<p>The main functionalities of Maggot were established according to a well-defined need (See Background).</p> <ol> <li>Documente with Metadata your datasets produced within a collective of people, thus making it possible :<ul> <li>to answer certain questions of the Data Management Plan (DMP) concerning the organization, documentation, storage and sharing of data in the data storage space, </li> <li>to meet certain data and metadata requirements, listed for example by the Open Research Europe in accordance with the FAIR principles.</li> </ul> </li> <li>Search datasets by their metadata<ul> <li>Indeed, the descriptive metadata thus produced can be associated with the corresponding data directly in the storage space then it is possible to perform a search on the metadata in order to find one or more sets of data. Only descriptive metadata is accessible by default.</li> </ul> </li> <li>Publish the metadata of datasets along with their data files into an Europe-approved repository</li> </ol> <p>See a short Presentation and Poster for a quick overview.</p> <p></p>"},{"location":"#overview-of-the-different-stages-of-metadata-management","title":"Overview of the different stages of metadata management","text":"<p> Note: The step numbers indicated in the figure correspond to the different points developed below </p> <p>1 - First you must define all the metadata that will be used to describe your datasets.   All metadata can be defined using a single file (in TSV format, therefore using a spreadsheet). This is a unavoidable step because both input and search interfaces are completely generated from these definition files, defining in this way each of the fields along with their input type and also the associated Controlled Vocabulary (ontology, thesaurus, dictionary, list of fixed terms). The metadata proposed by default was mainly established according to the DDI (Data Documentation Initiative) metadata schema. This schema also largely corresponds to that adopted by the Dataverse software. See the Terminology Definition section.  </p> <p>2 - Entering metadata will be greatly facilitated by the use of dictionaries.   The dictionaries offered by default are: people, funders, data producers, as well as a vocabulary dictionary allowing you to mix ontologies and thesauri from several sources. Each of these dictionaries allows users, by entering a name by autocompletion, to associate information which will then be added when exporting the metadata either to a remote repository, or for harvesting the metadata. Thus this information, once entered into a dictionary, will not need to be re-entered again.  </p> <p>3 - The web interface for entering metadata is entirely built on the basis of definition files.    The metadata are distributed according to the different sections chosen, each constituting a tab (see screenshot). Mandatory fields are marked with a red star and must be documented in order to be able to generate the metadata file. The entry of metadata governed by a controlled vocabulary is done by autocompletion from term lists (dictionary, thesaurus or ontology). We can also define external resources (URL links) relating to documents, publications or other related data. Maggot thus becomes a hub for your datasets connecting different resources, local and external. Once the mandatory fields (at least) and other recommended fields (at best) have been entered, the metadata file can be generated in JSON format.  </p> <p>4 - The file generated in JSON format must be placed in the storage space reserved for this purpose.    The role played by this metadata file can be seen as a README file adapted for machines, but also readable by humans. With an internal structure, it offers coherence and consistency of information that a simple README file with a completely free and therefore unstructured text format does not allow. Furthermore, the central idea is to use the storage space as a local data repository, so that the metadata should go to the data and not the other way around.  </p> <p>5 - A search of the datasets can thus be carried out on the basis of the metadata.    Indeed, all the JSON metadata files are scanned and parsed according to a fixed time interval (30 min) then loaded into a database. This allows you to perform searches based on predefined metadata. The search form, in a compact shape, is almost the same as the entry form (see a screenshot). Depending on the search criteria, a list of data sets is provided, with for each of them a link pointing to the detailed sheet.  </p> <p>6 - The detailed metadata sheet provides all the metadata divided by section.   Unfilled metadata does not appear by default. When a URL can be associated with information (ORCID, Ontology, web site, etc.), you can click on it to go to the corresponding link. Likewise, it is possible to follow the associated link on each of the resources. From this sheet, you can also export the metadata according to different schemata (Dataverse, Zenodo, JSON-LD). See screenshot 1 &amp; screenshot 2.  </p> <p>7 - Finally, once you have decided to publish your metadata with your data, you can choose the repository   that suits you (currently repositories based on Dataverse and Zenodo are supported).  </p> <p></p>"},{"location":"#additional-key-points","title":"Additional key points","text":"<ul> <li><p>Being able to generate descriptive metadata from the start of a project or study without waiting for all the data to be acquired or processed, nor for the moment when one wish to publish data, thus respecting the research data lifecycle as best as possible. Read more.</p></li> <li><p>The implementation of the tool requires involving all data stakeholders upstream (definition of the metadata schema, vocabularies, targeted data repositories, etc.); everyone has their role: data manager/data steward on one side but also scientists and data producers on the other. Read more.</p></li> <li><p>A progressive rise towards an increasingly controlled and standardized vocabulary is not only possible but even encouraged. First we can start with a simple vocabulary dictionary used locally and grouping together domain vocabularies. Then we can consider the creation of a thesaurus with or without mapping to ontologies. The promotion of ontologies must also be done gradually by selecting those which are truly relevant for the collective. A tool like Maggot makes it easy to implement them (See Vocabulary). Read more.</p></li> </ul> <p></p> <p></p>"},{"location":"about/","title":"About","text":""},{"location":"about/#background","title":"Background","text":""},{"location":"about/#motives","title":"Motives","text":"<ul> <li>Meet the challenges of organizing, documenting, storing and sharing data from a site, a project or a structure (unit, platform, etc.).</li> <li>Have visibility of what is produced within the collective: datasets, software, databases, images, sounds, videos, analyses, codes, etc.</li> <li>Fall within an open science quality approach for sharing and reproducibility.</li> <li>Promote FAIR (at least the Findable &amp; Accessible criteria) within the collective.</li> <li>Raise awareness among newcomers and students about a better description of what they produce.</li> </ul>"},{"location":"about/#state-of-need","title":"State of need","text":"<ul> <li>Implementing a data management plan imposes prerequisites such as the externalization of data to be preserved outside of users' disk space. This does not only concern published data but all data produced during the duration of a project. Above all, this outsourcing makes it possible to gather the data in one place and already constitutes a first-level backup. This becomes even more necessary when temporary agents (doctoral students, post-docs, interns, fixed-term contracts) are involved in data production.</li> <li>Consequently, the concern arises about the organization of these storage spaces. Should they be harmonized, i.e. impose good practices such as i) the naming of folders and files, ii) a folder structure (docs, data, scripts, etc.), iii) the use of README files, etc.</li> <li>At a minimum, using a README file seems the simplest and least restrictive. But then the question arises \u201cwhat to put in it\u201d? Templates can be offered to simplify their writing. But then the question arises of how to use them effectively when we want to find information? With what vocabulary?</li> </ul>"},{"location":"about/#proposed-approach","title":"Proposed approach","text":"<ul> <li>The two main ideas behind the tool are:<ul> <li>Make the data storage space a data repository without having to move the data, then ensure that the metadata gets to the data.</li> <li>Be able to \u201ccapture\u201d the user\u2019s metadata as easily as possible by using their vocabulary.</li> </ul> </li> <li>Concerning the first idea: \"Just\" place a metadata file (JSON format) describing the project data in each subdirectory, and then find the projects and/or data corresponding to specific criteria. The choice fell on the JSON format, very suitable for describing metadata, readable by both humans and machines.</li> <li> <p>Concerning the second idea: Given the diversity of the fields, the approach chosen is to be both the most flexible and the most pragmatic possible by allowing users to choose their own vocabulary (controlled or not) corresponding to the reality of their field and their activities. However, a good approach is as much as possible to use only controlled vocabulary, that is to say relevant and sufficient vocabulary used as a reference in the field concerned to allow users to describe a project and its context without having to add additional terms. To this end, the tool must allow users a progressive approach towards the adoption of standardized controlled vocabularies (thesauri or even ontologies).</p> </li> <li> <p>With the approach proposed by Maggot, initially there is no question of opening the data, but of managing metadata associated with the data on a storage space with a precise perimeter represented by the collective (unit, team, project , platform, \u2026). The main characteristic of the tool is, above all, to \u201ccapture\u201d the metadata as easily as possible according to a well-chosen metadata schema. However, the opening of data via their metadata must be a clearly stated objective within the framework of projects financed by public institutions (e.g Europe). Therefore if you have taken care to correctly define your metadata schema so that it is possible to make a metadata crosswalk (using a mapping file) with a data repository recognized by the international community, then you can easily \"push\" its metadata with the data without having re-enter anything.</p> </li> </ul>"},{"location":"about/#links","title":"Links","text":"<ul> <li>Source code on Github : inrae/pgd-mmdt</li> <li>Issues tracker : inrae/pgd-mmdt/issues</li> <li>Instance online : INRAE UMR 1322 BFP</li> </ul>"},{"location":"about/#contacts","title":"Contacts","text":"<ul> <li>Daniel Jacob (INRAE UMR BFP) : daniel.jacob @ inrae.fr</li> </ul>"},{"location":"about/#designers-developers","title":"Designers / Developers","text":"<ul> <li> <p>Daniel Jacob (INRAE UMR BFP) | CATI PROSODIe</p> </li> <li> <p>Fran\u00e7ois Ehrenmann (INRAE UMR BioGECO) | CATI GEDEOP</p> </li> <li> <p>Philippe Chaumeil (INRAE UMR BioGECO)</p> </li> </ul>"},{"location":"about/#contributors","title":"Contributors","text":"<ul> <li> <p>Edouard Guitton (INRAE Dept. SA, Emerg'IN)</p> </li> <li> <p>St\u00e9phane Bernillon (INRAE UR MycSA)</p> </li> <li> <p>Joseph TRAN (INRAE UMR EGFV) | CATI BARIC</p> </li> </ul> <p></p> <p> </p> <p></p>"},{"location":"bloxberg/","title":"Bloxberg Blockchain","text":""},{"location":"bloxberg/#experimental-certification-of-metadata-file-on-the-bloxberg-blockchain","title":"EXPERIMENTAL - Certification of metadata file on the bloxberg blockchain","text":""},{"location":"bloxberg/#motivation","title":"Motivation","text":"<p>To guarantee the authenticity and integrity of a metadata file by recording it permanently and immutably on the bloxberg blockchain.</p> <p>Indeed, the blockchain is a technology that makes it possible to keep track of a set of transactions (writings in the blockchain), in a decentralized, secure and transparent manner, in the form of a blockchain. A blockchain can therefore be compared to a large (public or private) unfalsifiable register. Blockchain is today used in many fields because it provides solutions to many problems. For example in the field of Higher Education and Research, registration of dataset metadata in the blockchain, makes possible in this way to certify, in an inalienable, irrefutable and completely transparent manner, the ownership and authenticity of the data as well as for example, the license of use and the date of production of the data. Research stakeholders are then more open to the dissemination of their data (files, results, protocols, publications, etc.) since they know that, in particular, the ownership, content and conditions of use of the data cannot not be altered.</p> <p>The Maggot tool could thus serve as a gateway to certify its data with the associated metadata. The complete process is schematized by the following figure:</p> <p> </p>"},{"location":"bloxberg/#about-bloxberg","title":"About bloxberg","text":"<p>bloxberg is the most important blockchain project in science. It was founded in 2019 by MPDL , looking for a way to store research results and make them available to other researchers. In this sense, bloxberg is a decentralized register in which results can be stored in a tamper-proof way with a time stamp and an identifier.</p> <p>bloxberg is based on the Ethereum Blockchain. However, it makes use of a different consensus mechanism: instead of \u201cProof of Stake\u201d used by Ethereum since 2022, bloxberg validates blocks through \u201cProof of Authority\u201d. Each node is operated by one member. All members of the association are research institutions and are known in the network.  Currently, bloxberg has 49 nodes. It is an international project with participating institutions from all over the world.</p>"},{"location":"bloxberg/#how-to-process","title":"How to process ?","text":"<p>You will need a Ethereum address and an API key (must be requested via bloxberg-services (at) mpdl.mpg.de). See an example of pushing a metadata file to the bloxberg blockchain using Maggot.</p> <p></p>"},{"location":"bloxberg/#useful-links","title":"Useful links","text":"<ul> <li>Bloxberg Documentation</li> <li>Blockexplorer</li> <li>Blockchain ESR (France)</li> </ul>"},{"location":"configuration/","title":"Configuration","text":""},{"location":"configuration/#terminology-configuration","title":"Terminology configuration","text":"<p>A single file (web/conf/config_terms.txt) contains all the terminology. The input and search interfaces are completely generated from this definition file, thus defining each of the fields, their input type (checkbox, dropbox, textbox, ...) and the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms). This is why a configuration and conversion step into JSON format is essential in order to be able to configure all the other modules (example: creation of the MongoDB database schema when starting the application before filling it).</p> <p> </p> <ul> <li>Note : The step numbers shown in the figure above are mentioned in brackets in the text below.</li> </ul>"},{"location":"configuration/#tsv-to-json","title":"TSV to JSON","text":"<ul> <li> <p>This function is used to generate the terminology definition file in JSON format (config_terms.json) and the corresponding JSON-Schema file (maggot-schema.json) from a tabulated file (1). You can either create a terminology definition file in TSV format from scratch (see below to have more details), or extract the file from the current configuration (see JSON to TSV).</p> </li> <li> <p>Once the terminology definition file has been obtained (2), you can load it and press 'Submit'.</p> </li> <li> <p>Three files are generated (3 &amp; 5):</p> </li> <li>config_terms.json and maggot-schema.json : These files should be placed in the web/conf directory (3). A (re)start of the application must be done in full mode (4) (sh ./run fullstart)</li> <li>config_doc.txt (5) : This file serves as a template for the documentation of the metadata profile. You should edit it with a spreadsheet program, and fill in the description column (6). Then it is used to generate the documentation file in markdown format (see TSV to DOC).</li> </ul>"},{"location":"configuration/#tsv-to-doc","title":"TSV to DOC","text":"<ul> <li> <p>This function generates the markdown documentation file (doc.md) from the template file (config_doc.txt) which is itself generated from the metadata definition file (config_terms.txt, cf TSV to JSON).</p> </li> <li> <p>Once the template file for the documentation (config_doc.txt) has been edited and documented (6) (see below to have more details), you can load it and press Submit button.</p> </li> <li> <p>The documentation file in markdown format (doc.md) is thus generated (7) and must be placed in the web/docs directory (8). Users will have access to this documentation file via the web interface, in the documentation section, heading \"Metadata\".</p> </li> </ul>"},{"location":"configuration/#json-to-tsv","title":"JSON to TSV","text":"<ul> <li>This function allows you to extract the terminology definition file in TSV format (config_terms.txt) from the current configuration. This allows you to start from this file, either to adapt your own metadata profile or simply to modify it slightly.</li> </ul>"},{"location":"dictionaries/","title":"Dictionaries","text":""},{"location":"dictionaries/#presentation","title":"Presentation","text":"<ul> <li>The use of dictionaries has no other purpose to facilitate the entry of metadata, entry which can be long and repetitive in generalist data warehouses (such as repository based on Dataverse).</li> <li>Dictionaries allow you to record multiple information necessary to define an entity, such as the names of people or even the funders. These information, once entered and saved in a file called a dictionary, can be subsequently associated with the corresponding entity. </li> <li>The dictionaries offered by default are: people (people), funders (grant), data producers (producer), as well as a vocabulary dictionary (vocabulary) allowing you to mix ontologies and thesauri from several sources.</li> <li>To add a new dictionary, simply create a directory under web/cvlist then putting the files corresponding to the dictionary inside. Dictionaries will be automatically found by browsing this directory.</li> <li>Dictionary files are made using a simple spreadsheet then exported in TSV format.</li> <li>Dictionaries are accessed through secure access limited to administrators allowing their editing. The login is by default 'admin'. You can add another account for consultation only using the following command:  <pre><code>sh ./run passwd &lt;user&gt;\n</code></pre></li> </ul>"},{"location":"dictionaries/#the-people-dictionary","title":"The people dictionary","text":"<ul> <li>Note : must not be changed in its format nor in its name.</li> <li> <p>Like any dictionary, there must be 3 files (see below). Please note that the names of these files must always contain the name of the dictionary, i.e. same as the directory.  </p> </li> <li> <p>The format of the file containing the dictionary data (people.txt) is defined by another file (people_format.txt).</p> </li> </ul> <p> </p> <ul> <li>Thus, we know that the people dictionary must contain 5 columns (last name, first name, institution, ORCID number and email address) and that some fields are mandatory (last name, first name, institution) and others optional (ORCID number, email address).</li> <li>Each of the fields must respect a format specified by a regular expression in order to be accepted as valid.</li> <li>Optionally, you can connect an web API to each of the fields in order to make an entry by autocompletion from a remote register. Currently only ROR (Research Organization Registry) web API is possible but the mechanism is in place for new extensions.</li> <li>The third file, a very simple script written in JavaScript, defines the way to retrieve the list of names (here by containing the first and last name). Note that the name of the variable must always be identical to that of the dictionary. <pre><code>var people = [];\n// Each item in the 'people' list consists of the first two columns (0,1) separated by a space\nget_dictionary_values('people', merge=[0,' ',1]) </code></pre></li> <li> <p>Below, an example is given when modifying a record. When you click on the Institute field which is connected to the ROR web API, the drop-down list of reseach organizations that can correspond in the register appears, if there are any.  </p> </li> <li> <p>Note: It is possible to edit dictionaries, by adding an entry for example, and at the same time be able to immediately find this new entry in the metadata entry in the Maggot tool. Indeed each dictionary is reloaded into memory as soon as the corresponding input box is clicked. See an illustration.</p> </li> </ul> <p></p>"},{"location":"dictionaries/#other-dictionaries","title":"Other dictionaries","text":"<ul> <li> <p>Funders : The dictionary of the funders allows you to define the funding agency, project ID and its corresponding URL.  </p> <ul> <li>Note : can be renamed but while keeping its format (same columns and same layout).  </li> </ul> </li> <li> <p>Producers : The dictionary of the data producers allows you to define their Institute and  project ID and their corresponding URL. Optionally, you can add the URL of the logo.  </p> <ul> <li>Note : can be renamed but while keeping its format (same columns and same layout).  </li> </ul> </li> <li> <p>Vocabulary : Use this dictionary for mixing thesauri and ontologies in order to better target the entire controlled vocabulary of its field of application. Only the vocabulary is mandatory, the URL linked to an ontology or a thesaurus is optional. See Vocabulary section to learn the extent of the possibilities concerning vocabulary in Maggot.  </p> <ul> <li>Note : can be duplicated but while keeping its format (same columns and same layout).  </li> </ul> </li> </ul> <p></p>"},{"location":"gant/","title":"Gant","text":""},{"location":"gant/#gantt-diagrams-of-the-developments","title":"Gantt diagrams of the developments","text":"gantt     dateFormat YYYY-MM-DD     axisFormat  %Y-%m     title Diagrammes de Gantt pr\u00e9visionnel des d\u00e9veloppements     section MongoDB        1: des1, 2023-11-01,60d        2: des2, 2023-12-01,90d        3: des3, 2023-12-01,90d     section Couche API        4: des4, 2024-01-01,120d        5: des5, 2024-05-01,60d     section Interface Web        6a: des6, 2024-06-01,60d        6b: des7, 2024-07-01,60d        6c: des8, 2024-09-01,60d"},{"location":"infrastructure/","title":"Infrastructure","text":""},{"location":"infrastructure/#infrastructure-local-remote-or-mixed","title":"Infrastructure : Local, Remote or Mixed","text":"<p>The necessary Infrastructure involves 1) a machine running a Linux OS and 2) a dedicated storage space.</p> <p>1 - The machine will most often be of \"virtual\" type because more simpler to deploy, either locally (with VM providers such as VirtualBox, VMware Workstation or MS Hyper-V) or remotely (e.g VMware ESXi, Openstack: example of deployment). Moreover, the OS of your machine must allow you the deployment of docker containers. See for more details on \u201cWhat is Docker\u201d. The minimum characteristics of the VM are:  2 cpu, 2 Go RAM, 8 Go HD.</p> <p>2 - The dedicated storage space could be either in the local space of the VM, or in a remote place on the network.</p> <ul> <li>If the storage space is directly included in the VM, then tools like WinSCP or RcloneBrowser will allow you to easily transfer your files to the data space.</li> <li>If the storage space is your collective's NAS, you will need to make sure to open the port corresponding to the remote disk mount protocol (e.g SMB, NFS, iSCSI, ...). on your network's firewall. If both VM and data storage are not in the same private network, it will probably also require installing the sofware layer corresponding to your corporate VPN on the VM so that it can access your NAS. See example successfully tested.</li> <li>If the storage space is in a data center (e.g. NextCloud, Google Drive), then you will need to install a tool such as rclone on your VM in order to be able to mount the storage space on the VM's disk space. See example successfully tested.</li> </ul> <p></p>"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#install-on-your-linux-computer-or-linux-unix-server","title":"Install on your linux computer or linux / unix server","text":"<p>Requirements: The installation must be carried out on a (virtual) machine with  a recent Linux OS that support Docker (see Infrastructure)</p> <p></p>"},{"location":"installation/#retrieving-the-code","title":"Retrieving the code","text":"<p>Go to the destination directory of your choice then clone the repository and <code>cd</code> to your clone path:</p> <pre><code>git clone https://github.com/inrae/pgd-mmdt.git pgd-mmdt\ncd pgd-mmdt\n</code></pre> <p></p>"},{"location":"installation/#installation-of-docker-containers","title":"Installation of Docker containers","text":"<p>MAGGOT uses 3 Docker images for 3 distinct services:</p> <ul> <li>pgd-mmdt-db which hosts the MongoDB database</li> <li>pgd-mmdt-scan which scans the data and updates the contents of the database and the web interface</li> <li>pgd-mmdt-web which hosts the web server and the web interface pages</li> </ul> <p></p>"},{"location":"installation/#configuration","title":"Configuration","text":"<ul> <li>run : defines root of the data directory (including for development)</li> <li>dockerdbpart/initialisation/setupdb-js.template : defines MongoDB settings</li> <li>dockerscanpart/scripts/config.py : defines MongoDB settings (dbserver, dbport, username, password)</li> <li>web/inc/config/mongodb.inc : defines MongoDB settings (dbserver, dbport, username, password)</li> <li>web/inc/config/config.inc : defines many of web parameters (modify only if necessary)</li> <li>web/inc/config/local.inc : defines the application parameters specific to the local installation (not erase when updating).</li> </ul> <p>See Configuration settings</p> <p>Warning : You have to pay attention to put the same MongoDB settings in all the above configuration files. It is best not to change anything. It would have been preferable to put a single configuration file but this was not yet done given the different languages involved (bash, javascript, python, PHP). To be done!</p> <p>Note : If you want to run multiple instances, you will need to change in the run file, i) the container names, ii) the data path, iii) the MongoDB volume name and iv) the MongoDB port</p> <p>The following two JSON files are defined by default but can be easily configured from the web interface. See the Terminology Configuration section.</p> <ul> <li>web/conf/config_terms.json : define the terminology</li> <li>web/conf/maggot-schema.json : define the JSON schema used to validate metadata files.</li> </ul> <p></p>"},{"location":"installation/#commands","title":"Commands","text":"<p>The run shell script allows you to perform multiple actions by specifying an option :</p> <pre><code>cd pgd-mmdt\nsh ./run &lt;option&gt;\n</code></pre> <p>Options:</p> <ul> <li>build : Create the 3 Docker images namely pgd-mmdt-db, pgd-mmdt-scan and pgd-mmdt-web</li> <li>start : 1) Launch the 3 services by creating the Docker containers corresponding to the Docker images; 2) Create also the MongoDB volume.</li> <li>stop :  1) Remove all the 3 Docker containers; 2) Remove the MongoDB volume.</li> <li>initdb : Create and initialize the Mongo collection</li> <li>scan : Scan the data  according to a fixed period (30 min) and update the contents of the database and the web interface</li> <li>fullstart : Perform the 3 actions start, initdb and scan</li> <li>restart : Perform the 2 actions stop then fullstart</li> <li>ps : Check that all containers are running correctly</li> <li>passwd &lt;user&gt;: Define the admin password if no user is specified, allowing you to copy the new configuration file on the server via the web interface (see configuration and to add entries in dictionaries. If a user is specified, the dictionary consultation will be authorized for this user.</li> </ul> <p></p>"},{"location":"installation/#starting-the-application","title":"Starting the application","text":"<ul> <li> <p>You must first build the 3 docker container images if this has not already been done, by :    <pre><code>sh ./run build\n</code></pre></p> </li> <li> <p>The application can be sequentially started :</p> <ul> <li>Starting the web interface  <pre><code>sh ./run start\n</code></pre></li> <li>Initialization of the MongoDB database  <pre><code>sh ./run initdb\n</code></pre></li> <li>Scanning the data directory for metadata files (META_XXXX.json)  <pre><code>sh ./run scan\n</code></pre></li> </ul> </li> <li> <p>You can also launch these 3 steps with a single command:    <pre><code>sh ./run fullstart\n</code></pre></p> </li> </ul> <p></p>"},{"location":"installation/#launching-the-web-application-in-the-web-browser","title":"Launching the web application in the web browser","text":"<ul> <li> <p>Once the application is started, we can see if the containers are started using the following command:    <pre><code>docker ps -a\n</code></pre></p> </li> <li> <p>which should produce a result similar to the following:</p> </li> </ul> <pre>\n   CONTAINER ID  IMAGE          COMMAND                 CREATED          STATUS         PORTS                                  NAMES\n   5914504f456d  pgd-mmdt-web   \"docker-php-entrypoi.\"  12 seconds ago   Up 10 seconds  0.0.0.0:8087-&gt;80/tcp, :::8087-&gt;80/tcp  mmdt-web\n   226b13ed9467  pgd-mmdt-scan  \"cron -f\"               12 seconds ago   Up 11 seconds                                         mmdt-scan\n   81fecbb56d23  pgd-mmdt-db    \"docker-entrypoint.s.\"  13 seconds ago   Up 12 seconds  27017/tcp                              mmdt-db\n</pre> <ul> <li> <p>On the first line, the one which corresponds to the web interface, we see that port 80 of the docker is exported to port 8087 of the VM. Let's say that the IP address of your VM is 192.168.56.2, then in your browser you will need to put the URL http://192.168.56.2:8087/. You can of course change the port number in the 'run' file.</p> </li> <li> <p>It may be preferable to use a lightweight http server like nginx so that the Maggot URL will be http://192.168.56.2/maggot/. Below an example of config:    <pre><code>## /etc/nginx/nginx.conf\nhttp {\n\n...\n    upstream maggot  { server 127.0.0.1:8087; }\n...\n\n}\n\n## /etc/nginx/conf.d/my-site.conf\n\nserver {\nlisten 80 default;\nserver_name $host;\n\n...\n\n    location /maggot/ {\nproxy_set_header Host $host;\nproxy_set_header X-App-Name 'maggot';\nproxy_set_header X-Real-Ip $remote_addr;\nproxy_set_header X-Forwarded-Host $host;\nproxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\nproxy_pass http://maggot/;\n}\n\n...\n\n}\n</code></pre></p> </li> </ul> <p></p>"},{"location":"installation/#stoping-the-application","title":"Stoping the application","text":"<ul> <li>To stop the application :    <pre><code>sh ./run stop\n</code></pre></li> </ul>"},{"location":"installation/#updating-the-application","title":"Updating the application","text":"<p>When updating the application, it is imperative to preserve a whole set of configuration files as well as the content of certain directories (dictionaries, javascripts dedicated to vocabularies, etc.). An update script is available (./etc/update-maggot.sh) preferably placed under '/usr/local/bin'. To preserve your configuration, it is recommended to create local configuration files.</p> <ul> <li> <p>A first file 'local.conf' will contain all the parameters to be preserved, initially contained in the 'run' file. A small example could be as follow :    <pre><code>#!/bin/bash\n\n# Local HTTP Port for web application\nWEB_PORT=8088\n\n# Path to the data\nDATADIR=/media/Workdir/Share/DATA/\n</code></pre></p> </li> <li> <p>A second file './web/inc/config/local.inc' will contain all the parameters to be preserved, initially contained in the './web/inc/config/config.inc' file. A small example could be as follow :    <pre><code>&lt;?php\n\n# Main title\n$TITLE ='Metadata management - My Labs';\n$MAINTITLE =$TITLE;\n\n# File Browser\n$FILEBROWSER=1;\n$URL_FILEBROWSER='/fb/';\n\n# Enable some functionalities\n$export_oai = 1;\n\n?&gt;\n</code></pre></p> </li> </ul>"},{"location":"installation/#architecture-diagram","title":"Architecture diagram","text":"<p> Note: See how to do proceed for configuration steps. </p>"},{"location":"installation/#file-browser","title":"File Browser","text":"<p>You can provide access to your data via a file browser. This application must be installed separately but can be connected to Maggot by specifying the corresponding URL in the configuration file. Users and their rights are managed in the filebrowser application. Likewise, we can also create links to the data without a password. These links can be usefully specified as external resources in the metadata managed by Maggot.</p> <p> See how to do install in github. </p> <p></p>"},{"location":"private-access/","title":"Private access","text":""},{"location":"private-access/#private-access-key-management","title":"Private access key management","text":""},{"location":"private-access/#motivation","title":"Motivation","text":"<p>Although the Maggot tool is designed to foster the sharing of metadata within a collective, it may be necessary to temporarily privatize access to the metadata of an ongoing project with confidentiality constraints. So even within our own collective, access to metadata must be restricted to authorized users only.</p>"},{"location":"private-access/#implementation","title":"Implementation","text":"<p>The choice of not wanting to manage users in the Maggot tool was made in order to make the metadata completely open by default within a collective. Furthermore, access rights to the storage space are managed independently of the Maggot tool by the administrator of this space. It is therefore through the storage space that we must give or not access to the metadata via the web interface.</p> <p>The chosen mechanism for privatizing access is described below. It has the dual advantage of being simple to implement and simple to use.</p> <ol> <li> <p>First we have to generate a file containing the encrypted key for a private access. This file must be generated from the web interface then downloaded as shown in the figure below. Then this file must be manually deposited in the data directory corresponding to the dataset whose access we wish to privatize. The presence of this file within a directory is enough to block access to metadata and data by default. It should be noted that we can put this same file containing the encrypted private key in several data directories (included within the same project for example). The deposit must be done by hand because the Maggot tool must only have access to the storage space in read mode. This also guarantees that the user has writing rights to this space without having to manage user accounts on the Maggot side.</p> <p> </p> <p> By default, \u2018untwist1\u2019 metadata are not accessible to anyone </p> </li> <li> <p>When we want to have access to the metadata of this dataset, we have to simply enter the private key in the current session. This will have the effect of unlocking access to the metadata via the web interface only in the current session of our web browser. This means that we will have to enter the private key for each session (by default, a session lasts a maximum of 1 hour).</p> <p> </p> <p> Now the \u2018untwist1\u2019 metadata are accessible only to us </p> </li> <li> <p>When we want to give access to the metadata to the entire collective, we simply need to delete the private access file (named by default 'META_auth.txt') from the concerned data directory.</p> </li> </ol> <p></p>"},{"location":"settings/","title":"Configuration settings","text":""},{"location":"settings/#configuration-settings_1","title":"Configuration settings","text":"<p>Here is the list of all files that may be subject to adjustment of certain parameters according to the needs of the instance site.</p> <p></p>"},{"location":"settings/#dockerscanpartscriptsconfigpy","title":"dockerscanpart/scripts/config.py","text":"<p>This file defines the connection parameters to the Mongo database. Knowing that this database is only accessible internally, in principle they do not need to be changed.</p> <p>Note: These settings must be the same as defined in dockerdbpart/initialisation/setupdb-js.template</p> Parameter Description Default value dbserver Name of the MongoDB server mmdt-db database Name of the MongoDB database pgd-db dbport Port of the MongoDB server 27017 username Username of the Mongo database pgd-db with Read/Write access userw-pgd password Password corresponding to the username of the Mongo DB pgd-db wwwww <p></p>"},{"location":"settings/#incconfigmongodbinc","title":"inc/config/mongodb.inc","text":"<p>This file defines the connection parameters to the Mongo database. Knowing that this database is only accessible internally, in principle they do not need to be changed.</p> <p>Note: These settings must be the same as defined in dockerdbpart/initialisation/setupdb-js.template</p> Parameter Description Default value docker_mode Indicates whether the installation involves using docker containers. In this case, the Mongo DB IP address will be different from 127.0.0.1. 1 uritarget the Mongo DB IP address mmdt-db (docker_mode=1) or 127.0.0.1 (docker_mode=0) database Name of the MongoDB database pgd-db collection Name of the MongoDB collection metadata port Port of the MongoDB server 27017 username Username of the Mongo database pgd-db with Read access only userr-pgd password Password corresponding to the username of the Mongo DB pgd-db rrrrr <p></p>"},{"location":"settings/#incconfigconfiginc","title":"inc/config/config.inc","text":"<p>This file defines parameters related to i) the web interface, ii) the functionalities allowed for users. Only the parameters that could be useful to be changed for the needs of an instance are described here.</p> Parameter Description Default value EXTERN Indicates if the use of the tool is only for external use, i.e. without using a storage space. 0 PRIVATE_ACCESS Gives the possibility of managing private access to metadata 0 ZOOMWP Zoom level regarding the web interface. By reducing the size slightly, you get a better layout. 90% RESMEDIA Gives the possibility of putting a MINE type on each resource in the metadata. 1 TITLE Title to display in main banner Metadata management FILEBROWSER Indicates whether the file browser is used. This assumes it is installed. 0 URL_FILEBROWSER File browser URL. It can be absolute or relative. /fb/ APPNAME Name given in the URL to access the web interface. maggot dataverse_urls Array of Dataverse repository URLs where you can upload metadata and data - zenodo_urls Array of Zenodo repository URLs where you can upload metadata and data - SERVER_URL Default Dataverse repository URL https://entrepot.recherche.data.gouv.fr ZENODO_SERVER_URL Default Zenodo repository URL https://zenodo.org export_dataverse Indicates whether the Dataverse feature is enabled 1 export_zenodo Indicates whether the Zenodo feature is enabled 1 export_jsonld Indicates whether the JSON-LD feature is enabled 1 export_oai Indicates whether the OAI-PMH feature is enabled 0 export_bloxberg Indicates whether the Bloxberg Blockchain feature is enabled (Experimental) 0 cvdir Relative path of the Control Vocabulary Listes (cvlist) cvlist/ maggot_fulltitle Maggot name of the field corresponding to the title in dataverse/zenodo fulltitle auth_senddata_file Name of the file that must be present in the data directory to authorize the transfer of the data file META_datafile_ok.txt private_auth_file Name of the private access file META_auth.txt sendMail Configuring messaging for sending metadata to data managers (see below) NULL <p></p> <p>The messaging configuration is done using the following array in the inc/config/config.inc file (or more judiciously in inc/config/local.inc in order to be preserved during an update) - To understand how it works see Send Emails using PHPmailer</p> <pre><code>$sendMail['smtpHost'] = 'smtp.example.org';        //  Set the SMTP server to send through\n$sendMail['smtpSecure'] = 'tls';                   //  Enable TLS encryption\n$sendMail['smtpPort'] = 587;                       //  Set the TCP port to connect to\n$sendMail['CheckEmail'] = 'maggot@exemple.org';    //  Email address authorized to send emails\n$sendMail['CheckPass'] = 'password';               //  The corresponding password\n$sendMail['CheckName'] = 'Maggot';                 //  Alias name\n$sendMail['UserEmail'] = 'admin@exemple.org';      //  Email of data managers, separated by a comma\n</code></pre> <p></p>"},{"location":"settings/#run","title":"run","text":"<p>This file contains the essential parameters to be set before any use.</p> Parameter Description Default value WEB_PORT Local HTTP Port for web application 8087 DATADIR Path to the data /opt/data/ DB_IMAGE Docker image name of the MongoDB pgd-mmdt-db SCAN_IMAGE Docker image name of the Scan process pgd-mmdt-scan WEB_IMAGE Docker image name of the Web interface pgd-mmdt-web DB_CONTAINER Docker container name of the MongoDB mmdt-db SCAN_CONTAINER Docker container name of the Scan process mmdt-scan WEB_CONTAINER Docker container name of the Web interface mmdt-web MONGO_VOL Volume name for MongoDB mmdt-mongodb MONGO_PORT HTTP Port for MongoDB 27017 USER Admin user in the htpasswd file admin <p></p>"},{"location":"chats/chat1/","title":"chatGPT","text":""},{"location":"chats/chat1/#difference-between-data-manager-and-data-steward","title":"Difference Between Data Manager and Data Steward","text":""},{"location":"chats/chat1/#data-manager","title":"Data Manager","text":"<p>Responsibilities:</p> <ul> <li> <p>Data Operations and Management:</p> <ul> <li>Overseeing the data lifecycle, including data collection, storage, processing, and dissemination.</li> <li>Ensuring that data is available, reliable, and accessible to those who need it.</li> </ul> </li> <li> <p>Data Quality and Integrity:</p> <ul> <li>Implementing processes and tools to ensure the quality and integrity of data.</li> <li>Addressing issues related to data accuracy, consistency, and completeness.</li> </ul> </li> <li> <p>Data Policies and Procedures:</p> <ul> <li>Developing and enforcing data management policies and procedures.</li> <li>Ensuring compliance with internal policies and external regulations.</li> </ul> </li> <li> <p>Technical Oversight:</p> <ul> <li>Managing databases and other data storage systems.</li> <li>Collaborating with IT teams to ensure that data systems are secure and efficient.</li> </ul> </li> <li> <p>Performance Monitoring:</p> <ul> <li>Monitoring data performance and usage.</li> <li>Optimizing data processes for better efficiency and effectiveness.</li> </ul> </li> </ul> <p>Skills:</p> <ul> <li>Strong technical skills in database management, data modeling, and data architecture.</li> <li>Knowledge of data management tools and software.</li> <li>Project management and leadership skills.</li> </ul> <p></p>"},{"location":"chats/chat1/#data-steward","title":"Data Steward","text":"<p>Responsibilities:</p> <ul> <li> <p>Data Governance:</p> <ul> <li>Enforcing data governance policies and standards.</li> <li>Ensuring that data usage aligns with organizational policies and regulatory requirements.</li> </ul> </li> <li> <p>Data Quality Assurance:</p> <ul> <li>Monitoring data quality and implementing measures to improve it.</li> <li>Ensuring data definitions and metadata are consistently applied across the organization.</li> </ul> </li> <li> <p>Data Access and Security:</p> <ul> <li>Managing data access rights and permissions.</li> <li>Ensuring data is protected and used ethically.</li> </ul> </li> <li> <p>Liaison and Coordination:</p> <ul> <li>Acting as a bridge between data users and data managers.</li> <li>Facilitating communication and collaboration across departments regarding data needs and issues.</li> </ul> </li> <li> <p>Training and Support:</p> <ul> <li>Providing guidance and support to data users.</li> <li>Conducting training sessions on data governance and best practices.</li> </ul> </li> </ul> <p>Skills:</p> <ul> <li>Strong understanding of data governance principles and practices.</li> <li>Excellent communication and interpersonal skills.</li> <li>Detail-oriented with a focus on data quality and compliance.</li> </ul> <p></p>"},{"location":"chats/chat1/#key-differences","title":"Key Differences","text":"<p>Focus:</p> <ul> <li>Data Manager: Primarily focused on the technical aspects of data management, including operations, quality, and infrastructure.</li> <li>Data Steward: Concentrates on ensuring data governance, quality assurance, and compliance, acting as a guardian of data policies and standards.</li> </ul> <p>Scope of Work:</p> <ul> <li>Data Manager: Engages in the day-to-day management of data systems and processes, ensuring technical efficiency and performance.</li> <li>Data Steward: Oversees the adherence to data policies, facilitating proper data usage and addressing governance issues.</li> </ul> <p>Technical vs. Governance:</p> <ul> <li>Data Manager: Requires more technical expertise related to data systems and tools.</li> <li>Data Steward: Requires a deeper understanding of governance, regulatory compliance, and organizational policies.</li> </ul> <p>Interaction with Stakeholders:</p> <ul> <li>Data Manager: Often interacts with IT and technical teams to manage and optimize data infrastructure.</li> <li>Data Steward: Engages more with business users and stakeholders to ensure data is used correctly and meets governance standards.</li> </ul>"},{"location":"chats/chat1/#role-of-data-producers-in-data-management","title":"Role of Data Producers in Data Management","text":""},{"location":"chats/chat1/#responsibilities-of-data-producers","title":"Responsibilities of Data Producers","text":"<ol> <li> <p>Data Generation:</p> <ul> <li>Creating and collecting data from various sources, such as business transactions, sensors, user interactions, research activities, and other operational processes.</li> </ul> </li> <li> <p>Data Quality Assurance:</p> <ul> <li>Ensuring the accuracy, completeness, and reliability of the data they produce.</li> <li>Implementing data validation checks and processes to maintain high data quality.</li> </ul> </li> <li> <p>Data Documentation:</p> <ul> <li>Providing metadata and documentation that describe the data, its sources, collection methods, and any relevant context.</li> <li>Ensuring that data is well-documented to facilitate understanding and use by other stakeholders.</li> </ul> </li> <li> <p>Adherence to Standards:</p> <ul> <li>Following organizational data standards, formats, and protocols to ensure consistency and interoperability.</li> <li>Aligning data production with industry standards and regulatory requirements where applicable.</li> </ul> </li> <li> <p>Data Security and Privacy:</p> <ul> <li>Ensuring that data production processes comply with security and privacy policies.</li> <li>Protecting sensitive data and maintaining confidentiality as required by organizational policies and regulations.</li> </ul> </li> <li> <p>Collaboration:</p> <ul> <li>Working closely with data managers, data stewards, and other stakeholders to ensure that the data produced meets the needs of the organization.</li> <li>Providing feedback and insights to improve data collection processes and quality.</li> </ul> </li> </ol>"},{"location":"chats/chat1/#contributions-to-data-management","title":"Contributions to Data Management","text":"<ol> <li> <p>Foundation of Data Assets:</p> <ul> <li>Data producers provide the foundational data that is crucial for analysis, decision-making, and strategic planning.</li> <li>High-quality data production leads to reliable insights and better decision-making.</li> </ul> </li> <li> <p>Enhancing Data Quality:</p> <ul> <li>By ensuring data quality at the point of creation, data producers contribute to the overall integrity and usability of the data.</li> <li>High-quality data reduces the need for extensive cleaning and transformation downstream.</li> </ul> </li> <li> <p>Supporting Data Governance:</p> <ul> <li>Data producers help enforce data governance policies by adhering to standards and providing well-documented data.</li> <li>Their role in maintaining data quality and compliance supports the broader goals of data governance.</li> </ul> </li> <li> <p>Facilitating Data Integration:</p> <ul> <li>Consistent and well-documented data produced according to standards makes it easier to integrate data from different sources.</li> <li>This facilitates comprehensive analysis and a holistic view of organizational data.</li> </ul> </li> <li> <p>Enabling Timely Insights:</p> <ul> <li>Timely and accurate data production enables real-time analytics and rapid response to emerging trends and issues.</li> <li>Data producers play a critical role in ensuring that the data pipeline flows smoothly and efficiently.</li> </ul> </li> </ol>"},{"location":"chats/chat1/#interaction-with-other-roles","title":"Interaction with Other Roles","text":"<p>Data Managers:</p> <ul> <li>Data producers collaborate with data managers to ensure that the data collected is stored, processed, and made accessible in an efficient manner.</li> <li>They provide the raw data that data managers then organize and maintain.</li> </ul> <p>Data Stewards:</p> <ul> <li>Data producers work with data stewards to ensure that data governance policies are implemented from the point of data creation.</li> <li>They support data stewards in maintaining data quality and compliance with governance standards.</li> </ul> <p>Data Consumers:</p> <ul> <li>Data producers ensure that the data meets the needs of data consumers, such as analysts, researchers, and business users.</li> <li>They respond to feedback and requirements from data consumers to continuously improve the data production process.</li> </ul>"},{"location":"chats/chat1/#should-data-producers-manage-to-document-their-data-on-their-own","title":"Should Data Producers Manage to Document Their Data on Their Own?","text":""},{"location":"chats/chat1/#challenges-for-data-producers-in-documenting-data","title":"Challenges for Data Producers in Documenting Data","text":"<ol> <li> <p>Time Constraints:</p> <ul> <li>Data producers are often focused on their primary tasks of generating and collecting data, which can leave limited time for comprehensive documentation.</li> </ul> </li> <li> <p>Lack of Expertise:</p> <ul> <li>Not all data producers may have the necessary skills or knowledge to document data effectively, particularly in terms of metadata standards and best practices.</li> </ul> </li> <li> <p>Inconsistent Practices:</p> <ul> <li>Without standardized processes, documentation practices can vary significantly between different data producers, leading to inconsistencies and gaps.</li> </ul> </li> </ol>"},{"location":"chats/chat1/#strategies-to-assist-data-producers","title":"Strategies to Assist Data Producers","text":"<ol> <li> <p>Standardized Templates and Guidelines:</p> <ul> <li>Provide clear templates and guidelines for data documentation that detail what information is required and how it should be recorded.</li> <li>These templates can include fields for metadata, data sources, collection methods, data formats, and any relevant context.</li> </ul> </li> <li> <p>Training and Education:</p> <ul> <li>Offer training sessions and resources to educate data producers on the importance of data documentation and how to do it effectively.</li> <li>Regular workshops, online courses, and documentation manuals can help build the necessary skills and knowledge.</li> </ul> </li> <li> <p>Automated Documentation Tools:</p> <ul> <li>Implement tools that can automate parts of the documentation process. For example, metadata management tools can automatically capture certain types of metadata.</li> <li>Data collection platforms can be equipped with features that prompt users to enter documentation information at the time of data entry.</li> </ul> </li> <li> <p>Data Steward Support:</p> <ul> <li>Assign data stewards or data governance officers to work closely with data producers, providing hands-on assistance and guidance for documentation.</li> <li>Data stewards can review and validate the documentation to ensure completeness and accuracy.</li> </ul> </li> <li> <p>Integration into Workflow:</p> <ul> <li>Integrate documentation practices into the natural workflow of data production, making it a seamless part of the data generation process.</li> <li>For example, incorporating mandatory documentation fields in data entry forms or systems.</li> </ul> </li> <li> <p>Incentives and Accountability:</p> <ul> <li>Establish accountability measures and incentives to encourage thorough documentation. Recognize and reward good documentation practices.</li> <li>Include data documentation quality as a metric in performance evaluations where applicable.</li> </ul> </li> <li> <p>Feedback Mechanism:</p> <ul> <li>Create a feedback loop where data consumers and other stakeholders can provide input on the documentation quality and completeness.</li> <li>Use this feedback to continuously improve documentation practices and address any gaps or issues.</li> </ul> </li> </ol>"},{"location":"chats/chat1/#implementation-and-monitoring","title":"Implementation and Monitoring","text":"<ol> <li> <p>Documentation Checkpoints:</p> <ul> <li>Implement checkpoints or reviews at various stages of the data lifecycle to ensure that documentation is being completed and updated as needed.</li> <li>Regular audits and reviews by data stewards can help maintain high standards of documentation.</li> </ul> </li> <li> <p>Collaboration Platforms:</p> <ul> <li>Use collaboration platforms that allow data producers, stewards, and managers to work together on documentation in real time.</li> <li>Tools like shared document repositories, project management software, and collaborative editing tools can facilitate this process.</li> </ul> </li> <li> <p>Clear Roles and Responsibilities:</p> <ul> <li>Define clear roles and responsibilities for data documentation within the organization. Ensure that everyone understands their role in maintaining high-quality documentation.</li> <li>Clarify the expectations for data producers regarding documentation and the support available to them.</li> </ul> </li> </ol> <p>Chat generated by chatGPT 3.5 - May 24, 2024</p>"},{"location":"chats/chat2/","title":"chatGPT","text":""},{"location":"chats/chat2/#how-can-we-properly-respect-the-data-life-cycle-in-scientific-research-in-order-to-carry-out-good-data-management","title":"How can we properly respect the data life cycle in scientific research in order to carry out good data management?","text":"<p>Properly respecting the data life cycle in scientific research is crucial for effective data management. The data life cycle encompasses the stages through which data passes, from its initial generation to its eventual archiving and potential reuse. Here\u2019s a comprehensive approach to managing the data life cycle in scientific research:</p>"},{"location":"chats/chat2/#1-planning","title":"1. Planning","text":"<ul> <li>Define Objectives and Requirements: Clearly outline the goals of the research and what data will be required.</li> <li>Develop a Data Management Plan (DMP): This document should detail how data will be collected, managed, shared, and preserved. Many funding agencies require a DMP as part of grant proposals.</li> </ul>"},{"location":"chats/chat2/#2-data-collection","title":"2. Data Collection","text":"<ul> <li>Standardize Data Collection Methods: Use consistent and reliable methods to ensure data quality and reproducibility.</li> <li>Documentation: Document the data collection process thoroughly, including methodologies, protocols, and any tools or instruments used.</li> </ul>"},{"location":"chats/chat2/#3-data-processing-and-analysis","title":"3. Data Processing and Analysis","text":"<ul> <li>Data Cleaning: Address any errors or inconsistencies in the data.</li> <li>Data Transformation: Process raw data into a format suitable for analysis.</li> <li>Metadata Creation: Generate metadata to describe the data, including its origin, structure, and context.</li> </ul>"},{"location":"chats/chat2/#4-data-storage-and-preservation","title":"4. Data Storage and Preservation","text":"<ul> <li>Secure Storage: Store data in secure, backed-up environments to prevent loss or unauthorized access.</li> <li>Data Formats: Use standardized and widely accepted file formats to ensure long-term accessibility.</li> <li>Version Control: Keep track of different versions of the data, especially if it is updated or modified.</li> </ul>"},{"location":"chats/chat2/#5-data-sharing-and-publishing","title":"5. Data Sharing and Publishing","text":"<ul> <li>Open Access: Share data openly whenever possible, adhering to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.</li> <li>Repositories: Deposit data in trusted repositories that provide persistent identifiers (such as DOIs) for data sets.</li> <li>Ethical and Legal Considerations: Ensure compliance with ethical standards and legal requirements, including data privacy and intellectual property rights.</li> </ul>"},{"location":"chats/chat2/#6-data-reuse","title":"6. Data Reuse","text":"<ul> <li>Accessibility: Ensure that data is easily accessible and usable by others, including clear licensing terms.</li> <li>Citations: Provide appropriate citations for datasets to acknowledge the original data creators.</li> </ul>"},{"location":"chats/chat2/#7-data-archiving","title":"7. Data Archiving","text":"<ul> <li>Long-term Preservation: Archive data in a manner that ensures its longevity, considering both physical and digital preservation strategies.</li> <li>Review and Update: Periodically review the archived data to ensure it remains accessible and usable.</li> </ul>"},{"location":"chats/chat2/#best-practices-across-all-stages","title":"Best Practices Across All Stages","text":"<ul> <li>Documentation and Metadata: Maintain comprehensive documentation and metadata throughout the data life cycle.</li> <li>Quality Control: Implement quality control measures at each stage to ensure data integrity.</li> <li>Training and Awareness: Educate researchers and staff on best practices for data management.</li> <li>Tools and Technologies: Utilize appropriate tools and technologies to facilitate each stage of the data life cycle.</li> </ul> <p>By adhering to these guidelines, researchers can ensure robust data management practices that enhance the reliability, reproducibility, and impact of scientific research.</p>"},{"location":"chats/chat2/#at-which-stages-of-the-data-life-cycle-is-it-most-crucial-to-document-data","title":"At which stages of the data life cycle is it most crucial to document data?","text":"<p>Documentation is crucial at all stages of the data life cycle, but there are specific stages where it is particularly essential to ensure data integrity, reproducibility, and usability. Here are the stages where documentation is most critical:</p>"},{"location":"chats/chat2/#1-planning_1","title":"1. Planning","text":"<ul> <li>Data Management Plan (DMP): At the planning stage, documenting a comprehensive Data Management Plan is crucial. The DMP should outline how data will be collected, processed, stored, shared, and preserved. This sets the foundation for the entire data life cycle and ensures that all team members are aligned on data management practices.</li> </ul>"},{"location":"chats/chat2/#2-data-collection_1","title":"2. Data Collection","text":"<ul> <li>Methods and Protocols: Documenting the methods, protocols, and tools used for data collection is vital. This includes detailed descriptions of experimental procedures, sampling methods, and instrumentation. Accurate documentation at this stage ensures that data can be understood, replicated, and validated by other researchers.</li> </ul>"},{"location":"chats/chat2/#3-data-processing-and-analysis_1","title":"3. Data Processing and Analysis","text":"<ul> <li>Data Cleaning and Transformation: Documenting the steps taken to clean and transform data is critical. This includes recording any modifications, corrections, and the rationale behind them. Clear documentation of these processes ensures transparency and reproducibility.</li> <li>Analysis Methods: Documenting the analysis techniques, software, and statistical methods used is essential. This helps in understanding how conclusions were derived from the data and allows others to replicate the analysis.</li> </ul>"},{"location":"chats/chat2/#4-data-storage-and-preservation_1","title":"4. Data Storage and Preservation","text":"<ul> <li>Storage Locations and Formats: Documenting where and how data is stored, including file formats and directory structures, is important for data retrieval and long-term preservation. This ensures that data remains accessible and usable in the future.</li> <li>Version Control: Documenting version histories, including changes made and reasons for those changes, helps in tracking the evolution of the dataset and prevents confusion over which version is the most current or relevant.</li> </ul>"},{"location":"chats/chat2/#5-data-sharing-and-publishing_1","title":"5. Data Sharing and Publishing","text":"<ul> <li>Metadata and Context: Documenting comprehensive metadata, including descriptions of the data, its origin, structure, and any constraints or limitations, is crucial for data sharing. Metadata provides the context needed for others to understand and use the data effectively.</li> <li>Licensing and Access Conditions: Documenting the terms under which data can be accessed and reused, including any licenses or restrictions, is essential for ethical and legal compliance.</li> </ul>"},{"location":"chats/chat2/#6-data-reuse_1","title":"6. Data Reuse","text":"<ul> <li>Usage Documentation: Documenting how the data can be reused, including any potential applications and limitations, helps future users understand how to effectively leverage the data.</li> <li>Citations and Acknowledgements: Documenting how the data should be cited ensures that original data creators receive appropriate credit for their work.</li> </ul>"},{"location":"chats/chat2/#7-data-archiving_1","title":"7. Data Archiving","text":"<ul> <li>Archival Information: Documenting the processes and criteria for archiving data, including retention schedules and preservation strategies, is important for ensuring that data remains available and usable over the long term.</li> <li>Access Procedures: Documenting how archived data can be accessed, including any retrieval processes or conditions, ensures that the data can be efficiently located and used in the future.</li> </ul> <p>By thoroughly documenting at these crucial stages, researchers can enhance the transparency, reproducibility, and overall quality of their data management practices.</p> <p>Chat generated by chatGPT 3.5 - May 24, 2024</p>"},{"location":"chats/chat3/","title":"chatGPT","text":""},{"location":"chats/chat3/#what-are-the-steps-to-building-our-own-controlled-vocabulary","title":"What are the steps to building our own controlled vocabulary ?","text":"<p>Creating a controlled vocabulary is a systematic process that involves several key steps. Here\u2019s a detailed guide to help you build your own controlled vocabulary:</p>"},{"location":"chats/chat3/#1-define-the-purpose-and-scope","title":"1. Define the Purpose and Scope","text":"<ul> <li>Purpose: Determine why you need a controlled vocabulary. It could be for improving search accuracy, ensuring consistent data categorization, or facilitating communication among team members.</li> <li>Scope: Define the boundaries of the vocabulary. What subjects, domains, or types of information will it cover?</li> </ul>"},{"location":"chats/chat3/#2-gather-stakeholders-and-form-a-team","title":"2. Gather Stakeholders and Form a Team","text":"<ul> <li>Stakeholders: Identify and involve stakeholders who will use or benefit from the controlled vocabulary.</li> <li>Team: Assemble a team of subject matter experts, information architects, librarians, and end-users to contribute to and review the vocabulary.</li> </ul>"},{"location":"chats/chat3/#3-collect-and-analyze-source-material","title":"3. Collect and Analyze Source Material","text":"<ul> <li>Source Material: Gather documents, datasets, and other materials relevant to the scope of your vocabulary.</li> <li>Analysis: Analyze the material to identify key terms, concepts, and relationships. Look for frequently used terms and concepts that are essential to your domain.</li> </ul>"},{"location":"chats/chat3/#4-develop-a-list-of-terms","title":"4. Develop a List of Terms","text":"<ul> <li>Initial List: Create an initial list of terms based on the analysis.</li> <li>Standardization: Standardize the terms to ensure consistency. Decide on preferred terms, synonyms, and variants.</li> </ul>"},{"location":"chats/chat3/#5-define-relationships","title":"5. Define Relationships","text":"<ul> <li>Hierarchical Relationships: Establish broader and narrower term relationships (e.g., parent/child, broader term/narrower term).</li> <li>Associative Relationships: Identify related terms that are conceptually linked but not hierarchically.</li> <li>Equivalence Relationships: Map synonyms and near-synonyms to preferred terms.</li> </ul>"},{"location":"chats/chat3/#6-create-definitions-and-scope-notes","title":"6. Create Definitions and Scope Notes","text":"<ul> <li>Definitions: Write clear, concise definitions for each term to avoid ambiguity.</li> <li>Scope Notes: Add notes to clarify the context and usage of terms, particularly for terms that might be misunderstood.</li> </ul>"},{"location":"chats/chat3/#7-test-the-vocabulary","title":"7. Test the Vocabulary","text":"<ul> <li>Pilot Testing: Implement the vocabulary in a controlled setting to test its effectiveness.</li> <li>Feedback: Gather feedback from users and stakeholders to identify any issues or gaps.</li> </ul>"},{"location":"chats/chat3/#8-review-and-refine","title":"8. Review and Refine","text":"<ul> <li>Iterative Refinement: Based on feedback, refine the vocabulary. This may involve adding new terms, modifying existing terms, or changing relationships.</li> <li>Version Control: Keep track of changes and maintain version control to document the evolution of the vocabulary.</li> </ul>"},{"location":"chats/chat3/#9-document-the-vocabulary","title":"9. Document the Vocabulary","text":"<ul> <li>Documentation: Create comprehensive documentation that includes the list of terms, definitions, relationships, scope notes, and guidelines for use.</li> <li>User Guide: Develop a user guide to help stakeholders understand how to use the controlled vocabulary effectively.</li> </ul>"},{"location":"chats/chat3/#10-implement-and-maintain","title":"10. Implement and Maintain","text":"<ul> <li>Implementation: Integrate the controlled vocabulary into your systems, such as databases, content management systems, and search engines.</li> <li>Maintenance Plan: Establish a maintenance plan to regularly review and update the vocabulary to ensure it remains relevant and accurate. Assign responsibility for ongoing management.</li> </ul>"},{"location":"chats/chat3/#11-train-users","title":"11. Train Users","text":"<ul> <li>Training Sessions: Conduct training sessions for users to ensure they understand how to apply the controlled vocabulary.</li> <li>Support Resources: Provide support resources, such as FAQs, tutorials, and help desks.</li> </ul>"},{"location":"chats/chat3/#12-monitor-and-evaluate","title":"12. Monitor and Evaluate","text":"<ul> <li>Monitoring: Continuously monitor the usage of the vocabulary to ensure it meets the needs of the users.</li> <li>Evaluation: Periodically evaluate the effectiveness of the vocabulary and make adjustments as necessary.</li> </ul> <p>By following these steps, you can develop a controlled vocabulary that improves information retrieval, enhances data consistency, and supports effective communication within your organization.</p>"},{"location":"chats/chat3/#what-are-the-differences-between-thesauri-ontologies-or-simple-vocabulary-dictionaries-in-which-cases-should-each-of-them-be-favored","title":"What are the differences between thesauri, ontologies or simple vocabulary dictionaries? In which cases should each of them be favored?","text":"<p>Thesauri, ontologies, and simple vocabulary dictionaries are all tools for organizing and managing information, but they differ in complexity, structure, and use cases. Here\u2019s a breakdown of their differences and when to use each:</p>"},{"location":"chats/chat3/#thesauri","title":"Thesauri","text":"<p>Definition: - A thesaurus is a controlled vocabulary that organizes terms into a structured hierarchy, showing relationships such as synonyms, antonyms, broader terms (BT), narrower terms (NT), and related terms (RT).</p> <p>Structure: - Synonyms: Provides alternative words with similar meanings. - Hierarchical Relationships: Shows broader and narrower terms. - Associative Relationships: Connects related terms that are not hierarchically linked.</p> <p>Use Cases: - Information Retrieval: Enhances search capabilities by providing alternative terms and related concepts. - Indexing and Cataloging: Standardizes terms for consistent categorization of information. - Library and Archival Systems: Organizes collections for easier access and discovery.</p> <p>When to Use: - When you need to manage a large set of terms with hierarchical and associative relationships. - When you want to improve search functionality by including synonyms and related terms.</p>"},{"location":"chats/chat3/#ontologies","title":"Ontologies","text":"<p>Definition: - An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It includes not just the terms, but also the rules and constraints that govern their use.</p> <p>Structure: - Concepts: Detailed definitions of domain-specific concepts. - Relationships: Various types of relationships between concepts, including hierarchical, associative, and more complex logical relationships. - Attributes: Properties and characteristics of concepts. - Rules and Constraints: Logical rules that define how concepts can interact.</p> <p>Use Cases: - Semantic Web and AI: Facilitates data integration, sharing, and interoperability in web and AI applications. - Knowledge Representation: Provides a detailed and formalized structure for representing domain knowledge. - Complex Data Integration: Integrates heterogeneous data sources by providing a common framework.</p> <p>When to Use: - When you need a highly detailed and formal representation of knowledge with complex relationships. - When integrating diverse data sources and ensuring semantic interoperability is crucial. - For advanced applications in artificial intelligence, natural language processing, and the Semantic Web.</p>"},{"location":"chats/chat3/#simple-vocabulary-dictionaries","title":"Simple Vocabulary Dictionaries","text":"<p>Definition: - A simple vocabulary dictionary is a basic list of terms and their definitions, without structured relationships between terms.</p> <p>Structure: - Terms: A list of words or phrases. - Definitions: Brief explanations or meanings of each term.</p> <p>Use Cases: - Glossaries: Provides quick reference for understanding terms and definitions. - Basic Cataloging: Lists terms for simple categorization without the need for complex relationships. - Learning and Education: Helps learners understand basic concepts and terminology.</p> <p>When to Use: - When you need a straightforward reference tool for terms and their definitions. - For simple documentation where relationships between terms are not necessary. - When building foundational knowledge without the need for detailed hierarchical or associative structures.</p>"},{"location":"chats/chat3/#summary","title":"Summary","text":"<ul> <li>Thesauri are ideal for organizing terms with hierarchical and associative relationships, enhancing search and retrieval in large datasets.</li> <li>Ontologies provide a detailed and formal structure for representing complex domain knowledge and relationships, suitable for advanced applications in AI and data integration.</li> <li>Simple Vocabulary Dictionaries offer a basic list of terms and definitions, useful for straightforward reference and basic documentation.</li> </ul> <p>Choosing the right tool depends on your specific needs regarding the complexity of the relationships between terms, the level of detail required, and the intended application.</p>"},{"location":"chats/chat3/#concerning-descriptive-metadata-can-we-envisage-a-gradual-evolution-in-the-use-of-simple-dictionaries-then-moving-to-thesauri-or-even-ontologies","title":"Concerning descriptive metadata, can we envisage a gradual evolution in the use of simple dictionaries, then moving to thesauri, or even ontologies?","text":"<p>Yes, it is both practical and beneficial to envisage a gradual evolution in the use of descriptive metadata, starting with simple dictionaries and progressing to thesauri and ontologies as the needs and complexity of your data management grow. Here\u2019s how you can approach this evolutionary path:</p>"},{"location":"chats/chat3/#stage-1-simple-vocabulary-dictionaries","title":"Stage 1: Simple Vocabulary Dictionaries","text":"<p>Initial Implementation: - Scope: Begin with a simple vocabulary dictionary to establish basic terminology and definitions. - Purpose: Ensure consistency in the use of terms across your organization or system. - Structure: Create a straightforward list of terms with clear definitions.</p> <p>Benefits: - Ease of Use: Simple to create and manage. - Foundation: Provides a foundational understanding of key terms. - Quick Implementation: Allows for immediate improvement in consistency and clarity of metadata.</p> <p>Use Cases: - Early Project Stages: When starting a new project or system where the primary goal is to standardize terminology. - Training and Education: Helps new team members or users quickly understand the basic terms and definitions.</p>"},{"location":"chats/chat3/#stage-2-thesauri","title":"Stage 2: Thesauri","text":"<p>Intermediate Implementation: - Scope: Expand your vocabulary to include hierarchical and associative relationships. - Purpose: Improve information retrieval and organization by showing broader, narrower, and related terms. - Structure: Develop a thesaurus that includes synonyms, broader terms (BT), narrower terms (NT), and related terms (RT).</p> <p>Benefits: - Enhanced Search: Improves search accuracy by including synonyms and related concepts. - Better Organization: Provides a more structured and interconnected view of your terms. - Scalability: Allows for the gradual addition of new terms and relationships.</p> <p>Use Cases: - Growing Projects: As your project or system grows and the need for more detailed metadata relationships emerges. - Information Retrieval Systems: When enhancing search functionalities and user experience is a priority.</p>"},{"location":"chats/chat3/#stage-3-ontologies","title":"Stage 3: Ontologies","text":"<p>Advanced Implementation: - Scope: Develop a comprehensive and formal representation of your domain knowledge. - Purpose: Facilitate complex data integration, interoperability, and advanced reasoning. - Structure: Create an ontology with detailed concepts, relationships, properties, and logical rules.</p> <p>Benefits: - Semantic Interoperability: Ensures data consistency and integration across different systems and platforms. - Advanced Reasoning: Supports sophisticated queries and inference capabilities. - Detailed Knowledge Representation: Provides a rich and nuanced understanding of domain concepts and their interrelationships.</p> <p>Use Cases: - Complex Systems: For projects requiring detailed and formal knowledge representation, such as in AI, machine learning, and the Semantic Web. - Interoperable Data Systems: When integrating diverse data sources and ensuring seamless interoperability is essential.</p>"},{"location":"chats/chat3/#evolution-strategy","title":"Evolution Strategy","text":"<ol> <li>Assessment: Regularly assess your metadata needs and the complexity of your data.</li> <li>Incremental Development: Start with a simple dictionary and incrementally enhance it to a thesaurus and eventually an ontology.</li> <li>Stakeholder Involvement: Involve stakeholders and subject matter experts at each stage to ensure the vocabulary meets user needs.</li> <li>Training and Documentation: Provide training and documentation to help users adapt to each stage of the vocabulary\u2019s evolution.</li> <li>Feedback and Iteration: Continuously gather feedback and iterate on the vocabulary to improve its accuracy and usefulness.</li> </ol>"},{"location":"chats/chat3/#example-evolution-path","title":"Example Evolution Path","text":"<ol> <li>Simple Vocabulary Dictionary:</li> <li>Define basic terms for a new digital library.</li> <li> <p>Ensure all team members use consistent terminology.</p> </li> <li> <p>Thesaurus:</p> </li> <li>Expand the dictionary to include hierarchical relationships between terms (e.g., broader and narrower terms for different types of documents).</li> <li> <p>Improve search functionality by adding synonyms and related terms.</p> </li> <li> <p>Ontology:</p> </li> <li>Develop an ontology to represent complex relationships between documents, authors, subjects, and related concepts.</li> <li>Enable advanced search and data integration with other digital libraries and research databases.</li> </ol> <p>By following this gradual evolution, you can systematically enhance your metadata management capabilities, ensuring your system remains adaptable and meets the growing complexity of your data and user needs.</p> <p>Chat generated by chatGPT 3.5 - May 24, 2024</p>"},{"location":"chats/chat4/","title":"chatGPT","text":""},{"location":"chats/chat4/#what-is-metadata-crosswalk","title":"What is metadata crosswalk?","text":"<p>A metadata crosswalk is a structured mapping between the metadata elements of two or more different systems or standards. Metadata, in this context, refers to descriptive information about data or resources, such as titles, authors, dates, keywords, etc. </p> <p>When different systems or standards are used to manage data or resources, they often have their own unique metadata schemas or formats. A metadata crosswalk helps to bridge the gap between these different schemas by establishing equivalences or relationships between the metadata elements used in each system.</p> <p>For example, if you have a digital library system that uses one set of metadata standards and you want to integrate it with a repository that uses a different set of standards, you might create a metadata crosswalk to map the elements from one standard to the other, ensuring interoperability and consistency in metadata representation.</p> <p>Metadata crosswalks are particularly useful in data integration, migration, and interoperability scenarios, enabling smooth communication and exchange of information between disparate systems or standards.</p>"},{"location":"chats/chat4/#whats-the-point","title":"What's the point ?","text":"<p>The primary purpose of a metadata crosswalk is to facilitate interoperability and data exchange between systems or standards that use different metadata schemas. Here are some specific points highlighting its importance:</p> <ol> <li> <p>Interoperability: Different organizations or systems may use varying metadata standards or schemas to describe their data or resources. By creating a metadata crosswalk, you ensure that these systems can communicate with each other effectively, exchanging information without loss of meaning.</p> </li> <li> <p>Data Integration: When consolidating data from multiple sources or systems, a metadata crosswalk helps ensure that the metadata associated with each dataset is properly understood and mapped to a common format. This facilitates the integration process and allows for more efficient analysis and management of the combined datasets.</p> </li> <li> <p>Standardization: By mapping metadata elements between different standards or schemas, a crosswalk promotes standardization and consistency in how data is described and represented. This consistency is essential for ensuring accurate interpretation and retrieval of information across different systems or platforms.</p> </li> <li> <p>Migration and Upgrades: During system migrations or upgrades, metadata formats may change. A metadata crosswalk can aid in transitioning from one metadata schema to another by providing a clear mapping between the old and new formats, thus minimizing disruptions and ensuring continuity in data management practices.</p> </li> <li> <p>Data Sharing and Reuse: When sharing data with external parties or making it available for reuse, a metadata crosswalk helps ensure that the metadata associated with the data is understandable and interpretable by others. This promotes data sharing and reuse, as users can easily understand the content and context of the data they are accessing.</p> </li> </ol> <p>Overall, the point of a metadata crosswalk is to promote seamless communication, integration, and interoperability between different systems or standards, ultimately enhancing the usability and accessibility of data across diverse environments.</p>"},{"location":"chats/chat4/#can-you-suggest-some-interesting-links-on-the-subject-related-to-life-science","title":"Can you suggest some interesting links on the subject related to life science ?","text":"<p>Certainly! Here are some interesting links related to metadata crosswalks in the context of life sciences:</p> <ol> <li> <p>NCBI Metadata Crosswalk Tool:</p> <ul> <li>This tool provided by the National Center for Biotechnology Information (NCBI) allows users to convert metadata between various standards commonly used in life sciences, such as Dublin Core, MIAME/Minimum Information About a Microarray Experiment, and SRA/SRA Study XML.</li> <li>Link: NCBI Metadata Crosswalk Tool</li> </ul> </li> <li> <p>Metadata Standards and Tools for Life Sciences:</p> <ul> <li>This page from the European Bioinformatics Institute (EMBL-EBI) provides an overview of metadata standards commonly used in life sciences research, along with links to tools and resources for managing metadata and converting between different standards.</li> <li>Link: EMBL-EBI Metadata Standards and Tools</li> </ul> </li> <li> <p>FAIRsharing:</p> <ul> <li>FAIRsharing is a curated resource that provides information about standards, databases, and data policies in the life sciences and related disciplines. It includes a section specifically dedicated to metadata standards and vocabularies.</li> <li>Link: FAIRsharing Metadata Standards</li> </ul> </li> <li> <p>NIH Common Data Elements (CDE) Repository:</p> <ul> <li>The NIH Common Data Elements Repository provides a standardized set of data elements and definitions for use in NIH-funded research. While not specifically focused on metadata crosswalks, it offers valuable insights into standardizing metadata across life sciences research projects.</li> <li>Link: NIH CDE Repository</li> </ul> </li> </ol> <p>These resources should provide you with a good starting point for exploring metadata standards and tools relevant to life sciences research.</p> <p>Chat generated by chatGPT 3.5 - May 24, 2024</p>"},{"location":"definitions/","title":"Definition Files","text":""},{"location":"definitions/#metadata-definition-files","title":"Metadata definition files","text":"<p>The Maggot tool offers great flexibility in configuration. It allows you to completely choose all the metadata you want to describe your data. You can base yourself on an existing metadata schema, invent your own schema or, more pragmatically, mix one or more schemas by introducing some metadata specific to your field of application. However, keep in mind that if you want to add descriptive metadata to your data then a certain amount of information is expected. But a completely different use of the tool is possible, it's up to you.</p> <p>There are two levels of definition files as shown the figure below:</p> <p></p> <p>1 - The first level concerns the definition of terminology (metadata) similar to a descriptive metadata plan. Clearly, this category is more akin to configuration files. They represent the heart of the application around which everything else is based. The input and search interfaces are completely generated from these definition files (especially the web/conf/config_terms.txt file), thus defining each of the fields, their input type (checkbox, dropbox, textbox, ...) and the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms). This is why a configuration step is essential in order to be able to configure all the other modules.</p> <p>2 - The second level concerns the definitions of the mapping to a differently structured metadata schema (metadata crosswalk, i.e a specification for mapping one metadata standard to another), used either i) for metadata export to a remote repository (e.g. Dataverse, Zenodo) or ii) for metadata harvesting (e.g. JSON-LD, OAI-PMH). Simply place the definition files in the configuration directory (web/conf) for them to be taken into account, provided you have adjusted the configuration (See Settings).</p> <p>All definition files are made using a simple spreadsheet then exported in TSV format. </p> <p>The list of definition files in Maggot are given below. All must be put under the directory web/conf. </p> <p>See an example on line : https://pmb-bordeaux.fr/maggot/config/view and the corresponding form based on these definition files.</p> <p></p>"},{"location":"definitions/config_terms/","title":"Terminlogy Definition","text":""},{"location":"definitions/config_terms/#example-of-a-terminlogy-definition-file","title":"Example of a Terminlogy Definition file","text":"Field Section Required Search ShortView Type features Label Predefined terms title definition Y N 1 textbox width=350px Short name fulltitle definition Y Y 2 textbox Full title subject definition Y Y checkbox open=0 Subject Agricultural Sciences,Arts and Humanities,Astronomy and Astrophysics,Business and Management,Chemistry,Computer and Information Science,Earth and Environmental Sciences,Engineering,Law,Mathematical Sciences,Medicine Health and Life Sciences,Physics,Social Sciences,Other description definition Y Y areabox rows=6,cols=30 Description of the dataset note definition N Y areabox rows=4,cols=30 Notes status status N Y 3 dropbox width=350px Status of the dataset Processed,In progress,Unprocessed access_rights status N Y 4 dropbox width=350px Access rights to data Public,Mixte,Private language status N Y checkbox open=0 Language Czech,Danish,Dutch,English,Finnish,French,German,Greek,Hungarian,Icelandic,Italian,Lithuanian,Norwegian,Romanian,Slovenian,Spanish,Swedish lifeCycleStep status N Y multiselect autocomplete=lifecycle,min=1 Life cycle step license status N Y textbox autocomplete=license,min=1 License datestart status N Y datebox width=350px Start of collection dateend status N Y datebox width=350px End of collection dmpid status N Y textbox DMP identifier contacts management Y Y multiselect autocomplete=people,min=1 Contacts authors management Y Y multiselect autocomplete=people,min=1 Authors collectors management N Y multiselect autocomplete=people,min=1 Data collectors curators management N Y multiselect autocomplete=people,min=1 Data curators members management N Y multiselect autocomplete=people,min=1 Project members leader management N Y multiselect autocomplete=people,min=1 Project leader wpleader management N Y multiselect autocomplete=people,min=1 WP leader depositor management N Y textbox Depositor producer management N Y multiselect autocomplete=producer,min=1 Producer grantNumbers management N Y multiselect autocomplete=grant,min=1 Grant Information kindOfData descriptors Y Y checkbox open=0 Kind of Data Audiovisual,Collection,Dataset,Event,Image,Interactive Resource,Model,Physical Object,Service,Software,Sound,Text,Workflow,Other keywords descriptors N Y multiselect autocomplete=bioportal,onto=EFO:JERM:EDAM:MS:NMR:NCIT:OBI:PO:PTO:AGRO:ECOCORE:IOBC:NCBITAXON Keywords topics descriptors N Y multiselect autocomplete=VOvocab Topic Classification dataOrigin descriptors N Y checkbox open=0 Data origin observational data,experimental data,survey data,analysis data,text corpus,simulation data,aggregate data,audiovisual corpus,computer code,Other experimentfactor descriptors N Y multiselect autocomplete=vocabulary,min=1 Experimental Factor measurement descriptors N Y multiselect autocomplete=vocabulary,min=1 Measurement type technology descriptors N Y multiselect autocomplete=vocabulary,min=1 Technology type publication_citation descriptors N Y areabox rows=5,cols=30 Publication - Citation publication_idtype descriptors N Y dropbox width=200px Publication - ID Type -,ark,arXiv,bibcode,doi,ean13,eissn,handle,isbn,issn,istc,lissn,lsid,pmid,purl,upc,url,urn publication_idnumber descriptors N Y textbox width=400px Publication - ID Number publication_url descriptors N Y textbox Publication - URL comment other N Y areabox rows=15, cols=30 Additional information"},{"location":"definitions/dataverse/","title":"Dataverse Definition File","text":"<p>Open source research data repository software, approved by Europe.</p>"},{"location":"definitions/dataverse/#dataverse-definition-file_1","title":"Dataverse definition File","text":"<p>This definition file will allow Maggot to automatically export the dataset into a data repository based on Dataverse. The approach consists of starting from the Maggot metadata file in JSON format and transforming it into another JSON format compatible with Dataverse, knowing that this metadata crosswalk was made possible by choosing the right metadata schema at upstream.</p> <p> </p> <p>The structure of the Dataverse JSON output file being known internally, a minimum of information is therefore necessary to carry out the correspondence.</p> <p>The file must have 4 columns with headers defined as follows:</p> <ul> <li>column 1 - Field : shortname of the Maggot fields</li> <li>column 2 - Typename : The corresponding Dataverse fields. </li> <li>column 3 - Type :The Dataverse field at the top level encapsulating the fields in the \u201cTypename\u201d column where applicable</li> <li>column 4 - Terminology : Indicates from which vocabulary the mapping should be performed. Concerning the vocabulary to be mapped either i) on a dictionary, you must put the name of the corresponding dictionary prefixed with cvlist:, or ii) on ontologies or on a thesaurus, you must specify the corresponding entry in the mapping definition file prefixing it with cv:.</li> </ul> <p>Below an example of Dataverse definition file (TSV)  </p> <p>Example of Dataverse JSON file generated based on the definition file itself given as an example above.</p> <ul> <li>Dataverse JSON of the FRIM dataset</li> </ul> <p></p>"},{"location":"definitions/json-ld/","title":"JSON-LD Definition File","text":""},{"location":"definitions/json-ld/#json-ld-definition-file_1","title":"JSON-LD definition File","text":"<p>This definition file will allow harvesters to collect structured metadata based on a semantic schema, i.e the fields themselves and not just their content can be associated with a semantic definition (ontology for example) which will then facilitate the link between the metadata and therefore the data (JSON-LD). The chosen semantic schema is based on several metadata schemas.</p> <p>The full workflow to \"climb the Link Open Data mountain\" is resumed by the figure below :  </p> <p>Metadata schemas used to build the model proposed by default:</p> <ul> <li>Schema.org, Bioschemas.org, Datacite, DDI-RDF, DubinCore, Dataverse</li> </ul> <p>Definition of the JSON-LD context using the metadata schemas proposed by default  </p> <p>The structure of the JSON-LD is not known internally, information on the structure will therefore be necessary to carry out the correspondence.</p> <p>Example of JSON-LD definition file (partial) using the metadata schemas proposed by default (TSV)  </p> <p>Example of JSON-LD file generated based on the definition file itself given as an example above.</p> <ul> <li>JSON-LD file of the FRIM dataset</li> </ul> <p></p>"},{"location":"definitions/mapping/","title":"Mapping Definition File","text":""},{"location":"definitions/mapping/#mapping-definition-file_1","title":"Mapping definition File","text":"<p>The mapping file is used as indicated by its name to match a term chosen by the user during entry with another term from an ontology or a thesaurus and therefore to obtain a URL which will be used for referencing. It can be used for each metadata crosswalk requiring such a mapping (e.g. to the Dataverse, Zenodo or JSON-LD format).</p> <p>The role of this definition file is illustrated with the figure above  </p> <p>The file must have 5 columns with headers defined as follows:</p> <ul> <li>column 1 - CVname : name of the mapping entry</li> <li>column 2 - CVtype : type of the CV target (must be either bioportal or skosmos)</li> <li>column 3 - CVurl : URL of the corresponding web API</li> <li>column 4 - CVterm : name of the thesaurus or the ontology list separated by a comma</li> <li>column 5 - CVlang : the chosen language (mainly for thesauri)</li> </ul> <p>Below an example of Mapping definition file (TSV)</p> <p> </p>"},{"location":"definitions/oai-pmh/","title":"OAI-PMH Definition File","text":"<p>OAI-PMH is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives.</p>"},{"location":"definitions/oai-pmh/#oai-pmh-definition-file_1","title":"OAI-PMH definition File","text":"<p>This definition file will allow harvesters to collect metadata structured according to a standard schema (OAI-DC).</p> <ul> <li> <p>Based on the Open Archives Initiative Protocol for Metadata Harvesting - Version 2</p> </li> <li> <p>Example of a OAI-PMH Data Provider Validation</p> </li> <li> <p>Example of OAI-PMH output for a dataset</p> <ul> <li>FRIM dataset</li> </ul> </li> </ul> <p>The structure of the OAI-PMH output file being known internally, a minimum of information is therefore necessary to carry out the correspondence.</p> <p>Example of OAI-PMH definition file (TSV)  </p> <p>Another example of OAI-PMH definition file (TSV) with identifers &amp; vocabulary mapping  </p> <p></p>"},{"location":"definitions/terminology/","title":"Terminology","text":""},{"location":"definitions/terminology/#definition-of-terminology","title":"Definition of terminology","text":"<p>There are two definition files to set up.</p> <ul> <li>The terminology definition file (config_terms.txt) serving to describe all terminology used to define the metadata of a dataset.</li> <li>The terminology documentation file (config_doc.txt) serving to documente all terminology definitions.</li> </ul> <p>Each time there is a change in these two definition files, it is necessary to convert them so that they are taken into account by the application.</p> <p>Terminology is the set of terms used to define the metadata of a dataset. A single file (web/conf/config_terms.txt) contains all the terminology. The input and search interfaces (e.g screenshot) are completely generated from this definition file, thus defining i) each of the fields, their input type (checkbox, dropbox, textbox, ...) and ii) the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms).</p> <p> </p> <p>The metadata schema proposed by defaut is mainly established according to the DDI (Data Documentation Initiative) schema that also corresponds to that adopted by the Dataverse software.</p> <p>Terminology is organised in several sections. By default 6 sections are proposed, but you can redefine them as you wish:</p> <ul> <li>DEFINITION : Section for describing shortly the dataset. </li> <li>STATUS : Section for defining the status of the dataset, associated rights, dates, etc.</li> <li>MANAGEMENT : Section for assigning names of people or organizations who participated in the production of data and according to the type of participation.</li> <li>DESCRIPTORS : Section for defining elements characterizing the data themselves and certain experimental conditions for obtaining them.</li> <li>OTHER : Section for entering miscellious information (protocols, comments, issues, ...)</li> <li>RESOURCES : Section for defining metadata about all the resources you want, i.e both external (links) and internal (data files on the storage space) resources. This section does not require any configuration a priori, it is added de facto.</li> </ul> <p>For each section, fields are then defined. These fields can be defined according to the way they will be entered via the web interface. There are 6 different types of input: check boxes (checkbox), drop lists (dropbox), single-line text boxes (textbox), single-line text boxes with an additional box for multiple selection from a catalog of terms (multiselect), date picker (datebox) and multi-line text boxes (areabox).</p> <p> </p> <p>For two types (checkbox and dropbox), it is possible to define the values to be selected (predefined terms).</p> <p></p>"},{"location":"definitions/terminology/#structure-of-the-terminology-definition-file-tsv","title":"Structure of the Terminology definition file (TSV)","text":"<p>The file must have 9 columns with headers defined as follows:</p> <ul> <li>column 1 - Field : shortname of the fields</li> <li>column 2 - Section : shortname ot the sections</li> <li>column 3 - Required : indicates if the field is mandatory ('Y') or not ('N')</li> <li>column 4 - Search : indicates if the field can be used as a criterion search ('Y') or not ('N')</li> <li>column 5 - Shortview : indicates with ordered numbers if the field serves for the overview table after the search (empty by default)</li> <li>column 6 - Type : indicates the way they will be entered via the web interface (possible values are: textbox, dropbox, checkbox, multiselect, datebox and areabox).</li> <li>column 7 - Features : dependings on the Type value, one can specifiy some specific features. If several features, they must be separated by a comma.<ul> <li>open=0 or open=1 (checkbox) :  indicates if the selection is opened or not. See Vocabulary.</li> <li>autocomplete=entity (textbox, checkbox &amp; multiselect) :  The entity.js file must be present under web/cvlist/entity/ if the entity is a dictionary otherwise it must be present under web/js/autocomplete. See Vocabulary.</li> <li>width=NNNpx (textbox, dropbox, datebox) : allows you to specify the width of the box. Usefull if you want put several fields in the same line. See note 1 below.</li> <li>row=NN and cols=NN (areabox) : allows you to specify the row and column size of the textarea.</li> </ul> </li> <li>column 8 - Label : Labels corresponding to the fields that will appear in the web interface</li> <li> <p>column 9 - Predefined terms : for fields defined with a type equal to checkbox or dropbox, one can give a list of terms separated by a comma.</p> </li> <li> <p>Notes</p> <ul> <li>the fields will be displayed in the same order as in the file and by section. So if you want to specify several textboxes with particular sizes so that they are on the same line, they should belong to the same section and follow each other in the file in the same order.</li> <li>the title and description fields are mandatory but not necessarily in the same section.</li> </ul> </li> </ul> <p>Below an example of Terminology definition file (TSV)  </p> <p>Example of Maggot JSON file generated based on the same definition file</p> <ul> <li>Maggot JSON of the FRIM dataset and its corresponding JSON-schema</li> </ul> <p></p>"},{"location":"definitions/terminology/#structure-of-the-terminology-documentation-file-tsv","title":"Structure of the Terminology documentation file (TSV)","text":"<p>The documentation definition file is used to have online help for each field (small icon placed next to each label on the form). So it should only be modified when a field is added or deleted, or moved to another section. This file will be used then to generate the online metadata documentation according to the figure below (See Configuration to find out how to carry out this transformation).</p> <p> </p> <p>The file must have 3 columns with headers defined as follows:</p> <ul> <li>column 1 - Type : The type of the element, namely 'section', 'field' or 'option'. An 'option' type must correspond to each of the options for a field corresponding to a drop-down list.</li> <li>column 2 - Name : Name of the element. The names of the sections, variables and drop-down options must be exactly the same as those specified in the terminology definition file.</li> <li>column 3 - Description : The description corresponding to the element, serving as much as possible to give indications on the information to be selected or entered, in order to remove possible ambiguities.</li> </ul> <p>Below an example of Terminology documentation file (TSV)  </p> <p>Same example as above converted to HTML format using Markdown format</p> <ul> <li>Metadata Documentation</li> </ul> <p></p>"},{"location":"definitions/vocabulary/","title":"Vocabulary","text":""},{"location":"definitions/vocabulary/#vocabulary_1","title":"Vocabulary","text":"<ul> <li>In this section we expose the full extent of the possibilities concerning the vocabulary in Maggot.</li> <li>Choosing the type of vocabulary and how to enter it depends entirely on what you put in the terminology definition file. However, some approaches require a little technicality by writing small scripts based on JavaScript, but nothing too serious. You can always take an already ready-made script and modify only the part that concerns your focus.</li> </ul> <p>1 -  Vocabulary based on a list of terms fixed in advance (checbox with feature open=0)</p> <ul> <li>List of well-chosen and limited Control Vocabulary e.g according to a reference e.g. Data Document Initiative.</li> </ul> <p> </p> <p>2 - Vocabulary open for addition (checkbox with feature open=1)</p> <ul> <li>allows you to collect the desired Control Vocabulary (CV) from users. In order to initiate the list, you can put some terms in the predefined terms column of the terminology definition file.</li> </ul> <p> </p> <p>3 - Vocabulary based on a web API in a text field (textbox)</p> <ul> <li>The web API is defined in a JavaScript file with the same name as the assigned variable (here cities) and must present under web/js/autocomplete. For example, to enter a French city you can use the API geo.api.gouv.fr. See cities.js</li> </ul> <p> </p> <p>4 - Vocabulary based on a dictionary with multiple selection (multiselect) </p> <ul> <li>Dictionaries allow you to record multiple information necessary to define an entity, such as the names of people or even the funders. These information, once entered and saved in a file called a dictionary. Based on a very simple JavaScript retrieving the complete list of items included in the dictionary, thus creating a sort of internal API, we can fill a Maggot field by  autocompletion related to a search for these items.</li> <li>The JavaScript file must be named dico.js and be present under web/cvlist/dico/ where dico is the name of the dictionary. See for instance people.js</li> </ul> <p> </p> <p>5 - Vocabulary based on a SKOSMOS Thesaurus with multiple selection (multiselect) </p> <ul> <li>SKOSMOS is a web tool facilitating the posting of controlled vocabulary online in the form of a thesaurus according to the SKOS data model. It offers a navigation interface as well as a web API. A simple JavaScript allows you to easily connect this web API with a multiselect field.</li> <li>The JavaScript file must have the same name as the assigned variable (here VOvocab) and must present under web/js/autocomplete. See for instance VOvocab.js.</li> </ul> <p> </p> <p>6 - Vocabulary based on an OntoPortal with multiple selection (multiselect) </p> <ul> <li>Portals based on OntoPortal offer the wealth of ontologies according to several domains of application (e.g. BioPortal in the biomedical domain, AgroPortal in the domain of plants).</li> <li>No need of JavaScript file. The Bioportal Autocompletion widget has been implemented into Maggot. You have to only declare the ontology you want to use directly into the terminology definition file in order to easily connect this widget with a multiselect field.</li> </ul> <p> </p> <p></p>"},{"location":"definitions/zenodo/","title":"Zenodo Definition File","text":"<p>Open source research data repository software, approved by Europe.</p>"},{"location":"definitions/zenodo/#zenodo-definition-file_1","title":"Zenodo definition File","text":"<p>This definition file will allow Maggot to automatically export the dataset into a data repository based on Zenodo. The approach consists of starting from the Maggot metadata file in JSON format and transforming it into another JSON format compatible with Zenodo.</p> <p>The structure of the Zenodo JSON output file is not known internally, information on the structure will therefore be necessary to carry out the correspondence.</p> <p>Below an example of Zenodo definition file (TSV)  </p> <p>Example of Zenodo JSON file generated based on the definition file itself given as an example above.</p> <ul> <li>Zenodo JSON of the FRIM dataset</li> </ul> <p></p>"},{"location":"publish/","title":"Publish Metadata","text":""},{"location":"publish/#publish-metadata_1","title":"Publish Metadata","text":"<ul> <li>Once we have decided to publish our metadata with possibly our data, we can choose the repository that suits us. Currently repositories based on Dataverse and Zenodo are supported, both being Europe-approved repositories.</li> <li> <p>Using an approach that might be called \u201cmachine-readable metadata,\u201d it is possible to populate metadata for a dataset into one of the proposed data repositories via its web API, provided that you have taken care to correctly define your metadata schema so that it is possible to make a correspondence with the chosen data repository using a mapping definition file.</p> </li> <li> <p>The principle is illustrated by the figure above.</p> </li> </ul> <p> </p> <ul> <li>We start from the Maggot JSON format metadata file generated from the web interface and based on the metadata profile defined by the terminology definition files. </li> <li>Then from a file defining the correspondence between the Maggot fields and those of the target repository, we can perform a metadata crosswalk to the JSON format supported by the web API of the target repository.</li> <li>During the process we enrich the metadata with controlled vocabularies based either on dictionaries or on thesauri and/or ontologies. For the latter cases, we use the web APIs of these sources to perform the mapping (see the definition of mapping).</li> <li>Finally, to be able to carry out the transfer i.e. the submission to the target repository (we say \"push\" for short), we first need to connect to the repository in order to retrieve the key (the API token) authorizing us to submit the dataset. This obviously assumes that we have the privileges (creation/modification rights) to do so.</li> </ul>"},{"location":"publish/#httpswwwgooglecomsearchqmetadatacrosswalkdefinitionoqmetadatacrosswalk","title":"https://www.google.com/search?q=metadata+crosswalk+definition&amp;oq=metadata+crosswalk","text":""},{"location":"publish/dataverse/","title":"Publish into Dataverse","text":""},{"location":"publish/dataverse/#publish-into-dataverse_1","title":"Publish into Dataverse","text":"<ul> <li>Based on the Dataverse Native API</li> </ul> <p>1 - To submit metadata to a Dataverse repository, you must first select a dataset either from the drop-down list corresponding to the datasets listed on the data storage space or a metadata file from your local disk.</p> <p>2 - You then need to connect to the repository in order to retrieve the key (the  API token) authorizing you to submit the dataset. This obviously assumes that you have the privileges (creation/modification rights) to do so.</p> <p>3 - After choosing the repository URL, you must also specify on which dataverse collection you want to deposit the datasets. As previously, you must have write rights to this dataverse collection.</p> <p> </p> <p></p> <ul> <li>Then, all you have to do is click on 'Publish' to \"push\" the metadata to the repository. The figure below illustrates based on an example how the metadata is recorded in the repository as well as the Mapping corresponding to the fields linked to Controlled Vocabularies.</li> </ul> <p></p> <p></p>"},{"location":"publish/dataverse/#deposit-data-files","title":"Deposit data files","text":"<ul> <li> <p>If you also want to deposit data files at the same time as the metadata, you will need:</p> <ul> <li> <p>1 - declare the files to be deposited in the resources; these same files must also be present in the storage space.</p> </li> <li> <p>2 - create a semaphore file (META_datafile_ok.txt); its sole presence, independently of its content (which may be empty) will authorize the transfer. Indeed, the creation of such a file guarantees that the user has actually write rights to the storage space corresponding to his dataset. This prevents someone else from publishing the data without having the right to do so. This mechanism also avoids having to manage user accounts on Maggot.</p> </li> </ul> </li> </ul> <p> </p> <p></p> <ul> <li>The figure below illustrates based on an example how data files appear on the repository with annotations corresponding to those created in Maggot.  </li> </ul> <p></p>"},{"location":"publish/zenodo/","title":"Publish into Zenodo","text":""},{"location":"publish/zenodo/#publish-into-zenodo_1","title":"Publish into Zenodo","text":"<ul> <li>Based on the Zenodo REST-API</li> </ul> <p>1 - To submit metadata to a Zenodo repository, you must first select a dataset either from the drop-down list corresponding to the datasets listed on the data storage space or a metadata file from your local disk.</p> <p>2 - Unless you have previously saved your API token, you must create a new one and copy and paste it before validating it. Before validating, you must check the deposit:access and deposit:write boxes in order to obtain creation and modification rights with this token.</p> <p>3 - After choosing the repository URL, you can optionally choose a community to which the dataset will be linked. By default, you can leave empty this field.</p> <ul> <li>Warning : given the new changes introduced to the Zenodo validation process (October 2023), it seems that it is no longer possible to validate a community via API. Only a choice via the Zenodo web interface will allow you to do so in order to be validated later by the manager of this community.</li> </ul> <p> </p> <p></p>"},{"location":"publish/zenodo/#deposit-data-files","title":"Deposit data files","text":"<ul> <li> <p>If you also want to deposit data files at the same time as the metadata, you will need (see figure below)</p> <ul> <li> <p>1 - declare the files to be deposited in the resources (1) ; these same files must also be present in the storage space.</p> </li> <li> <p>2 - create a semaphore file (META_datafile_ok.txt) (2); its sole presence, independently of its content will authorize the transfer. Indeed, the creation of such a file guarantees that the user has actually write rights to the storage space corresponding to his dataset. This prevents someone else from publishing the data without having the right to do so. This mechanism also avoids having to manage user accounts on Maggot.</p> </li> </ul> </li> <li> <p>Then, all you have to do is click on 'Publish' to \"push\" the metadata and data to the repository (3).</p> </li> <li> <p>After submission and if everything went well, a link to the deposit will be given to you (4).</p> </li> </ul> <p> </p> <p></p> <ul> <li>The figure below illustrates based on an example how the metadata and data is recorded in the repository.</li> </ul> <p> </p> <p></p>"},{"location":"tutorial/","title":"Quick tutorial","text":""},{"location":"tutorial/#quick-tutorial_1","title":"Quick tutorial","text":"<p>This is a quick tutorial of how to use the Maggot tool in practice and therefore preferably targeting the end user. </p> <p>See a short Presentation and Poster if you want to have a more general overview of the tool.</p> <p></p>"},{"location":"tutorial/#overview","title":"Overview","text":"<p>The Maggot tool is made up of several modules, all accessible from the main page by clicking on the corresponding part of the image as shown in the figure below:</p> <p> </p> Configuration <p>This module mainly concerns the data manager and makes it possible to construct all the terminology definition files, i.e. the metadata and sources of associated vocabularies. See Definition files then Configuration.</p> Private Access <p>This module allows data producer to temporarily protect access to metadata for the time necessary before sharing it within his collective. See Private access key management.</p> Dictionaries <p>This module allows data producer to view content of all dictionaries. It also allows data steward to edit their content. See Dictionaries for technical details only.</p> Metadata Entry <p>This is the main module allowing the data producer to enter their metadata relating to a dataset. See the corresponding tutorial for Metadata Entry.</p> Search datasets <p>This module allows users to search datasets based on the associated metadata, to see all the metadata and possibly to have access to the data itself. This obviously assumes that the metadata files have been deposited in the correct directory in the storage space dedicated to data management within your collective. See Infrastructure.</p> File Browser <p>This module gives users access to a file browser provided that the data manager has installed it. See File Browser</p> Publication <p>This module allows either the data producer or the data steward to publish the metadata with possibly the corresponding data within the suitable data repository. See Publication</p> <p></p>"},{"location":"tutorial/describe/","title":"Quick tutorial","text":""},{"location":"tutorial/describe/#metadata-entry","title":"Metadata Entry","text":"<p>The figures are given here for illustration purposes but certain elements may be different for you given that this will depend on the configuration on your instance, in particular the choice of metadata, and the associated vocabulary sources.</p> <p>Indeed, the choice of vocabulary sources (ontologies, thesauri, dictionaries) as well as the choice of metadata fields to enter must in principle have been the subject of discussion between data producers and data manager during the implementation of the Maggot tool in order to find the best compromise between the choice of sources and all the scientific fields targeted (see Definition files). However a later addition is always possible.</p> <p></p>"},{"location":"tutorial/describe/#overview","title":"Overview","text":"<p>When you enter the metadata entry module you should see a page that looks like the figure below:</p> <p> </p> <ul> <li> <p>All the fields (metadata) to be filled in are distributed between several tabs, also called sections. Each section tries to group together a set of fields relating to the same topic.</p> </li> <li> <p>You can reload a previously created metadata file. All form fields will then be initialized with the value(s) defined in the metadata file.</p> </li> <li> <p>You must at least complete the mandatory fields marked with a red star.</p> </li> <li> <p>It is possible to obtain help for each field to be completed. A mini-icon with a question mark is placed after each field label. By clicking on this icon, a web page opens with the focus on the definition of the corresponding field. This help should provide you with at least a definition of a field and, if necessary, instructions on how to fill it in. It should be noted that the quality of the documentation depends on each instance and its configuration.</p> </li> <li> <p>Once the form has been completed, even partially (at least those which are mandatory and marked with a red star), you can export your metadata in the form of a file. See Metadata File</p> </li> </ul> <p></p>"},{"location":"tutorial/describe/#dictionaries","title":"Dictionaries","text":"<p>Dictionary-based metadata (e.g. people's names) can easily be entered by autocomplete in the 'Search value' box provided the name appears in the corresponding dictionary.</p> <p> </p> <p>However, if the name does not yet appear in the dictionary, simply enter the full name (first name &amp; last name) in the main box, making sure to separate each name with a comma and then a space as shown in the figure below.</p> <p> </p> <p>Then you can request to add the additional person name(s) to the dictionary later as described below:</p> <p> </p> <ul> <li> <p>From the home page, select \"Dictionaries\". As username, just put \"maggot\" (this might be different within your instance).</p> </li> <li> <p>Then after choosing the \"people\" dictionary, you can download the entire dictionary in a TSV file (Tab-Separated Values) ready to be edited with your favorite spreadsheet.</p> </li> <li> <p>Add all the desired people's names with their institution, and possibly their ORCID and their email address. Please note that emails are required for authors and contacts</p> </li> <li> <p>You will then just have to send it to the data manager so that he can add new people's names to the online dictionary.</p> </li> </ul> <p>Please proceed in the same way for all dictionaries (people, funders, producer, vocabulary)</p> <p></p>"},{"location":"tutorial/describe/#controlled-vocabulary","title":"Controlled Vocabulary","text":"<p>Depending on the configuration of your instance, it is very likely that certain fields (eg. keywords) are connected to a controlled vocabulary source (e.g. ontology, thesaurus). Vocabulary based on ontologies, thesauri or even dictionaries can easily be entered by autocomplete in the \"search for a value\" box provided that the term exists in the corresponding vocabulary source. </p> <p> </p> <p>If a term cannot be found by autocomplete, you can enter the term directly in the main box, making sure to separate each term with a comma and a space as shown in the figure below.</p> <p> </p> <p>The data steward will later try to link it to a vocabulary source that may be suitable for the domain in question. Furthermore, even if the choice of vocabulary sources was made before the tool was put into service, a later addition is always possible. You should make the request to your data manager.</p> <p></p>"},{"location":"tutorial/describe/#resources","title":"Resources","text":"<p>Because data is often scattered across various platforms, databases, and file formats, this making it challenging to locate and access. This is called data fragmentation. So the Maggot tool allows you to specify resources, i.e. data in the broader sense, whether external or internal, allowing to centralize all links towards data.</p> <ul> <li>External resources will be specified by a URL with preference for a permanent identifier (e.g. DOI) but also any URL pointing to data whether they comply with the FAIR principle (e.g. ODAM) or not.</li> <li>Internal resources will be the data files to be uploaded to the data repository at push time. In the latter case the exact name of the file on the storage space must appear in the location field.</li> <li>Furthermore, in the case of local data management, it would be wise to indicate in which space the data is located if it is not located in the same place as the metadata (e.g. NextCloud, Unit NAS, etc.)</li> </ul> <p>Four fields must be filled in :</p> <p> </p> <ul> <li> <p>Resource Type : Choose the type of the resource in the droplist.</p> </li> <li> <p>Media Type : Choose a media type if applicable by autocomplete.</p> </li> <li> <p>Description : Provide a concise and accurate description of the resource. Must not exceed 30 characters.</p> </li> <li> <p>Location : Preferably indicate an URL to an external resource accessible to all. But it can also be a password-protected resource (e.g. a disk space on the cloud). This can also be text clearly indicating where the resource is located (internal disk space). Finally, this can be the name of a file deposited on the same disk space as the metadata file, in order to be able to push it in the data repository at the same time as the metadata (see Publication).</p> </li> </ul> <p></p>"},{"location":"tutorial/metadata/","title":"Quick tutorial","text":""},{"location":"tutorial/metadata/#metadata-file","title":"Metadata File","text":"<p>Once the form has been completed, even partially (at least those which are mandatory and marked with a red star), you can export your metadata in the form of a file. The file is in JSON format and must have the prefix 'META_'.</p> <p>By clicking on the \"Generate the metadata file\" button, you can save it on your disk space. </p> <p> </p> <p>Furthermore, if email sending has been configured (see settings), then you have the possibility of sending the metadata file to the data managers for conservation, and possibly also for supporting its storage on data disk space if specific rights are required.</p> <p> </p> <p>In case you want to save the metadata file on your disk space, you have two ways to use this file:</p> 1. The first use is the recommended one because it allows metadata management within your collective. <p>You drop the metadata file directly under the data directory corresponding to the metadata. Indeed, when installing the tool, a storage space dedicated to the tool had to be provided for this purpose. See infrastructure. Once deposited, you just have to wait around 30 minutes maximum so that the tool has had time to scan the root of the data directories looking for new files in order to update the database. After this period, the description of your dataset will be visible from the interface, and a selection of criteria will be made in order to restrict the search.</p> <p> </p> <p>You will then have the possibility to publish the metadata later with possibly the corresponding data in a data repository such as Dataverse or Zenodo.</p> 2. The second use is only to deposit the metadata into a data repository <p>Whether with Dataverse or Zenodo, you have the possibility of publishing metadata directly in one or other of these repositories without using the storage space.</p> <p> </p> <p>Please note that you cannot also deposit the data files in this way. You will have to do this manually for each of them directly online in the repository.</p> <p></p>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\\\s\\\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":"<p> An ecosystem for sharing metadata <p> </p>"},{"location":"#foster-good-data-management-with-data-sharing-in-mind","title":"Foster good data management, with data sharing in mind","text":"<p>Sharing descriptive Metadata is the first essential step towards Open Scientific Data. With this in mind, Maggot was specifically designed to annotate datasets by creating a metadata file to attach to the storage space. Indeed, it allows users to easily add descriptive metadata to datasets produced within a collective of people (research unit, platform, multi-partner project, etc.). This approach fits perfectly into a data management plan as it addresses the issues of data organization and documentation, data storage and frictionless metadata sharing within this same collective and beyond.</p>"},{"location":"#main-features-of-maggot","title":"Main features of Maggot","text":"<p>The main functionalities of Maggot were established according to a well-defined need (See Background).</p> <ol> <li>Documente with Metadata your datasets produced within a collective of people, thus making it possible :<ul> <li>to answer certain questions of the Data Management Plan (DMP) concerning the organization, documentation, storage and sharing of data in the data storage space, </li> <li>to meet certain data and metadata requirements, listed for example by the Open Research Europe in accordance with the FAIR principles.</li> </ul> </li> <li>Search datasets by their metadata<ul> <li>Indeed, the descriptive metadata thus produced can be associated with the corresponding data directly in the storage space then it is possible to perform a search on the metadata in order to find one or more sets of data. Only descriptive metadata is accessible by default.</li> </ul> </li> <li>Publish the metadata of datasets along with their data files into an Europe-approved repository</li> </ol> <p>See a short Presentation and Poster for a quick overview.</p> <p></p>"},{"location":"#overview-of-the-different-stages-of-metadata-management","title":"Overview of the different stages of metadata management","text":"<p> Note: The step numbers indicated in the figure correspond to the different points developed below </p> <p>1 - First you must define all the metadata that will be used to describe your datasets.   All metadata can be defined using a single file (in TSV format, therefore using a spreadsheet). This is a unavoidable step because both input and search interfaces are completely generated from these definition files, defining in this way each of the fields along with their input type and also the associated Controlled Vocabulary (ontology, thesaurus, dictionary, list of fixed terms). The metadata proposed by default was mainly established according to the DDI (Data Documentation Initiative) metadata schema. This schema also largely corresponds to that adopted by the Dataverse software. See the Terminology Definition section.  </p> <p>2 - Entering metadata will be greatly facilitated by the use of dictionaries.   The dictionaries offered by default are: people, funders, data producers, as well as a vocabulary dictionary allowing you to mix ontologies and thesauri from several sources. Each of these dictionaries allows users, by entering a name by autocompletion, to associate information which will then be added when exporting the metadata either to a remote repository, or for harvesting the metadata. Thus this information, once entered into a dictionary, will not need to be re-entered again.  </p> <p>3 - The web interface for entering metadata is entirely built on the basis of definition files.    The metadata are distributed according to the different sections chosen, each constituting a tab (see screenshot). Mandatory fields are marked with a red star and must be documented in order to be able to generate the metadata file. The entry of metadata governed by a controlled vocabulary is done by autocompletion from term lists (dictionary, thesaurus or ontology). We can also define external resources (URL links) relating to documents, publications or other related data. Maggot thus becomes a hub for your datasets connecting different resources, local and external. Once the mandatory fields (at least) and other recommended fields (at best) have been entered, the metadata file can be generated in JSON format.  </p> <p>4 - The file generated in JSON format must be placed in the storage space reserved for this purpose.    The role played by this metadata file can be seen as a README file adapted for machines, but also readable by humans. With an internal structure, it offers coherence and consistency of information that a simple README file with a completely free and therefore unstructured text format does not allow. Furthermore, the central idea is to use the storage space as a local data repository, so that the metadata should go to the data and not the other way around.  </p> <p>5 - A search of the datasets can thus be carried out on the basis of the metadata.    Indeed, all the JSON metadata files are scanned and parsed according to a fixed time interval (30 min) then loaded into a database. This allows you to perform searches based on predefined metadata. The search form, in a compact shape, is almost the same as the entry form (see a screenshot). Depending on the search criteria, a list of data sets is provided, with for each of them a link pointing to the detailed sheet.  </p> <p>6 - The detailed metadata sheet provides all the metadata divided by section.   Unfilled metadata does not appear by default. When a URL can be associated with information (ORCID, Ontology, web site, etc.), you can click on it to go to the corresponding link. Likewise, it is possible to follow the associated link on each of the resources. From this sheet, you can also export the metadata according to different schemata (Dataverse, Zenodo, JSON-LD). See screenshot 1 &amp; screenshot 2.  </p> <p>7 - Finally, once you have decided to publish your metadata with your data, you can choose the repository   that suits you (currently repositories based on Dataverse and Zenodo are supported).  </p> <p></p>"},{"location":"#additional-key-points","title":"Additional key points","text":"<ul> <li><p>Being able to generate descriptive metadata from the start of a project or study without waiting for all the data to be acquired or processed, nor for the moment when one wish to publish data, thus respecting the research data lifecycle as best as possible. Read more.</p></li> <li><p>The implementation of the tool requires involving all data stakeholders upstream (definition of the metadata schema, vocabularies, targeted data repositories, etc.); everyone has their role: data manager/data steward on one side but also scientists and data producers on the other. Read more.</p></li> <li><p>A progressive rise towards an increasingly controlled and standardized vocabulary is not only possible but even encouraged. First we can start with a simple vocabulary dictionary used locally and grouping together domain vocabularies. Then we can consider the creation of a thesaurus with or without mapping to ontologies. The promotion of ontologies must also be done gradually by selecting those which are truly relevant for the collective. A tool like Maggot makes it easy to implement them (See Vocabulary). Read more.</p></li> </ul> <p></p> <p></p>"},{"location":"about/","title":"About","text":""},{"location":"about/#background","title":"Background","text":""},{"location":"about/#motives","title":"Motives","text":"<ul> <li>Meet the challenges of organizing, documenting, storing and sharing data from a site, a project or a structure (unit, platform, etc.).</li> <li>Have visibility of what is produced within the collective: datasets, software, databases, images, sounds, videos, analyses, codes, etc.</li> <li>Fall within an open science quality approach for sharing and reproducibility.</li> <li>Promote FAIR (at least the Findable &amp; Accessible criteria) within the collective.</li> <li>Raise awareness among newcomers and students about a better description of what they produce.</li> </ul>"},{"location":"about/#state-of-need","title":"State of need","text":"<ul> <li>Implementing a data management plan imposes prerequisites such as the externalization of data to be preserved outside of users' disk space. This does not only concern published data but all data produced during the duration of a project. Above all, this outsourcing makes it possible to gather the data in one place and already constitutes a first-level backup. This becomes even more necessary when temporary agents (doctoral students, post-docs, interns, fixed-term contracts) are involved in data production.</li> <li>Consequently, the concern arises about the organization of these storage spaces. Should they be harmonized, i.e. impose good practices such as i) the naming of folders and files, ii) a folder structure (docs, data, scripts, etc.), iii) the use of README files, etc.</li> <li>At a minimum, using a README file seems the simplest and least restrictive. But then the question arises \u201cwhat to put in it\u201d? Templates can be offered to simplify their writing. But then the question arises of how to use them effectively when we want to find information? With what vocabulary?</li> </ul>"},{"location":"about/#proposed-approach","title":"Proposed approach","text":"<ul> <li>The two main ideas behind the tool are:<ul> <li>Make the data storage space a data repository without having to move the data, then ensure that the metadata gets to the data.</li> <li>Be able to \u201ccapture\u201d the user\u2019s metadata as easily as possible by using their vocabulary.</li> </ul> </li> <li>Concerning the first idea: \"Just\" place a metadata file (JSON format) describing the project data in each subdirectory, and then find the projects and/or data corresponding to specific criteria. The choice fell on the JSON format, very suitable for describing metadata, readable by both humans and machines.</li> <li> <p>Concerning the second idea: Given the diversity of the fields, the approach chosen is to be both the most flexible and the most pragmatic possible by allowing users to choose their own vocabulary (controlled or not) corresponding to the reality of their field and their activities. However, a good approach is as much as possible to use only controlled vocabulary, that is to say relevant and sufficient vocabulary used as a reference in the field concerned to allow users to describe a project and its context without having to add additional terms. To this end, the tool must allow users a progressive approach towards the adoption of standardized controlled vocabularies (thesauri or even ontologies).</p> </li> <li> <p>With the approach proposed by Maggot, initially there is no question of opening the data, but of managing metadata associated with the data on a storage space with a precise perimeter represented by the collective (unit, team, project , platform, \u2026). The main characteristic of the tool is, above all, to \u201ccapture\u201d the metadata as easily as possible according to a well-chosen metadata schema. However, the opening of data via their metadata must be a clearly stated objective within the framework of projects financed by public institutions (e.g Europe). Therefore if you have taken care to correctly define your metadata schema so that it is possible to make a metadata crosswalk (using a mapping file) with a data repository recognized by the international community, then you can easily \"push\" its metadata with the data without having re-enter anything.</p> </li> </ul>"},{"location":"about/#links","title":"Links","text":"<ul> <li>Source code on Github : inrae/pgd-mmdt</li> <li>Issues tracker : inrae/pgd-mmdt/issues</li> <li>Instance online : INRAE UMR 1322 BFP</li> </ul>"},{"location":"about/#preprint","title":"Preprint","text":"<p>Daniel Jacob, Francois Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil (2024) Maggot: An ecosystem for sharing metadata within the web of FAIR Data, BioRxiv, https://doi.org/10.1101/2024.05.24.595703</p>"},{"location":"about/#contacts","title":"Contacts","text":"<ul> <li>Daniel Jacob (INRAE UMR BFP) : daniel.jacob @ inrae.fr</li> </ul>"},{"location":"about/#designers-developers","title":"Designers / Developers","text":"<ul> <li> <p>Daniel Jacob (INRAE UMR BFP) | CATI PROSODIe</p> </li> <li> <p>Fran\u00e7ois Ehrenmann (INRAE UMR BioGECO) | CATI GEDEOP</p> </li> <li> <p>Philippe Chaumeil (INRAE UMR BioGECO)</p> </li> </ul>"},{"location":"about/#contributors","title":"Contributors","text":"<ul> <li> <p>Edouard Guitton (INRAE Dept. SA, Emerg'IN)</p> </li> <li> <p>St\u00e9phane Bernillon (INRAE UR MycSA)</p> </li> <li> <p>Joseph TRAN (INRAE UMR EGFV) | CATI BARIC</p> </li> </ul> <p></p> <p> </p> <p></p>"},{"location":"bloxberg/","title":"Bloxberg Blockchain","text":""},{"location":"bloxberg/#experimental-certification-of-metadata-file-on-the-bloxberg-blockchain","title":"EXPERIMENTAL - Certification of metadata file on the bloxberg blockchain","text":""},{"location":"bloxberg/#motivation","title":"Motivation","text":"<p>To guarantee the authenticity and integrity of a metadata file by recording it permanently and immutably on the bloxberg blockchain.</p> <p>Indeed, the blockchain is a technology that makes it possible to keep track of a set of transactions (writings in the blockchain), in a decentralized, secure and transparent manner, in the form of a blockchain. A blockchain can therefore be compared to a large (public or private) unfalsifiable register. Blockchain is today used in many fields because it provides solutions to many problems. For example in the field of Higher Education and Research, registration of dataset metadata in the blockchain, makes possible in this way to certify, in an inalienable, irrefutable and completely transparent manner, the ownership and authenticity of the data as well as for example, the license of use and the date of production of the data. Research stakeholders are then more open to the dissemination of their data (files, results, protocols, publications, etc.) since they know that, in particular, the ownership, content and conditions of use of the data cannot not be altered.</p> <p>The Maggot tool could thus serve as a gateway to certify its data with the associated metadata. The complete process is schematized by the following figure:</p> <p> </p>"},{"location":"bloxberg/#about-bloxberg","title":"About bloxberg","text":"<p>bloxberg is the most important blockchain project in science. It was founded in 2019 by MPDL , looking for a way to store research results and make them available to other researchers. In this sense, bloxberg is a decentralized register in which results can be stored in a tamper-proof way with a time stamp and an identifier.</p> <p>bloxberg is based on the Ethereum Blockchain. However, it makes use of a different consensus mechanism: instead of \u201cProof of Stake\u201d used by Ethereum since 2022, bloxberg validates blocks through \u201cProof of Authority\u201d. Each node is operated by one member. All members of the association are research institutions and are known in the network.  Currently, bloxberg has 49 nodes. It is an international project with participating institutions from all over the world.</p>"},{"location":"bloxberg/#how-to-process","title":"How to process ?","text":"<p>You will need a Ethereum address and an API key (must be requested via bloxberg-services (at) mpdl.mpg.de). See an example of pushing a metadata file to the bloxberg blockchain using Maggot.</p> <p></p>"},{"location":"bloxberg/#useful-links","title":"Useful links","text":"<ul> <li>Bloxberg Documentation</li> <li>Blockexplorer</li> <li>Blockchain ESR (France)</li> </ul>"},{"location":"configuration/","title":"Configuration","text":""},{"location":"configuration/#terminology-configuration","title":"Terminology configuration","text":"<p>A single file (web/conf/config_terms.txt) contains all the terminology. The input and search interfaces are completely generated from this definition file, thus defining each of the fields, their input type (checkbox, dropbox, textbox, ...) and the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms). This is why a configuration and conversion step into JSON format is essential in order to be able to configure all the other modules (example: creation of the MongoDB database schema when starting the application before filling it).</p> <p> </p> <ul> <li>Note : The step numbers shown in the figure above are mentioned in brackets in the text below.</li> </ul>"},{"location":"configuration/#tsv-to-json","title":"TSV to JSON","text":"<ul> <li> <p>This function is used to generate the terminology definition file in JSON format (config_terms.json) and the corresponding JSON-Schema file (maggot-schema.json) from a tabulated file (1). You can either create a terminology definition file in TSV format from scratch (see below to have more details), or extract the file from the current configuration (see JSON to TSV).</p> </li> <li> <p>Once the terminology definition file has been obtained (2), you can load it and press 'Submit'.</p> </li> <li> <p>Three files are generated (3 &amp; 5):</p> </li> <li>config_terms.json and maggot-schema.json : These files should be placed in the web/conf directory (3). A (re)start of the application must be done in full mode (4) (sh ./run fullstart)</li> <li>config_doc.txt (5) : This file serves as a template for the documentation of the metadata profile. You should edit it with a spreadsheet program, and fill in the description column (6). Then it is used to generate the documentation file in markdown format (see TSV to DOC).</li> </ul>"},{"location":"configuration/#tsv-to-doc","title":"TSV to DOC","text":"<ul> <li> <p>This function generates the markdown documentation file (doc.md) from the template file (config_doc.txt) which is itself generated from the metadata definition file (config_terms.txt, cf TSV to JSON).</p> </li> <li> <p>Once the template file for the documentation (config_doc.txt) has been edited and documented (6) (see below to have more details), you can load it and press Submit button.</p> </li> <li> <p>The documentation file in markdown format (doc.md) is thus generated (7) and must be placed in the web/docs directory (8). Users will have access to this documentation file via the web interface, in the documentation section, heading \"Metadata\".</p> </li> </ul>"},{"location":"configuration/#json-to-tsv","title":"JSON to TSV","text":"<ul> <li>This function allows you to extract the terminology definition file in TSV format (config_terms.txt) from the current configuration. This allows you to start from this file, either to adapt your own metadata profile or simply to modify it slightly.</li> </ul>"},{"location":"dictionaries/","title":"Dictionaries","text":""},{"location":"dictionaries/#presentation","title":"Presentation","text":"<ul> <li>The use of dictionaries has no other purpose to facilitate the entry of metadata, entry which can be long and repetitive in generalist data warehouses (such as repository based on Dataverse).</li> <li>Dictionaries allow you to record multiple information necessary to define an entity, such as the names of people or even the funders. These information, once entered and saved in a file called a dictionary, can be subsequently associated with the corresponding entity. </li> <li>The dictionaries offered by default are: people (people), funders (grant), data producers (producer), as well as a vocabulary dictionary (vocabulary) allowing you to mix ontologies and thesauri from several sources.</li> <li>To add a new dictionary, simply create a directory under web/cvlist then putting the files corresponding to the dictionary inside. Dictionaries will be automatically found by browsing this directory.</li> <li>Dictionary files are made using a simple spreadsheet then exported in TSV format.</li> <li>Dictionaries are accessed through secure access limited to administrators allowing their editing. The login is by default 'admin'. You can add another account for consultation only using the following command:  <pre><code>sh ./run passwd &lt;user&gt;\n</code></pre></li> </ul>"},{"location":"dictionaries/#the-people-dictionary","title":"The people dictionary","text":"<ul> <li>Note : must not be changed in its format nor in its name.</li> <li> <p>Like any dictionary, there must be 3 files (see below). Please note that the names of these files must always contain the name of the dictionary, i.e. same as the directory.  </p> </li> <li> <p>The format of the file containing the dictionary data (people.txt) is defined by another file (people_format.txt).</p> </li> </ul> <p> </p> <ul> <li>Thus, we know that the people dictionary must contain 5 columns (last name, first name, institution, ORCID number and email address) and that some fields are mandatory (last name, first name, institution) and others optional (ORCID number, email address).</li> <li>Each of the fields must respect a format specified by a regular expression in order to be accepted as valid.</li> <li>Optionally, you can connect an web API to each of the fields in order to make an entry by autocompletion from a remote register. Currently only ROR (Research Organization Registry) web API is possible but the mechanism is in place for new extensions.</li> <li>The third file, a very simple script written in JavaScript, defines the way to retrieve the list of names (here by containing the first and last name). Note that the name of the variable must always be identical to that of the dictionary. <pre><code>var people = [];\n// Each item in the 'people' list consists of the first two columns (0,1) separated by a space\nget_dictionary_values('people', merge=[0,' ',1]) </code></pre></li> <li> <p>Below, an example is given when modifying a record. When you click on the Institute field which is connected to the ROR web API, the drop-down list of reseach organizations that can correspond in the register appears, if there are any.  </p> </li> <li> <p>Note: It is possible to edit dictionaries, by adding an entry for example, and at the same time be able to immediately find this new entry in the metadata entry in the Maggot tool. Indeed each dictionary is reloaded into memory as soon as the corresponding input box is clicked. See an illustration.</p> </li> </ul> <p></p>"},{"location":"dictionaries/#other-dictionaries","title":"Other dictionaries","text":"<ul> <li> <p>Funders : The dictionary of the funders allows you to define the funding agency, project ID and its corresponding URL.  </p> <ul> <li>Note : can be renamed but while keeping its format (same columns and same layout).  </li> </ul> </li> <li> <p>Producers : The dictionary of the data producers allows you to define their Institute and  project ID and their corresponding URL. Optionally, you can add the URL of the logo.  </p> <ul> <li>Note : can be renamed but while keeping its format (same columns and same layout).  </li> </ul> </li> <li> <p>Vocabulary : Use this dictionary for mixing thesauri and ontologies in order to better target the entire controlled vocabulary of its field of application. Only the vocabulary is mandatory, the URL linked to an ontology or a thesaurus is optional. See Vocabulary section to learn the extent of the possibilities concerning vocabulary in Maggot.  </p> <ul> <li>Note : can be duplicated but while keeping its format (same columns and same layout).  </li> </ul> </li> </ul> <p></p>"},{"location":"gant/","title":"Gant","text":""},{"location":"gant/#gantt-diagrams-of-the-developments","title":"Gantt diagrams of the developments","text":"gantt     dateFormat YYYY-MM-DD     axisFormat  %Y-%m     title Diagrammes de Gantt pr\u00e9visionnel des d\u00e9veloppements     section MongoDB        1: des1, 2023-11-01,60d        2: des2, 2023-12-01,90d        3: des3, 2023-12-01,90d     section Couche API        4: des4, 2024-01-01,120d        5: des5, 2024-05-01,60d     section Interface Web        6a: des6, 2024-06-01,60d        6b: des7, 2024-07-01,60d        6c: des8, 2024-09-01,60d"},{"location":"infrastructure/","title":"Infrastructure","text":""},{"location":"infrastructure/#infrastructure-local-remote-or-mixed","title":"Infrastructure : Local, Remote or Mixed","text":"<p>The necessary Infrastructure involves 1) a machine running a Linux OS and 2) a dedicated storage space.</p> <p>1 - The machine will most often be of \"virtual\" type because more simpler to deploy, either locally (with VM providers such as VirtualBox, VMware Workstation or MS Hyper-V) or remotely (e.g VMware ESXi, Openstack: example of deployment). Moreover, the OS of your machine must allow you the deployment of docker containers. See for more details on \u201cWhat is Docker\u201d. The minimum characteristics of the VM are:  2 cpu, 2 Go RAM, 8 Go HD.</p> <p>2 - The dedicated storage space could be either in the local space of the VM, or in a remote place on the network.</p> <ul> <li>If the storage space is directly included in the VM, then tools like WinSCP or RcloneBrowser will allow you to easily transfer your files to the data space.</li> <li>If the storage space is your collective's NAS, you will need to make sure to open the port corresponding to the remote disk mount protocol (e.g SMB, NFS, iSCSI, ...). on your network's firewall. If both VM and data storage are not in the same private network, it will probably also require installing the sofware layer corresponding to your corporate VPN on the VM so that it can access your NAS. See example successfully tested.</li> <li>If the storage space is in a data center (e.g. NextCloud, Google Drive), then you will need to install a tool such as rclone on your VM in order to be able to mount the storage space on the VM's disk space. See example successfully tested.</li> </ul> <p></p>"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#install-on-your-linux-computer-or-linux-unix-server","title":"Install on your linux computer or linux / unix server","text":"<p>Requirements: The installation must be carried out on a (virtual) machine with  a recent Linux OS that support Docker (see Infrastructure)</p> <p></p>"},{"location":"installation/#retrieving-the-code","title":"Retrieving the code","text":"<p>Go to the destination directory of your choice then clone the repository and <code>cd</code> to your clone path:</p> <pre><code>git clone https://github.com/inrae/pgd-mmdt.git pgd-mmdt\ncd pgd-mmdt\n</code></pre> <p></p>"},{"location":"installation/#installation-of-docker-containers","title":"Installation of Docker containers","text":"<p>MAGGOT uses 3 Docker images for 3 distinct services:</p> <ul> <li>pgd-mmdt-db which hosts the MongoDB database</li> <li>pgd-mmdt-scan which scans the data and updates the contents of the database and the web interface</li> <li>pgd-mmdt-web which hosts the web server and the web interface pages</li> </ul> <p></p>"},{"location":"installation/#configuration","title":"Configuration","text":"<ul> <li>run : defines root of the data directory (including for development)</li> <li>dockerdbpart/initialisation/setupdb-js.template : defines MongoDB settings</li> <li>dockerscanpart/scripts/config.py : defines MongoDB settings (dbserver, dbport, username, password)</li> <li>web/inc/config/mongodb.inc : defines MongoDB settings (dbserver, dbport, username, password)</li> <li>web/inc/config/config.inc : defines many of web parameters (modify only if necessary)</li> <li>web/inc/config/local.inc : defines the application parameters specific to the local installation (not erase when updating).</li> </ul> <p>See Configuration settings</p> <p>Warning : You have to pay attention to put the same MongoDB settings in all the above configuration files. It is best not to change anything. It would have been preferable to put a single configuration file but this was not yet done given the different languages involved (bash, javascript, python, PHP). To be done!</p> <p>Note : If you want to run multiple instances, you will need to change in the run file, i) the container names, ii) the data path, iii) the MongoDB volume name and iv) the MongoDB port</p> <p>The following two JSON files are defined by default but can be easily configured from the web interface. See the Terminology Configuration section.</p> <ul> <li>web/conf/config_terms.json : define the terminology</li> <li>web/conf/maggot-schema.json : define the JSON schema used to validate metadata files.</li> </ul> <p></p>"},{"location":"installation/#commands","title":"Commands","text":"<p>The run shell script allows you to perform multiple actions by specifying an option :</p> <pre><code>cd pgd-mmdt\nsh ./run &lt;option&gt;\n</code></pre> <p>Options:</p> <ul> <li>build : Create the 3 Docker images namely pgd-mmdt-db, pgd-mmdt-scan and pgd-mmdt-web</li> <li>start : 1) Launch the 3 services by creating the Docker containers corresponding to the Docker images; 2) Create also the MongoDB volume.</li> <li>stop :  1) Remove all the 3 Docker containers; 2) Remove the MongoDB volume.</li> <li>initdb : Create and initialize the Mongo collection</li> <li>scan : Scan the data  according to a fixed period (30 min) and update the contents of the database and the web interface</li> <li>fullstart : Perform the 3 actions start, initdb and scan</li> <li>restart : Perform the 2 actions stop then fullstart</li> <li>ps : Check that all containers are running correctly</li> <li>passwd &lt;user&gt;: Define the admin password if no user is specified, allowing you to copy the new configuration file on the server via the web interface (see configuration and to add entries in dictionaries. If a user is specified, the dictionary consultation will be authorized for this user.</li> </ul> <p></p>"},{"location":"installation/#starting-the-application","title":"Starting the application","text":"<ul> <li> <p>You must first build the 3 docker container images if this has not already been done, by :    <pre><code>sh ./run build\n</code></pre></p> </li> <li> <p>The application can be sequentially started :</p> <ul> <li>Starting the web interface  <pre><code>sh ./run start\n</code></pre></li> <li>Initialization of the MongoDB database  <pre><code>sh ./run initdb\n</code></pre></li> <li>Scanning the data directory for metadata files (META_XXXX.json)  <pre><code>sh ./run scan\n</code></pre></li> </ul> </li> <li> <p>You can also launch these 3 steps with a single command:    <pre><code>sh ./run fullstart\n</code></pre></p> </li> </ul> <p></p>"},{"location":"installation/#launching-the-web-application-in-the-web-browser","title":"Launching the web application in the web browser","text":"<ul> <li> <p>Once the application is started, we can see if the containers are started using the following command:    <pre><code>docker ps -a\n</code></pre></p> </li> <li> <p>which should produce a result similar to the following:</p> </li> </ul> <pre>\n   CONTAINER ID  IMAGE          COMMAND                 CREATED          STATUS         PORTS                                  NAMES\n   5914504f456d  pgd-mmdt-web   \"docker-php-entrypoi.\"  12 seconds ago   Up 10 seconds  0.0.0.0:8087-&gt;80/tcp, :::8087-&gt;80/tcp  mmdt-web\n   226b13ed9467  pgd-mmdt-scan  \"cron -f\"               12 seconds ago   Up 11 seconds                                         mmdt-scan\n   81fecbb56d23  pgd-mmdt-db    \"docker-entrypoint.s.\"  13 seconds ago   Up 12 seconds  27017/tcp                              mmdt-db\n</pre> <ul> <li> <p>On the first line, the one which corresponds to the web interface, we see that port 80 of the docker is exported to port 8087 of the VM. Let's say that the IP address of your VM is 192.168.56.2, then in your browser you will need to put the URL http://192.168.56.2:8087/. You can of course change the port number in the 'run' file.</p> </li> <li> <p>It may be preferable to use a lightweight http server like nginx so that the Maggot URL will be http://192.168.56.2/maggot/. Below an example of config:    <pre><code>## /etc/nginx/nginx.conf\nhttp {\n\n...\n    upstream maggot  { server 127.0.0.1:8087; }\n...\n\n}\n\n## /etc/nginx/conf.d/my-site.conf\n\nserver {\nlisten 80 default;\nserver_name $host;\n\n...\n\n    location /maggot/ {\nproxy_set_header Host $host;\nproxy_set_header X-App-Name 'maggot';\nproxy_set_header X-Real-Ip $remote_addr;\nproxy_set_header X-Forwarded-Host $host;\nproxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\nproxy_pass http://maggot/;\n}\n\n...\n\n}\n</code></pre></p> </li> </ul> <p></p>"},{"location":"installation/#stoping-the-application","title":"Stoping the application","text":"<ul> <li>To stop the application :    <pre><code>sh ./run stop\n</code></pre></li> </ul>"},{"location":"installation/#updating-the-application","title":"Updating the application","text":"<p>When updating the application, it is imperative to preserve a whole set of configuration files as well as the content of certain directories (dictionaries, javascripts dedicated to vocabularies, etc.). An update script is available (./etc/update-maggot.sh) preferably placed under '/usr/local/bin'. To preserve your configuration, it is recommended to create local configuration files.</p> <ul> <li> <p>A first file 'local.conf' will contain all the parameters to be preserved, initially contained in the 'run' file. A small example could be as follow :    <pre><code>#!/bin/bash\n\n# Local HTTP Port for web application\nWEB_PORT=8088\n\n# Path to the data\nDATADIR=/media/Workdir/Share/DATA/\n</code></pre></p> </li> <li> <p>A second file './web/inc/config/local.inc' will contain all the parameters to be preserved, initially contained in the './web/inc/config/config.inc' file. A small example could be as follow :    <pre><code>&lt;?php\n\n# Main title\n$TITLE ='Metadata management - My Labs';\n$MAINTITLE =$TITLE;\n\n# File Browser\n$FILEBROWSER=1;\n$URL_FILEBROWSER='/fb/';\n\n# Enable some functionalities\n$export_oai = 1;\n\n?&gt;\n</code></pre></p> </li> </ul>"},{"location":"installation/#architecture-diagram","title":"Architecture diagram","text":"<p> Note: See how to do proceed for configuration steps. </p>"},{"location":"installation/#file-browser","title":"File Browser","text":"<p>You can provide access to your data via a file browser. This application must be installed separately but can be connected to Maggot by specifying the corresponding URL in the configuration file. Users and their rights are managed in the filebrowser application. Likewise, we can also create links to the data without a password. These links can be usefully specified as external resources in the metadata managed by Maggot.</p> <p> See how to do install in github. </p> <p></p>"},{"location":"private-access/","title":"Private access","text":""},{"location":"private-access/#private-access-key-management","title":"Private access key management","text":""},{"location":"private-access/#motivation","title":"Motivation","text":"<p>Although the Maggot tool is designed to foster the sharing of metadata within a collective, it may be necessary to temporarily privatize access to the metadata of an ongoing project with confidentiality constraints. So even within our own collective, access to metadata must be restricted to authorized users only.</p>"},{"location":"private-access/#implementation","title":"Implementation","text":"<p>The choice of not wanting to manage users in the Maggot tool was made in order to make the metadata completely open by default within a collective. Furthermore, access rights to the storage space are managed independently of the Maggot tool by the administrator of this space. It is therefore through the storage space that we must give or not access to the metadata via the web interface.</p> <p>The chosen mechanism for privatizing access is described below. It has the dual advantage of being simple to implement and simple to use.</p> <ol> <li> <p>First we have to generate a file containing the encrypted key for a private access. This file must be generated from the web interface then downloaded as shown in the figure below. Then this file must be manually deposited in the data directory corresponding to the dataset whose access we wish to privatize. The presence of this file within a directory is enough to block access to metadata and data by default. It should be noted that we can put this same file containing the encrypted private key in several data directories (included within the same project for example). The deposit must be done by hand because the Maggot tool must only have access to the storage space in read mode. This also guarantees that the user has writing rights to this space without having to manage user accounts on the Maggot side.</p> <p> </p> <p> By default, \u2018untwist1\u2019 metadata are not accessible to anyone </p> </li> <li> <p>When we want to have access to the metadata of this dataset, we have to simply enter the private key in the current session. This will have the effect of unlocking access to the metadata via the web interface only in the current session of our web browser. This means that we will have to enter the private key for each session (by default, a session lasts a maximum of 1 hour).</p> <p> </p> <p> Now the \u2018untwist1\u2019 metadata are accessible only to us </p> </li> <li> <p>When we want to give access to the metadata to the entire collective, we simply need to delete the private access file (named by default 'META_auth.txt') from the concerned data directory.</p> </li> </ol> <p></p>"},{"location":"settings/","title":"Configuration settings","text":""},{"location":"settings/#configuration-settings_1","title":"Configuration settings","text":"<p>Here is the list of all files that may be subject to adjustment of certain parameters according to the needs of the instance site.</p> <p></p>"},{"location":"settings/#dockerscanpartscriptsconfigpy","title":"dockerscanpart/scripts/config.py","text":"<p>This file defines the connection parameters to the Mongo database. Knowing that this database is only accessible internally, in principle they do not need to be changed.</p> <p>Note: These settings must be the same as defined in dockerdbpart/initialisation/setupdb-js.template</p> Parameter Description Default value dbserver Name of the MongoDB server mmdt-db database Name of the MongoDB database pgd-db dbport Port of the MongoDB server 27017 username Username of the Mongo database pgd-db with Read/Write access userw-pgd password Password corresponding to the username of the Mongo DB pgd-db wwwww <p></p>"},{"location":"settings/#incconfigmongodbinc","title":"inc/config/mongodb.inc","text":"<p>This file defines the connection parameters to the Mongo database. Knowing that this database is only accessible internally, in principle they do not need to be changed.</p> <p>Note: These settings must be the same as defined in dockerdbpart/initialisation/setupdb-js.template</p> Parameter Description Default value docker_mode Indicates whether the installation involves using docker containers. In this case, the Mongo DB IP address will be different from 127.0.0.1. 1 uritarget the Mongo DB IP address mmdt-db (docker_mode=1) or 127.0.0.1 (docker_mode=0) database Name of the MongoDB database pgd-db collection Name of the MongoDB collection metadata port Port of the MongoDB server 27017 username Username of the Mongo database pgd-db with Read access only userr-pgd password Password corresponding to the username of the Mongo DB pgd-db rrrrr <p></p>"},{"location":"settings/#incconfigconfiginc","title":"inc/config/config.inc","text":"<p>This file defines parameters related to i) the web interface, ii) the functionalities allowed for users. Only the parameters that could be useful to be changed for the needs of an instance are described here.</p> Parameter Description Default value EXTERN Indicates if the use of the tool is only for external use, i.e. without using a storage space. 0 PRIVATE_ACCESS Gives the possibility of managing private access to metadata 0 ZOOMWP Zoom level regarding the web interface. By reducing the size slightly, you get a better layout. 90% RESMEDIA Gives the possibility of putting a MINE type on each resource in the metadata. 1 TITLE Title to display in main banner Metadata management FILEBROWSER Indicates whether the file browser is used. This assumes it is installed. 0 URL_FILEBROWSER File browser URL. It can be absolute or relative. /fb/ APPNAME Name given in the URL to access the web interface. maggot dataverse_urls Array of Dataverse repository URLs where you can upload metadata and data - zenodo_urls Array of Zenodo repository URLs where you can upload metadata and data - SERVER_URL Default Dataverse repository URL https://entrepot.recherche.data.gouv.fr ZENODO_SERVER_URL Default Zenodo repository URL https://zenodo.org export_dataverse Indicates whether the Dataverse feature is enabled 1 export_zenodo Indicates whether the Zenodo feature is enabled 1 export_jsonld Indicates whether the JSON-LD feature is enabled 1 export_oai Indicates whether the OAI-PMH feature is enabled 0 export_bloxberg Indicates whether the Bloxberg Blockchain feature is enabled (Experimental) 0 cvdir Relative path of the Control Vocabulary Listes (cvlist) cvlist/ maggot_fulltitle Maggot name of the field corresponding to the title in dataverse/zenodo fulltitle auth_senddata_file Name of the file that must be present in the data directory to authorize the transfer of the data file META_datafile_ok.txt private_auth_file Name of the private access file META_auth.txt sendMail Configuring messaging for sending metadata to data managers (see below) NULL <p></p> <p>The messaging configuration is done using the following array in the inc/config/config.inc file (or more judiciously in inc/config/local.inc in order to be preserved during an update) - To understand how it works see Send Emails using PHPmailer</p> <pre><code>$sendMail['smtpHost'] = 'smtp.example.org';        //  Set the SMTP server to send through\n$sendMail['smtpSecure'] = 'tls';                   //  Enable TLS encryption\n$sendMail['smtpPort'] = 587;                       //  Set the TCP port to connect to\n$sendMail['CheckEmail'] = 'maggot@exemple.org';    //  Email address authorized to send emails\n$sendMail['CheckPass'] = 'password';               //  The corresponding password\n$sendMail['CheckName'] = 'Maggot';                 //  Alias name\n$sendMail['UserEmail'] = 'admin@exemple.org';      //  Email of data managers, separated by a comma\n</code></pre> <p></p>"},{"location":"settings/#run","title":"run","text":"<p>This file contains the essential parameters to be set before any use.</p> Parameter Description Default value WEB_PORT Local HTTP Port for web application 8087 DATADIR Path to the data /opt/data/ DB_IMAGE Docker image name of the MongoDB pgd-mmdt-db SCAN_IMAGE Docker image name of the Scan process pgd-mmdt-scan WEB_IMAGE Docker image name of the Web interface pgd-mmdt-web DB_CONTAINER Docker container name of the MongoDB mmdt-db SCAN_CONTAINER Docker container name of the Scan process mmdt-scan WEB_CONTAINER Docker container name of the Web interface mmdt-web MONGO_VOL Volume name for MongoDB mmdt-mongodb MONGO_PORT HTTP Port for MongoDB 27017 USER Admin user in the htpasswd file admin <p></p>"},{"location":"chats/chat1/","title":"chatGPT","text":""},{"location":"chats/chat1/#difference-between-data-manager-and-data-steward","title":"Difference Between Data Manager and Data Steward","text":""},{"location":"chats/chat1/#data-manager","title":"Data Manager","text":"<p>Responsibilities:</p> <ul> <li> <p>Data Operations and Management:</p> <ul> <li>Overseeing the data lifecycle, including data collection, storage, processing, and dissemination.</li> <li>Ensuring that data is available, reliable, and accessible to those who need it.</li> </ul> </li> <li> <p>Data Quality and Integrity:</p> <ul> <li>Implementing processes and tools to ensure the quality and integrity of data.</li> <li>Addressing issues related to data accuracy, consistency, and completeness.</li> </ul> </li> <li> <p>Data Policies and Procedures:</p> <ul> <li>Developing and enforcing data management policies and procedures.</li> <li>Ensuring compliance with internal policies and external regulations.</li> </ul> </li> <li> <p>Technical Oversight:</p> <ul> <li>Managing databases and other data storage systems.</li> <li>Collaborating with IT teams to ensure that data systems are secure and efficient.</li> </ul> </li> <li> <p>Performance Monitoring:</p> <ul> <li>Monitoring data performance and usage.</li> <li>Optimizing data processes for better efficiency and effectiveness.</li> </ul> </li> </ul> <p>Skills:</p> <ul> <li>Strong technical skills in database management, data modeling, and data architecture.</li> <li>Knowledge of data management tools and software.</li> <li>Project management and leadership skills.</li> </ul> <p></p>"},{"location":"chats/chat1/#data-steward","title":"Data Steward","text":"<p>Responsibilities:</p> <ul> <li> <p>Data Governance:</p> <ul> <li>Enforcing data governance policies and standards.</li> <li>Ensuring that data usage aligns with organizational policies and regulatory requirements.</li> </ul> </li> <li> <p>Data Quality Assurance:</p> <ul> <li>Monitoring data quality and implementing measures to improve it.</li> <li>Ensuring data definitions and metadata are consistently applied across the organization.</li> </ul> </li> <li> <p>Data Access and Security:</p> <ul> <li>Managing data access rights and permissions.</li> <li>Ensuring data is protected and used ethically.</li> </ul> </li> <li> <p>Liaison and Coordination:</p> <ul> <li>Acting as a bridge between data users and data managers.</li> <li>Facilitating communication and collaboration across departments regarding data needs and issues.</li> </ul> </li> <li> <p>Training and Support:</p> <ul> <li>Providing guidance and support to data users.</li> <li>Conducting training sessions on data governance and best practices.</li> </ul> </li> </ul> <p>Skills:</p> <ul> <li>Strong understanding of data governance principles and practices.</li> <li>Excellent communication and interpersonal skills.</li> <li>Detail-oriented with a focus on data quality and compliance.</li> </ul> <p></p>"},{"location":"chats/chat1/#key-differences","title":"Key Differences","text":"<p>Focus:</p> <ul> <li>Data Manager: Primarily focused on the technical aspects of data management, including operations, quality, and infrastructure.</li> <li>Data Steward: Concentrates on ensuring data governance, quality assurance, and compliance, acting as a guardian of data policies and standards.</li> </ul> <p>Scope of Work:</p> <ul> <li>Data Manager: Engages in the day-to-day management of data systems and processes, ensuring technical efficiency and performance.</li> <li>Data Steward: Oversees the adherence to data policies, facilitating proper data usage and addressing governance issues.</li> </ul> <p>Technical vs. Governance:</p> <ul> <li>Data Manager: Requires more technical expertise related to data systems and tools.</li> <li>Data Steward: Requires a deeper understanding of governance, regulatory compliance, and organizational policies.</li> </ul> <p>Interaction with Stakeholders:</p> <ul> <li>Data Manager: Often interacts with IT and technical teams to manage and optimize data infrastructure.</li> <li>Data Steward: Engages more with business users and stakeholders to ensure data is used correctly and meets governance standards.</li> </ul>"},{"location":"chats/chat1/#role-of-data-producers-in-data-management","title":"Role of Data Producers in Data Management","text":""},{"location":"chats/chat1/#responsibilities-of-data-producers","title":"Responsibilities of Data Producers","text":"<ol> <li> <p>Data Generation:</p> <ul> <li>Creating and collecting data from various sources, such as business transactions, sensors, user interactions, research activities, and other operational processes.</li> </ul> </li> <li> <p>Data Quality Assurance:</p> <ul> <li>Ensuring the accuracy, completeness, and reliability of the data they produce.</li> <li>Implementing data validation checks and processes to maintain high data quality.</li> </ul> </li> <li> <p>Data Documentation:</p> <ul> <li>Providing metadata and documentation that describe the data, its sources, collection methods, and any relevant context.</li> <li>Ensuring that data is well-documented to facilitate understanding and use by other stakeholders.</li> </ul> </li> <li> <p>Adherence to Standards:</p> <ul> <li>Following organizational data standards, formats, and protocols to ensure consistency and interoperability.</li> <li>Aligning data production with industry standards and regulatory requirements where applicable.</li> </ul> </li> <li> <p>Data Security and Privacy:</p> <ul> <li>Ensuring that data production processes comply with security and privacy policies.</li> <li>Protecting sensitive data and maintaining confidentiality as required by organizational policies and regulations.</li> </ul> </li> <li> <p>Collaboration:</p> <ul> <li>Working closely with data managers, data stewards, and other stakeholders to ensure that the data produced meets the needs of the organization.</li> <li>Providing feedback and insights to improve data collection processes and quality.</li> </ul> </li> </ol>"},{"location":"chats/chat1/#contributions-to-data-management","title":"Contributions to Data Management","text":"<ol> <li> <p>Foundation of Data Assets:</p> <ul> <li>Data producers provide the foundational data that is crucial for analysis, decision-making, and strategic planning.</li> <li>High-quality data production leads to reliable insights and better decision-making.</li> </ul> </li> <li> <p>Enhancing Data Quality:</p> <ul> <li>By ensuring data quality at the point of creation, data producers contribute to the overall integrity and usability of the data.</li> <li>High-quality data reduces the need for extensive cleaning and transformation downstream.</li> </ul> </li> <li> <p>Supporting Data Governance:</p> <ul> <li>Data producers help enforce data governance policies by adhering to standards and providing well-documented data.</li> <li>Their role in maintaining data quality and compliance supports the broader goals of data governance.</li> </ul> </li> <li> <p>Facilitating Data Integration:</p> <ul> <li>Consistent and well-documented data produced according to standards makes it easier to integrate data from different sources.</li> <li>This facilitates comprehensive analysis and a holistic view of organizational data.</li> </ul> </li> <li> <p>Enabling Timely Insights:</p> <ul> <li>Timely and accurate data production enables real-time analytics and rapid response to emerging trends and issues.</li> <li>Data producers play a critical role in ensuring that the data pipeline flows smoothly and efficiently.</li> </ul> </li> </ol>"},{"location":"chats/chat1/#interaction-with-other-roles","title":"Interaction with Other Roles","text":"<p>Data Managers:</p> <ul> <li>Data producers collaborate with data managers to ensure that the data collected is stored, processed, and made accessible in an efficient manner.</li> <li>They provide the raw data that data managers then organize and maintain.</li> </ul> <p>Data Stewards:</p> <ul> <li>Data producers work with data stewards to ensure that data governance policies are implemented from the point of data creation.</li> <li>They support data stewards in maintaining data quality and compliance with governance standards.</li> </ul> <p>Data Consumers:</p> <ul> <li>Data producers ensure that the data meets the needs of data consumers, such as analysts, researchers, and business users.</li> <li>They respond to feedback and requirements from data consumers to continuously improve the data production process.</li> </ul>"},{"location":"chats/chat1/#should-data-producers-manage-to-document-their-data-on-their-own","title":"Should Data Producers Manage to Document Their Data on Their Own?","text":""},{"location":"chats/chat1/#challenges-for-data-producers-in-documenting-data","title":"Challenges for Data Producers in Documenting Data","text":"<ol> <li> <p>Time Constraints:</p> <ul> <li>Data producers are often focused on their primary tasks of generating and collecting data, which can leave limited time for comprehensive documentation.</li> </ul> </li> <li> <p>Lack of Expertise:</p> <ul> <li>Not all data producers may have the necessary skills or knowledge to document data effectively, particularly in terms of metadata standards and best practices.</li> </ul> </li> <li> <p>Inconsistent Practices:</p> <ul> <li>Without standardized processes, documentation practices can vary significantly between different data producers, leading to inconsistencies and gaps.</li> </ul> </li> </ol>"},{"location":"chats/chat1/#strategies-to-assist-data-producers","title":"Strategies to Assist Data Producers","text":"<ol> <li> <p>Standardized Templates and Guidelines:</p> <ul> <li>Provide clear templates and guidelines for data documentation that detail what information is required and how it should be recorded.</li> <li>These templates can include fields for metadata, data sources, collection methods, data formats, and any relevant context.</li> </ul> </li> <li> <p>Training and Education:</p> <ul> <li>Offer training sessions and resources to educate data producers on the importance of data documentation and how to do it effectively.</li> <li>Regular workshops, online courses, and documentation manuals can help build the necessary skills and knowledge.</li> </ul> </li> <li> <p>Automated Documentation Tools:</p> <ul> <li>Implement tools that can automate parts of the documentation process. For example, metadata management tools can automatically capture certain types of metadata.</li> <li>Data collection platforms can be equipped with features that prompt users to enter documentation information at the time of data entry.</li> </ul> </li> <li> <p>Data Steward Support:</p> <ul> <li>Assign data stewards or data governance officers to work closely with data producers, providing hands-on assistance and guidance for documentation.</li> <li>Data stewards can review and validate the documentation to ensure completeness and accuracy.</li> </ul> </li> <li> <p>Integration into Workflow:</p> <ul> <li>Integrate documentation practices into the natural workflow of data production, making it a seamless part of the data generation process.</li> <li>For example, incorporating mandatory documentation fields in data entry forms or systems.</li> </ul> </li> <li> <p>Incentives and Accountability:</p> <ul> <li>Establish accountability measures and incentives to encourage thorough documentation. Recognize and reward good documentation practices.</li> <li>Include data documentation quality as a metric in performance evaluations where applicable.</li> </ul> </li> <li> <p>Feedback Mechanism:</p> <ul> <li>Create a feedback loop where data consumers and other stakeholders can provide input on the documentation quality and completeness.</li> <li>Use this feedback to continuously improve documentation practices and address any gaps or issues.</li> </ul> </li> </ol>"},{"location":"chats/chat1/#implementation-and-monitoring","title":"Implementation and Monitoring","text":"<ol> <li> <p>Documentation Checkpoints:</p> <ul> <li>Implement checkpoints or reviews at various stages of the data lifecycle to ensure that documentation is being completed and updated as needed.</li> <li>Regular audits and reviews by data stewards can help maintain high standards of documentation.</li> </ul> </li> <li> <p>Collaboration Platforms:</p> <ul> <li>Use collaboration platforms that allow data producers, stewards, and managers to work together on documentation in real time.</li> <li>Tools like shared document repositories, project management software, and collaborative editing tools can facilitate this process.</li> </ul> </li> <li> <p>Clear Roles and Responsibilities:</p> <ul> <li>Define clear roles and responsibilities for data documentation within the organization. Ensure that everyone understands their role in maintaining high-quality documentation.</li> <li>Clarify the expectations for data producers regarding documentation and the support available to them.</li> </ul> </li> </ol> <p>Chat generated by chatGPT 3.5 - May 24, 2024</p>"},{"location":"chats/chat2/","title":"chatGPT","text":""},{"location":"chats/chat2/#how-can-we-properly-respect-the-data-life-cycle-in-scientific-research-in-order-to-carry-out-good-data-management","title":"How can we properly respect the data life cycle in scientific research in order to carry out good data management?","text":"<p>Properly respecting the data life cycle in scientific research is crucial for effective data management. The data life cycle encompasses the stages through which data passes, from its initial generation to its eventual archiving and potential reuse. Here\u2019s a comprehensive approach to managing the data life cycle in scientific research:</p>"},{"location":"chats/chat2/#1-planning","title":"1. Planning","text":"<ul> <li>Define Objectives and Requirements: Clearly outline the goals of the research and what data will be required.</li> <li>Develop a Data Management Plan (DMP): This document should detail how data will be collected, managed, shared, and preserved. Many funding agencies require a DMP as part of grant proposals.</li> </ul>"},{"location":"chats/chat2/#2-data-collection","title":"2. Data Collection","text":"<ul> <li>Standardize Data Collection Methods: Use consistent and reliable methods to ensure data quality and reproducibility.</li> <li>Documentation: Document the data collection process thoroughly, including methodologies, protocols, and any tools or instruments used.</li> </ul>"},{"location":"chats/chat2/#3-data-processing-and-analysis","title":"3. Data Processing and Analysis","text":"<ul> <li>Data Cleaning: Address any errors or inconsistencies in the data.</li> <li>Data Transformation: Process raw data into a format suitable for analysis.</li> <li>Metadata Creation: Generate metadata to describe the data, including its origin, structure, and context.</li> </ul>"},{"location":"chats/chat2/#4-data-storage-and-preservation","title":"4. Data Storage and Preservation","text":"<ul> <li>Secure Storage: Store data in secure, backed-up environments to prevent loss or unauthorized access.</li> <li>Data Formats: Use standardized and widely accepted file formats to ensure long-term accessibility.</li> <li>Version Control: Keep track of different versions of the data, especially if it is updated or modified.</li> </ul>"},{"location":"chats/chat2/#5-data-sharing-and-publishing","title":"5. Data Sharing and Publishing","text":"<ul> <li>Open Access: Share data openly whenever possible, adhering to the FAIR (Findable, Accessible, Interoperable, Reusable) principles.</li> <li>Repositories: Deposit data in trusted repositories that provide persistent identifiers (such as DOIs) for data sets.</li> <li>Ethical and Legal Considerations: Ensure compliance with ethical standards and legal requirements, including data privacy and intellectual property rights.</li> </ul>"},{"location":"chats/chat2/#6-data-reuse","title":"6. Data Reuse","text":"<ul> <li>Accessibility: Ensure that data is easily accessible and usable by others, including clear licensing terms.</li> <li>Citations: Provide appropriate citations for datasets to acknowledge the original data creators.</li> </ul>"},{"location":"chats/chat2/#7-data-archiving","title":"7. Data Archiving","text":"<ul> <li>Long-term Preservation: Archive data in a manner that ensures its longevity, considering both physical and digital preservation strategies.</li> <li>Review and Update: Periodically review the archived data to ensure it remains accessible and usable.</li> </ul>"},{"location":"chats/chat2/#best-practices-across-all-stages","title":"Best Practices Across All Stages","text":"<ul> <li>Documentation and Metadata: Maintain comprehensive documentation and metadata throughout the data life cycle.</li> <li>Quality Control: Implement quality control measures at each stage to ensure data integrity.</li> <li>Training and Awareness: Educate researchers and staff on best practices for data management.</li> <li>Tools and Technologies: Utilize appropriate tools and technologies to facilitate each stage of the data life cycle.</li> </ul> <p>By adhering to these guidelines, researchers can ensure robust data management practices that enhance the reliability, reproducibility, and impact of scientific research.</p>"},{"location":"chats/chat2/#at-which-stages-of-the-data-life-cycle-is-it-most-crucial-to-document-data","title":"At which stages of the data life cycle is it most crucial to document data?","text":"<p>Documentation is crucial at all stages of the data life cycle, but there are specific stages where it is particularly essential to ensure data integrity, reproducibility, and usability. Here are the stages where documentation is most critical:</p>"},{"location":"chats/chat2/#1-planning_1","title":"1. Planning","text":"<ul> <li>Data Management Plan (DMP): At the planning stage, documenting a comprehensive Data Management Plan is crucial. The DMP should outline how data will be collected, processed, stored, shared, and preserved. This sets the foundation for the entire data life cycle and ensures that all team members are aligned on data management practices.</li> </ul>"},{"location":"chats/chat2/#2-data-collection_1","title":"2. Data Collection","text":"<ul> <li>Methods and Protocols: Documenting the methods, protocols, and tools used for data collection is vital. This includes detailed descriptions of experimental procedures, sampling methods, and instrumentation. Accurate documentation at this stage ensures that data can be understood, replicated, and validated by other researchers.</li> </ul>"},{"location":"chats/chat2/#3-data-processing-and-analysis_1","title":"3. Data Processing and Analysis","text":"<ul> <li>Data Cleaning and Transformation: Documenting the steps taken to clean and transform data is critical. This includes recording any modifications, corrections, and the rationale behind them. Clear documentation of these processes ensures transparency and reproducibility.</li> <li>Analysis Methods: Documenting the analysis techniques, software, and statistical methods used is essential. This helps in understanding how conclusions were derived from the data and allows others to replicate the analysis.</li> </ul>"},{"location":"chats/chat2/#4-data-storage-and-preservation_1","title":"4. Data Storage and Preservation","text":"<ul> <li>Storage Locations and Formats: Documenting where and how data is stored, including file formats and directory structures, is important for data retrieval and long-term preservation. This ensures that data remains accessible and usable in the future.</li> <li>Version Control: Documenting version histories, including changes made and reasons for those changes, helps in tracking the evolution of the dataset and prevents confusion over which version is the most current or relevant.</li> </ul>"},{"location":"chats/chat2/#5-data-sharing-and-publishing_1","title":"5. Data Sharing and Publishing","text":"<ul> <li>Metadata and Context: Documenting comprehensive metadata, including descriptions of the data, its origin, structure, and any constraints or limitations, is crucial for data sharing. Metadata provides the context needed for others to understand and use the data effectively.</li> <li>Licensing and Access Conditions: Documenting the terms under which data can be accessed and reused, including any licenses or restrictions, is essential for ethical and legal compliance.</li> </ul>"},{"location":"chats/chat2/#6-data-reuse_1","title":"6. Data Reuse","text":"<ul> <li>Usage Documentation: Documenting how the data can be reused, including any potential applications and limitations, helps future users understand how to effectively leverage the data.</li> <li>Citations and Acknowledgements: Documenting how the data should be cited ensures that original data creators receive appropriate credit for their work.</li> </ul>"},{"location":"chats/chat2/#7-data-archiving_1","title":"7. Data Archiving","text":"<ul> <li>Archival Information: Documenting the processes and criteria for archiving data, including retention schedules and preservation strategies, is important for ensuring that data remains available and usable over the long term.</li> <li>Access Procedures: Documenting how archived data can be accessed, including any retrieval processes or conditions, ensures that the data can be efficiently located and used in the future.</li> </ul> <p>By thoroughly documenting at these crucial stages, researchers can enhance the transparency, reproducibility, and overall quality of their data management practices.</p> <p>Chat generated by chatGPT 3.5 - May 24, 2024</p>"},{"location":"chats/chat3/","title":"chatGPT","text":""},{"location":"chats/chat3/#what-are-the-steps-to-building-our-own-controlled-vocabulary","title":"What are the steps to building our own controlled vocabulary ?","text":"<p>Creating a controlled vocabulary is a systematic process that involves several key steps. Here\u2019s a detailed guide to help you build your own controlled vocabulary:</p>"},{"location":"chats/chat3/#1-define-the-purpose-and-scope","title":"1. Define the Purpose and Scope","text":"<ul> <li>Purpose: Determine why you need a controlled vocabulary. It could be for improving search accuracy, ensuring consistent data categorization, or facilitating communication among team members.</li> <li>Scope: Define the boundaries of the vocabulary. What subjects, domains, or types of information will it cover?</li> </ul>"},{"location":"chats/chat3/#2-gather-stakeholders-and-form-a-team","title":"2. Gather Stakeholders and Form a Team","text":"<ul> <li>Stakeholders: Identify and involve stakeholders who will use or benefit from the controlled vocabulary.</li> <li>Team: Assemble a team of subject matter experts, information architects, librarians, and end-users to contribute to and review the vocabulary.</li> </ul>"},{"location":"chats/chat3/#3-collect-and-analyze-source-material","title":"3. Collect and Analyze Source Material","text":"<ul> <li>Source Material: Gather documents, datasets, and other materials relevant to the scope of your vocabulary.</li> <li>Analysis: Analyze the material to identify key terms, concepts, and relationships. Look for frequently used terms and concepts that are essential to your domain.</li> </ul>"},{"location":"chats/chat3/#4-develop-a-list-of-terms","title":"4. Develop a List of Terms","text":"<ul> <li>Initial List: Create an initial list of terms based on the analysis.</li> <li>Standardization: Standardize the terms to ensure consistency. Decide on preferred terms, synonyms, and variants.</li> </ul>"},{"location":"chats/chat3/#5-define-relationships","title":"5. Define Relationships","text":"<ul> <li>Hierarchical Relationships: Establish broader and narrower term relationships (e.g., parent/child, broader term/narrower term).</li> <li>Associative Relationships: Identify related terms that are conceptually linked but not hierarchically.</li> <li>Equivalence Relationships: Map synonyms and near-synonyms to preferred terms.</li> </ul>"},{"location":"chats/chat3/#6-create-definitions-and-scope-notes","title":"6. Create Definitions and Scope Notes","text":"<ul> <li>Definitions: Write clear, concise definitions for each term to avoid ambiguity.</li> <li>Scope Notes: Add notes to clarify the context and usage of terms, particularly for terms that might be misunderstood.</li> </ul>"},{"location":"chats/chat3/#7-test-the-vocabulary","title":"7. Test the Vocabulary","text":"<ul> <li>Pilot Testing: Implement the vocabulary in a controlled setting to test its effectiveness.</li> <li>Feedback: Gather feedback from users and stakeholders to identify any issues or gaps.</li> </ul>"},{"location":"chats/chat3/#8-review-and-refine","title":"8. Review and Refine","text":"<ul> <li>Iterative Refinement: Based on feedback, refine the vocabulary. This may involve adding new terms, modifying existing terms, or changing relationships.</li> <li>Version Control: Keep track of changes and maintain version control to document the evolution of the vocabulary.</li> </ul>"},{"location":"chats/chat3/#9-document-the-vocabulary","title":"9. Document the Vocabulary","text":"<ul> <li>Documentation: Create comprehensive documentation that includes the list of terms, definitions, relationships, scope notes, and guidelines for use.</li> <li>User Guide: Develop a user guide to help stakeholders understand how to use the controlled vocabulary effectively.</li> </ul>"},{"location":"chats/chat3/#10-implement-and-maintain","title":"10. Implement and Maintain","text":"<ul> <li>Implementation: Integrate the controlled vocabulary into your systems, such as databases, content management systems, and search engines.</li> <li>Maintenance Plan: Establish a maintenance plan to regularly review and update the vocabulary to ensure it remains relevant and accurate. Assign responsibility for ongoing management.</li> </ul>"},{"location":"chats/chat3/#11-train-users","title":"11. Train Users","text":"<ul> <li>Training Sessions: Conduct training sessions for users to ensure they understand how to apply the controlled vocabulary.</li> <li>Support Resources: Provide support resources, such as FAQs, tutorials, and help desks.</li> </ul>"},{"location":"chats/chat3/#12-monitor-and-evaluate","title":"12. Monitor and Evaluate","text":"<ul> <li>Monitoring: Continuously monitor the usage of the vocabulary to ensure it meets the needs of the users.</li> <li>Evaluation: Periodically evaluate the effectiveness of the vocabulary and make adjustments as necessary.</li> </ul> <p>By following these steps, you can develop a controlled vocabulary that improves information retrieval, enhances data consistency, and supports effective communication within your organization.</p>"},{"location":"chats/chat3/#what-are-the-differences-between-thesauri-ontologies-or-simple-vocabulary-dictionaries-in-which-cases-should-each-of-them-be-favored","title":"What are the differences between thesauri, ontologies or simple vocabulary dictionaries? In which cases should each of them be favored?","text":"<p>Thesauri, ontologies, and simple vocabulary dictionaries are all tools for organizing and managing information, but they differ in complexity, structure, and use cases. Here\u2019s a breakdown of their differences and when to use each:</p>"},{"location":"chats/chat3/#thesauri","title":"Thesauri","text":"<p>Definition: - A thesaurus is a controlled vocabulary that organizes terms into a structured hierarchy, showing relationships such as synonyms, antonyms, broader terms (BT), narrower terms (NT), and related terms (RT).</p> <p>Structure: - Synonyms: Provides alternative words with similar meanings. - Hierarchical Relationships: Shows broader and narrower terms. - Associative Relationships: Connects related terms that are not hierarchically linked.</p> <p>Use Cases: - Information Retrieval: Enhances search capabilities by providing alternative terms and related concepts. - Indexing and Cataloging: Standardizes terms for consistent categorization of information. - Library and Archival Systems: Organizes collections for easier access and discovery.</p> <p>When to Use: - When you need to manage a large set of terms with hierarchical and associative relationships. - When you want to improve search functionality by including synonyms and related terms.</p>"},{"location":"chats/chat3/#ontologies","title":"Ontologies","text":"<p>Definition: - An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It includes not just the terms, but also the rules and constraints that govern their use.</p> <p>Structure: - Concepts: Detailed definitions of domain-specific concepts. - Relationships: Various types of relationships between concepts, including hierarchical, associative, and more complex logical relationships. - Attributes: Properties and characteristics of concepts. - Rules and Constraints: Logical rules that define how concepts can interact.</p> <p>Use Cases: - Semantic Web and AI: Facilitates data integration, sharing, and interoperability in web and AI applications. - Knowledge Representation: Provides a detailed and formalized structure for representing domain knowledge. - Complex Data Integration: Integrates heterogeneous data sources by providing a common framework.</p> <p>When to Use: - When you need a highly detailed and formal representation of knowledge with complex relationships. - When integrating diverse data sources and ensuring semantic interoperability is crucial. - For advanced applications in artificial intelligence, natural language processing, and the Semantic Web.</p>"},{"location":"chats/chat3/#simple-vocabulary-dictionaries","title":"Simple Vocabulary Dictionaries","text":"<p>Definition: - A simple vocabulary dictionary is a basic list of terms and their definitions, without structured relationships between terms.</p> <p>Structure: - Terms: A list of words or phrases. - Definitions: Brief explanations or meanings of each term.</p> <p>Use Cases: - Glossaries: Provides quick reference for understanding terms and definitions. - Basic Cataloging: Lists terms for simple categorization without the need for complex relationships. - Learning and Education: Helps learners understand basic concepts and terminology.</p> <p>When to Use: - When you need a straightforward reference tool for terms and their definitions. - For simple documentation where relationships between terms are not necessary. - When building foundational knowledge without the need for detailed hierarchical or associative structures.</p>"},{"location":"chats/chat3/#summary","title":"Summary","text":"<ul> <li>Thesauri are ideal for organizing terms with hierarchical and associative relationships, enhancing search and retrieval in large datasets.</li> <li>Ontologies provide a detailed and formal structure for representing complex domain knowledge and relationships, suitable for advanced applications in AI and data integration.</li> <li>Simple Vocabulary Dictionaries offer a basic list of terms and definitions, useful for straightforward reference and basic documentation.</li> </ul> <p>Choosing the right tool depends on your specific needs regarding the complexity of the relationships between terms, the level of detail required, and the intended application.</p>"},{"location":"chats/chat3/#concerning-descriptive-metadata-can-we-envisage-a-gradual-evolution-in-the-use-of-simple-dictionaries-then-moving-to-thesauri-or-even-ontologies","title":"Concerning descriptive metadata, can we envisage a gradual evolution in the use of simple dictionaries, then moving to thesauri, or even ontologies?","text":"<p>Yes, it is both practical and beneficial to envisage a gradual evolution in the use of descriptive metadata, starting with simple dictionaries and progressing to thesauri and ontologies as the needs and complexity of your data management grow. Here\u2019s how you can approach this evolutionary path:</p>"},{"location":"chats/chat3/#stage-1-simple-vocabulary-dictionaries","title":"Stage 1: Simple Vocabulary Dictionaries","text":"<p>Initial Implementation: - Scope: Begin with a simple vocabulary dictionary to establish basic terminology and definitions. - Purpose: Ensure consistency in the use of terms across your organization or system. - Structure: Create a straightforward list of terms with clear definitions.</p> <p>Benefits: - Ease of Use: Simple to create and manage. - Foundation: Provides a foundational understanding of key terms. - Quick Implementation: Allows for immediate improvement in consistency and clarity of metadata.</p> <p>Use Cases: - Early Project Stages: When starting a new project or system where the primary goal is to standardize terminology. - Training and Education: Helps new team members or users quickly understand the basic terms and definitions.</p>"},{"location":"chats/chat3/#stage-2-thesauri","title":"Stage 2: Thesauri","text":"<p>Intermediate Implementation: - Scope: Expand your vocabulary to include hierarchical and associative relationships. - Purpose: Improve information retrieval and organization by showing broader, narrower, and related terms. - Structure: Develop a thesaurus that includes synonyms, broader terms (BT), narrower terms (NT), and related terms (RT).</p> <p>Benefits: - Enhanced Search: Improves search accuracy by including synonyms and related concepts. - Better Organization: Provides a more structured and interconnected view of your terms. - Scalability: Allows for the gradual addition of new terms and relationships.</p> <p>Use Cases: - Growing Projects: As your project or system grows and the need for more detailed metadata relationships emerges. - Information Retrieval Systems: When enhancing search functionalities and user experience is a priority.</p>"},{"location":"chats/chat3/#stage-3-ontologies","title":"Stage 3: Ontologies","text":"<p>Advanced Implementation: - Scope: Develop a comprehensive and formal representation of your domain knowledge. - Purpose: Facilitate complex data integration, interoperability, and advanced reasoning. - Structure: Create an ontology with detailed concepts, relationships, properties, and logical rules.</p> <p>Benefits: - Semantic Interoperability: Ensures data consistency and integration across different systems and platforms. - Advanced Reasoning: Supports sophisticated queries and inference capabilities. - Detailed Knowledge Representation: Provides a rich and nuanced understanding of domain concepts and their interrelationships.</p> <p>Use Cases: - Complex Systems: For projects requiring detailed and formal knowledge representation, such as in AI, machine learning, and the Semantic Web. - Interoperable Data Systems: When integrating diverse data sources and ensuring seamless interoperability is essential.</p>"},{"location":"chats/chat3/#evolution-strategy","title":"Evolution Strategy","text":"<ol> <li>Assessment: Regularly assess your metadata needs and the complexity of your data.</li> <li>Incremental Development: Start with a simple dictionary and incrementally enhance it to a thesaurus and eventually an ontology.</li> <li>Stakeholder Involvement: Involve stakeholders and subject matter experts at each stage to ensure the vocabulary meets user needs.</li> <li>Training and Documentation: Provide training and documentation to help users adapt to each stage of the vocabulary\u2019s evolution.</li> <li>Feedback and Iteration: Continuously gather feedback and iterate on the vocabulary to improve its accuracy and usefulness.</li> </ol>"},{"location":"chats/chat3/#example-evolution-path","title":"Example Evolution Path","text":"<ol> <li>Simple Vocabulary Dictionary:</li> <li>Define basic terms for a new digital library.</li> <li> <p>Ensure all team members use consistent terminology.</p> </li> <li> <p>Thesaurus:</p> </li> <li>Expand the dictionary to include hierarchical relationships between terms (e.g., broader and narrower terms for different types of documents).</li> <li> <p>Improve search functionality by adding synonyms and related terms.</p> </li> <li> <p>Ontology:</p> </li> <li>Develop an ontology to represent complex relationships between documents, authors, subjects, and related concepts.</li> <li>Enable advanced search and data integration with other digital libraries and research databases.</li> </ol> <p>By following this gradual evolution, you can systematically enhance your metadata management capabilities, ensuring your system remains adaptable and meets the growing complexity of your data and user needs.</p> <p>Chat generated by chatGPT 3.5 - May 24, 2024</p>"},{"location":"chats/chat4/","title":"chatGPT","text":""},{"location":"chats/chat4/#what-is-metadata-crosswalk","title":"What is metadata crosswalk?","text":"<p>A metadata crosswalk is a structured mapping between the metadata elements of two or more different systems or standards. Metadata, in this context, refers to descriptive information about data or resources, such as titles, authors, dates, keywords, etc. </p> <p>When different systems or standards are used to manage data or resources, they often have their own unique metadata schemas or formats. A metadata crosswalk helps to bridge the gap between these different schemas by establishing equivalences or relationships between the metadata elements used in each system.</p> <p>For example, if you have a digital library system that uses one set of metadata standards and you want to integrate it with a repository that uses a different set of standards, you might create a metadata crosswalk to map the elements from one standard to the other, ensuring interoperability and consistency in metadata representation.</p> <p>Metadata crosswalks are particularly useful in data integration, migration, and interoperability scenarios, enabling smooth communication and exchange of information between disparate systems or standards.</p>"},{"location":"chats/chat4/#whats-the-point","title":"What's the point ?","text":"<p>The primary purpose of a metadata crosswalk is to facilitate interoperability and data exchange between systems or standards that use different metadata schemas. Here are some specific points highlighting its importance:</p> <ol> <li> <p>Interoperability: Different organizations or systems may use varying metadata standards or schemas to describe their data or resources. By creating a metadata crosswalk, you ensure that these systems can communicate with each other effectively, exchanging information without loss of meaning.</p> </li> <li> <p>Data Integration: When consolidating data from multiple sources or systems, a metadata crosswalk helps ensure that the metadata associated with each dataset is properly understood and mapped to a common format. This facilitates the integration process and allows for more efficient analysis and management of the combined datasets.</p> </li> <li> <p>Standardization: By mapping metadata elements between different standards or schemas, a crosswalk promotes standardization and consistency in how data is described and represented. This consistency is essential for ensuring accurate interpretation and retrieval of information across different systems or platforms.</p> </li> <li> <p>Migration and Upgrades: During system migrations or upgrades, metadata formats may change. A metadata crosswalk can aid in transitioning from one metadata schema to another by providing a clear mapping between the old and new formats, thus minimizing disruptions and ensuring continuity in data management practices.</p> </li> <li> <p>Data Sharing and Reuse: When sharing data with external parties or making it available for reuse, a metadata crosswalk helps ensure that the metadata associated with the data is understandable and interpretable by others. This promotes data sharing and reuse, as users can easily understand the content and context of the data they are accessing.</p> </li> </ol> <p>Overall, the point of a metadata crosswalk is to promote seamless communication, integration, and interoperability between different systems or standards, ultimately enhancing the usability and accessibility of data across diverse environments.</p>"},{"location":"chats/chat4/#can-you-suggest-some-interesting-links-on-the-subject-related-to-life-science","title":"Can you suggest some interesting links on the subject related to life science ?","text":"<p>Certainly! Here are some interesting links related to metadata crosswalks in the context of life sciences:</p> <ol> <li> <p>NCBI Metadata Crosswalk Tool:</p> <ul> <li>This tool provided by the National Center for Biotechnology Information (NCBI) allows users to convert metadata between various standards commonly used in life sciences, such as Dublin Core, MIAME/Minimum Information About a Microarray Experiment, and SRA/SRA Study XML.</li> <li>Link: NCBI Metadata Crosswalk Tool</li> </ul> </li> <li> <p>Metadata Standards and Tools for Life Sciences:</p> <ul> <li>This page from the European Bioinformatics Institute (EMBL-EBI) provides an overview of metadata standards commonly used in life sciences research, along with links to tools and resources for managing metadata and converting between different standards.</li> <li>Link: EMBL-EBI Metadata Standards and Tools</li> </ul> </li> <li> <p>FAIRsharing:</p> <ul> <li>FAIRsharing is a curated resource that provides information about standards, databases, and data policies in the life sciences and related disciplines. It includes a section specifically dedicated to metadata standards and vocabularies.</li> <li>Link: FAIRsharing Metadata Standards</li> </ul> </li> <li> <p>NIH Common Data Elements (CDE) Repository:</p> <ul> <li>The NIH Common Data Elements Repository provides a standardized set of data elements and definitions for use in NIH-funded research. While not specifically focused on metadata crosswalks, it offers valuable insights into standardizing metadata across life sciences research projects.</li> <li>Link: NIH CDE Repository</li> </ul> </li> </ol> <p>These resources should provide you with a good starting point for exploring metadata standards and tools relevant to life sciences research.</p> <p>Chat generated by chatGPT 3.5 - May 24, 2024</p>"},{"location":"definitions/","title":"Definition Files","text":""},{"location":"definitions/#metadata-definition-files","title":"Metadata definition files","text":"<p>The Maggot tool offers great flexibility in configuration. It allows you to completely choose all the metadata you want to describe your data. You can base yourself on an existing metadata schema, invent your own schema or, more pragmatically, mix one or more schemas by introducing some metadata specific to your field of application. However, keep in mind that if you want to add descriptive metadata to your data then a certain amount of information is expected. But a completely different use of the tool is possible, it's up to you.</p> <p>There are two levels of definition files as shown the figure below:</p> <p></p> <p>1 - The first level concerns the definition of terminology (metadata) similar to a descriptive metadata plan. Clearly, this category is more akin to configuration files. They represent the heart of the application around which everything else is based. The input and search interfaces are completely generated from these definition files (especially the web/conf/config_terms.txt file), thus defining each of the fields, their input type (checkbox, dropbox, textbox, ...) and the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms). This is why a configuration step is essential in order to be able to configure all the other modules.</p> <p>2 - The second level concerns the definitions of the mapping to a differently structured metadata schema (metadata crosswalk, i.e a specification for mapping one metadata standard to another), used either i) for metadata export to a remote repository (e.g. Dataverse, Zenodo) or ii) for metadata harvesting (e.g. JSON-LD, OAI-PMH). Simply place the definition files in the configuration directory (web/conf) for them to be taken into account, provided you have adjusted the configuration (See Settings).</p> <p>All definition files are made using a simple spreadsheet then exported in TSV format. </p> <p>The list of definition files in Maggot are given below. All must be put under the directory web/conf. </p> <p>See an example on line : https://pmb-bordeaux.fr/maggot/config/view and the corresponding form based on these definition files.</p> <p></p>"},{"location":"definitions/config_terms/","title":"Terminlogy Definition","text":""},{"location":"definitions/config_terms/#example-of-a-terminlogy-definition-file","title":"Example of a Terminlogy Definition file","text":"Field Section Required Search ShortView Type features Label Predefined terms title definition Y N 1 textbox width=350px Short name fulltitle definition Y Y 2 textbox Full title subject definition Y Y checkbox open=0 Subject Agricultural Sciences,Arts and Humanities,Astronomy and Astrophysics,Business and Management,Chemistry,Computer and Information Science,Earth and Environmental Sciences,Engineering,Law,Mathematical Sciences,Medicine Health and Life Sciences,Physics,Social Sciences,Other description definition Y Y areabox rows=6,cols=30 Description of the dataset note definition N Y areabox rows=4,cols=30 Notes status status N Y 3 dropbox width=350px Status of the dataset Processed,In progress,Unprocessed access_rights status N Y 4 dropbox width=350px Access rights to data Public,Mixte,Private language status N Y checkbox open=0 Language Czech,Danish,Dutch,English,Finnish,French,German,Greek,Hungarian,Icelandic,Italian,Lithuanian,Norwegian,Romanian,Slovenian,Spanish,Swedish lifeCycleStep status N Y multiselect autocomplete=lifecycle,min=1 Life cycle step license status N Y textbox autocomplete=license,min=1 License datestart status N Y datebox width=350px Start of collection dateend status N Y datebox width=350px End of collection dmpid status N Y textbox DMP identifier contacts management Y Y multiselect autocomplete=people,min=1 Contacts authors management Y Y multiselect autocomplete=people,min=1 Authors collectors management N Y multiselect autocomplete=people,min=1 Data collectors curators management N Y multiselect autocomplete=people,min=1 Data curators members management N Y multiselect autocomplete=people,min=1 Project members leader management N Y multiselect autocomplete=people,min=1 Project leader wpleader management N Y multiselect autocomplete=people,min=1 WP leader depositor management N Y textbox Depositor producer management N Y multiselect autocomplete=producer,min=1 Producer grantNumbers management N Y multiselect autocomplete=grant,min=1 Grant Information kindOfData descriptors Y Y checkbox open=0 Kind of Data Audiovisual,Collection,Dataset,Event,Image,Interactive Resource,Model,Physical Object,Service,Software,Sound,Text,Workflow,Other keywords descriptors N Y multiselect autocomplete=bioportal,onto=EFO:JERM:EDAM:MS:NMR:NCIT:OBI:PO:PTO:AGRO:ECOCORE:IOBC:NCBITAXON Keywords topics descriptors N Y multiselect autocomplete=VOvocab Topic Classification dataOrigin descriptors N Y checkbox open=0 Data origin observational data,experimental data,survey data,analysis data,text corpus,simulation data,aggregate data,audiovisual corpus,computer code,Other experimentfactor descriptors N Y multiselect autocomplete=vocabulary,min=1 Experimental Factor measurement descriptors N Y multiselect autocomplete=vocabulary,min=1 Measurement type technology descriptors N Y multiselect autocomplete=vocabulary,min=1 Technology type publication_citation descriptors N Y areabox rows=5,cols=30 Publication - Citation publication_idtype descriptors N Y dropbox width=200px Publication - ID Type -,ark,arXiv,bibcode,doi,ean13,eissn,handle,isbn,issn,istc,lissn,lsid,pmid,purl,upc,url,urn publication_idnumber descriptors N Y textbox width=400px Publication - ID Number publication_url descriptors N Y textbox Publication - URL comment other N Y areabox rows=15, cols=30 Additional information"},{"location":"definitions/dataverse/","title":"Dataverse Definition File","text":"<p>Open source research data repository software, approved by Europe.</p>"},{"location":"definitions/dataverse/#dataverse-definition-file_1","title":"Dataverse definition File","text":"<p>This definition file will allow Maggot to automatically export the dataset into a data repository based on Dataverse. The approach consists of starting from the Maggot metadata file in JSON format and transforming it into another JSON format compatible with Dataverse, knowing that this metadata crosswalk was made possible by choosing the right metadata schema at upstream.</p> <p> </p> <p>The structure of the Dataverse JSON output file being known internally, a minimum of information is therefore necessary to carry out the correspondence.</p> <p>The file must have 4 columns with headers defined as follows:</p> <ul> <li>column 1 - Field : shortname of the Maggot fields</li> <li>column 2 - Typename : The corresponding Dataverse fields. </li> <li>column 3 - Type :The Dataverse field at the top level encapsulating the fields in the \u201cTypename\u201d column where applicable</li> <li>column 4 - Terminology : Indicates from which vocabulary the mapping should be performed. Concerning the vocabulary to be mapped either i) on a dictionary, you must put the name of the corresponding dictionary prefixed with cvlist:, or ii) on ontologies or on a thesaurus, you must specify the corresponding entry in the mapping definition file prefixing it with cv:.</li> </ul> <p>Below an example of Dataverse definition file (TSV)  </p> <p>Example of Dataverse JSON file generated based on the definition file itself given as an example above.</p> <ul> <li>Dataverse JSON of the FRIM dataset</li> </ul> <p></p>"},{"location":"definitions/json-ld/","title":"JSON-LD Definition File","text":""},{"location":"definitions/json-ld/#json-ld-definition-file_1","title":"JSON-LD definition File","text":"<p>This definition file will allow harvesters to collect structured metadata based on a semantic schema, i.e the fields themselves and not just their content can be associated with a semantic definition (ontology for example) which will then facilitate the link between the metadata and therefore the data (JSON-LD). The chosen semantic schema is based on several metadata schemas.</p> <p>The full workflow to \"climb the Link Open Data mountain\" is resumed by the figure below :  </p> <p>Metadata schemas used to build the model proposed by default:</p> <ul> <li>Schema.org, Bioschemas.org, Datacite, DDI-RDF, DubinCore, Dataverse</li> </ul> <p>Definition of the JSON-LD context using the metadata schemas proposed by default  </p> <p>The structure of the JSON-LD is not known internally, information on the structure will therefore be necessary to carry out the correspondence.</p> <p>Example of JSON-LD definition file (partial) using the metadata schemas proposed by default (TSV)  </p> <p>Example of JSON-LD file generated based on the definition file itself given as an example above.</p> <ul> <li>JSON-LD file of the FRIM dataset</li> </ul> <p></p>"},{"location":"definitions/mapping/","title":"Mapping Definition File","text":""},{"location":"definitions/mapping/#mapping-definition-file_1","title":"Mapping definition File","text":"<p>The mapping file is used as indicated by its name to match a term chosen by the user during entry with another term from an ontology or a thesaurus and therefore to obtain a URL which will be used for referencing. It can be used for each metadata crosswalk requiring such a mapping (e.g. to the Dataverse, Zenodo or JSON-LD format).</p> <p>The role of this definition file is illustrated with the figure above  </p> <p>The file must have 5 columns with headers defined as follows:</p> <ul> <li>column 1 - CVname : name of the mapping entry</li> <li>column 2 - CVtype : type of the CV target (must be either bioportal or skosmos)</li> <li>column 3 - CVurl : URL of the corresponding web API</li> <li>column 4 - CVterm : name of the thesaurus or the ontology list separated by a comma</li> <li>column 5 - CVlang : the chosen language (mainly for thesauri)</li> </ul> <p>Below an example of Mapping definition file (TSV)</p> <p> </p>"},{"location":"definitions/oai-pmh/","title":"OAI-PMH Definition File","text":"<p>OAI-PMH is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives.</p>"},{"location":"definitions/oai-pmh/#oai-pmh-definition-file_1","title":"OAI-PMH definition File","text":"<p>This definition file will allow harvesters to collect metadata structured according to a standard schema (OAI-DC).</p> <ul> <li> <p>Based on the Open Archives Initiative Protocol for Metadata Harvesting - Version 2</p> </li> <li> <p>Example of a OAI-PMH Data Provider Validation</p> </li> <li> <p>Example of OAI-PMH output for a dataset</p> <ul> <li>FRIM dataset</li> </ul> </li> </ul> <p>The structure of the OAI-PMH output file being known internally, a minimum of information is therefore necessary to carry out the correspondence.</p> <p>Example of OAI-PMH definition file (TSV)  </p> <p>Another example of OAI-PMH definition file (TSV) with identifers &amp; vocabulary mapping  </p> <p></p>"},{"location":"definitions/terminology/","title":"Terminology","text":""},{"location":"definitions/terminology/#definition-of-terminology","title":"Definition of terminology","text":"<p>There are two definition files to set up.</p> <ul> <li>The terminology definition file (config_terms.txt) serving to describe all terminology used to define the metadata of a dataset.</li> <li>The terminology documentation file (config_doc.txt) serving to documente all terminology definitions.</li> </ul> <p>Each time there is a change in these two definition files, it is necessary to convert them so that they are taken into account by the application.</p> <p>Terminology is the set of terms used to define the metadata of a dataset. A single file (web/conf/config_terms.txt) contains all the terminology. The input and search interfaces (e.g screenshot) are completely generated from this definition file, thus defining i) each of the fields, their input type (checkbox, dropbox, textbox, ...) and ii) the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms).</p> <p> </p> <p>The metadata schema proposed by defaut is mainly established according to the DDI (Data Documentation Initiative) schema that also corresponds to that adopted by the Dataverse software.</p> <p>Terminology is organised in several sections. By default 6 sections are proposed, but you can redefine them as you wish:</p> <ul> <li>DEFINITION : Section for describing shortly the dataset. </li> <li>STATUS : Section for defining the status of the dataset, associated rights, dates, etc.</li> <li>MANAGEMENT : Section for assigning names of people or organizations who participated in the production of data and according to the type of participation.</li> <li>DESCRIPTORS : Section for defining elements characterizing the data themselves and certain experimental conditions for obtaining them.</li> <li>OTHER : Section for entering miscellious information (protocols, comments, issues, ...)</li> <li>RESOURCES : Section for defining metadata about all the resources you want, i.e both external (links) and internal (data files on the storage space) resources. This section does not require any configuration a priori, it is added de facto.</li> </ul> <p>For each section, fields are then defined. These fields can be defined according to the way they will be entered via the web interface. There are 6 different types of input: check boxes (checkbox), drop lists (dropbox), single-line text boxes (textbox), single-line text boxes with an additional box for multiple selection from a catalog of terms (multiselect), date picker (datebox) and multi-line text boxes (areabox).</p> <p> </p> <p>For two types (checkbox and dropbox), it is possible to define the values to be selected (predefined terms).</p> <p></p>"},{"location":"definitions/terminology/#structure-of-the-terminology-definition-file-tsv","title":"Structure of the Terminology definition file (TSV)","text":"<p>The file must have 9 columns with headers defined as follows:</p> <ul> <li>column 1 - Field : shortname of the fields</li> <li>column 2 - Section : shortname ot the sections</li> <li>column 3 - Required : indicates if the field is mandatory ('Y') or not ('N')</li> <li>column 4 - Search : indicates if the field can be used as a criterion search ('Y') or not ('N')</li> <li>column 5 - Shortview : indicates with ordered numbers if the field serves for the overview table after the search (empty by default)</li> <li>column 6 - Type : indicates the way they will be entered via the web interface (possible values are: textbox, dropbox, checkbox, multiselect, datebox and areabox).</li> <li>column 7 - Features : dependings on the Type value, one can specifiy some specific features. If several features, they must be separated by a comma.<ul> <li>open=0 or open=1 (checkbox) :  indicates if the selection is opened or not. See Vocabulary.</li> <li>autocomplete=entity (textbox, checkbox &amp; multiselect) :  The entity.js file must be present under web/cvlist/entity/ if the entity is a dictionary otherwise it must be present under web/js/autocomplete. See Vocabulary.</li> <li>width=NNNpx (textbox, dropbox, datebox) : allows you to specify the width of the box. Usefull if you want put several fields in the same line. See note 1 below.</li> <li>row=NN and cols=NN (areabox) : allows you to specify the row and column size of the textarea.</li> </ul> </li> <li>column 8 - Label : Labels corresponding to the fields that will appear in the web interface</li> <li> <p>column 9 - Predefined terms : for fields defined with a type equal to checkbox or dropbox, one can give a list of terms separated by a comma.</p> </li> <li> <p>Notes</p> <ul> <li>the fields will be displayed in the same order as in the file and by section. So if you want to specify several textboxes with particular sizes so that they are on the same line, they should belong to the same section and follow each other in the file in the same order.</li> <li>the title and description fields are mandatory but not necessarily in the same section.</li> </ul> </li> </ul> <p>Below an example of Terminology definition file (TSV)  </p> <p>Example of Maggot JSON file generated based on the same definition file</p> <ul> <li>Maggot JSON of the FRIM dataset and its corresponding JSON-schema</li> </ul> <p></p>"},{"location":"definitions/terminology/#structure-of-the-terminology-documentation-file-tsv","title":"Structure of the Terminology documentation file (TSV)","text":"<p>The documentation definition file is used to have online help for each field (small icon placed next to each label on the form). So it should only be modified when a field is added or deleted, or moved to another section. This file will be used then to generate the online metadata documentation according to the figure below (See Configuration to find out how to carry out this transformation).</p> <p> </p> <p>The file must have 3 columns with headers defined as follows:</p> <ul> <li>column 1 - Type : The type of the element, namely 'section', 'field' or 'option'. An 'option' type must correspond to each of the options for a field corresponding to a drop-down list.</li> <li>column 2 - Name : Name of the element. The names of the sections, variables and drop-down options must be exactly the same as those specified in the terminology definition file.</li> <li>column 3 - Description : The description corresponding to the element, serving as much as possible to give indications on the information to be selected or entered, in order to remove possible ambiguities.</li> </ul> <p>Below an example of Terminology documentation file (TSV)  </p> <p>Same example as above converted to HTML format using Markdown format</p> <ul> <li>Metadata Documentation</li> </ul> <p></p>"},{"location":"definitions/vocabulary/","title":"Vocabulary","text":""},{"location":"definitions/vocabulary/#vocabulary_1","title":"Vocabulary","text":"<ul> <li>In this section we expose the full extent of the possibilities concerning the vocabulary in Maggot.</li> <li>Choosing the type of vocabulary and how to enter it depends entirely on what you put in the terminology definition file. However, some approaches require a little technicality by writing small scripts based on JavaScript, but nothing too serious. You can always take an already ready-made script and modify only the part that concerns your focus.</li> </ul> <p>1 -  Vocabulary based on a list of terms fixed in advance (checbox with feature open=0)</p> <ul> <li>List of well-chosen and limited Control Vocabulary e.g according to a reference e.g. Data Document Initiative.</li> </ul> <p> </p> <p>2 - Vocabulary open for addition (checkbox with feature open=1)</p> <ul> <li>allows you to collect the desired Control Vocabulary (CV) from users. In order to initiate the list, you can put some terms in the predefined terms column of the terminology definition file.</li> </ul> <p> </p> <p>3 - Vocabulary based on a web API in a text field (textbox)</p> <ul> <li>The web API is defined in a JavaScript file with the same name as the assigned variable (here cities) and must present under web/js/autocomplete. For example, to enter a French city you can use the API geo.api.gouv.fr. See cities.js</li> </ul> <p> </p> <p>4 - Vocabulary based on a dictionary with multiple selection (multiselect) </p> <ul> <li>Dictionaries allow you to record multiple information necessary to define an entity, such as the names of people or even the funders. These information, once entered and saved in a file called a dictionary. Based on a very simple JavaScript retrieving the complete list of items included in the dictionary, thus creating a sort of internal API, we can fill a Maggot field by  autocompletion related to a search for these items.</li> <li>The JavaScript file must be named dico.js and be present under web/cvlist/dico/ where dico is the name of the dictionary. See for instance people.js</li> </ul> <p> </p> <p>5 - Vocabulary based on a SKOSMOS Thesaurus with multiple selection (multiselect) </p> <ul> <li>SKOSMOS is a web tool facilitating the posting of controlled vocabulary online in the form of a thesaurus according to the SKOS data model. It offers a navigation interface as well as a web API. A simple JavaScript allows you to easily connect this web API with a multiselect field.</li> <li>The JavaScript file must have the same name as the assigned variable (here VOvocab) and must present under web/js/autocomplete. See for instance VOvocab.js.</li> </ul> <p> </p> <p>6 - Vocabulary based on an OntoPortal with multiple selection (multiselect) </p> <ul> <li>Portals based on OntoPortal offer the wealth of ontologies according to several domains of application (e.g. BioPortal in the biomedical domain, AgroPortal in the domain of plants).</li> <li>No need of JavaScript file. The Bioportal Autocompletion widget has been implemented into Maggot. You have to only declare the ontology you want to use directly into the terminology definition file in order to easily connect this widget with a multiselect field.</li> </ul> <p> </p> <p></p>"},{"location":"definitions/zenodo/","title":"Zenodo Definition File","text":"<p>Open source research data repository software, approved by Europe.</p>"},{"location":"definitions/zenodo/#zenodo-definition-file_1","title":"Zenodo definition File","text":"<p>This definition file will allow Maggot to automatically export the dataset into a data repository based on Zenodo. The approach consists of starting from the Maggot metadata file in JSON format and transforming it into another JSON format compatible with Zenodo.</p> <p>The structure of the Zenodo JSON output file is not known internally, information on the structure will therefore be necessary to carry out the correspondence.</p> <p>Below an example of Zenodo definition file (TSV)  </p> <p>Example of Zenodo JSON file generated based on the definition file itself given as an example above.</p> <ul> <li>Zenodo JSON of the FRIM dataset</li> </ul> <p></p>"},{"location":"publish/","title":"Publish Metadata","text":""},{"location":"publish/#publish-metadata_1","title":"Publish Metadata","text":"<ul> <li>Once we have decided to publish our metadata with possibly our data, we can choose the repository that suits us. Currently repositories based on Dataverse and Zenodo are supported, both being Europe-approved repositories.</li> <li> <p>Using an approach that might be called \u201cmachine-readable metadata,\u201d it is possible to populate metadata for a dataset into one of the proposed data repositories via its web API, provided that you have taken care to correctly define your metadata schema so that it is possible to make a correspondence with the chosen data repository using a mapping definition file.</p> </li> <li> <p>The principle is illustrated by the figure above.</p> </li> </ul> <p> </p> <ul> <li>We start from the Maggot JSON format metadata file generated from the web interface and based on the metadata profile defined by the terminology definition files. </li> <li>Then from a file defining the correspondence between the Maggot fields and those of the target repository, we can perform a metadata crosswalk to the JSON format supported by the web API of the target repository.</li> <li>During the process we enrich the metadata with controlled vocabularies based either on dictionaries or on thesauri and/or ontologies. For the latter cases, we use the web APIs of these sources to perform the mapping (see the definition of mapping).</li> <li>Finally, to be able to carry out the transfer i.e. the submission to the target repository (we say \"push\" for short), we first need to connect to the repository in order to retrieve the key (the API token) authorizing us to submit the dataset. This obviously assumes that we have the privileges (creation/modification rights) to do so.</li> </ul>"},{"location":"publish/#httpswwwgooglecomsearchqmetadatacrosswalkdefinitionoqmetadatacrosswalk","title":"https://www.google.com/search?q=metadata+crosswalk+definition&amp;oq=metadata+crosswalk","text":""},{"location":"publish/dataverse/","title":"Publish into Dataverse","text":""},{"location":"publish/dataverse/#publish-into-dataverse_1","title":"Publish into Dataverse","text":"<ul> <li>Based on the Dataverse Native API</li> </ul> <p>1 - To submit metadata to a Dataverse repository, you must first select a dataset either from the drop-down list corresponding to the datasets listed on the data storage space or a metadata file from your local disk.</p> <p>2 - You then need to connect to the repository in order to retrieve the key (the  API token) authorizing you to submit the dataset. This obviously assumes that you have the privileges (creation/modification rights) to do so.</p> <p>3 - After choosing the repository URL, you must also specify on which dataverse collection you want to deposit the datasets. As previously, you must have write rights to this dataverse collection.</p> <p> </p> <p></p> <ul> <li>Then, all you have to do is click on 'Publish' to \"push\" the metadata to the repository. The figure below illustrates based on an example how the metadata is recorded in the repository as well as the Mapping corresponding to the fields linked to Controlled Vocabularies.</li> </ul> <p></p> <p></p>"},{"location":"publish/dataverse/#deposit-data-files","title":"Deposit data files","text":"<ul> <li> <p>If you also want to deposit data files at the same time as the metadata, you will need:</p> <ul> <li> <p>1 - declare the files to be deposited in the resources; these same files must also be present in the storage space.</p> </li> <li> <p>2 - create a semaphore file (META_datafile_ok.txt); its sole presence, independently of its content (which may be empty) will authorize the transfer. Indeed, the creation of such a file guarantees that the user has actually write rights to the storage space corresponding to his dataset. This prevents someone else from publishing the data without having the right to do so. This mechanism also avoids having to manage user accounts on Maggot.</p> </li> </ul> </li> </ul> <p> </p> <p></p> <ul> <li>The figure below illustrates based on an example how data files appear on the repository with annotations corresponding to those created in Maggot.  </li> </ul> <p></p>"},{"location":"publish/zenodo/","title":"Publish into Zenodo","text":""},{"location":"publish/zenodo/#publish-into-zenodo_1","title":"Publish into Zenodo","text":"<ul> <li>Based on the Zenodo REST-API</li> </ul> <p>1 - To submit metadata to a Zenodo repository, you must first select a dataset either from the drop-down list corresponding to the datasets listed on the data storage space or a metadata file from your local disk.</p> <p>2 - Unless you have previously saved your API token, you must create a new one and copy and paste it before validating it. Before validating, you must check the deposit:access and deposit:write boxes in order to obtain creation and modification rights with this token.</p> <p>3 - After choosing the repository URL, you can optionally choose a community to which the dataset will be linked. By default, you can leave empty this field.</p> <ul> <li>Warning : given the new changes introduced to the Zenodo validation process (October 2023), it seems that it is no longer possible to validate a community via API. Only a choice via the Zenodo web interface will allow you to do so in order to be validated later by the manager of this community.</li> </ul> <p> </p> <p></p>"},{"location":"publish/zenodo/#deposit-data-files","title":"Deposit data files","text":"<ul> <li> <p>If you also want to deposit data files at the same time as the metadata, you will need (see figure below)</p> <ul> <li> <p>1 - declare the files to be deposited in the resources (1) ; these same files must also be present in the storage space.</p> </li> <li> <p>2 - create a semaphore file (META_datafile_ok.txt) (2); its sole presence, independently of its content will authorize the transfer. Indeed, the creation of such a file guarantees that the user has actually write rights to the storage space corresponding to his dataset. This prevents someone else from publishing the data without having the right to do so. This mechanism also avoids having to manage user accounts on Maggot.</p> </li> </ul> </li> <li> <p>Then, all you have to do is click on 'Publish' to \"push\" the metadata and data to the repository (3).</p> </li> <li> <p>After submission and if everything went well, a link to the deposit will be given to you (4).</p> </li> </ul> <p> </p> <p></p> <ul> <li>The figure below illustrates based on an example how the metadata and data is recorded in the repository.</li> </ul> <p> </p> <p></p>"},{"location":"tutorial/","title":"Quick tutorial","text":""},{"location":"tutorial/#quick-tutorial_1","title":"Quick tutorial","text":"<p>This is a quick tutorial of how to use the Maggot tool in practice and therefore preferably targeting the end user. </p> <p>See a short Presentation and Poster if you want to have a more general overview of the tool.</p> <p></p>"},{"location":"tutorial/#overview","title":"Overview","text":"<p>The Maggot tool is made up of several modules, all accessible from the main page by clicking on the corresponding part of the image as shown in the figure below:</p> <p> </p> Configuration <p>This module mainly concerns the data manager and makes it possible to construct all the terminology definition files, i.e. the metadata and sources of associated vocabularies. See Definition files then Configuration.</p> Private Access <p>This module allows data producer to temporarily protect access to metadata for the time necessary before sharing it within his collective. See Private access key management.</p> Dictionaries <p>This module allows data producer to view content of all dictionaries. It also allows data steward to edit their content. See Dictionaries for technical details only.</p> Metadata Entry <p>This is the main module allowing the data producer to enter their metadata relating to a dataset. See the corresponding tutorial for Metadata Entry.</p> Search datasets <p>This module allows users to search datasets based on the associated metadata, to see all the metadata and possibly to have access to the data itself. This obviously assumes that the metadata files have been deposited in the correct directory in the storage space dedicated to data management within your collective. See Infrastructure.</p> File Browser <p>This module gives users access to a file browser provided that the data manager has installed it. See File Browser</p> Publication <p>This module allows either the data producer or the data steward to publish the metadata with possibly the corresponding data within the suitable data repository. See Publication</p> <p></p>"},{"location":"tutorial/describe/","title":"Quick tutorial","text":""},{"location":"tutorial/describe/#metadata-entry","title":"Metadata Entry","text":"<p>The figures are given here for illustration purposes but certain elements may be different for you given that this will depend on the configuration on your instance, in particular the choice of metadata, and the associated vocabulary sources.</p> <p>Indeed, the choice of vocabulary sources (ontologies, thesauri, dictionaries) as well as the choice of metadata fields to enter must in principle have been the subject of discussion between data producers and data manager during the implementation of the Maggot tool in order to find the best compromise between the choice of sources and all the scientific fields targeted (see Definition files). However a later addition is always possible.</p> <p></p>"},{"location":"tutorial/describe/#overview","title":"Overview","text":"<p>When you enter the metadata entry module you should see a page that looks like the figure below:</p> <p> </p> <ul> <li> <p>All the fields (metadata) to be filled in are distributed between several tabs, also called sections. Each section tries to group together a set of fields relating to the same topic.</p> </li> <li> <p>You can reload a previously created metadata file. All form fields will then be initialized with the value(s) defined in the metadata file.</p> </li> <li> <p>You must at least complete the mandatory fields marked with a red star.</p> </li> <li> <p>It is possible to obtain help for each field to be completed. A mini-icon with a question mark is placed after each field label. By clicking on this icon, a web page opens with the focus on the definition of the corresponding field. This help should provide you with at least a definition of a field and, if necessary, instructions on how to fill it in. It should be noted that the quality of the documentation depends on each instance and its configuration.</p> </li> <li> <p>Once the form has been completed, even partially (at least those which are mandatory and marked with a red star), you can export your metadata in the form of a file. See Metadata File</p> </li> </ul> <p></p>"},{"location":"tutorial/describe/#dictionaries","title":"Dictionaries","text":"<p>Dictionary-based metadata (e.g. people's names) can easily be entered by autocomplete in the 'Search value' box provided the name appears in the corresponding dictionary.</p> <p> </p> <p>However, if the name does not yet appear in the dictionary, simply enter the full name (first name &amp; last name) in the main box, making sure to separate each name with a comma and then a space as shown in the figure below.</p> <p> </p> <p>Then you can request to add the additional person name(s) to the dictionary later as described below:</p> <p> </p> <ul> <li> <p>From the home page, select \"Dictionaries\". As username, just put \"maggot\" (this might be different within your instance).</p> </li> <li> <p>Then after choosing the \"people\" dictionary, you can download the entire dictionary in a TSV file (Tab-Separated Values) ready to be edited with your favorite spreadsheet.</p> </li> <li> <p>Add all the desired people's names with their institution, and possibly their ORCID and their email address. Please note that emails are required for authors and contacts</p> </li> <li> <p>You will then just have to send it to the data manager so that he can add new people's names to the online dictionary.</p> </li> </ul> <p>Please proceed in the same way for all dictionaries (people, funders, producer, vocabulary)</p> <p></p>"},{"location":"tutorial/describe/#controlled-vocabulary","title":"Controlled Vocabulary","text":"<p>Depending on the configuration of your instance, it is very likely that certain fields (eg. keywords) are connected to a controlled vocabulary source (e.g. ontology, thesaurus). Vocabulary based on ontologies, thesauri or even dictionaries can easily be entered by autocomplete in the \"search for a value\" box provided that the term exists in the corresponding vocabulary source. </p> <p> </p> <p>If a term cannot be found by autocomplete, you can enter the term directly in the main box, making sure to separate each term with a comma and a space as shown in the figure below.</p> <p> </p> <p>The data steward will later try to link it to a vocabulary source that may be suitable for the domain in question. Furthermore, even if the choice of vocabulary sources was made before the tool was put into service, a later addition is always possible. You should make the request to your data manager.</p> <p></p>"},{"location":"tutorial/describe/#resources","title":"Resources","text":"<p>Because data is often scattered across various platforms, databases, and file formats, this making it challenging to locate and access. This is called data fragmentation. So the Maggot tool allows you to specify resources, i.e. data in the broader sense, whether external or internal, allowing to centralize all links towards data.</p> <ul> <li>External resources will be specified by a URL with preference for a permanent identifier (e.g. DOI) but also any URL pointing to data whether they comply with the FAIR principle (e.g. ODAM) or not.</li> <li>Internal resources will be the data files to be uploaded to the data repository at push time. In the latter case the exact name of the file on the storage space must appear in the location field.</li> <li>Furthermore, in the case of local data management, it would be wise to indicate in which space the data is located if it is not located in the same place as the metadata (e.g. NextCloud, Unit NAS, etc.)</li> </ul> <p>Four fields must be filled in :</p> <p> </p> <ul> <li> <p>Resource Type : Choose the type of the resource in the droplist.</p> </li> <li> <p>Media Type : Choose a media type if applicable by autocomplete.</p> </li> <li> <p>Description : Provide a concise and accurate description of the resource. Must not exceed 30 characters.</p> </li> <li> <p>Location : Preferably indicate an URL to an external resource accessible to all. But it can also be a password-protected resource (e.g. a disk space on the cloud). This can also be text clearly indicating where the resource is located (internal disk space). Finally, this can be the name of a file deposited on the same disk space as the metadata file, in order to be able to push it in the data repository at the same time as the metadata (see Publication).</p> </li> </ul> <p></p>"},{"location":"tutorial/metadata/","title":"Quick tutorial","text":""},{"location":"tutorial/metadata/#metadata-file","title":"Metadata File","text":"<p>Once the form has been completed, even partially (at least those which are mandatory and marked with a red star), you can export your metadata in the form of a file. The file is in JSON format and must have the prefix 'META_'.</p> <p>By clicking on the \"Generate the metadata file\" button, you can save it on your disk space. </p> <p> </p> <p>Furthermore, if email sending has been configured (see settings), then you have the possibility of sending the metadata file to the data managers for conservation, and possibly also for supporting its storage on data disk space if specific rights are required.</p> <p> </p> <p>In case you want to save the metadata file on your disk space, you have two ways to use this file:</p> 1. The first use is the recommended one because it allows metadata management within your collective. <p>You drop the metadata file directly under the data directory corresponding to the metadata. Indeed, when installing the tool, a storage space dedicated to the tool had to be provided for this purpose. See infrastructure. Once deposited, you just have to wait around 30 minutes maximum so that the tool has had time to scan the root of the data directories looking for new files in order to update the database. After this period, the description of your dataset will be visible from the interface, and a selection of criteria will be made in order to restrict the search.</p> <p> </p> <p>You will then have the possibility to publish the metadata later with possibly the corresponding data in a data repository such as Dataverse or Zenodo.</p> 2. The second use is only to deposit the metadata into a data repository <p>Whether with Dataverse or Zenodo, you have the possibility of publishing metadata directly in one or other of these repositories without using the storage space.</p> <p> </p> <p>Please note that you cannot also deposit the data files in this way. You will have to do this manually for each of them directly online in the repository.</p> <p></p>"}]}
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 187b23b8a4ba71e499c1bccf0c6c8c4200d7d76b..eb5e42a0cdfe2da80258fa96acb245d3178a0c9e 100755
GIT binary patch
delta 13
Ucmb=gXP58h;5a1`F_FCj02~Vh1^@s6

delta 13
Ucmb=gXP58h;BfW~pU7ST02$^3vj6}9