From 260dc7a52547858efca4c1ac9522ceb7ddcbd969 Mon Sep 17 00:00:00 2001
From: "djacob65@gmail.com" Daniel Jacob, Francois Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil (2024) Maggot: An ecosystem for sharing metadata within the web of FAIR Data, BioRxiv, https://doi.org/10.1101/2024.05.24.595703 Daniel Jacob, Francois Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil (2024) Maggot: An ecosystem for sharing metadata within the web of FAIR Data, BioRxiv, https://doi.org/10.1101/2024.05.24.595703 An ecosystem for sharing metadata Sharing descriptive Metadata is the first essential step towards Open Scientific Data. With this in mind, Maggot was specifically designed to annotate datasets by creating a metadata file to attach to the storage space. Indeed, it allows users to easily add descriptive metadata to datasets produced within a collective of people (research unit, platform, multi-partner project, etc.). This approach fits perfectly into a data management plan as it addresses the issues of data organization and documentation, data storage and frictionless metadata sharing within this same collective and beyond. The main functionalities of Maggot were established according to a well-defined need (See Background). See a short Presentation and Poster for a quick overview. Note: The step numbers indicated in the figure correspond to the different points developed below 1 - First you must define all the metadata that will be used to describe your datasets. All metadata can be defined using a single file (in TSV format, therefore using a spreadsheet). This is a unavoidable step because both input and search interfaces are completely generated from these definition files, defining in this way each of the fields along with their input type and also the associated Controlled Vocabulary (ontology, thesaurus, dictionary, list of fixed terms). The metadata proposed by default was mainly established according to the DDI (Data Documentation Initiative) metadata schema. This schema also largely corresponds to that adopted by the Dataverse software. See the Terminology Definition section. 2 - Entering metadata will be greatly facilitated by the use of dictionaries. The dictionaries offered by default are: people, funders, data producers, as well as a vocabulary dictionary allowing you to mix ontologies and thesauri from several sources. Each of these dictionaries allows users, by entering a name by autocompletion, to associate information which will then be added when exporting the metadata either to a remote repository, or for harvesting the metadata. Thus this information, once entered into a dictionary, will not need to be re-entered again. 3 - The web interface for entering metadata is entirely built on the basis of definition files. The metadata are distributed according to the different sections chosen, each constituting a tab (see screenshot). Mandatory fields are marked with a red star and must be documented in order to be able to generate the metadata file. The entry of metadata governed by a controlled vocabulary is done by autocompletion from term lists (dictionary, thesaurus or ontology). We can also define external resources (URL links) relating to documents, publications or other related data. Maggot thus becomes a hub for your datasets connecting different resources, local and external. Once the mandatory fields (at least) and other recommended fields (at best) have been entered, the metadata file can be generated in JSON format. 4 - The file generated in JSON format must be placed in the storage space reserved for this purpose. The role played by this metadata file can be seen as a README file adapted for machines, but also readable by humans. With an internal structure, it offers coherence and consistency of information that a simple README file with a completely free and therefore unstructured text format does not allow. Furthermore, the central idea is to use the storage space as a local data repository, so that the metadata should go to the data and not the other way around. 5 - A search of the datasets can thus be carried out on the basis of the metadata. Indeed, all the JSON metadata files are scanned and parsed according to a fixed time interval (30 min) then loaded into a database. This allows you to perform searches based on predefined metadata. The search form, in a compact shape, is almost the same as the entry form (see a screenshot). Depending on the search criteria, a list of data sets is provided, with for each of them a link pointing to the detailed sheet. 6 - The detailed metadata sheet provides all the metadata divided by section. Unfilled metadata does not appear by default. When a URL can be associated with information (ORCID, Ontology, web site, etc.), you can click on it to go to the corresponding link. Likewise, it is possible to follow the associated link on each of the resources. From this sheet, you can also export the metadata according to different schemata (Dataverse, Zenodo, JSON-LD). See screenshot 1 & screenshot 2. 7 - Finally, once you have decided to publish your metadata with your data, you can choose the repository that suits you (currently repositories based on Dataverse and Zenodo are supported). Being able to generate descriptive metadata from the start of a project or study without waiting for all the data to be acquired or processed, nor for the moment when one wish to publish data, thus respecting the research data lifecycle as best as possible. Read more. The implementation of the tool requires involving all data stakeholders upstream (definition of the metadata schema, vocabularies, targeted data repositories, etc.); everyone has their role: data manager/data steward on one side but also scientists and data producers on the other. Read more. A progressive rise towards an increasingly controlled and standardized vocabulary is not only possible but even encouraged. First we can start with a simple vocabulary dictionary used locally and grouping together domain vocabularies. Then we can consider the creation of a thesaurus with or without mapping to ontologies. The promotion of ontologies must also be done gradually by selecting those which are truly relevant for the collective. A tool like Maggot makes it easy to implement them (See Vocabulary). Read more. Concerning the second idea: Given the diversity of the fields, the approach chosen is to be both the most flexible and the most pragmatic possible by allowing users to choose their own vocabulary (controlled or not) corresponding to the reality of their field and their activities. However, a good approach is as much as possible to use only controlled vocabulary, that is to say relevant and sufficient vocabulary used as a reference in the field concerned to allow users to describe a project and its context without having to add additional terms. To this end, the tool must allow users a progressive approach towards the adoption of standardized controlled vocabularies (thesauri or even ontologies). With the approach proposed by Maggot, initially there is no question of opening the data, but of managing metadata associated with the data on a storage space with a precise perimeter represented by the collective (unit, team, project , platform, \u2026). The main characteristic of the tool is, above all, to \u201ccapture\u201d the metadata as easily as possible according to a well-chosen metadata schema. However, the opening of data via their metadata must be a clearly stated objective within the framework of projects financed by public institutions (e.g Europe). Therefore if you have taken care to correctly define your metadata schema so that it is possible to make a metadata crosswalk (using a mapping file) with a data repository recognized by the international community, then you can easily \"push\" its metadata with the data without having re-enter anything. Daniel Jacob (INRAE UMR BFP) | CATI PROSODIe Fran\u00e7ois Ehrenmann (INRAE UMR BioGECO) | CATI GEDEOP Philippe Chaumeil (INRAE UMR BioGECO) Edouard Guitton (INRAE Dept. SA, Emerg'IN) St\u00e9phane Bernillon (INRAE UR MycSA) Joseph TRAN (INRAE UMR EGFV) | CATI BARIC To guarantee the authenticity and integrity of a metadata file by recording it permanently and immutably on the bloxberg blockchain. Indeed, the blockchain is a technology that makes it possible to keep track of a set of transactions (writings in the blockchain), in a decentralized, secure and transparent manner, in the form of a blockchain. A blockchain can therefore be compared to a large (public or private) unfalsifiable register. Blockchain is today used in many fields because it provides solutions to many problems. For example in the field of Higher Education and Research, registration of dataset metadata in the blockchain, makes possible in this way to certify, in an inalienable, irrefutable and completely transparent manner, the ownership and authenticity of the data as well as for example, the license of use and the date of production of the data. Research stakeholders are then more open to the dissemination of their data (files, results, protocols, publications, etc.) since they know that, in particular, the ownership, content and conditions of use of the data cannot not be altered. The Maggot tool could thus serve as a gateway to certify its data with the associated metadata. The complete process is schematized by the following figure: bloxberg is the most important blockchain project in science. It was founded in 2019 by MPDL , looking for a way to store research results and make them available to other researchers. In this sense, bloxberg is a decentralized register in which results can be stored in a tamper-proof way with a time stamp and an identifier. bloxberg is based on the Ethereum Blockchain. However, it makes use of a different consensus mechanism: instead of \u201cProof of Stake\u201d used by Ethereum since 2022, bloxberg validates blocks through \u201cProof of Authority\u201d. Each node is operated by one member. All members of the association are research institutions and are known in the network. Currently, bloxberg has 49 nodes. It is an international project with participating institutions from all over the world. You will need a Ethereum address and an API key (must be requested via bloxberg-services (at) mpdl.mpg.de). See an example of pushing a metadata file to the bloxberg blockchain using Maggot. A single file (web/conf/config_terms.txt) contains all the terminology. The input and search interfaces are completely generated from this definition file, thus defining each of the fields, their input type (checkbox, dropbox, textbox, ...) and the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms). This is why a configuration and conversion step into JSON format is essential in order to be able to configure all the other modules (example: creation of the MongoDB database schema when starting the application before filling it). This function is used to generate the terminology definition file in JSON format (config_terms.json) and the corresponding JSON-Schema file (maggot-schema.json) from a tabulated file (1). You can either create a terminology definition file in TSV format from scratch (see below to have more details), or extract the file from the current configuration (see JSON to TSV). Once the terminology definition file has been obtained (2), you can load it and press 'Submit'. Three files are generated (3 & 5): This function generates the markdown documentation file (doc.md) from the template file (config_doc.txt) which is itself generated from the metadata definition file (config_terms.txt, cf TSV to JSON). Once the template file for the documentation (config_doc.txt) has been edited and documented (6) (see below to have more details), you can load it and press Submit button. The documentation file in markdown format (doc.md) is thus generated (7) and must be placed in the web/docs directory (8). Users will have access to this documentation file via the web interface, in the documentation section, heading \"Metadata\". Like any dictionary, there must be 3 files (see below). Please note that the names of these files must always contain the name of the dictionary, i.e. same as the directory. The format of the file containing the dictionary data (people.txt) is defined by another file (people_format.txt). Below, an example is given when modifying a record. When you click on the Institute field which is connected to the ROR web API, the drop-down list of reseach organizations that can correspond in the register appears, if there are any. Note: It is possible to edit dictionaries, by adding an entry for example, and at the same time be able to immediately find this new entry in the metadata entry in the Maggot tool. Indeed each dictionary is reloaded into memory as soon as the corresponding input box is clicked. See an illustration. Funders : The dictionary of the funders allows you to define the funding agency, project ID and its corresponding URL. Producers : The dictionary of the data producers allows you to define their Institute and project ID and their corresponding URL. Optionally, you can add the URL of the logo. Vocabulary : Use this dictionary for mixing thesauri and ontologies in order to better target the entire controlled vocabulary of its field of application. Only the vocabulary is mandatory, the URL linked to an ontology or a thesaurus is optional. See Vocabulary section to learn the extent of the possibilities concerning vocabulary in Maggot. The necessary Infrastructure involves 1) a machine running a Linux OS and 2) a dedicated storage space. 1 - The machine will most often be of \"virtual\" type because more simpler to deploy, either locally (with VM providers such as VirtualBox, VMware Workstation or MS Hyper-V) or remotely (e.g VMware ESXi, Openstack: example of deployment). Moreover, the OS of your machine must allow you the deployment of docker containers. See for more details on \u201cWhat is Docker\u201d. The minimum characteristics of the VM are: 2 cpu, 2 Go RAM, 8 Go HD. 2 - The dedicated storage space could be either in the local space of the VM, or in a remote place on the network. Requirements: The installation must be carried out on a (virtual) machine with a recent Linux OS that support Docker (see Infrastructure) Go to the destination directory of your choice then clone the repository and MAGGOT uses 3 Docker images for 3 distinct services: See Configuration settings Warning : You have to pay attention to put the same MongoDB settings in all the above configuration files. It is best not to change anything. It would have been preferable to put a single configuration file but this was not yet done given the different languages involved (bash, javascript, python, PHP). To be done! Note : If you want to run multiple instances, you will need to change in the run file, i) the container names, ii) the data path, iii) the MongoDB volume name and iv) the MongoDB port The following two JSON files are defined by default but can be easily configured from the web interface. See the Terminology Configuration section. The run shell script allows you to perform multiple actions by specifying an option : Options: You must first build the 3 docker container images if this has not already been done, by : Links&
Preprint¶
+Contacts¶
Linksinrae/pgd-mmdt/issues
Preprint¶
+Contacts¶
"},{"location":"about/","title":"About","text":""},{"location":"about/#background","title":"Background","text":""},{"location":"about/#motives","title":"Motives","text":"
"},{"location":"about/#state-of-need","title":"State of need","text":"
"},{"location":"about/#proposed-approach","title":"Proposed approach","text":"
"},{"location":"about/#links","title":"Links","text":"
"},{"location":"about/#contacts","title":"Contacts","text":"
"},{"location":"about/#designers-developers","title":"Designers / Developers","text":"
"},{"location":"about/#contributors","title":"Contributors","text":"
"},{"location":"configuration/","title":"Configuration","text":""},{"location":"configuration/#terminology-configuration","title":"Terminology configuration","text":"
"},{"location":"configuration/#tsv-to-json","title":"TSV to JSON","text":"
"},{"location":"configuration/#tsv-to-doc","title":"TSV to DOC","text":"
"},{"location":"configuration/#json-to-tsv","title":"JSON to TSV","text":"
"},{"location":"dictionaries/","title":"Dictionaries","text":""},{"location":"dictionaries/#presentation","title":"Presentation","text":"
"},{"location":"dictionaries/#the-people-dictionary","title":"The people dictionary","text":"sh ./run passwd <user>\n
"},{"location":"dictionaries/#other-dictionaries","title":"Other dictionaries","text":"var people = [];\n// Each item in the 'people' list consists of the first two columns (0,1) separated by a space\nget_dictionary_values('people', merge=[0,' ',1])
"},{"location":"gant/","title":"Gant","text":""},{"location":"gant/#gantt-diagrams-of-the-developments","title":"Gantt diagrams of the developments","text":"gantt dateFormat YYYY-MM-DD axisFormat %Y-%m title Diagrammes de Gantt pr\u00e9visionnel des d\u00e9veloppements section MongoDB 1: des1, 2023-11-01,60d 2: des2, 2023-12-01,90d 3: des3, 2023-12-01,90d section Couche API 4: des4, 2024-01-01,120d 5: des5, 2024-05-01,60d section Interface Web 6a: des6, 2024-06-01,60d 6b: des7, 2024-07-01,60d 6c: des8, 2024-09-01,60d"},{"location":"infrastructure/","title":"Infrastructure","text":""},{"location":"infrastructure/#infrastructure-local-remote-or-mixed","title":"Infrastructure : Local, Remote or Mixed","text":"
"},{"location":"installation/","title":"Installation","text":""},{"location":"installation/#install-on-your-linux-computer-or-linux-unix-server","title":"Install on your linux computer or linux / unix server","text":"cd
to your clone path:
"},{"location":"installation/#installation-of-docker-containers","title":"Installation of Docker containers","text":"git clone https://github.com/inrae/pgd-mmdt.git pgd-mmdt\ncd pgd-mmdt\n
"},{"location":"installation/#configuration","title":"Configuration","text":"
"},{"location":"installation/#commands","title":"Commands","text":"cd pgd-mmdt\nsh ./run <option>\n
"},{"location":"installation/#starting-the-application","title":"Starting the application","text":"
sh ./run build\n
The application can be sequentially started :
sh ./run start\n
sh ./run initdb\n
sh ./run scan\n
You can also launch these 3 steps with a single command:
sh ./run fullstart\n
Once the application is started, we can see if the containers are started using the following command:
docker ps -a\n
which should produce a result similar to the following:
\n CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n 5914504f456d pgd-mmdt-web \"docker-php-entrypoi.\" 12 seconds ago Up 10 seconds 0.0.0.0:8087->80/tcp, :::8087->80/tcp mmdt-web\n 226b13ed9467 pgd-mmdt-scan \"cron -f\" 12 seconds ago Up 11 seconds mmdt-scan\n 81fecbb56d23 pgd-mmdt-db \"docker-entrypoint.s.\" 13 seconds ago Up 12 seconds 27017/tcp mmdt-db\n
On the first line, the one which corresponds to the web interface, we see that port 80 of the docker is exported to port 8087 of the VM. Let's say that the IP address of your VM is 192.168.56.2, then in your browser you will need to put the URL http://192.168.56.2:8087/. You can of course change the port number in the 'run' file.
It may be preferable to use a lightweight http server like nginx so that the Maggot URL will be http://192.168.56.2/maggot/. Below an example of config:
## /etc/nginx/nginx.conf\nhttp {\n\n...\n upstream maggot { server 127.0.0.1:8087; }\n...\n\n}\n\n## /etc/nginx/conf.d/my-site.conf\n\nserver {\nlisten 80 default;\nserver_name $host;\n\n...\n\n location /maggot/ {\nproxy_set_header Host $host;\nproxy_set_header X-App-Name 'maggot';\nproxy_set_header X-Real-Ip $remote_addr;\nproxy_set_header X-Forwarded-Host $host;\nproxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\nproxy_pass http://maggot/;\n}\n\n...\n\n}\n
sh ./run stop\n
When updating the application, it is imperative to preserve a whole set of configuration files as well as the content of certain directories (dictionaries, javascripts dedicated to vocabularies, etc.). An update script is available (./etc/update-maggot.sh) preferably placed under '/usr/local/bin'. To preserve your configuration, it is recommended to create local configuration files.
A first file 'local.conf' will contain all the parameters to be preserved, initially contained in the 'run' file. A small example could be as follow :
#!/bin/bash\n\n# Local HTTP Port for web application\nWEB_PORT=8088\n\n# Path to the data\nDATADIR=/media/Workdir/Share/DATA/\n
A second file './web/inc/config/local.inc' will contain all the parameters to be preserved, initially contained in the './web/inc/config/config.inc' file. A small example could be as follow :
<?php\n\n# Main title\n$TITLE ='Metadata management - My Labs';\n$MAINTITLE =$TITLE;\n\n# File Browser\n$FILEBROWSER=1;\n$URL_FILEBROWSER='/fb/';\n\n# Enable some functionalities\n$export_oai = 1;\n\n?>\n
Note: See how to do proceed for configuration steps.
"},{"location":"installation/#file-browser","title":"File Browser","text":"You can provide access to your data via a file browser. This application must be installed separately but can be connected to Maggot by specifying the corresponding URL in the configuration file. Users and their rights are managed in the filebrowser application. Likewise, we can also create links to the data without a password. These links can be usefully specified as external resources in the metadata managed by Maggot.
See how to do install in github.
"},{"location":"private-access/","title":"Private access","text":""},{"location":"private-access/#private-access-key-management","title":"Private access key management","text":""},{"location":"private-access/#motivation","title":"Motivation","text":"Although the Maggot tool is designed to foster the sharing of metadata within a collective, it may be necessary to temporarily privatize access to the metadata of an ongoing project with confidentiality constraints. So even within our own collective, access to metadata must be restricted to authorized users only.
"},{"location":"private-access/#implementation","title":"Implementation","text":"The choice of not wanting to manage users in the Maggot tool was made in order to make the metadata completely open by default within a collective. Furthermore, access rights to the storage space are managed independently of the Maggot tool by the administrator of this space. It is therefore through the storage space that we must give or not access to the metadata via the web interface.
The chosen mechanism for privatizing access is described below. It has the dual advantage of being simple to implement and simple to use.
First we have to generate a file containing the encrypted key for a private access. This file must be generated from the web interface then downloaded as shown in the figure below. Then this file must be manually deposited in the data directory corresponding to the dataset whose access we wish to privatize. The presence of this file within a directory is enough to block access to metadata and data by default. It should be noted that we can put this same file containing the encrypted private key in several data directories (included within the same project for example). The deposit must be done by hand because the Maggot tool must only have access to the storage space in read mode. This also guarantees that the user has writing rights to this space without having to manage user accounts on the Maggot side.
By default, \u2018untwist1\u2019 metadata are not accessible to anyone
When we want to have access to the metadata of this dataset, we have to simply enter the private key in the current session. This will have the effect of unlocking access to the metadata via the web interface only in the current session of our web browser. This means that we will have to enter the private key for each session (by default, a session lasts a maximum of 1 hour).
Now the \u2018untwist1\u2019 metadata are accessible only to us
When we want to give access to the metadata to the entire collective, we simply need to delete the private access file (named by default 'META_auth.txt') from the concerned data directory.
Here is the list of all files that may be subject to adjustment of certain parameters according to the needs of the instance site.
"},{"location":"settings/#dockerscanpartscriptsconfigpy","title":"dockerscanpart/scripts/config.py","text":"This file defines the connection parameters to the Mongo database. Knowing that this database is only accessible internally, in principle they do not need to be changed.
Note: These settings must be the same as defined in dockerdbpart/initialisation/setupdb-js.template
Parameter Description Default value dbserver Name of the MongoDB server mmdt-db database Name of the MongoDB database pgd-db dbport Port of the MongoDB server 27017 username Username of the Mongo database pgd-db with Read/Write access userw-pgd password Password corresponding to the username of the Mongo DB pgd-db wwwww "},{"location":"settings/#incconfigmongodbinc","title":"inc/config/mongodb.inc","text":"This file defines the connection parameters to the Mongo database. Knowing that this database is only accessible internally, in principle they do not need to be changed.
Note: These settings must be the same as defined in dockerdbpart/initialisation/setupdb-js.template
Parameter Description Default value docker_mode Indicates whether the installation involves using docker containers. In this case, the Mongo DB IP address will be different from 127.0.0.1. 1 uritarget the Mongo DB IP address mmdt-db (docker_mode=1) or 127.0.0.1 (docker_mode=0) database Name of the MongoDB database pgd-db collection Name of the MongoDB collection metadata port Port of the MongoDB server 27017 username Username of the Mongo database pgd-db with Read access only userr-pgd password Password corresponding to the username of the Mongo DB pgd-db rrrrr "},{"location":"settings/#incconfigconfiginc","title":"inc/config/config.inc","text":"This file defines parameters related to i) the web interface, ii) the functionalities allowed for users. Only the parameters that could be useful to be changed for the needs of an instance are described here.
Parameter Description Default value EXTERN Indicates if the use of the tool is only for external use, i.e. without using a storage space. 0 PRIVATE_ACCESS Gives the possibility of managing private access to metadata 0 ZOOMWP Zoom level regarding the web interface. By reducing the size slightly, you get a better layout. 90% RESMEDIA Gives the possibility of putting a MINE type on each resource in the metadata. 1 TITLE Title to display in main banner Metadata management FILEBROWSER Indicates whether the file browser is used. This assumes it is installed. 0 URL_FILEBROWSER File browser URL. It can be absolute or relative. /fb/ APPNAME Name given in the URL to access the web interface. maggot dataverse_urls Array of Dataverse repository URLs where you can upload metadata and data - zenodo_urls Array of Zenodo repository URLs where you can upload metadata and data - SERVER_URL Default Dataverse repository URL https://entrepot.recherche.data.gouv.fr ZENODO_SERVER_URL Default Zenodo repository URL https://zenodo.org export_dataverse Indicates whether the Dataverse feature is enabled 1 export_zenodo Indicates whether the Zenodo feature is enabled 1 export_jsonld Indicates whether the JSON-LD feature is enabled 1 export_oai Indicates whether the OAI-PMH feature is enabled 0 export_bloxberg Indicates whether the Bloxberg Blockchain feature is enabled (Experimental) 0 cvdir Relative path of the Control Vocabulary Listes (cvlist) cvlist/ maggot_fulltitle Maggot name of the field corresponding to the title in dataverse/zenodo fulltitle auth_senddata_file Name of the file that must be present in the data directory to authorize the transfer of the data file META_datafile_ok.txt private_auth_file Name of the private access file META_auth.txt sendMail Configuring messaging for sending metadata to data managers (see below) NULLThe messaging configuration is done using the following array in the inc/config/config.inc file (or more judiciously in inc/config/local.inc in order to be preserved during an update) - To understand how it works see Send Emails using PHPmailer
$sendMail['smtpHost'] = 'smtp.example.org'; // Set the SMTP server to send through\n$sendMail['smtpSecure'] = 'tls'; // Enable TLS encryption\n$sendMail['smtpPort'] = 587; // Set the TCP port to connect to\n$sendMail['CheckEmail'] = 'maggot@exemple.org'; // Email address authorized to send emails\n$sendMail['CheckPass'] = 'password'; // The corresponding password\n$sendMail['CheckName'] = 'Maggot'; // Alias name\n$sendMail['UserEmail'] = 'admin@exemple.org'; // Email of data managers, separated by a comma\n
"},{"location":"settings/#run","title":"run","text":"This file contains the essential parameters to be set before any use.
Parameter Description Default value WEB_PORT Local HTTP Port for web application 8087 DATADIR Path to the data /opt/data/ DB_IMAGE Docker image name of the MongoDB pgd-mmdt-db SCAN_IMAGE Docker image name of the Scan process pgd-mmdt-scan WEB_IMAGE Docker image name of the Web interface pgd-mmdt-web DB_CONTAINER Docker container name of the MongoDB mmdt-db SCAN_CONTAINER Docker container name of the Scan process mmdt-scan WEB_CONTAINER Docker container name of the Web interface mmdt-web MONGO_VOL Volume name for MongoDB mmdt-mongodb MONGO_PORT HTTP Port for MongoDB 27017 USER Admin user in the htpasswd file admin "},{"location":"chats/chat1/","title":"chatGPT","text":""},{"location":"chats/chat1/#difference-between-data-manager-and-data-steward","title":"Difference Between Data Manager and Data Steward","text":""},{"location":"chats/chat1/#data-manager","title":"Data Manager","text":"Responsibilities:
Data Operations and Management:
Data Quality and Integrity:
Data Policies and Procedures:
Technical Oversight:
Performance Monitoring:
Skills:
Responsibilities:
Data Governance:
Data Quality Assurance:
Data Access and Security:
Liaison and Coordination:
Training and Support:
Skills:
Focus:
Scope of Work:
Technical vs. Governance:
Interaction with Stakeholders:
Data Generation:
Data Quality Assurance:
Data Documentation:
Adherence to Standards:
Data Security and Privacy:
Collaboration:
Foundation of Data Assets:
Enhancing Data Quality:
Supporting Data Governance:
Facilitating Data Integration:
Enabling Timely Insights:
Data Managers:
Data Stewards:
Data Consumers:
Time Constraints:
Lack of Expertise:
Inconsistent Practices:
Standardized Templates and Guidelines:
Training and Education:
Automated Documentation Tools:
Data Steward Support:
Integration into Workflow:
Incentives and Accountability:
Feedback Mechanism:
Documentation Checkpoints:
Collaboration Platforms:
Clear Roles and Responsibilities:
Chat generated by chatGPT 3.5 - May 24, 2024
"},{"location":"chats/chat2/","title":"chatGPT","text":""},{"location":"chats/chat2/#how-can-we-properly-respect-the-data-life-cycle-in-scientific-research-in-order-to-carry-out-good-data-management","title":"How can we properly respect the data life cycle in scientific research in order to carry out good data management?","text":"Properly respecting the data life cycle in scientific research is crucial for effective data management. The data life cycle encompasses the stages through which data passes, from its initial generation to its eventual archiving and potential reuse. Here\u2019s a comprehensive approach to managing the data life cycle in scientific research:
"},{"location":"chats/chat2/#1-planning","title":"1. Planning","text":"By adhering to these guidelines, researchers can ensure robust data management practices that enhance the reliability, reproducibility, and impact of scientific research.
"},{"location":"chats/chat2/#at-which-stages-of-the-data-life-cycle-is-it-most-crucial-to-document-data","title":"At which stages of the data life cycle is it most crucial to document data?","text":"Documentation is crucial at all stages of the data life cycle, but there are specific stages where it is particularly essential to ensure data integrity, reproducibility, and usability. Here are the stages where documentation is most critical:
"},{"location":"chats/chat2/#1-planning_1","title":"1. Planning","text":"By thoroughly documenting at these crucial stages, researchers can enhance the transparency, reproducibility, and overall quality of their data management practices.
Chat generated by chatGPT 3.5 - May 24, 2024
"},{"location":"chats/chat3/","title":"chatGPT","text":""},{"location":"chats/chat3/#what-are-the-steps-to-building-our-own-controlled-vocabulary","title":"What are the steps to building our own controlled vocabulary ?","text":"Creating a controlled vocabulary is a systematic process that involves several key steps. Here\u2019s a detailed guide to help you build your own controlled vocabulary:
"},{"location":"chats/chat3/#1-define-the-purpose-and-scope","title":"1. Define the Purpose and Scope","text":"By following these steps, you can develop a controlled vocabulary that improves information retrieval, enhances data consistency, and supports effective communication within your organization.
"},{"location":"chats/chat3/#what-are-the-differences-between-thesauri-ontologies-or-simple-vocabulary-dictionaries-in-which-cases-should-each-of-them-be-favored","title":"What are the differences between thesauri, ontologies or simple vocabulary dictionaries? In which cases should each of them be favored?","text":"Thesauri, ontologies, and simple vocabulary dictionaries are all tools for organizing and managing information, but they differ in complexity, structure, and use cases. Here\u2019s a breakdown of their differences and when to use each:
"},{"location":"chats/chat3/#thesauri","title":"Thesauri","text":"Definition: - A thesaurus is a controlled vocabulary that organizes terms into a structured hierarchy, showing relationships such as synonyms, antonyms, broader terms (BT), narrower terms (NT), and related terms (RT).
Structure: - Synonyms: Provides alternative words with similar meanings. - Hierarchical Relationships: Shows broader and narrower terms. - Associative Relationships: Connects related terms that are not hierarchically linked.
Use Cases: - Information Retrieval: Enhances search capabilities by providing alternative terms and related concepts. - Indexing and Cataloging: Standardizes terms for consistent categorization of information. - Library and Archival Systems: Organizes collections for easier access and discovery.
When to Use: - When you need to manage a large set of terms with hierarchical and associative relationships. - When you want to improve search functionality by including synonyms and related terms.
"},{"location":"chats/chat3/#ontologies","title":"Ontologies","text":"Definition: - An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It includes not just the terms, but also the rules and constraints that govern their use.
Structure: - Concepts: Detailed definitions of domain-specific concepts. - Relationships: Various types of relationships between concepts, including hierarchical, associative, and more complex logical relationships. - Attributes: Properties and characteristics of concepts. - Rules and Constraints: Logical rules that define how concepts can interact.
Use Cases: - Semantic Web and AI: Facilitates data integration, sharing, and interoperability in web and AI applications. - Knowledge Representation: Provides a detailed and formalized structure for representing domain knowledge. - Complex Data Integration: Integrates heterogeneous data sources by providing a common framework.
When to Use: - When you need a highly detailed and formal representation of knowledge with complex relationships. - When integrating diverse data sources and ensuring semantic interoperability is crucial. - For advanced applications in artificial intelligence, natural language processing, and the Semantic Web.
"},{"location":"chats/chat3/#simple-vocabulary-dictionaries","title":"Simple Vocabulary Dictionaries","text":"Definition: - A simple vocabulary dictionary is a basic list of terms and their definitions, without structured relationships between terms.
Structure: - Terms: A list of words or phrases. - Definitions: Brief explanations or meanings of each term.
Use Cases: - Glossaries: Provides quick reference for understanding terms and definitions. - Basic Cataloging: Lists terms for simple categorization without the need for complex relationships. - Learning and Education: Helps learners understand basic concepts and terminology.
When to Use: - When you need a straightforward reference tool for terms and their definitions. - For simple documentation where relationships between terms are not necessary. - When building foundational knowledge without the need for detailed hierarchical or associative structures.
"},{"location":"chats/chat3/#summary","title":"Summary","text":"Choosing the right tool depends on your specific needs regarding the complexity of the relationships between terms, the level of detail required, and the intended application.
"},{"location":"chats/chat3/#concerning-descriptive-metadata-can-we-envisage-a-gradual-evolution-in-the-use-of-simple-dictionaries-then-moving-to-thesauri-or-even-ontologies","title":"Concerning descriptive metadata, can we envisage a gradual evolution in the use of simple dictionaries, then moving to thesauri, or even ontologies?","text":"Yes, it is both practical and beneficial to envisage a gradual evolution in the use of descriptive metadata, starting with simple dictionaries and progressing to thesauri and ontologies as the needs and complexity of your data management grow. Here\u2019s how you can approach this evolutionary path:
"},{"location":"chats/chat3/#stage-1-simple-vocabulary-dictionaries","title":"Stage 1: Simple Vocabulary Dictionaries","text":"Initial Implementation: - Scope: Begin with a simple vocabulary dictionary to establish basic terminology and definitions. - Purpose: Ensure consistency in the use of terms across your organization or system. - Structure: Create a straightforward list of terms with clear definitions.
Benefits: - Ease of Use: Simple to create and manage. - Foundation: Provides a foundational understanding of key terms. - Quick Implementation: Allows for immediate improvement in consistency and clarity of metadata.
Use Cases: - Early Project Stages: When starting a new project or system where the primary goal is to standardize terminology. - Training and Education: Helps new team members or users quickly understand the basic terms and definitions.
"},{"location":"chats/chat3/#stage-2-thesauri","title":"Stage 2: Thesauri","text":"Intermediate Implementation: - Scope: Expand your vocabulary to include hierarchical and associative relationships. - Purpose: Improve information retrieval and organization by showing broader, narrower, and related terms. - Structure: Develop a thesaurus that includes synonyms, broader terms (BT), narrower terms (NT), and related terms (RT).
Benefits: - Enhanced Search: Improves search accuracy by including synonyms and related concepts. - Better Organization: Provides a more structured and interconnected view of your terms. - Scalability: Allows for the gradual addition of new terms and relationships.
Use Cases: - Growing Projects: As your project or system grows and the need for more detailed metadata relationships emerges. - Information Retrieval Systems: When enhancing search functionalities and user experience is a priority.
"},{"location":"chats/chat3/#stage-3-ontologies","title":"Stage 3: Ontologies","text":"Advanced Implementation: - Scope: Develop a comprehensive and formal representation of your domain knowledge. - Purpose: Facilitate complex data integration, interoperability, and advanced reasoning. - Structure: Create an ontology with detailed concepts, relationships, properties, and logical rules.
Benefits: - Semantic Interoperability: Ensures data consistency and integration across different systems and platforms. - Advanced Reasoning: Supports sophisticated queries and inference capabilities. - Detailed Knowledge Representation: Provides a rich and nuanced understanding of domain concepts and their interrelationships.
Use Cases: - Complex Systems: For projects requiring detailed and formal knowledge representation, such as in AI, machine learning, and the Semantic Web. - Interoperable Data Systems: When integrating diverse data sources and ensuring seamless interoperability is essential.
"},{"location":"chats/chat3/#evolution-strategy","title":"Evolution Strategy","text":"Ensure all team members use consistent terminology.
Thesaurus:
Improve search functionality by adding synonyms and related terms.
Ontology:
By following this gradual evolution, you can systematically enhance your metadata management capabilities, ensuring your system remains adaptable and meets the growing complexity of your data and user needs.
Chat generated by chatGPT 3.5 - May 24, 2024
"},{"location":"chats/chat4/","title":"chatGPT","text":""},{"location":"chats/chat4/#what-is-metadata-crosswalk","title":"What is metadata crosswalk?","text":"A metadata crosswalk is a structured mapping between the metadata elements of two or more different systems or standards. Metadata, in this context, refers to descriptive information about data or resources, such as titles, authors, dates, keywords, etc.
When different systems or standards are used to manage data or resources, they often have their own unique metadata schemas or formats. A metadata crosswalk helps to bridge the gap between these different schemas by establishing equivalences or relationships between the metadata elements used in each system.
For example, if you have a digital library system that uses one set of metadata standards and you want to integrate it with a repository that uses a different set of standards, you might create a metadata crosswalk to map the elements from one standard to the other, ensuring interoperability and consistency in metadata representation.
Metadata crosswalks are particularly useful in data integration, migration, and interoperability scenarios, enabling smooth communication and exchange of information between disparate systems or standards.
"},{"location":"chats/chat4/#whats-the-point","title":"What's the point ?","text":"The primary purpose of a metadata crosswalk is to facilitate interoperability and data exchange between systems or standards that use different metadata schemas. Here are some specific points highlighting its importance:
Interoperability: Different organizations or systems may use varying metadata standards or schemas to describe their data or resources. By creating a metadata crosswalk, you ensure that these systems can communicate with each other effectively, exchanging information without loss of meaning.
Data Integration: When consolidating data from multiple sources or systems, a metadata crosswalk helps ensure that the metadata associated with each dataset is properly understood and mapped to a common format. This facilitates the integration process and allows for more efficient analysis and management of the combined datasets.
Standardization: By mapping metadata elements between different standards or schemas, a crosswalk promotes standardization and consistency in how data is described and represented. This consistency is essential for ensuring accurate interpretation and retrieval of information across different systems or platforms.
Migration and Upgrades: During system migrations or upgrades, metadata formats may change. A metadata crosswalk can aid in transitioning from one metadata schema to another by providing a clear mapping between the old and new formats, thus minimizing disruptions and ensuring continuity in data management practices.
Data Sharing and Reuse: When sharing data with external parties or making it available for reuse, a metadata crosswalk helps ensure that the metadata associated with the data is understandable and interpretable by others. This promotes data sharing and reuse, as users can easily understand the content and context of the data they are accessing.
Overall, the point of a metadata crosswalk is to promote seamless communication, integration, and interoperability between different systems or standards, ultimately enhancing the usability and accessibility of data across diverse environments.
"},{"location":"chats/chat4/#can-you-suggest-some-interesting-links-on-the-subject-related-to-life-science","title":"Can you suggest some interesting links on the subject related to life science ?","text":"Certainly! Here are some interesting links related to metadata crosswalks in the context of life sciences:
NCBI Metadata Crosswalk Tool:
Metadata Standards and Tools for Life Sciences:
FAIRsharing:
NIH Common Data Elements (CDE) Repository:
These resources should provide you with a good starting point for exploring metadata standards and tools relevant to life sciences research.
Chat generated by chatGPT 3.5 - May 24, 2024
"},{"location":"definitions/","title":"Definition Files","text":""},{"location":"definitions/#metadata-definition-files","title":"Metadata definition files","text":"The Maggot tool offers great flexibility in configuration. It allows you to completely choose all the metadata you want to describe your data. You can base yourself on an existing metadata schema, invent your own schema or, more pragmatically, mix one or more schemas by introducing some metadata specific to your field of application. However, keep in mind that if you want to add descriptive metadata to your data then a certain amount of information is expected. But a completely different use of the tool is possible, it's up to you.
There are two levels of definition files as shown the figure below:
1 - The first level concerns the definition of terminology (metadata) similar to a descriptive metadata plan. Clearly, this category is more akin to configuration files. They represent the heart of the application around which everything else is based. The input and search interfaces are completely generated from these definition files (especially the web/conf/config_terms.txt file), thus defining each of the fields, their input type (checkbox, dropbox, textbox, ...) and the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms). This is why a configuration step is essential in order to be able to configure all the other modules.
2 - The second level concerns the definitions of the mapping to a differently structured metadata schema (metadata crosswalk, i.e a specification for mapping one metadata standard to another), used either i) for metadata export to a remote repository (e.g. Dataverse, Zenodo) or ii) for metadata harvesting (e.g. JSON-LD, OAI-PMH). Simply place the definition files in the configuration directory (web/conf) for them to be taken into account, provided you have adjusted the configuration (See Settings).
All definition files are made using a simple spreadsheet then exported in TSV format.
The list of definition files in Maggot are given below. All must be put under the directory web/conf.
See an example on line : https://pmb-bordeaux.fr/maggot/config/view and the corresponding form based on these definition files.
"},{"location":"definitions/config_terms/","title":"Terminlogy Definition","text":""},{"location":"definitions/config_terms/#example-of-a-terminlogy-definition-file","title":"Example of a Terminlogy Definition file","text":"Field Section Required Search ShortView Type features Label Predefined terms title definition Y N 1 textbox width=350px Short name fulltitle definition Y Y 2 textbox Full title subject definition Y Y checkbox open=0 Subject Agricultural Sciences,Arts and Humanities,Astronomy and Astrophysics,Business and Management,Chemistry,Computer and Information Science,Earth and Environmental Sciences,Engineering,Law,Mathematical Sciences,Medicine Health and Life Sciences,Physics,Social Sciences,Other description definition Y Y areabox rows=6,cols=30 Description of the dataset note definition N Y areabox rows=4,cols=30 Notes status status N Y 3 dropbox width=350px Status of the dataset Processed,In progress,Unprocessed access_rights status N Y 4 dropbox width=350px Access rights to data Public,Mixte,Private language status N Y checkbox open=0 Language Czech,Danish,Dutch,English,Finnish,French,German,Greek,Hungarian,Icelandic,Italian,Lithuanian,Norwegian,Romanian,Slovenian,Spanish,Swedish lifeCycleStep status N Y multiselect autocomplete=lifecycle,min=1 Life cycle step license status N Y textbox autocomplete=license,min=1 License datestart status N Y datebox width=350px Start of collection dateend status N Y datebox width=350px End of collection dmpid status N Y textbox DMP identifier contacts management Y Y multiselect autocomplete=people,min=1 Contacts authors management Y Y multiselect autocomplete=people,min=1 Authors collectors management N Y multiselect autocomplete=people,min=1 Data collectors curators management N Y multiselect autocomplete=people,min=1 Data curators members management N Y multiselect autocomplete=people,min=1 Project members leader management N Y multiselect autocomplete=people,min=1 Project leader wpleader management N Y multiselect autocomplete=people,min=1 WP leader depositor management N Y textbox Depositor producer management N Y multiselect autocomplete=producer,min=1 Producer grantNumbers management N Y multiselect autocomplete=grant,min=1 Grant Information kindOfData descriptors Y Y checkbox open=0 Kind of Data Audiovisual,Collection,Dataset,Event,Image,Interactive Resource,Model,Physical Object,Service,Software,Sound,Text,Workflow,Other keywords descriptors N Y multiselect autocomplete=bioportal,onto=EFO:JERM:EDAM:MS:NMR:NCIT:OBI:PO:PTO:AGRO:ECOCORE:IOBC:NCBITAXON Keywords topics descriptors N Y multiselect autocomplete=VOvocab Topic Classification dataOrigin descriptors N Y checkbox open=0 Data origin observational data,experimental data,survey data,analysis data,text corpus,simulation data,aggregate data,audiovisual corpus,computer code,Other experimentfactor descriptors N Y multiselect autocomplete=vocabulary,min=1 Experimental Factor measurement descriptors N Y multiselect autocomplete=vocabulary,min=1 Measurement type technology descriptors N Y multiselect autocomplete=vocabulary,min=1 Technology type publication_citation descriptors N Y areabox rows=5,cols=30 Publication - Citation publication_idtype descriptors N Y dropbox width=200px Publication - ID Type -,ark,arXiv,bibcode,doi,ean13,eissn,handle,isbn,issn,istc,lissn,lsid,pmid,purl,upc,url,urn publication_idnumber descriptors N Y textbox width=400px Publication - ID Number publication_url descriptors N Y textbox Publication - URL comment other N Y areabox rows=15, cols=30 Additional information"},{"location":"definitions/dataverse/","title":"Dataverse Definition File","text":"Open source research data repository software, approved by Europe.
"},{"location":"definitions/dataverse/#dataverse-definition-file_1","title":"Dataverse definition File","text":"This definition file will allow Maggot to automatically export the dataset into a data repository based on Dataverse. The approach consists of starting from the Maggot metadata file in JSON format and transforming it into another JSON format compatible with Dataverse, knowing that this metadata crosswalk was made possible by choosing the right metadata schema at upstream.
The structure of the Dataverse JSON output file being known internally, a minimum of information is therefore necessary to carry out the correspondence.
The file must have 4 columns with headers defined as follows:
Below an example of Dataverse definition file (TSV)
Example of Dataverse JSON file generated based on the definition file itself given as an example above.
This definition file will allow harvesters to collect structured metadata based on a semantic schema, i.e the fields themselves and not just their content can be associated with a semantic definition (ontology for example) which will then facilitate the link between the metadata and therefore the data (JSON-LD). The chosen semantic schema is based on several metadata schemas.
The full workflow to \"climb the Link Open Data mountain\" is resumed by the figure below :
Metadata schemas used to build the model proposed by default:
Definition of the JSON-LD context using the metadata schemas proposed by default
The structure of the JSON-LD is not known internally, information on the structure will therefore be necessary to carry out the correspondence.
Example of JSON-LD definition file (partial) using the metadata schemas proposed by default (TSV)
Example of JSON-LD file generated based on the definition file itself given as an example above.
The mapping file is used as indicated by its name to match a term chosen by the user during entry with another term from an ontology or a thesaurus and therefore to obtain a URL which will be used for referencing. It can be used for each metadata crosswalk requiring such a mapping (e.g. to the Dataverse, Zenodo or JSON-LD format).
The role of this definition file is illustrated with the figure above
The file must have 5 columns with headers defined as follows:
Below an example of Mapping definition file (TSV)
"},{"location":"definitions/oai-pmh/","title":"OAI-PMH Definition File","text":"
OAI-PMH is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives.
"},{"location":"definitions/oai-pmh/#oai-pmh-definition-file_1","title":"OAI-PMH definition File","text":"This definition file will allow harvesters to collect metadata structured according to a standard schema (OAI-DC).
Based on the Open Archives Initiative Protocol for Metadata Harvesting - Version 2
Example of a OAI-PMH Data Provider Validation
Example of OAI-PMH output for a dataset
The structure of the OAI-PMH output file being known internally, a minimum of information is therefore necessary to carry out the correspondence.
Example of OAI-PMH definition file (TSV)
Another example of OAI-PMH definition file (TSV) with identifers & vocabulary mapping
"},{"location":"definitions/terminology/","title":"Terminology","text":""},{"location":"definitions/terminology/#definition-of-terminology","title":"Definition of terminology","text":"There are two definition files to set up.
Each time there is a change in these two definition files, it is necessary to convert them so that they are taken into account by the application.
Terminology is the set of terms used to define the metadata of a dataset. A single file (web/conf/config_terms.txt) contains all the terminology. The input and search interfaces (e.g screenshot) are completely generated from this definition file, thus defining i) each of the fields, their input type (checkbox, dropbox, textbox, ...) and ii) the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms).
The metadata schema proposed by defaut is mainly established according to the DDI (Data Documentation Initiative) schema that also corresponds to that adopted by the Dataverse software.
Terminology is organised in several sections. By default 6 sections are proposed, but you can redefine them as you wish:
For each section, fields are then defined. These fields can be defined according to the way they will be entered via the web interface. There are 6 different types of input: check boxes (checkbox), drop lists (dropbox), single-line text boxes (textbox), single-line text boxes with an additional box for multiple selection from a catalog of terms (multiselect), date picker (datebox) and multi-line text boxes (areabox).
For two types (checkbox and dropbox), it is possible to define the values to be selected (predefined terms).
"},{"location":"definitions/terminology/#structure-of-the-terminology-definition-file-tsv","title":"Structure of the Terminology definition file (TSV)","text":"The file must have 9 columns with headers defined as follows:
column 9 - Predefined terms : for fields defined with a type equal to checkbox or dropbox, one can give a list of terms separated by a comma.
Notes
Below an example of Terminology definition file (TSV)
Example of Maggot JSON file generated based on the same definition file
The documentation definition file is used to have online help for each field (small icon placed next to each label on the form). So it should only be modified when a field is added or deleted, or moved to another section. This file will be used then to generate the online metadata documentation according to the figure below (See Configuration to find out how to carry out this transformation).
The file must have 3 columns with headers defined as follows:
Below an example of Terminology documentation file (TSV)
Same example as above converted to HTML format using Markdown format
1 - Vocabulary based on a list of terms fixed in advance (checbox with feature open=0)
2 - Vocabulary open for addition (checkbox with feature open=1)
3 - Vocabulary based on a web API in a text field (textbox)
4 - Vocabulary based on a dictionary with multiple selection (multiselect)
5 - Vocabulary based on a SKOSMOS Thesaurus with multiple selection (multiselect)
6 - Vocabulary based on an OntoPortal with multiple selection (multiselect)
"},{"location":"definitions/zenodo/","title":"Zenodo Definition File","text":"
Open source research data repository software, approved by Europe.
"},{"location":"definitions/zenodo/#zenodo-definition-file_1","title":"Zenodo definition File","text":"This definition file will allow Maggot to automatically export the dataset into a data repository based on Zenodo. The approach consists of starting from the Maggot metadata file in JSON format and transforming it into another JSON format compatible with Zenodo.
The structure of the Zenodo JSON output file is not known internally, information on the structure will therefore be necessary to carry out the correspondence.
Below an example of Zenodo definition file (TSV)
Example of Zenodo JSON file generated based on the definition file itself given as an example above.
Using an approach that might be called \u201cmachine-readable metadata,\u201d it is possible to populate metadata for a dataset into one of the proposed data repositories via its web API, provided that you have taken care to correctly define your metadata schema so that it is possible to make a correspondence with the chosen data repository using a mapping definition file.
The principle is illustrated by the figure above.
1 - To submit metadata to a Dataverse repository, you must first select a dataset either from the drop-down list corresponding to the datasets listed on the data storage space or a metadata file from your local disk.
2 - You then need to connect to the repository in order to retrieve the key (the API token) authorizing you to submit the dataset. This obviously assumes that you have the privileges (creation/modification rights) to do so.
3 - After choosing the repository URL, you must also specify on which dataverse collection you want to deposit the datasets. As previously, you must have write rights to this dataverse collection.
If you also want to deposit data files at the same time as the metadata, you will need:
1 - declare the files to be deposited in the resources; these same files must also be present in the storage space.
2 - create a semaphore file (META_datafile_ok.txt); its sole presence, independently of its content (which may be empty) will authorize the transfer. Indeed, the creation of such a file guarantees that the user has actually write rights to the storage space corresponding to his dataset. This prevents someone else from publishing the data without having the right to do so. This mechanism also avoids having to manage user accounts on Maggot.
1 - To submit metadata to a Zenodo repository, you must first select a dataset either from the drop-down list corresponding to the datasets listed on the data storage space or a metadata file from your local disk.
2 - Unless you have previously saved your API token, you must create a new one and copy and paste it before validating it. Before validating, you must check the deposit:access and deposit:write boxes in order to obtain creation and modification rights with this token.
3 - After choosing the repository URL, you can optionally choose a community to which the dataset will be linked. By default, you can leave empty this field.
"},{"location":"publish/zenodo/#deposit-data-files","title":"Deposit data files","text":"
If you also want to deposit data files at the same time as the metadata, you will need (see figure below)
1 - declare the files to be deposited in the resources (1) ; these same files must also be present in the storage space.
2 - create a semaphore file (META_datafile_ok.txt) (2); its sole presence, independently of its content will authorize the transfer. Indeed, the creation of such a file guarantees that the user has actually write rights to the storage space corresponding to his dataset. This prevents someone else from publishing the data without having the right to do so. This mechanism also avoids having to manage user accounts on Maggot.
Then, all you have to do is click on 'Publish' to \"push\" the metadata and data to the repository (3).
After submission and if everything went well, a link to the deposit will be given to you (4).
"},{"location":"tutorial/","title":"Quick tutorial","text":""},{"location":"tutorial/#quick-tutorial_1","title":"Quick tutorial","text":"
This is a quick tutorial of how to use the Maggot tool in practice and therefore preferably targeting the end user.
See a short Presentation and Poster if you want to have a more general overview of the tool.
"},{"location":"tutorial/#overview","title":"Overview","text":"The Maggot tool is made up of several modules, all accessible from the main page by clicking on the corresponding part of the image as shown in the figure below:
Configuration
This module mainly concerns the data manager and makes it possible to construct all the terminology definition files, i.e. the metadata and sources of associated vocabularies. See Definition files then Configuration.
Private AccessThis module allows data producer to temporarily protect access to metadata for the time necessary before sharing it within his collective. See Private access key management.
DictionariesThis module allows data producer to view content of all dictionaries. It also allows data steward to edit their content. See Dictionaries for technical details only.
Metadata EntryThis is the main module allowing the data producer to enter their metadata relating to a dataset. See the corresponding tutorial for Metadata Entry.
Search datasetsThis module allows users to search datasets based on the associated metadata, to see all the metadata and possibly to have access to the data itself. This obviously assumes that the metadata files have been deposited in the correct directory in the storage space dedicated to data management within your collective. See Infrastructure.
File BrowserThis module gives users access to a file browser provided that the data manager has installed it. See File Browser
PublicationThis module allows either the data producer or the data steward to publish the metadata with possibly the corresponding data within the suitable data repository. See Publication
"},{"location":"tutorial/describe/","title":"Quick tutorial","text":""},{"location":"tutorial/describe/#metadata-entry","title":"Metadata Entry","text":"The figures are given here for illustration purposes but certain elements may be different for you given that this will depend on the configuration on your instance, in particular the choice of metadata, and the associated vocabulary sources.
Indeed, the choice of vocabulary sources (ontologies, thesauri, dictionaries) as well as the choice of metadata fields to enter must in principle have been the subject of discussion between data producers and data manager during the implementation of the Maggot tool in order to find the best compromise between the choice of sources and all the scientific fields targeted (see Definition files). However a later addition is always possible.
"},{"location":"tutorial/describe/#overview","title":"Overview","text":"When you enter the metadata entry module you should see a page that looks like the figure below:
All the fields (metadata) to be filled in are distributed between several tabs, also called sections. Each section tries to group together a set of fields relating to the same topic.
You can reload a previously created metadata file. All form fields will then be initialized with the value(s) defined in the metadata file.
You must at least complete the mandatory fields marked with a red star.
It is possible to obtain help for each field to be completed. A mini-icon with a question mark is placed after each field label. By clicking on this icon, a web page opens with the focus on the definition of the corresponding field. This help should provide you with at least a definition of a field and, if necessary, instructions on how to fill it in. It should be noted that the quality of the documentation depends on each instance and its configuration.
Once the form has been completed, even partially (at least those which are mandatory and marked with a red star), you can export your metadata in the form of a file. See Metadata File
Dictionary-based metadata (e.g. people's names) can easily be entered by autocomplete in the 'Search value' box provided the name appears in the corresponding dictionary.
However, if the name does not yet appear in the dictionary, simply enter the full name (first name & last name) in the main box, making sure to separate each name with a comma and then a space as shown in the figure below.
Then you can request to add the additional person name(s) to the dictionary later as described below:
From the home page, select \"Dictionaries\". As username, just put \"maggot\" (this might be different within your instance).
Then after choosing the \"people\" dictionary, you can download the entire dictionary in a TSV file (Tab-Separated Values) ready to be edited with your favorite spreadsheet.
Add all the desired people's names with their institution, and possibly their ORCID and their email address. Please note that emails are required for authors and contacts
You will then just have to send it to the data manager so that he can add new people's names to the online dictionary.
Please proceed in the same way for all dictionaries (people, funders, producer, vocabulary)
"},{"location":"tutorial/describe/#controlled-vocabulary","title":"Controlled Vocabulary","text":"Depending on the configuration of your instance, it is very likely that certain fields (eg. keywords) are connected to a controlled vocabulary source (e.g. ontology, thesaurus). Vocabulary based on ontologies, thesauri or even dictionaries can easily be entered by autocomplete in the \"search for a value\" box provided that the term exists in the corresponding vocabulary source.
If a term cannot be found by autocomplete, you can enter the term directly in the main box, making sure to separate each term with a comma and a space as shown in the figure below.
The data steward will later try to link it to a vocabulary source that may be suitable for the domain in question. Furthermore, even if the choice of vocabulary sources was made before the tool was put into service, a later addition is always possible. You should make the request to your data manager.
"},{"location":"tutorial/describe/#resources","title":"Resources","text":"Because data is often scattered across various platforms, databases, and file formats, this making it challenging to locate and access. This is called data fragmentation. So the Maggot tool allows you to specify resources, i.e. data in the broader sense, whether external or internal, allowing to centralize all links towards data.
Four fields must be filled in :
Resource Type : Choose the type of the resource in the droplist.
Media Type : Choose a media type if applicable by autocomplete.
Description : Provide a concise and accurate description of the resource. Must not exceed 30 characters.
Location : Preferably indicate an URL to an external resource accessible to all. But it can also be a password-protected resource (e.g. a disk space on the cloud). This can also be text clearly indicating where the resource is located (internal disk space). Finally, this can be the name of a file deposited on the same disk space as the metadata file, in order to be able to push it in the data repository at the same time as the metadata (see Publication).
Once the form has been completed, even partially (at least those which are mandatory and marked with a red star), you can export your metadata in the form of a file. The file is in JSON format and must have the prefix 'META_'.
By clicking on the \"Generate the metadata file\" button, you can save it on your disk space.
Furthermore, if email sending has been configured (see settings), then you have the possibility of sending the metadata file to the data managers for conservation, and possibly also for supporting its storage on data disk space if specific rights are required.
In case you want to save the metadata file on your disk space, you have two ways to use this file:
1. The first use is the recommended one because it allows metadata management within your collective.You drop the metadata file directly under the data directory corresponding to the metadata. Indeed, when installing the tool, a storage space dedicated to the tool had to be provided for this purpose. See infrastructure. Once deposited, you just have to wait around 30 minutes maximum so that the tool has had time to scan the root of the data directories looking for new files in order to update the database. After this period, the description of your dataset will be visible from the interface, and a selection of criteria will be made in order to restrict the search.
You will then have the possibility to publish the metadata later with possibly the corresponding data in a data repository such as Dataverse or Zenodo.
2. The second use is only to deposit the metadata into a data repositoryWhether with Dataverse or Zenodo, you have the possibility of publishing metadata directly in one or other of these repositories without using the storage space.
Please note that you cannot also deposit the data files in this way. You will have to do this manually for each of them directly online in the repository.
"}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\\\s\\\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":"An ecosystem for sharing metadata
"},{"location":"#foster-good-data-management-with-data-sharing-in-mind","title":"Foster good data management, with data sharing in mind","text":"
Sharing descriptive Metadata is the first essential step towards Open Scientific Data. With this in mind, Maggot was specifically designed to annotate datasets by creating a metadata file to attach to the storage space. Indeed, it allows users to easily add descriptive metadata to datasets produced within a collective of people (research unit, platform, multi-partner project, etc.). This approach fits perfectly into a data management plan as it addresses the issues of data organization and documentation, data storage and frictionless metadata sharing within this same collective and beyond.
"},{"location":"#main-features-of-maggot","title":"Main features of Maggot","text":"The main functionalities of Maggot were established according to a well-defined need (See Background).
See a short Presentation and Poster for a quick overview.
"},{"location":"#overview-of-the-different-stages-of-metadata-management","title":"Overview of the different stages of metadata management","text":"Note: The step numbers indicated in the figure correspond to the different points developed below
1 - First you must define all the metadata that will be used to describe your datasets. All metadata can be defined using a single file (in TSV format, therefore using a spreadsheet). This is a unavoidable step because both input and search interfaces are completely generated from these definition files, defining in this way each of the fields along with their input type and also the associated Controlled Vocabulary (ontology, thesaurus, dictionary, list of fixed terms). The metadata proposed by default was mainly established according to the DDI (Data Documentation Initiative) metadata schema. This schema also largely corresponds to that adopted by the Dataverse software. See the Terminology Definition section.
2 - Entering metadata will be greatly facilitated by the use of dictionaries. The dictionaries offered by default are: people, funders, data producers, as well as a vocabulary dictionary allowing you to mix ontologies and thesauri from several sources. Each of these dictionaries allows users, by entering a name by autocompletion, to associate information which will then be added when exporting the metadata either to a remote repository, or for harvesting the metadata. Thus this information, once entered into a dictionary, will not need to be re-entered again.
3 - The web interface for entering metadata is entirely built on the basis of definition files. The metadata are distributed according to the different sections chosen, each constituting a tab (see screenshot). Mandatory fields are marked with a red star and must be documented in order to be able to generate the metadata file. The entry of metadata governed by a controlled vocabulary is done by autocompletion from term lists (dictionary, thesaurus or ontology). We can also define external resources (URL links) relating to documents, publications or other related data. Maggot thus becomes a hub for your datasets connecting different resources, local and external. Once the mandatory fields (at least) and other recommended fields (at best) have been entered, the metadata file can be generated in JSON format.
4 - The file generated in JSON format must be placed in the storage space reserved for this purpose. The role played by this metadata file can be seen as a README file adapted for machines, but also readable by humans. With an internal structure, it offers coherence and consistency of information that a simple README file with a completely free and therefore unstructured text format does not allow. Furthermore, the central idea is to use the storage space as a local data repository, so that the metadata should go to the data and not the other way around.
5 - A search of the datasets can thus be carried out on the basis of the metadata. Indeed, all the JSON metadata files are scanned and parsed according to a fixed time interval (30 min) then loaded into a database. This allows you to perform searches based on predefined metadata. The search form, in a compact shape, is almost the same as the entry form (see a screenshot). Depending on the search criteria, a list of data sets is provided, with for each of them a link pointing to the detailed sheet.
6 - The detailed metadata sheet provides all the metadata divided by section. Unfilled metadata does not appear by default. When a URL can be associated with information (ORCID, Ontology, web site, etc.), you can click on it to go to the corresponding link. Likewise, it is possible to follow the associated link on each of the resources. From this sheet, you can also export the metadata according to different schemata (Dataverse, Zenodo, JSON-LD). See screenshot 1 & screenshot 2.
7 - Finally, once you have decided to publish your metadata with your data, you can choose the repository that suits you (currently repositories based on Dataverse and Zenodo are supported).
"},{"location":"#additional-key-points","title":"Additional key points","text":"Being able to generate descriptive metadata from the start of a project or study without waiting for all the data to be acquired or processed, nor for the moment when one wish to publish data, thus respecting the research data lifecycle as best as possible. Read more.
The implementation of the tool requires involving all data stakeholders upstream (definition of the metadata schema, vocabularies, targeted data repositories, etc.); everyone has their role: data manager/data steward on one side but also scientists and data producers on the other. Read more.
A progressive rise towards an increasingly controlled and standardized vocabulary is not only possible but even encouraged. First we can start with a simple vocabulary dictionary used locally and grouping together domain vocabularies. Then we can consider the creation of a thesaurus with or without mapping to ontologies. The promotion of ontologies must also be done gradually by selecting those which are truly relevant for the collective. A tool like Maggot makes it easy to implement them (See Vocabulary). Read more.
Concerning the second idea: Given the diversity of the fields, the approach chosen is to be both the most flexible and the most pragmatic possible by allowing users to choose their own vocabulary (controlled or not) corresponding to the reality of their field and their activities. However, a good approach is as much as possible to use only controlled vocabulary, that is to say relevant and sufficient vocabulary used as a reference in the field concerned to allow users to describe a project and its context without having to add additional terms. To this end, the tool must allow users a progressive approach towards the adoption of standardized controlled vocabularies (thesauri or even ontologies).
With the approach proposed by Maggot, initially there is no question of opening the data, but of managing metadata associated with the data on a storage space with a precise perimeter represented by the collective (unit, team, project , platform, \u2026). The main characteristic of the tool is, above all, to \u201ccapture\u201d the metadata as easily as possible according to a well-chosen metadata schema. However, the opening of data via their metadata must be a clearly stated objective within the framework of projects financed by public institutions (e.g Europe). Therefore if you have taken care to correctly define your metadata schema so that it is possible to make a metadata crosswalk (using a mapping file) with a data repository recognized by the international community, then you can easily \"push\" its metadata with the data without having re-enter anything.
Daniel Jacob, Francois Ehrenmann, Romain David, Joseph Tran, Cathleen Mirande-Ney, Philippe Chaumeil (2024) Maggot: An ecosystem for sharing metadata within the web of FAIR Data, BioRxiv, https://doi.org/10.1101/2024.05.24.595703
"},{"location":"about/#contacts","title":"Contacts","text":"Daniel Jacob (INRAE UMR BFP) | CATI PROSODIe
Fran\u00e7ois Ehrenmann (INRAE UMR BioGECO) | CATI GEDEOP
Philippe Chaumeil (INRAE UMR BioGECO)
Edouard Guitton (INRAE Dept. SA, Emerg'IN)
St\u00e9phane Bernillon (INRAE UR MycSA)
Joseph TRAN (INRAE UMR EGFV) | CATI BARIC
"},{"location":"bloxberg/","title":"Bloxberg Blockchain","text":""},{"location":"bloxberg/#experimental-certification-of-metadata-file-on-the-bloxberg-blockchain","title":"EXPERIMENTAL - Certification of metadata file on the bloxberg blockchain","text":""},{"location":"bloxberg/#motivation","title":"Motivation","text":"
To guarantee the authenticity and integrity of a metadata file by recording it permanently and immutably on the bloxberg blockchain.
Indeed, the blockchain is a technology that makes it possible to keep track of a set of transactions (writings in the blockchain), in a decentralized, secure and transparent manner, in the form of a blockchain. A blockchain can therefore be compared to a large (public or private) unfalsifiable register. Blockchain is today used in many fields because it provides solutions to many problems. For example in the field of Higher Education and Research, registration of dataset metadata in the blockchain, makes possible in this way to certify, in an inalienable, irrefutable and completely transparent manner, the ownership and authenticity of the data as well as for example, the license of use and the date of production of the data. Research stakeholders are then more open to the dissemination of their data (files, results, protocols, publications, etc.) since they know that, in particular, the ownership, content and conditions of use of the data cannot not be altered.
The Maggot tool could thus serve as a gateway to certify its data with the associated metadata. The complete process is schematized by the following figure:
"},{"location":"bloxberg/#about-bloxberg","title":"About bloxberg","text":"
bloxberg is the most important blockchain project in science. It was founded in 2019 by MPDL , looking for a way to store research results and make them available to other researchers. In this sense, bloxberg is a decentralized register in which results can be stored in a tamper-proof way with a time stamp and an identifier.
bloxberg is based on the Ethereum Blockchain. However, it makes use of a different consensus mechanism: instead of \u201cProof of Stake\u201d used by Ethereum since 2022, bloxberg validates blocks through \u201cProof of Authority\u201d. Each node is operated by one member. All members of the association are research institutions and are known in the network. Currently, bloxberg has 49 nodes. It is an international project with participating institutions from all over the world.
"},{"location":"bloxberg/#how-to-process","title":"How to process ?","text":"You will need a Ethereum address and an API key (must be requested via bloxberg-services (at) mpdl.mpg.de). See an example of pushing a metadata file to the bloxberg blockchain using Maggot.
"},{"location":"bloxberg/#useful-links","title":"Useful links","text":"A single file (web/conf/config_terms.txt) contains all the terminology. The input and search interfaces are completely generated from this definition file, thus defining each of the fields, their input type (checkbox, dropbox, textbox, ...) and the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms). This is why a configuration and conversion step into JSON format is essential in order to be able to configure all the other modules (example: creation of the MongoDB database schema when starting the application before filling it).
This function is used to generate the terminology definition file in JSON format (config_terms.json) and the corresponding JSON-Schema file (maggot-schema.json) from a tabulated file (1). You can either create a terminology definition file in TSV format from scratch (see below to have more details), or extract the file from the current configuration (see JSON to TSV).
Once the terminology definition file has been obtained (2), you can load it and press 'Submit'.
Three files are generated (3 & 5):
This function generates the markdown documentation file (doc.md) from the template file (config_doc.txt) which is itself generated from the metadata definition file (config_terms.txt, cf TSV to JSON).
Once the template file for the documentation (config_doc.txt) has been edited and documented (6) (see below to have more details), you can load it and press Submit button.
The documentation file in markdown format (doc.md) is thus generated (7) and must be placed in the web/docs directory (8). Users will have access to this documentation file via the web interface, in the documentation section, heading \"Metadata\".
sh ./run passwd <user>\n
Like any dictionary, there must be 3 files (see below). Please note that the names of these files must always contain the name of the dictionary, i.e. same as the directory.
The format of the file containing the dictionary data (people.txt) is defined by another file (people_format.txt).
var people = [];\n// Each item in the 'people' list consists of the first two columns (0,1) separated by a space\nget_dictionary_values('people', merge=[0,' ',1])
Below, an example is given when modifying a record. When you click on the Institute field which is connected to the ROR web API, the drop-down list of reseach organizations that can correspond in the register appears, if there are any.
Note: It is possible to edit dictionaries, by adding an entry for example, and at the same time be able to immediately find this new entry in the metadata entry in the Maggot tool. Indeed each dictionary is reloaded into memory as soon as the corresponding input box is clicked. See an illustration.
Funders : The dictionary of the funders allows you to define the funding agency, project ID and its corresponding URL.
Producers : The dictionary of the data producers allows you to define their Institute and project ID and their corresponding URL. Optionally, you can add the URL of the logo.
Vocabulary : Use this dictionary for mixing thesauri and ontologies in order to better target the entire controlled vocabulary of its field of application. Only the vocabulary is mandatory, the URL linked to an ontology or a thesaurus is optional. See Vocabulary section to learn the extent of the possibilities concerning vocabulary in Maggot.
The necessary Infrastructure involves 1) a machine running a Linux OS and 2) a dedicated storage space.
1 - The machine will most often be of \"virtual\" type because more simpler to deploy, either locally (with VM providers such as VirtualBox, VMware Workstation or MS Hyper-V) or remotely (e.g VMware ESXi, Openstack: example of deployment). Moreover, the OS of your machine must allow you the deployment of docker containers. See for more details on \u201cWhat is Docker\u201d. The minimum characteristics of the VM are: 2 cpu, 2 Go RAM, 8 Go HD.
2 - The dedicated storage space could be either in the local space of the VM, or in a remote place on the network.
Requirements: The installation must be carried out on a (virtual) machine with a recent Linux OS that support Docker (see Infrastructure)
"},{"location":"installation/#retrieving-the-code","title":"Retrieving the code","text":"Go to the destination directory of your choice then clone the repository and cd
to your clone path:
git clone https://github.com/inrae/pgd-mmdt.git pgd-mmdt\ncd pgd-mmdt\n
"},{"location":"installation/#installation-of-docker-containers","title":"Installation of Docker containers","text":"MAGGOT uses 3 Docker images for 3 distinct services:
See Configuration settings
Warning : You have to pay attention to put the same MongoDB settings in all the above configuration files. It is best not to change anything. It would have been preferable to put a single configuration file but this was not yet done given the different languages involved (bash, javascript, python, PHP). To be done!
Note : If you want to run multiple instances, you will need to change in the run file, i) the container names, ii) the data path, iii) the MongoDB volume name and iv) the MongoDB port
The following two JSON files are defined by default but can be easily configured from the web interface. See the Terminology Configuration section.
The run shell script allows you to perform multiple actions by specifying an option :
cd pgd-mmdt\nsh ./run <option>\n
Options:
You must first build the 3 docker container images if this has not already been done, by :
sh ./run build\n
The application can be sequentially started :
sh ./run start\n
sh ./run initdb\n
sh ./run scan\n
You can also launch these 3 steps with a single command:
sh ./run fullstart\n
Once the application is started, we can see if the containers are started using the following command:
docker ps -a\n
which should produce a result similar to the following:
\n CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES\n 5914504f456d pgd-mmdt-web \"docker-php-entrypoi.\" 12 seconds ago Up 10 seconds 0.0.0.0:8087->80/tcp, :::8087->80/tcp mmdt-web\n 226b13ed9467 pgd-mmdt-scan \"cron -f\" 12 seconds ago Up 11 seconds mmdt-scan\n 81fecbb56d23 pgd-mmdt-db \"docker-entrypoint.s.\" 13 seconds ago Up 12 seconds 27017/tcp mmdt-db\n
On the first line, the one which corresponds to the web interface, we see that port 80 of the docker is exported to port 8087 of the VM. Let's say that the IP address of your VM is 192.168.56.2, then in your browser you will need to put the URL http://192.168.56.2:8087/. You can of course change the port number in the 'run' file.
It may be preferable to use a lightweight http server like nginx so that the Maggot URL will be http://192.168.56.2/maggot/. Below an example of config:
## /etc/nginx/nginx.conf\nhttp {\n\n...\n upstream maggot { server 127.0.0.1:8087; }\n...\n\n}\n\n## /etc/nginx/conf.d/my-site.conf\n\nserver {\nlisten 80 default;\nserver_name $host;\n\n...\n\n location /maggot/ {\nproxy_set_header Host $host;\nproxy_set_header X-App-Name 'maggot';\nproxy_set_header X-Real-Ip $remote_addr;\nproxy_set_header X-Forwarded-Host $host;\nproxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\nproxy_pass http://maggot/;\n}\n\n...\n\n}\n
sh ./run stop\n
When updating the application, it is imperative to preserve a whole set of configuration files as well as the content of certain directories (dictionaries, javascripts dedicated to vocabularies, etc.). An update script is available (./etc/update-maggot.sh) preferably placed under '/usr/local/bin'. To preserve your configuration, it is recommended to create local configuration files.
A first file 'local.conf' will contain all the parameters to be preserved, initially contained in the 'run' file. A small example could be as follow :
#!/bin/bash\n\n# Local HTTP Port for web application\nWEB_PORT=8088\n\n# Path to the data\nDATADIR=/media/Workdir/Share/DATA/\n
A second file './web/inc/config/local.inc' will contain all the parameters to be preserved, initially contained in the './web/inc/config/config.inc' file. A small example could be as follow :
<?php\n\n# Main title\n$TITLE ='Metadata management - My Labs';\n$MAINTITLE =$TITLE;\n\n# File Browser\n$FILEBROWSER=1;\n$URL_FILEBROWSER='/fb/';\n\n# Enable some functionalities\n$export_oai = 1;\n\n?>\n
Note: See how to do proceed for configuration steps.
"},{"location":"installation/#file-browser","title":"File Browser","text":"You can provide access to your data via a file browser. This application must be installed separately but can be connected to Maggot by specifying the corresponding URL in the configuration file. Users and their rights are managed in the filebrowser application. Likewise, we can also create links to the data without a password. These links can be usefully specified as external resources in the metadata managed by Maggot.
See how to do install in github.
"},{"location":"private-access/","title":"Private access","text":""},{"location":"private-access/#private-access-key-management","title":"Private access key management","text":""},{"location":"private-access/#motivation","title":"Motivation","text":"Although the Maggot tool is designed to foster the sharing of metadata within a collective, it may be necessary to temporarily privatize access to the metadata of an ongoing project with confidentiality constraints. So even within our own collective, access to metadata must be restricted to authorized users only.
"},{"location":"private-access/#implementation","title":"Implementation","text":"The choice of not wanting to manage users in the Maggot tool was made in order to make the metadata completely open by default within a collective. Furthermore, access rights to the storage space are managed independently of the Maggot tool by the administrator of this space. It is therefore through the storage space that we must give or not access to the metadata via the web interface.
The chosen mechanism for privatizing access is described below. It has the dual advantage of being simple to implement and simple to use.
First we have to generate a file containing the encrypted key for a private access. This file must be generated from the web interface then downloaded as shown in the figure below. Then this file must be manually deposited in the data directory corresponding to the dataset whose access we wish to privatize. The presence of this file within a directory is enough to block access to metadata and data by default. It should be noted that we can put this same file containing the encrypted private key in several data directories (included within the same project for example). The deposit must be done by hand because the Maggot tool must only have access to the storage space in read mode. This also guarantees that the user has writing rights to this space without having to manage user accounts on the Maggot side.
By default, \u2018untwist1\u2019 metadata are not accessible to anyone
When we want to have access to the metadata of this dataset, we have to simply enter the private key in the current session. This will have the effect of unlocking access to the metadata via the web interface only in the current session of our web browser. This means that we will have to enter the private key for each session (by default, a session lasts a maximum of 1 hour).
Now the \u2018untwist1\u2019 metadata are accessible only to us
When we want to give access to the metadata to the entire collective, we simply need to delete the private access file (named by default 'META_auth.txt') from the concerned data directory.
Here is the list of all files that may be subject to adjustment of certain parameters according to the needs of the instance site.
"},{"location":"settings/#dockerscanpartscriptsconfigpy","title":"dockerscanpart/scripts/config.py","text":"This file defines the connection parameters to the Mongo database. Knowing that this database is only accessible internally, in principle they do not need to be changed.
Note: These settings must be the same as defined in dockerdbpart/initialisation/setupdb-js.template
Parameter Description Default value dbserver Name of the MongoDB server mmdt-db database Name of the MongoDB database pgd-db dbport Port of the MongoDB server 27017 username Username of the Mongo database pgd-db with Read/Write access userw-pgd password Password corresponding to the username of the Mongo DB pgd-db wwwww "},{"location":"settings/#incconfigmongodbinc","title":"inc/config/mongodb.inc","text":"This file defines the connection parameters to the Mongo database. Knowing that this database is only accessible internally, in principle they do not need to be changed.
Note: These settings must be the same as defined in dockerdbpart/initialisation/setupdb-js.template
Parameter Description Default value docker_mode Indicates whether the installation involves using docker containers. In this case, the Mongo DB IP address will be different from 127.0.0.1. 1 uritarget the Mongo DB IP address mmdt-db (docker_mode=1) or 127.0.0.1 (docker_mode=0) database Name of the MongoDB database pgd-db collection Name of the MongoDB collection metadata port Port of the MongoDB server 27017 username Username of the Mongo database pgd-db with Read access only userr-pgd password Password corresponding to the username of the Mongo DB pgd-db rrrrr "},{"location":"settings/#incconfigconfiginc","title":"inc/config/config.inc","text":"This file defines parameters related to i) the web interface, ii) the functionalities allowed for users. Only the parameters that could be useful to be changed for the needs of an instance are described here.
Parameter Description Default value EXTERN Indicates if the use of the tool is only for external use, i.e. without using a storage space. 0 PRIVATE_ACCESS Gives the possibility of managing private access to metadata 0 ZOOMWP Zoom level regarding the web interface. By reducing the size slightly, you get a better layout. 90% RESMEDIA Gives the possibility of putting a MINE type on each resource in the metadata. 1 TITLE Title to display in main banner Metadata management FILEBROWSER Indicates whether the file browser is used. This assumes it is installed. 0 URL_FILEBROWSER File browser URL. It can be absolute or relative. /fb/ APPNAME Name given in the URL to access the web interface. maggot dataverse_urls Array of Dataverse repository URLs where you can upload metadata and data - zenodo_urls Array of Zenodo repository URLs where you can upload metadata and data - SERVER_URL Default Dataverse repository URL https://entrepot.recherche.data.gouv.fr ZENODO_SERVER_URL Default Zenodo repository URL https://zenodo.org export_dataverse Indicates whether the Dataverse feature is enabled 1 export_zenodo Indicates whether the Zenodo feature is enabled 1 export_jsonld Indicates whether the JSON-LD feature is enabled 1 export_oai Indicates whether the OAI-PMH feature is enabled 0 export_bloxberg Indicates whether the Bloxberg Blockchain feature is enabled (Experimental) 0 cvdir Relative path of the Control Vocabulary Listes (cvlist) cvlist/ maggot_fulltitle Maggot name of the field corresponding to the title in dataverse/zenodo fulltitle auth_senddata_file Name of the file that must be present in the data directory to authorize the transfer of the data file META_datafile_ok.txt private_auth_file Name of the private access file META_auth.txt sendMail Configuring messaging for sending metadata to data managers (see below) NULLThe messaging configuration is done using the following array in the inc/config/config.inc file (or more judiciously in inc/config/local.inc in order to be preserved during an update) - To understand how it works see Send Emails using PHPmailer
$sendMail['smtpHost'] = 'smtp.example.org'; // Set the SMTP server to send through\n$sendMail['smtpSecure'] = 'tls'; // Enable TLS encryption\n$sendMail['smtpPort'] = 587; // Set the TCP port to connect to\n$sendMail['CheckEmail'] = 'maggot@exemple.org'; // Email address authorized to send emails\n$sendMail['CheckPass'] = 'password'; // The corresponding password\n$sendMail['CheckName'] = 'Maggot'; // Alias name\n$sendMail['UserEmail'] = 'admin@exemple.org'; // Email of data managers, separated by a comma\n
"},{"location":"settings/#run","title":"run","text":"This file contains the essential parameters to be set before any use.
Parameter Description Default value WEB_PORT Local HTTP Port for web application 8087 DATADIR Path to the data /opt/data/ DB_IMAGE Docker image name of the MongoDB pgd-mmdt-db SCAN_IMAGE Docker image name of the Scan process pgd-mmdt-scan WEB_IMAGE Docker image name of the Web interface pgd-mmdt-web DB_CONTAINER Docker container name of the MongoDB mmdt-db SCAN_CONTAINER Docker container name of the Scan process mmdt-scan WEB_CONTAINER Docker container name of the Web interface mmdt-web MONGO_VOL Volume name for MongoDB mmdt-mongodb MONGO_PORT HTTP Port for MongoDB 27017 USER Admin user in the htpasswd file admin "},{"location":"chats/chat1/","title":"chatGPT","text":""},{"location":"chats/chat1/#difference-between-data-manager-and-data-steward","title":"Difference Between Data Manager and Data Steward","text":""},{"location":"chats/chat1/#data-manager","title":"Data Manager","text":"Responsibilities:
Data Operations and Management:
Data Quality and Integrity:
Data Policies and Procedures:
Technical Oversight:
Performance Monitoring:
Skills:
Responsibilities:
Data Governance:
Data Quality Assurance:
Data Access and Security:
Liaison and Coordination:
Training and Support:
Skills:
Focus:
Scope of Work:
Technical vs. Governance:
Interaction with Stakeholders:
Data Generation:
Data Quality Assurance:
Data Documentation:
Adherence to Standards:
Data Security and Privacy:
Collaboration:
Foundation of Data Assets:
Enhancing Data Quality:
Supporting Data Governance:
Facilitating Data Integration:
Enabling Timely Insights:
Data Managers:
Data Stewards:
Data Consumers:
Time Constraints:
Lack of Expertise:
Inconsistent Practices:
Standardized Templates and Guidelines:
Training and Education:
Automated Documentation Tools:
Data Steward Support:
Integration into Workflow:
Incentives and Accountability:
Feedback Mechanism:
Documentation Checkpoints:
Collaboration Platforms:
Clear Roles and Responsibilities:
Chat generated by chatGPT 3.5 - May 24, 2024
"},{"location":"chats/chat2/","title":"chatGPT","text":""},{"location":"chats/chat2/#how-can-we-properly-respect-the-data-life-cycle-in-scientific-research-in-order-to-carry-out-good-data-management","title":"How can we properly respect the data life cycle in scientific research in order to carry out good data management?","text":"Properly respecting the data life cycle in scientific research is crucial for effective data management. The data life cycle encompasses the stages through which data passes, from its initial generation to its eventual archiving and potential reuse. Here\u2019s a comprehensive approach to managing the data life cycle in scientific research:
"},{"location":"chats/chat2/#1-planning","title":"1. Planning","text":"By adhering to these guidelines, researchers can ensure robust data management practices that enhance the reliability, reproducibility, and impact of scientific research.
"},{"location":"chats/chat2/#at-which-stages-of-the-data-life-cycle-is-it-most-crucial-to-document-data","title":"At which stages of the data life cycle is it most crucial to document data?","text":"Documentation is crucial at all stages of the data life cycle, but there are specific stages where it is particularly essential to ensure data integrity, reproducibility, and usability. Here are the stages where documentation is most critical:
"},{"location":"chats/chat2/#1-planning_1","title":"1. Planning","text":"By thoroughly documenting at these crucial stages, researchers can enhance the transparency, reproducibility, and overall quality of their data management practices.
Chat generated by chatGPT 3.5 - May 24, 2024
"},{"location":"chats/chat3/","title":"chatGPT","text":""},{"location":"chats/chat3/#what-are-the-steps-to-building-our-own-controlled-vocabulary","title":"What are the steps to building our own controlled vocabulary ?","text":"Creating a controlled vocabulary is a systematic process that involves several key steps. Here\u2019s a detailed guide to help you build your own controlled vocabulary:
"},{"location":"chats/chat3/#1-define-the-purpose-and-scope","title":"1. Define the Purpose and Scope","text":"By following these steps, you can develop a controlled vocabulary that improves information retrieval, enhances data consistency, and supports effective communication within your organization.
"},{"location":"chats/chat3/#what-are-the-differences-between-thesauri-ontologies-or-simple-vocabulary-dictionaries-in-which-cases-should-each-of-them-be-favored","title":"What are the differences between thesauri, ontologies or simple vocabulary dictionaries? In which cases should each of them be favored?","text":"Thesauri, ontologies, and simple vocabulary dictionaries are all tools for organizing and managing information, but they differ in complexity, structure, and use cases. Here\u2019s a breakdown of their differences and when to use each:
"},{"location":"chats/chat3/#thesauri","title":"Thesauri","text":"Definition: - A thesaurus is a controlled vocabulary that organizes terms into a structured hierarchy, showing relationships such as synonyms, antonyms, broader terms (BT), narrower terms (NT), and related terms (RT).
Structure: - Synonyms: Provides alternative words with similar meanings. - Hierarchical Relationships: Shows broader and narrower terms. - Associative Relationships: Connects related terms that are not hierarchically linked.
Use Cases: - Information Retrieval: Enhances search capabilities by providing alternative terms and related concepts. - Indexing and Cataloging: Standardizes terms for consistent categorization of information. - Library and Archival Systems: Organizes collections for easier access and discovery.
When to Use: - When you need to manage a large set of terms with hierarchical and associative relationships. - When you want to improve search functionality by including synonyms and related terms.
"},{"location":"chats/chat3/#ontologies","title":"Ontologies","text":"Definition: - An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. It includes not just the terms, but also the rules and constraints that govern their use.
Structure: - Concepts: Detailed definitions of domain-specific concepts. - Relationships: Various types of relationships between concepts, including hierarchical, associative, and more complex logical relationships. - Attributes: Properties and characteristics of concepts. - Rules and Constraints: Logical rules that define how concepts can interact.
Use Cases: - Semantic Web and AI: Facilitates data integration, sharing, and interoperability in web and AI applications. - Knowledge Representation: Provides a detailed and formalized structure for representing domain knowledge. - Complex Data Integration: Integrates heterogeneous data sources by providing a common framework.
When to Use: - When you need a highly detailed and formal representation of knowledge with complex relationships. - When integrating diverse data sources and ensuring semantic interoperability is crucial. - For advanced applications in artificial intelligence, natural language processing, and the Semantic Web.
"},{"location":"chats/chat3/#simple-vocabulary-dictionaries","title":"Simple Vocabulary Dictionaries","text":"Definition: - A simple vocabulary dictionary is a basic list of terms and their definitions, without structured relationships between terms.
Structure: - Terms: A list of words or phrases. - Definitions: Brief explanations or meanings of each term.
Use Cases: - Glossaries: Provides quick reference for understanding terms and definitions. - Basic Cataloging: Lists terms for simple categorization without the need for complex relationships. - Learning and Education: Helps learners understand basic concepts and terminology.
When to Use: - When you need a straightforward reference tool for terms and their definitions. - For simple documentation where relationships between terms are not necessary. - When building foundational knowledge without the need for detailed hierarchical or associative structures.
"},{"location":"chats/chat3/#summary","title":"Summary","text":"Choosing the right tool depends on your specific needs regarding the complexity of the relationships between terms, the level of detail required, and the intended application.
"},{"location":"chats/chat3/#concerning-descriptive-metadata-can-we-envisage-a-gradual-evolution-in-the-use-of-simple-dictionaries-then-moving-to-thesauri-or-even-ontologies","title":"Concerning descriptive metadata, can we envisage a gradual evolution in the use of simple dictionaries, then moving to thesauri, or even ontologies?","text":"Yes, it is both practical and beneficial to envisage a gradual evolution in the use of descriptive metadata, starting with simple dictionaries and progressing to thesauri and ontologies as the needs and complexity of your data management grow. Here\u2019s how you can approach this evolutionary path:
"},{"location":"chats/chat3/#stage-1-simple-vocabulary-dictionaries","title":"Stage 1: Simple Vocabulary Dictionaries","text":"Initial Implementation: - Scope: Begin with a simple vocabulary dictionary to establish basic terminology and definitions. - Purpose: Ensure consistency in the use of terms across your organization or system. - Structure: Create a straightforward list of terms with clear definitions.
Benefits: - Ease of Use: Simple to create and manage. - Foundation: Provides a foundational understanding of key terms. - Quick Implementation: Allows for immediate improvement in consistency and clarity of metadata.
Use Cases: - Early Project Stages: When starting a new project or system where the primary goal is to standardize terminology. - Training and Education: Helps new team members or users quickly understand the basic terms and definitions.
"},{"location":"chats/chat3/#stage-2-thesauri","title":"Stage 2: Thesauri","text":"Intermediate Implementation: - Scope: Expand your vocabulary to include hierarchical and associative relationships. - Purpose: Improve information retrieval and organization by showing broader, narrower, and related terms. - Structure: Develop a thesaurus that includes synonyms, broader terms (BT), narrower terms (NT), and related terms (RT).
Benefits: - Enhanced Search: Improves search accuracy by including synonyms and related concepts. - Better Organization: Provides a more structured and interconnected view of your terms. - Scalability: Allows for the gradual addition of new terms and relationships.
Use Cases: - Growing Projects: As your project or system grows and the need for more detailed metadata relationships emerges. - Information Retrieval Systems: When enhancing search functionalities and user experience is a priority.
"},{"location":"chats/chat3/#stage-3-ontologies","title":"Stage 3: Ontologies","text":"Advanced Implementation: - Scope: Develop a comprehensive and formal representation of your domain knowledge. - Purpose: Facilitate complex data integration, interoperability, and advanced reasoning. - Structure: Create an ontology with detailed concepts, relationships, properties, and logical rules.
Benefits: - Semantic Interoperability: Ensures data consistency and integration across different systems and platforms. - Advanced Reasoning: Supports sophisticated queries and inference capabilities. - Detailed Knowledge Representation: Provides a rich and nuanced understanding of domain concepts and their interrelationships.
Use Cases: - Complex Systems: For projects requiring detailed and formal knowledge representation, such as in AI, machine learning, and the Semantic Web. - Interoperable Data Systems: When integrating diverse data sources and ensuring seamless interoperability is essential.
"},{"location":"chats/chat3/#evolution-strategy","title":"Evolution Strategy","text":"Ensure all team members use consistent terminology.
Thesaurus:
Improve search functionality by adding synonyms and related terms.
Ontology:
By following this gradual evolution, you can systematically enhance your metadata management capabilities, ensuring your system remains adaptable and meets the growing complexity of your data and user needs.
Chat generated by chatGPT 3.5 - May 24, 2024
"},{"location":"chats/chat4/","title":"chatGPT","text":""},{"location":"chats/chat4/#what-is-metadata-crosswalk","title":"What is metadata crosswalk?","text":"A metadata crosswalk is a structured mapping between the metadata elements of two or more different systems or standards. Metadata, in this context, refers to descriptive information about data or resources, such as titles, authors, dates, keywords, etc.
When different systems or standards are used to manage data or resources, they often have their own unique metadata schemas or formats. A metadata crosswalk helps to bridge the gap between these different schemas by establishing equivalences or relationships between the metadata elements used in each system.
For example, if you have a digital library system that uses one set of metadata standards and you want to integrate it with a repository that uses a different set of standards, you might create a metadata crosswalk to map the elements from one standard to the other, ensuring interoperability and consistency in metadata representation.
Metadata crosswalks are particularly useful in data integration, migration, and interoperability scenarios, enabling smooth communication and exchange of information between disparate systems or standards.
"},{"location":"chats/chat4/#whats-the-point","title":"What's the point ?","text":"The primary purpose of a metadata crosswalk is to facilitate interoperability and data exchange between systems or standards that use different metadata schemas. Here are some specific points highlighting its importance:
Interoperability: Different organizations or systems may use varying metadata standards or schemas to describe their data or resources. By creating a metadata crosswalk, you ensure that these systems can communicate with each other effectively, exchanging information without loss of meaning.
Data Integration: When consolidating data from multiple sources or systems, a metadata crosswalk helps ensure that the metadata associated with each dataset is properly understood and mapped to a common format. This facilitates the integration process and allows for more efficient analysis and management of the combined datasets.
Standardization: By mapping metadata elements between different standards or schemas, a crosswalk promotes standardization and consistency in how data is described and represented. This consistency is essential for ensuring accurate interpretation and retrieval of information across different systems or platforms.
Migration and Upgrades: During system migrations or upgrades, metadata formats may change. A metadata crosswalk can aid in transitioning from one metadata schema to another by providing a clear mapping between the old and new formats, thus minimizing disruptions and ensuring continuity in data management practices.
Data Sharing and Reuse: When sharing data with external parties or making it available for reuse, a metadata crosswalk helps ensure that the metadata associated with the data is understandable and interpretable by others. This promotes data sharing and reuse, as users can easily understand the content and context of the data they are accessing.
Overall, the point of a metadata crosswalk is to promote seamless communication, integration, and interoperability between different systems or standards, ultimately enhancing the usability and accessibility of data across diverse environments.
"},{"location":"chats/chat4/#can-you-suggest-some-interesting-links-on-the-subject-related-to-life-science","title":"Can you suggest some interesting links on the subject related to life science ?","text":"Certainly! Here are some interesting links related to metadata crosswalks in the context of life sciences:
NCBI Metadata Crosswalk Tool:
Metadata Standards and Tools for Life Sciences:
FAIRsharing:
NIH Common Data Elements (CDE) Repository:
These resources should provide you with a good starting point for exploring metadata standards and tools relevant to life sciences research.
Chat generated by chatGPT 3.5 - May 24, 2024
"},{"location":"definitions/","title":"Definition Files","text":""},{"location":"definitions/#metadata-definition-files","title":"Metadata definition files","text":"The Maggot tool offers great flexibility in configuration. It allows you to completely choose all the metadata you want to describe your data. You can base yourself on an existing metadata schema, invent your own schema or, more pragmatically, mix one or more schemas by introducing some metadata specific to your field of application. However, keep in mind that if you want to add descriptive metadata to your data then a certain amount of information is expected. But a completely different use of the tool is possible, it's up to you.
There are two levels of definition files as shown the figure below:
1 - The first level concerns the definition of terminology (metadata) similar to a descriptive metadata plan. Clearly, this category is more akin to configuration files. They represent the heart of the application around which everything else is based. The input and search interfaces are completely generated from these definition files (especially the web/conf/config_terms.txt file), thus defining each of the fields, their input type (checkbox, dropbox, textbox, ...) and the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms). This is why a configuration step is essential in order to be able to configure all the other modules.
2 - The second level concerns the definitions of the mapping to a differently structured metadata schema (metadata crosswalk, i.e a specification for mapping one metadata standard to another), used either i) for metadata export to a remote repository (e.g. Dataverse, Zenodo) or ii) for metadata harvesting (e.g. JSON-LD, OAI-PMH). Simply place the definition files in the configuration directory (web/conf) for them to be taken into account, provided you have adjusted the configuration (See Settings).
All definition files are made using a simple spreadsheet then exported in TSV format.
The list of definition files in Maggot are given below. All must be put under the directory web/conf.
See an example on line : https://pmb-bordeaux.fr/maggot/config/view and the corresponding form based on these definition files.
"},{"location":"definitions/config_terms/","title":"Terminlogy Definition","text":""},{"location":"definitions/config_terms/#example-of-a-terminlogy-definition-file","title":"Example of a Terminlogy Definition file","text":"Field Section Required Search ShortView Type features Label Predefined terms title definition Y N 1 textbox width=350px Short name fulltitle definition Y Y 2 textbox Full title subject definition Y Y checkbox open=0 Subject Agricultural Sciences,Arts and Humanities,Astronomy and Astrophysics,Business and Management,Chemistry,Computer and Information Science,Earth and Environmental Sciences,Engineering,Law,Mathematical Sciences,Medicine Health and Life Sciences,Physics,Social Sciences,Other description definition Y Y areabox rows=6,cols=30 Description of the dataset note definition N Y areabox rows=4,cols=30 Notes status status N Y 3 dropbox width=350px Status of the dataset Processed,In progress,Unprocessed access_rights status N Y 4 dropbox width=350px Access rights to data Public,Mixte,Private language status N Y checkbox open=0 Language Czech,Danish,Dutch,English,Finnish,French,German,Greek,Hungarian,Icelandic,Italian,Lithuanian,Norwegian,Romanian,Slovenian,Spanish,Swedish lifeCycleStep status N Y multiselect autocomplete=lifecycle,min=1 Life cycle step license status N Y textbox autocomplete=license,min=1 License datestart status N Y datebox width=350px Start of collection dateend status N Y datebox width=350px End of collection dmpid status N Y textbox DMP identifier contacts management Y Y multiselect autocomplete=people,min=1 Contacts authors management Y Y multiselect autocomplete=people,min=1 Authors collectors management N Y multiselect autocomplete=people,min=1 Data collectors curators management N Y multiselect autocomplete=people,min=1 Data curators members management N Y multiselect autocomplete=people,min=1 Project members leader management N Y multiselect autocomplete=people,min=1 Project leader wpleader management N Y multiselect autocomplete=people,min=1 WP leader depositor management N Y textbox Depositor producer management N Y multiselect autocomplete=producer,min=1 Producer grantNumbers management N Y multiselect autocomplete=grant,min=1 Grant Information kindOfData descriptors Y Y checkbox open=0 Kind of Data Audiovisual,Collection,Dataset,Event,Image,Interactive Resource,Model,Physical Object,Service,Software,Sound,Text,Workflow,Other keywords descriptors N Y multiselect autocomplete=bioportal,onto=EFO:JERM:EDAM:MS:NMR:NCIT:OBI:PO:PTO:AGRO:ECOCORE:IOBC:NCBITAXON Keywords topics descriptors N Y multiselect autocomplete=VOvocab Topic Classification dataOrigin descriptors N Y checkbox open=0 Data origin observational data,experimental data,survey data,analysis data,text corpus,simulation data,aggregate data,audiovisual corpus,computer code,Other experimentfactor descriptors N Y multiselect autocomplete=vocabulary,min=1 Experimental Factor measurement descriptors N Y multiselect autocomplete=vocabulary,min=1 Measurement type technology descriptors N Y multiselect autocomplete=vocabulary,min=1 Technology type publication_citation descriptors N Y areabox rows=5,cols=30 Publication - Citation publication_idtype descriptors N Y dropbox width=200px Publication - ID Type -,ark,arXiv,bibcode,doi,ean13,eissn,handle,isbn,issn,istc,lissn,lsid,pmid,purl,upc,url,urn publication_idnumber descriptors N Y textbox width=400px Publication - ID Number publication_url descriptors N Y textbox Publication - URL comment other N Y areabox rows=15, cols=30 Additional information"},{"location":"definitions/dataverse/","title":"Dataverse Definition File","text":"Open source research data repository software, approved by Europe.
"},{"location":"definitions/dataverse/#dataverse-definition-file_1","title":"Dataverse definition File","text":"This definition file will allow Maggot to automatically export the dataset into a data repository based on Dataverse. The approach consists of starting from the Maggot metadata file in JSON format and transforming it into another JSON format compatible with Dataverse, knowing that this metadata crosswalk was made possible by choosing the right metadata schema at upstream.
The structure of the Dataverse JSON output file being known internally, a minimum of information is therefore necessary to carry out the correspondence.
The file must have 4 columns with headers defined as follows:
Below an example of Dataverse definition file (TSV)
Example of Dataverse JSON file generated based on the definition file itself given as an example above.
This definition file will allow harvesters to collect structured metadata based on a semantic schema, i.e the fields themselves and not just their content can be associated with a semantic definition (ontology for example) which will then facilitate the link between the metadata and therefore the data (JSON-LD). The chosen semantic schema is based on several metadata schemas.
The full workflow to \"climb the Link Open Data mountain\" is resumed by the figure below :
Metadata schemas used to build the model proposed by default:
Definition of the JSON-LD context using the metadata schemas proposed by default
The structure of the JSON-LD is not known internally, information on the structure will therefore be necessary to carry out the correspondence.
Example of JSON-LD definition file (partial) using the metadata schemas proposed by default (TSV)
Example of JSON-LD file generated based on the definition file itself given as an example above.
The mapping file is used as indicated by its name to match a term chosen by the user during entry with another term from an ontology or a thesaurus and therefore to obtain a URL which will be used for referencing. It can be used for each metadata crosswalk requiring such a mapping (e.g. to the Dataverse, Zenodo or JSON-LD format).
The role of this definition file is illustrated with the figure above
The file must have 5 columns with headers defined as follows:
Below an example of Mapping definition file (TSV)
"},{"location":"definitions/oai-pmh/","title":"OAI-PMH Definition File","text":"
OAI-PMH is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives.
"},{"location":"definitions/oai-pmh/#oai-pmh-definition-file_1","title":"OAI-PMH definition File","text":"This definition file will allow harvesters to collect metadata structured according to a standard schema (OAI-DC).
Based on the Open Archives Initiative Protocol for Metadata Harvesting - Version 2
Example of a OAI-PMH Data Provider Validation
Example of OAI-PMH output for a dataset
The structure of the OAI-PMH output file being known internally, a minimum of information is therefore necessary to carry out the correspondence.
Example of OAI-PMH definition file (TSV)
Another example of OAI-PMH definition file (TSV) with identifers & vocabulary mapping
"},{"location":"definitions/terminology/","title":"Terminology","text":""},{"location":"definitions/terminology/#definition-of-terminology","title":"Definition of terminology","text":"There are two definition files to set up.
Each time there is a change in these two definition files, it is necessary to convert them so that they are taken into account by the application.
Terminology is the set of terms used to define the metadata of a dataset. A single file (web/conf/config_terms.txt) contains all the terminology. The input and search interfaces (e.g screenshot) are completely generated from this definition file, thus defining i) each of the fields, their input type (checkbox, dropbox, textbox, ...) and ii) the associated controlled vocabulary (ontology and thesaurus by autocompletion, drop-down list according to a list of fixed terms).
The metadata schema proposed by defaut is mainly established according to the DDI (Data Documentation Initiative) schema that also corresponds to that adopted by the Dataverse software.
Terminology is organised in several sections. By default 6 sections are proposed, but you can redefine them as you wish:
For each section, fields are then defined. These fields can be defined according to the way they will be entered via the web interface. There are 6 different types of input: check boxes (checkbox), drop lists (dropbox), single-line text boxes (textbox), single-line text boxes with an additional box for multiple selection from a catalog of terms (multiselect), date picker (datebox) and multi-line text boxes (areabox).
For two types (checkbox and dropbox), it is possible to define the values to be selected (predefined terms).
"},{"location":"definitions/terminology/#structure-of-the-terminology-definition-file-tsv","title":"Structure of the Terminology definition file (TSV)","text":"The file must have 9 columns with headers defined as follows:
column 9 - Predefined terms : for fields defined with a type equal to checkbox or dropbox, one can give a list of terms separated by a comma.
Notes
Below an example of Terminology definition file (TSV)
Example of Maggot JSON file generated based on the same definition file
The documentation definition file is used to have online help for each field (small icon placed next to each label on the form). So it should only be modified when a field is added or deleted, or moved to another section. This file will be used then to generate the online metadata documentation according to the figure below (See Configuration to find out how to carry out this transformation).
The file must have 3 columns with headers defined as follows:
Below an example of Terminology documentation file (TSV)
Same example as above converted to HTML format using Markdown format
1 - Vocabulary based on a list of terms fixed in advance (checbox with feature open=0)
2 - Vocabulary open for addition (checkbox with feature open=1)
3 - Vocabulary based on a web API in a text field (textbox)
4 - Vocabulary based on a dictionary with multiple selection (multiselect)
5 - Vocabulary based on a SKOSMOS Thesaurus with multiple selection (multiselect)
6 - Vocabulary based on an OntoPortal with multiple selection (multiselect)
"},{"location":"definitions/zenodo/","title":"Zenodo Definition File","text":"
Open source research data repository software, approved by Europe.
"},{"location":"definitions/zenodo/#zenodo-definition-file_1","title":"Zenodo definition File","text":"This definition file will allow Maggot to automatically export the dataset into a data repository based on Zenodo. The approach consists of starting from the Maggot metadata file in JSON format and transforming it into another JSON format compatible with Zenodo.
The structure of the Zenodo JSON output file is not known internally, information on the structure will therefore be necessary to carry out the correspondence.
Below an example of Zenodo definition file (TSV)
Example of Zenodo JSON file generated based on the definition file itself given as an example above.
Using an approach that might be called \u201cmachine-readable metadata,\u201d it is possible to populate metadata for a dataset into one of the proposed data repositories via its web API, provided that you have taken care to correctly define your metadata schema so that it is possible to make a correspondence with the chosen data repository using a mapping definition file.
The principle is illustrated by the figure above.
1 - To submit metadata to a Dataverse repository, you must first select a dataset either from the drop-down list corresponding to the datasets listed on the data storage space or a metadata file from your local disk.
2 - You then need to connect to the repository in order to retrieve the key (the API token) authorizing you to submit the dataset. This obviously assumes that you have the privileges (creation/modification rights) to do so.
3 - After choosing the repository URL, you must also specify on which dataverse collection you want to deposit the datasets. As previously, you must have write rights to this dataverse collection.
If you also want to deposit data files at the same time as the metadata, you will need:
1 - declare the files to be deposited in the resources; these same files must also be present in the storage space.
2 - create a semaphore file (META_datafile_ok.txt); its sole presence, independently of its content (which may be empty) will authorize the transfer. Indeed, the creation of such a file guarantees that the user has actually write rights to the storage space corresponding to his dataset. This prevents someone else from publishing the data without having the right to do so. This mechanism also avoids having to manage user accounts on Maggot.
1 - To submit metadata to a Zenodo repository, you must first select a dataset either from the drop-down list corresponding to the datasets listed on the data storage space or a metadata file from your local disk.
2 - Unless you have previously saved your API token, you must create a new one and copy and paste it before validating it. Before validating, you must check the deposit:access and deposit:write boxes in order to obtain creation and modification rights with this token.
3 - After choosing the repository URL, you can optionally choose a community to which the dataset will be linked. By default, you can leave empty this field.
"},{"location":"publish/zenodo/#deposit-data-files","title":"Deposit data files","text":"
If you also want to deposit data files at the same time as the metadata, you will need (see figure below)
1 - declare the files to be deposited in the resources (1) ; these same files must also be present in the storage space.
2 - create a semaphore file (META_datafile_ok.txt) (2); its sole presence, independently of its content will authorize the transfer. Indeed, the creation of such a file guarantees that the user has actually write rights to the storage space corresponding to his dataset. This prevents someone else from publishing the data without having the right to do so. This mechanism also avoids having to manage user accounts on Maggot.
Then, all you have to do is click on 'Publish' to \"push\" the metadata and data to the repository (3).
After submission and if everything went well, a link to the deposit will be given to you (4).
"},{"location":"tutorial/","title":"Quick tutorial","text":""},{"location":"tutorial/#quick-tutorial_1","title":"Quick tutorial","text":"
This is a quick tutorial of how to use the Maggot tool in practice and therefore preferably targeting the end user.
See a short Presentation and Poster if you want to have a more general overview of the tool.
"},{"location":"tutorial/#overview","title":"Overview","text":"The Maggot tool is made up of several modules, all accessible from the main page by clicking on the corresponding part of the image as shown in the figure below:
Configuration
This module mainly concerns the data manager and makes it possible to construct all the terminology definition files, i.e. the metadata and sources of associated vocabularies. See Definition files then Configuration.
Private AccessThis module allows data producer to temporarily protect access to metadata for the time necessary before sharing it within his collective. See Private access key management.
DictionariesThis module allows data producer to view content of all dictionaries. It also allows data steward to edit their content. See Dictionaries for technical details only.
Metadata EntryThis is the main module allowing the data producer to enter their metadata relating to a dataset. See the corresponding tutorial for Metadata Entry.
Search datasetsThis module allows users to search datasets based on the associated metadata, to see all the metadata and possibly to have access to the data itself. This obviously assumes that the metadata files have been deposited in the correct directory in the storage space dedicated to data management within your collective. See Infrastructure.
File BrowserThis module gives users access to a file browser provided that the data manager has installed it. See File Browser
PublicationThis module allows either the data producer or the data steward to publish the metadata with possibly the corresponding data within the suitable data repository. See Publication
"},{"location":"tutorial/describe/","title":"Quick tutorial","text":""},{"location":"tutorial/describe/#metadata-entry","title":"Metadata Entry","text":"The figures are given here for illustration purposes but certain elements may be different for you given that this will depend on the configuration on your instance, in particular the choice of metadata, and the associated vocabulary sources.
Indeed, the choice of vocabulary sources (ontologies, thesauri, dictionaries) as well as the choice of metadata fields to enter must in principle have been the subject of discussion between data producers and data manager during the implementation of the Maggot tool in order to find the best compromise between the choice of sources and all the scientific fields targeted (see Definition files). However a later addition is always possible.
"},{"location":"tutorial/describe/#overview","title":"Overview","text":"When you enter the metadata entry module you should see a page that looks like the figure below:
All the fields (metadata) to be filled in are distributed between several tabs, also called sections. Each section tries to group together a set of fields relating to the same topic.
You can reload a previously created metadata file. All form fields will then be initialized with the value(s) defined in the metadata file.
You must at least complete the mandatory fields marked with a red star.
It is possible to obtain help for each field to be completed. A mini-icon with a question mark is placed after each field label. By clicking on this icon, a web page opens with the focus on the definition of the corresponding field. This help should provide you with at least a definition of a field and, if necessary, instructions on how to fill it in. It should be noted that the quality of the documentation depends on each instance and its configuration.
Once the form has been completed, even partially (at least those which are mandatory and marked with a red star), you can export your metadata in the form of a file. See Metadata File
Dictionary-based metadata (e.g. people's names) can easily be entered by autocomplete in the 'Search value' box provided the name appears in the corresponding dictionary.
However, if the name does not yet appear in the dictionary, simply enter the full name (first name & last name) in the main box, making sure to separate each name with a comma and then a space as shown in the figure below.
Then you can request to add the additional person name(s) to the dictionary later as described below:
From the home page, select \"Dictionaries\". As username, just put \"maggot\" (this might be different within your instance).
Then after choosing the \"people\" dictionary, you can download the entire dictionary in a TSV file (Tab-Separated Values) ready to be edited with your favorite spreadsheet.
Add all the desired people's names with their institution, and possibly their ORCID and their email address. Please note that emails are required for authors and contacts
You will then just have to send it to the data manager so that he can add new people's names to the online dictionary.
Please proceed in the same way for all dictionaries (people, funders, producer, vocabulary)
"},{"location":"tutorial/describe/#controlled-vocabulary","title":"Controlled Vocabulary","text":"Depending on the configuration of your instance, it is very likely that certain fields (eg. keywords) are connected to a controlled vocabulary source (e.g. ontology, thesaurus). Vocabulary based on ontologies, thesauri or even dictionaries can easily be entered by autocomplete in the \"search for a value\" box provided that the term exists in the corresponding vocabulary source.
If a term cannot be found by autocomplete, you can enter the term directly in the main box, making sure to separate each term with a comma and a space as shown in the figure below.
The data steward will later try to link it to a vocabulary source that may be suitable for the domain in question. Furthermore, even if the choice of vocabulary sources was made before the tool was put into service, a later addition is always possible. You should make the request to your data manager.
"},{"location":"tutorial/describe/#resources","title":"Resources","text":"Because data is often scattered across various platforms, databases, and file formats, this making it challenging to locate and access. This is called data fragmentation. So the Maggot tool allows you to specify resources, i.e. data in the broader sense, whether external or internal, allowing to centralize all links towards data.
Four fields must be filled in :
Resource Type : Choose the type of the resource in the droplist.
Media Type : Choose a media type if applicable by autocomplete.
Description : Provide a concise and accurate description of the resource. Must not exceed 30 characters.
Location : Preferably indicate an URL to an external resource accessible to all. But it can also be a password-protected resource (e.g. a disk space on the cloud). This can also be text clearly indicating where the resource is located (internal disk space). Finally, this can be the name of a file deposited on the same disk space as the metadata file, in order to be able to push it in the data repository at the same time as the metadata (see Publication).
Once the form has been completed, even partially (at least those which are mandatory and marked with a red star), you can export your metadata in the form of a file. The file is in JSON format and must have the prefix 'META_'.
By clicking on the \"Generate the metadata file\" button, you can save it on your disk space.
Furthermore, if email sending has been configured (see settings), then you have the possibility of sending the metadata file to the data managers for conservation, and possibly also for supporting its storage on data disk space if specific rights are required.
In case you want to save the metadata file on your disk space, you have two ways to use this file:
1. The first use is the recommended one because it allows metadata management within your collective.You drop the metadata file directly under the data directory corresponding to the metadata. Indeed, when installing the tool, a storage space dedicated to the tool had to be provided for this purpose. See infrastructure. Once deposited, you just have to wait around 30 minutes maximum so that the tool has had time to scan the root of the data directories looking for new files in order to update the database. After this period, the description of your dataset will be visible from the interface, and a selection of criteria will be made in order to restrict the search.
You will then have the possibility to publish the metadata later with possibly the corresponding data in a data repository such as Dataverse or Zenodo.
2. The second use is only to deposit the metadata into a data repositoryWhether with Dataverse or Zenodo, you have the possibility of publishing metadata directly in one or other of these repositories without using the storage space.
Please note that you cannot also deposit the data files in this way. You will have to do this manually for each of them directly online in the repository.
"}]} \ No newline at end of file diff --git a/sitemap.xml.gz b/sitemap.xml.gz index 187b23b8a4ba71e499c1bccf0c6c8c4200d7d76b..eb5e42a0cdfe2da80258fa96acb245d3178a0c9e 100755 GIT binary patch delta 13 Ucmb=gXP58h;5a1`F_FCj02~Vh1^@s6 delta 13 Ucmb=gXP58h;BfW~pU7ST02$^3vj6}9