Somesy (software metadata sync) is a CLI tool to avoid messy software project metadata by keeping it in sync.
Many development tools allow or require to provide information about the software project they are used in. These tools are often very specific to the programming-language and the task at hand and often come with their own configuration files. Emerging best practices for FAIR software metadata require to add even more files providing information such as the project name, description, version, repository url, license or authors.
If setting up the different files only once would be enough, there would not be an issue. But software is always in development and a moving target - versions and maintainers can change, contributors come and go, the version number is regularly increased, the project can be moved to a different location. Maintaining this kind of information and updating it in various files and formats used in the project by hand is tedious, error-prone and time consuming. Somesy automates the synchronization of general software project metadata and frees your time to focus on your actual work.
Because the same information is represented in different ways and more or less detail in different files, somesy requires to put all project information in a somesy-specific input section is located in a supported input file. Somesy will use this as the single source of truth for the supported project metadata fields and can synchronize this information into different output files.
Somesy first converts the information as needed for an output, while trying to preserve as much information as possible. Then it carefully updates the file, while keeping all other fields in the target file unchanged. For files that are usually edited by hand, it will even make sure that the comments in your TOML and YAML files stay in place.
Here is an overview of the supported files and formats.
Input Formats | Comment |
---|---|
pyproject.toml | tool.somesy section |
package.json | TBD |
.somesy.toml | ✓ |
Output Formats | Status |
---|---|
pyproject.toml (poetry) | ✓ |
pyproject.toml (setuptools) | ✓ |
package.json | TBD |
mkdocs.yml | TBD |
CITATION.cff | ✓ |
codemeta.json | TBD |
Somesy does not support setuptools dynamic fields in this version.
The below table shows which fields are mapped to corresponding other fields in the currently supported formats. Some of the metadata fields are required inputs in the somesy input file. somesy
will give an error if required fields are not filled.
Project Metadata | Poetry Config | SetupTools Config | CITATION.cff | Requirement |
---|---|---|---|---|
name | name | name | title | required |
version | version | version | version | optional |
description | description | description | abstract | required |
authors | authors | authors | authors | required |
maintainers | maintainers | maintainers | contact | optional |
keywords | keywords | keywords | keywords | optional |
license | license | license | license | required |
repository | repository | urls.repository | repository_code | optional |
homepage | homepage | urls.homepage | url | optional |
Somesy input has the information on what is the most important for metadata and standard columns between different file formats. Somesy input columns are explained below.
- name: Software name - String
- version: Software version - String
- description: Software description - String
- authors: Software authors - List of
Person
s - maintainers: Software maintainers - List of
Person
s - contributors: Software contributors - List of
Person
s - keywords: Keywords that explain the software - List of strings
- license: SPDX string of the license - String in SPDX string format
- repository: The repository URL - String in URL format
- homepage: The software website - String in URL format
Person
is a subclass of the Project Metadata, based on the CFF version 1.2.0 Person class. We added contribution relation fields to this Person
class to appreciate all the contributions to the project. Person
class fields:
- address: The person's address. - String
- affiliation: The person's affiliation. - String
- alias: The person's alias. - String
- city: The person's city. - String
- country: The person's country abbreviation in two capital characters. - String
- email: The person's email address. - String in email format
- family-names: The person's family names. - String
- fax: The person's fax number. - String
- given-names: The person's given names. - String
- name-particle: The person's name particle, e.g., a nobiliary particle or a preposition meaning 'of' or 'from' (for example, 'von' in 'Alexander von Humboldt'). - String
- name-suffix: The person's name-suffix, e.g. 'Jr.' for Sammy Davis Jr. or 'III' for Frank Edwin Wright III. - String
- orcid: The person's ORCID URL. - String in URL format
- post_code: The person's post-code. - String
- tel: The person's phone number. - String
- website: The person's website. - String in URL format
- contribution: Summary of how the person contributed to the project. - String
- contribution_type: Contribution type of contributor using emoji from all contributors. - String in emoji name
- contribution_begin: Beginning date of the contribution. - Date in YYYY-MM-DD format
- contribution_end: Ending date of the contribution. - Date in YYYY-MM-DD format
Input fields have to adhere above restrictions. If not, somesy tool will raise errors.
Somesy requires Python >=3.8
. You can install the package just as any other package into your current Python environment using:
$ pip install git+ssh://git@github.com:Materials-Data-Science-and-Informatics/somesy.git
or, if you are adding it as a dependency into a poetry project:
$ poetry add git+ssh://git@github.com:Materials-Data-Science-and-Informatics/somesy.git
After the installation with pip, you can use somesy as a CLI tool. somesy sync
command checks input file in the working directory by default. .somesy.toml
and pyproject.toml
is checked as input files, ordinarily. Currently, there are 2 output methods for somesy sync
command, CITATION.cff
and pyproject.toml
(either in poetry or setuptools format), and both are synced by default. CITATION.cff
is created if file does not exists but pyproject.toml
have to be created beforehand either in poetry or setuptools format. You can disable either output by CLI options.
Command | Option | Option input | Description |
---|---|---|---|
somesy | --version, -v | - | Get somesy version |
somesy sync | --input-file, -i | input file path | set input file |
somesy sync | --no-sync-cff, -C | - | Do not sync CITATION.cff file |
somesy sync | --cff-file, -c | cff file path | set CITATION.cff file to sync |
somesy sync | --no-sync-pyproject, -P | - | Do not sync pyproject file |
somesy sync | --pyproject-file, -p | pyproject file path | set pyproject file to sync |
somesy sync | --show-info, -s | - | show basic information messages |
somesy sync | --verbose, -v | - | show verbose messages |
somesy sync | --debug, -d | - | show debug messages, overrides verbose |
somesy
is designed to be used as a pre-commit tool so it does not give any output unless there is an error or one of the related flags is set. Also, somesy
will give an error if there is no output to sync.
somesy
can be used as a pre-commit hook. A pre-commit hook runs on every commit to automatically point out issues and/or fixing them. Thus, somesy
syncs your data in every commit in a deterministic way.
If you already use pre-commit, you can add somesy as a pre-commit hook. For people who are new to pre-commit, you can create a .pre-commit-config.yaml file in the root folder of your repository. You can set CLI options in args
as in the example below.
repos:
- repo: https://github.com/Materials-Data-Science-and-Informatics/somesy
rev: "0.1.0"
hooks:
- id: somesy
args: ["-C", "-p", "~/xx/xx/pyproject.toml"]
This repository has a .somesy.toml
file that can be used as a example. You can check this additional example for somesy project metadata inputs. Please pay attention to the toml table titles for each file example, the input itself is the same.
.somesy.toml example:
[project]
name = "test"
version = "0.1.0"
description = "Test description."
authors = [
{family-names = "Doe", given-names= "John", email = "test@test.test", orcid = "https://orcid.org/0000-0001-2345-5678", contribution = "The main author, maintainer and tester.", contribution_begin = "2023-03-01", contribution_type = "code"}
]
maintainers = [
{family-names = "Doe", given-names= "John", email = "test@test.test", orcid = "https://orcid.org/0000-0001-2345-5678", contribution = "The main author, maintainer and tester.", contribution_begin = "2023-03-01", contribution_type = "code"}
]
contributors = [
[
{family-names = "Doe", given-names= "John", email = "test@test.test", orcid = "https://orcid.org/0000-0001-2345-5678", contribution = "The main author, maintainer and tester.", contribution_begin = "2023-03-01", contribution_type = "code"},
{family-names = "Dow", given-names= "John", email = "test2@test.test", orcid = "https://orcid.org/0000-0012-3456-7890", contribution = "Reviewer", contribution_begin = "2023-03-01", contribution_type = "review"}
]
keywords = ["key", "word"]
license = "MIT"
repository = "https://github.com/xx/test"
homepage = "https://xx.github.io/test"
pyproject.toml example:
[tool.somesy.project]
name = "test"
version = "0.1.0"
description = "Test description."
authors = [
{family-names = "Doe", given-names= "John", email = "test@test.test", orcid = "https://orcid.org/0000-0001-2345-5678", contribution = "The main author, maintainer and tester.", contribution_begin = "2023-03-01", contribution_type = "code"}
]
maintainers = [
{family-names = "Doe", given-names= "John", email = "test@test.test", orcid = "https://orcid.org/0000-0001-2345-5678", contribution = "The main author, maintainer and tester.", contribution_begin = "2023-03-01", contribution_type = "code"}
]
contributors = [
[
{family-names = "Doe", given-names= "John", email = "test@test.test", orcid = "https://orcid.org/0000-0001-2345-5678", contribution = "The main author, maintainer and tester.", contribution_begin = "2023-03-01", contribution_type = "code"},
{family-names = "Dow", given-names= "John", email = "test2@test.test", orcid = "https://orcid.org/0000-0012-3456-7890", contribution = "Reviewer", contribution_begin = "2023-03-01", contribution_type = "review"}
]
keywords = ["key", "word"]
license = "MIT"
repository = "https://github.com/xx/test"
homepage = "https://xx.github.io/test"
This project uses Poetry for dependency management, so you will need to have it installed for a development setup for working on this package.
Then you can run the following lines to setup the project:
$ git clone git@github.com:Materials-Data-Science-and-Informatics/somesy.git
$ cd somesy
$ poetry install
Common tasks are accessible via poethepoet,
which can be installed by running poetry self add 'poethepoet[poetry_plugin]'
.
-
Use
poetry poe init-dev
after cloning to enable automatic linting before each commit. -
Use
poetry poe lint
to run the same linters manually. -
Use
poetry poe test
to run tests, add--cov
to also show test coverage. -
Use
poetry poe docs
to generate local documentation.
If you want to cite this project in your scientific work, please use the citation file in the repository.
This project was developed at the Institute for Materials Data Science and Informatics (IAS-9) of the Jülich Research Center and funded by the Helmholtz Metadata Collaboration (HMC), an incubator-platform of the Helmholtz Association within the framework of the Information and Data Science strategic initiative.