A CRUD for substitution matrices like BLOSUM50, BLOSUM62, PAM250 and more; commonly used in Bioinformatics and Evolutionary Biology.
This has been built by ZENODE within the Hardhat environment and is licensed under the MIT-license (see LICENSE.md).
hardhat
(npm module)web3
(npm module)- Uses the
zenode-contracts
repository, which is automatically included as a Git submodule.
- CRUD in Solidity; immutable code, but flexible by design.
- Modular; loose coupling and high cohesion promote easy implementation into other contracts.
- Re-usable; deploy only once and use in multiple contracts.
- Ownership; access control and administrative privilege management.
- AA (Amino acids; alphabet for Proteins)
- NT (Nucleotides; alphabet for DNA — also known as the 'Nucleic acid notation')
- Scripts
- deploy.js - deploys the contract to the configured network.
- insert.js - reads, parses and inserts matrices or alphabets.
- delete.js - deletes matrices or alphabets.
- Tasks for contract interaction (see 6. Interaction).
- Text parsers that convert matrices and alphabets into Solidity code.
0. Clone
--use the --recursive flag.git clone --recursive https://github.com/zenodeapp/substitution-matrices.git <destination_folder>
1. Installation
--use npm, yarn or any other package manager.npm install
yarn install
2. Run the test node
--do this in a separate terminal!npx hardhat node
npx hardhat run scripts/deploy.js
4. Configuration
--add the contract address to zenode.config.js.... contracts: { substitutionMatrices: { name: "SubstitutionMatrices", address: "ADD_YOUR_CONTRACT_ADDRESS_HERE", }, }, ...npx hardhat run scripts/alphabets/insert.js
npx hardhat run scripts/matrices/insert.js
6. Interaction
--use the scripts provided in the Interaction phase.
To get started, clone the repository with the --recursive
flag:
git clone --recursive https://github.com/zenodeapp/substitution-matrices.git <destination_folder>
This repository includes submodules and should thus contain the
--recursive
flag.
If you've already downloaded, forked or cloned this repository without including the --recursive
flag, then run this command from the root folder:
git submodule update --init --recursive
Read more on how to work with
submodules
in the zenode-contracts repository.
Install all dependencies using a package manager of your choosing:
npm install
yarn install
After having installed all dependencies, use:
npx hardhat node
Make sure to do this in a separate terminal!
This will create a test environment where we can deploy our contract(s) to. By default, this repository is configured to Hardhat's local test node, but can be changed in the hardhat.config.js file. For more information on how to do this, see Hardhat's documentation.
Now that our node is up-and-running, we can deploy our contract using:
npx hardhat run scripts/deploy.js
You should see a message appear in your terminal, stating that the contract was deployed successfully.
Our CRUD is deployed, but doesn't contain any data whatsoever. Before we go ahead and populate it with alphabets and matrices, we'll have to make a couple of changes to the zenode.config.js file.
We add the address of our contract to the contracts
object. That way it knows which deployed contract it should interact with.
...
contracts: {
substitutionMatrices: {
name: "SubstitutionMatrices",
address: "ADD_YOUR_CONTRACT_ADDRESS_HERE",
},
},
...
The contract address can be found in your terminal after deployment.
By default, all known alphabets and matrices will be inserted upon running the insert.js
scripts (in the Population phase).
If you would like to change this behavior, edit the following key-value pairs:
{
// You could also pass in a string instead of an array
alphabetsToInsert: ["ALPHABET_ID_1", "ALPHABET_ID_2", ...],
matricesToInsert: ["MATRIX_ID_1", "MATRIX_ID_2", ...],
}
and for the delete.js
scripts:
{
alphabetsToDelete: ["ALPHABET_ID_1", "ALPHABET_ID_2", ...],
matricesToDelete: ["MATRIX_ID_1", "MATRIX_ID_2", ...],
}
NOTE:
ID
s are only valid if they are present in thealphabets
ormatrices
objects (see 4.3).
There are two steps to consider when adding new alphabets or matrices, namely:
- The creation of the actual file that represents our new dataset, and
- Creating a reference to this dataset in zenode.config.js.
For step one it's important to know what data our text parser expects. For this it might be best to look at the files we've already included in the dataset folder. I also suggest to read more about the formatting of Alphabets and Matrices
in the Appendix.
For the second step we add our new dataset to one of the following objects:
alphabets
alphabets: {
ALPHABET_ID_1: "ALPHABET_ID_1_RELATIVE_PATH",
ALPHABET_ID_2: "ALPHABET_ID_2_RELATIVE_PATH",
...
},
or matrices
matrices: {
MATRIX_ID_1: {
alphabet: "ALPHABET_ID_2",
file: "MATRIX_ID_1_RELATIVE_PATH",
},
MATRIX_ID_2: {
alphabet: "ALPHABET_ID_1",
file: "MATRIX_ID_2_RELATIVE_PATH",
},
...
},
- The
alphabets
-object only requires anID
andRELATIVE_PATH
. - The
matrices
-object on the other hand also requires you to add anALPHABET_ID
. - The
IDs
can be used inalphabetsToInsert
,alphabetsToDelete
,matricesToInsert
andmatricesToDelete
(see 4.2).
-
alphabet amino_acids
(protein sequence characters):alphabets: { amino_acids: "dataset/alphabets/aa.txt", }
-
matrix blosum100
usingalphabet amino_acids
:matrices: { blosum100: { alphabet: "amino_acids", file: "dataset/matrices/blosum100.txt", }, }
IMPORTANT: adding a new alphabet or matrix doesn't mean it gets inserted into the contract in the Population phase. For this it has to be included in the
alphabetsToInsert
ormatricesToInsert
key-value pair! (see 4.2)
Now that we've deployed our contract and configured our setup, we can start populating our CRUD with alphabets and matrices!
To insert all the alphabets/matrices you've configured in the key-value pair alphabetsToInsert
/matricesToInsert
use:
npx hardhat run scripts/alphabets/insert.js
npx hardhat run scripts/matrices/insert.js
NOTE: you cannot insert a matrix before having inserted the alphabet it belongs to!
To delete all the alphabets/matrices you've configured in the key-value pair alphabetsToDelete
/matricesToDelete
use:
npx hardhat run scripts/alphabets/delete.js
npx hardhat run scripts/matrices/delete.js
Deployed, populated and ready to explore!
Here are a few Hardhat tasks (written in hardhat.config.js) to test our contract with:
-
getScore
Get the alignment score of two characters based on the given substitution matrix.
-
input:
--matrix string
--a char
--b char
-
output:
int
npx hardhat getScore --matrix "MATRIX_ID" --a "SINGLE_CHAR_A" --b "SINGLE_CHAR_B"
-
-
getAlphabet
Returns an alphabet-object based on the given ALPHABET_ID.
-
input:
--id string
-
output:
struct Alphabet
--see libraries/Structs.sol
npx hardhat getAlphabet --id "ALPHABET_ID"
-
-
getMatrix
Returns a matrix-object based on the given MATRIX_ID.
-
input:
--id string
-
output:
struct Matrix
--see libraries/Structs.sol
npx hardhat getMatrix --id "MATRIX_ID"
-
-
getAlphabets
Returns the list of inserted ALPHABET_IDs.
-
input:
null
-
output:
string[]
npx hardhat getAlphabets
-
-
getMatrices
Returns the list of inserted MATRIX_IDs.
-
input:
null
-
output:
string[]
npx hardhat getMatrices
-
Alphabets
and Matrices
are the two main components of the SubstitutionMatrices
contract. Alphabets include but are not limited to nucleotide and protein sequence characters (e.g. C, T, A and G), while matrices are 2-dimensional scoring grids (e.g. BLOSUM62, PAM40, PAM120, etc.). To get a better (visual) understanding, you should check out the alphabets and matrices included in the dataset folder.
These components are simple .txt files that abide by the following formatting rules:
- An
alphabet
is a single line of characters, where the position of a character represents its numeric value. - A
matrix
is a 2-dimensional grid, where the first row and first column consist of only-alphabetical characters. - The remaining positions of a
matrix
are integers (zero, negative or positive). - The order of the alphabetical characters inside a
matrix
should be the same as thealphabet
it belongs to (horizontally and vertically). - Every alphanumerical character, for both
alphabet
andmatrix
, is delimited by whitespaces.
This is where most of the personalization for contract deployment and filling takes place.
In the case of the substitution-matrices
repository this includes:
- Choosing which alphabets/matrices get inserted or deleted in the Population phase.
- Configuring which contract we'll interact with in the Interaction phase.
- Expanding (or shrinking for that matter) the list of known alphabets and matrices.
- Hardhat's infrastructure! (https://hardhat.org/)
— ZEN
Copyright (c) 2022 ZENODE