a basic matlab package with VMD-like selection snytax for analysis protein structure
- Import path
- Read local pdb file
- Get coordinate data
- Get other attributes of the pdb
- Get center of geometry
- Get center of mass
- Atom selection (as)
Import bioStructureM path to Matlab
addpath('your_bioStructureM_root/core');
addpath('your_bioStructureM_root/atomselector');
Read local pdb file (ex. 1BFG).
The pdbStruct is a MATLAB structure array contain several fields :
Field Name | Data Type | Description |
---|---|---|
alternate | char | Alternate location indicator |
atomno | double | Atom serial number |
atmname | char | Atom name. The max length of atmname is 4 Characters. |
bval | double | Temperature factor (b-factor) |
charge | char | Charge on the atom |
coord | 3 x 1 double | Coordinates in Angstroms. |
elementSymbol | char | Element symbol |
iCode | char | Code for insertion of residues |
occupancy | double | Occupancy |
record | char | Record name can be either ATOM or HETATM |
resname | char | Residue name |
resno | char | Residue sequence number |
segment | char | Segment identifier |
subunit | char | Chain identifier |
pdbStruct = readPDB('1BFG.pdb');
The return of getCoord
is a n by 3 array. Where the "n" is number of atoms of the pdbStruct.
crd = getCoord(pdbStruct);
For the double format data
bfactor = [pdbStruct.bval]; % bfactor is an double array.
For the characters format data
atomName = {pdbStruct.atmname}; % atom Name is a cell array.
gcenter = getGeometrycenter(pdbStruct);
Before using getCenterOfMass
, assigning mass to each atom is needed.
pdbStruct = assignMass(pdbStruct);
mcenter = getCenterOfMass(pdbStruct);
Use VMD-like syntax to select specific atoms.
Select by atom name.
CaStruct = as('name CA',pdbStruct);
Select by residues id
T73 = as('resi 73',pdbStruct);
Select protein or water
protein = as('protein',pdbStruct);
water = as('water',pdbStruct);
The return of "as" is a structure array that has same fields as original structure
-
name atomname {selected-atom-names} Using space as delimiter to separate the different atom names.
as('name CA C O N',pdbStruct) as('atomname CA C O N',pdbStruct)
"name" and "atomname" support simple regular expression. For example "name H*", this command will select all the atoms which the names is H1,H2,HD1... etc.
as('name H*',pdbStruct)
Because of supporting regular expression, the command to select the "H*" atoms should be "H\*".
-
resi resid residue {selected-resids}
select by residue idsas('resi 73 80',pdbStruct) as('resid 73 80',pdbStruct)
select sequence residue ids: start:step:end or start:end
as('resi 19:90',pdbStruct)
as('resi 19:2:90',pdbStruct)
as('resi 19:31 40:60 144',pdbStruct)
-
record {ATOM|HETATM}
as('record HETATM',pdbStruct)
-
insertion {single-character}
select by insertion code (iCode) of residuesas('insertion A',pdbStruct) as('insertion A \s',pdbStruct)
"\s" is used to select the atoms that the insertion code is empty.
-
bval beta {<|<=|>|>=|=}{value}
select by specific Temperature factorsas('bval >40',pdbStruct) as('bval =40',pdbStruct)
Note: There should have extra space between "bval" and "condition". ex. "bal>40" is
a wrong represontation.
-
resn restype {residue-names}
as('resn ALA',pdbStruct) as('resn ALA TYR',pdbStruct)
-
seq sequence {protein-sequence}
as('seq GGFFLRIHPDGRVD',pdbStruct) as('sequence GGFFLRIHPDGRVD',pdbStruct)
-
chain c. {one-character-chain-ID}
as('c. A',pdbStruct) as('c. A B',pdbStruct) as('c. \s',pdbStruct)
"\s" is used to select the atoms that the chain ID is empty.
-
segment segid {segids}
as('segid PROA',pdbStruct) as('segment PROA WAT',pdbStruct)
as('protein',pdbStruct)
as('all',pdbStruct)
keywords:
- all
- protein
- backbone
- water wat
- nucleic
- het. HETATM
as('protein or water',pdbStruct)
as('(protein and c. A) or water',pdbStruct)
-
and &
Select the intersection of two selectionsas('resi 73 and name CA CB',pdbStruct) as('resi 73 and name CA CB and bval >10',pdbStruct)
-
or |
Select the union of two selectionsas('protein or water',pdbStruct)
-
not
Select all atoms not in selectionas('not water',pdbStruct)
-
within {distance} of
as('water within 4 of protein',pdbStruct)
Let's call water as sel1 and protein as sel2. The command would select any atoms in sel1 which are wihin 4 Angstroms of any atom in sel2.
-
()
Change the priority of selection command.
This command would select the O atoms of water only.as('protein and name CA or water and name O',pdbStruct)
After add () to the command, it can select CA atoms in protein and O atoms in water
as('(protein and name CA) or (water and name O)',pdbStruct)
-
byres
Extend selection to complete residuesas('byres (protein within 4 of resi 73)',pdbStruct)
-
bychain
Extend selection to whole atoms in same chain.as('bychain resi 73',pdbStruct)
This section shows how to set values to specific field and atoms.
asSetAttribute('protein',pdbStruct,'segment','PROA')
Change the segment field of protein to "PROA".
asSetAttribute('all',pdbStruct,'bval',0)
newCABval = ones(1,144)*100
asSetAttribute('protein and CA',pdbStruct,'bval',newCABval)
Set CA atoms bfactor to 100 and set all others to zero.
Note: The number of assigned values should be same as number of atoms or a sigle value.
save pdbStructure as a .pdb file
createPDB(pdbStruct,'output_path.pdb')