Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implementing gemmi-based mmcif reader (with easy extension to PDB/PDBx and mmJSON) #4712

Open
wants to merge 33 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
aa2a88f
Start working on MMCIF parser
marinegor May 22, 2024
218cf43
Add first (not working) version of MMCIFReader and MMCIF topology parser
marinegor May 22, 2024
7f78e02
Do some squashing
marinegor May 22, 2024
6682d6e
Remove inherited docs
marinegor May 22, 2024
817f3a0
Try improving the parsing
marinegor May 22, 2024
3cc8c80
Try three independent loops over the model
marinegor May 30, 2024
f1bf325
Merge remote-tracking branch 'upstream/develop' into feature/mmcif
marinegor Jul 25, 2024
d21c220
Add gemmi dependency
marinegor Sep 13, 2024
2a1be15
necessary params
marinegor Sep 20, 2024
77645e6
finished sorting atom attrs
marinegor Sep 20, 2024
91e6942
add function for transformation into *idx
marinegor Sep 20, 2024
9a0c086
oh damn seems to finally be working
marinegor Sep 20, 2024
9c731df
remove TODOs
marinegor Sep 20, 2024
8b40ec7
Remove debug prints
marinegor Sep 20, 2024
bdcbd73
Merge branch 'develop' into feature/mmcif
marinegor Sep 22, 2024
401a4d3
try to pack things into separate class in utils?
marinegor Sep 22, 2024
9c336bd
remove unnecessary functions
marinegor Sep 22, 2024
def88e4
copy all loops into separate functions
marinegor Sep 23, 2024
cabfd37
Move loops over structures into functions
marinegor Sep 23, 2024
4c9d930
Move coordinate fetching into function for the coordinate reader as well
marinegor Sep 23, 2024
184491a
Fix imports
marinegor Sep 23, 2024
3de8565
Start adding documentation
marinegor Sep 30, 2024
ca6ebbb
Reference MMCIFParser in PDBParser
marinegor Oct 1, 2024
45077ad
Add documentation for trajectory and topology parsers
marinegor Oct 1, 2024
9a1a59a
Add mmcif tests
marinegor Oct 2, 2024
27c10d6
Update format specifications
marinegor Oct 2, 2024
950cfcf
Write simple tests
marinegor Oct 2, 2024
8d1a8b5
Merge remote-tracking branch 'upstream/develop' into feature/mmcif
marinegor Oct 24, 2024
ef29338
update github action with gemmi
marinegor Oct 24, 2024
caca17e
fix gemmi import errors
marinegor Oct 24, 2024
f0e49cc
add mmcif testfiles
marinegor Oct 24, 2024
b7ada7c
add mmcif to __all__
marinegor Oct 24, 2024
e80632c
add black instead of ruff
marinegor Oct 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/actions/setup-deps/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ inputs:
default: 'cython'
fasteners:
default: 'fasteners'
gemmi:
default: 'gemmi'
griddataformats:
default: 'griddataformats'
gsd:
Expand Down Expand Up @@ -130,6 +132,7 @@ runs:
${{ inputs.dask }}
${{ inputs.distopia }}
${{ inputs.gsd }}
${{ inputs.gemmi }}
${{ inputs.h5py }}
${{ inputs.hole2 }}
${{ inputs.joblib }}
Expand Down
85 changes: 85 additions & 0 deletions package/MDAnalysis/coordinates/MMCIF.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
import logging
import warnings

import numpy as np

from . import base

try:
import gemmi

HAS_GEMMI = True
except ImportError:
HAS_GEMMI = False

logger = logging.getLogger("MDAnalysis.coordinates.MMCIF")


def get_coordinates(model: "gemmi.Model") -> np.ndarray:
"""Get coordinates of all atoms in the `gemmi.Model` object.

Parameters
----------
model
input `gemmi.Model`, e.g. `gemmi.read_structure('file.cif')[0]`

Returns
-------
np.ndarray, shape [n, 3], where `n` is the number of atoms in the structure.
"""
return np.array(
[[*at.pos.tolist()] for chain in model for res in chain for at in res]
)


class MMCIFReader(base.SingleFrameReaderBase):
"""Reads from an MMCIF file using ``gemmi`` library as a backend.

Notes
-----

If the structure represents an ensemble, only the first structure in the ensemble
is read here (and a warning is thrown). Also, if the structure has a placeholder "CRYST1"
record (1, 1, 1, 90, 90, 90), it's set to ``None`` instead.

.. versionadded:: 2.8.0
"""

format = ["cif", "cif.gz", "mmcif"]
units = {"time": None, "length": "Angstrom"}

def _read_first_frame(self):
structure = gemmi.read_structure(self.filename)
cell_dims = np.array(
[
getattr(structure.cell, name)
for name in ("a", "b", "c", "alpha", "beta", "gamma")
]
)
if len(structure) > 1:
warnings.warn(

Check warning on line 60 in package/MDAnalysis/coordinates/MMCIF.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/MMCIF.py#L60

Added line #L60 was not covered by tests
f"File {self.filename} has {len(structure)} models, but only the first one will be read"
)
if len(structure) > 1:
warnings.warn(

Check warning on line 64 in package/MDAnalysis/coordinates/MMCIF.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/MMCIF.py#L64

Added line #L64 was not covered by tests
"MMCIF model {self.filename} contains {len(model)=} different models, "
"but only the first one will be used to assign the topology"
) # TODO: if the structures represent timestamps, can parse them with :func:`get_coordinates`.

model = structure[0]
coords = get_coordinates(model)
self.n_atoms = len(coords)
self.ts = self._Timestep.from_coordinates(coords, **self._ts_kwargs)
if np.allclose(cell_dims, np.array([1.0, 1.0, 1.0, 90.0, 90.0, 90.0])):
warnings.warn(

Check warning on line 74 in package/MDAnalysis/coordinates/MMCIF.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/MMCIF.py#L74

Added line #L74 was not covered by tests
"1 A^3 CRYST1 record,"
" this is usually a placeholder."
" Unit cell dimensions will be set to None."
)
self.ts.dimensions = None

Check warning on line 79 in package/MDAnalysis/coordinates/MMCIF.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/MMCIF.py#L79

Added line #L79 was not covered by tests
else:
self.ts.dimensions = cell_dims
self.ts.frame = 0

def close(self):
pass

Check warning on line 85 in package/MDAnalysis/coordinates/MMCIF.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/MMCIF.py#L85

Added line #L85 was not covered by tests
1 change: 1 addition & 0 deletions package/MDAnalysis/coordinates/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -791,3 +791,4 @@ class can choose an appropriate reader automatically.
from . import NAMDBIN
from . import FHIAIMS
from . import TNG
from . import MMCIF
Loading
Loading