Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBFixer cannot handle nucleic acids with residue name N #281

Open
jamesmkrieger opened this issue Nov 14, 2023 · 5 comments
Open

PDBFixer cannot handle nucleic acids with residue name N #281

jamesmkrieger opened this issue Nov 14, 2023 · 5 comments

Comments

@jamesmkrieger
Copy link

   fixer.addMissingAtoms()
  File "/home/jkrieger/software/miniconda/envs/prody-github/lib/python3.9/site-packages/pdbfixer/pdbfixer.py", line 902, in addMissingAtoms
    (newTopology, newPositions, newAtoms, existingAtomMap) = self._addAtomsToTopology(True, True)
  File "/home/jkrieger/software/miniconda/envs/prody-github/lib/python3.9/site-packages/pdbfixer/pdbfixer.py", line 400, in _addAtomsToTopology
    self._addMissingResiduesToChain(newChain, insertHere, startPosition, endPosition, loopDirection, residue, newAtoms, newPositions, firstIndex)
  File "/home/jkrieger/software/miniconda/envs/prody-github/lib/python3.9/site-packages/pdbfixer/pdbfixer.py", line 511, in _addMissingResiduesToChain
    template = self.templates[residueName]
KeyError: 'N'

This was triggered by 7s7b.cif downloaded from the PDB

@peastman
Copy link
Member

That's one I haven't seen before. What is N supposed to mean? Is this file trying to use the nucleotide sequence search codes where N means, "Accept any nucleotide at this position?"

@jamesmkrieger
Copy link
Author

Perhaps it means they don’t know what nucleotide is there because they don’t have enough resolution

@peastman
Copy link
Member

Maybe, but the sequence in a PDB file is supposed to be a real sequence, not IUPAC codes. Oh well, I guess someone has figured out yet another way to make a messed up PDB file!

What should we do in this situation? You've told it to add missing residues based on the sequence. But since the sequence doesn't tell us what to add at that position?

@jamesmkrieger
Copy link
Author

Yeah, it is another strange thing to have

Perhaps raise a warning and skip that residue rather than completely stopping?

@sukritsingh
Copy link

Out of curiosity I dug through the source paper and it's not it's not even clear that it's just a single nucleotide being skipped (truly seems a bit of a sloppy entry).

I think skipping on "N" seems a bit dangerous because it's not even clear how many nucleotides are supposed to take it's place. Perhaps the safest bet is to just throw a more clear error message indicating that the entry has unrecognized single-letter-codes for nucleotides, and indicate the index where that happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants