Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDBs from RCSB failed to reconstruct #219

Open
slawek111 opened this issue Mar 30, 2021 · 7 comments
Open

PDBs from RCSB failed to reconstruct #219

slawek111 opened this issue Mar 30, 2021 · 7 comments

Comments

@slawek111
Copy link

Hi,

I have been testing PDBFixer for some time and it works great in most cases. Unfortunately, I have also found that there are several PDBs for which the missing residues are not reconstructed (6RPA, 6RPB and 1AO7). PDBFixer just does not find the missing parts in the bakcbone and thus it does not reconstruct them. I have tried several approaches: first I checked if the SEQRES is well defined (even created own proper SEQRES), which was true. Then, I realized that the amino acids in these PDBs are not numbered in order for the continuous parts (not missing ones) - I also fixed that. Another fix was also to remove letters for the variant mutations done in the experiments (in comparison to reference sequence).
Unfortunately, none of the above worked and at the moment there is effort to check through the code and find the reason. But maybe I could find help from you. All the suggestions and help will be very appreciated!

All the best,
Sławek

@peastman
Copy link
Member

I downloaded 6RPA, and it looks to me like there are problems in the residue numbering. Take a look at chain D. The SEQRES and ATOM records match through residue 29 (GLY). Then the residue number jumps to 36, indicating six missing residues. But they don't appear in the SEQRES records. It carries right on at residue 36 as if nothing were there.

As a result, PDBFixer can't figure out any way to match up the sequence of chain D to the SEQRES records. If it can't align them, then it can't identify what residues are missing.

@slawek111
Copy link
Author

slawek111 commented Apr 1, 2021 via email

@peastman
Copy link
Member

peastman commented Apr 1, 2021

Can you post your modified PDB file where you've fixed those problems? I can investigate what else is happening.

@slawek111
Copy link
Author

Hi, sorry for late response, but I was going through Covid and I was unable to take up the issue. The 6RPA PDB with fixed numbering is in the attachment.
6RPA_numbers_fixed.zip

@NatureGeorge
Copy link

pdbfixer should make use of the _pdbx_poly_seq_scheme record in the mmCIF format rather than the SEQRES in the legacy pdb format.

loop_
_pdbx_poly_seq_scheme.asym_id 
_pdbx_poly_seq_scheme.entity_id 
_pdbx_poly_seq_scheme.seq_id 
_pdbx_poly_seq_scheme.mon_id 
_pdbx_poly_seq_scheme.ndb_seq_num 
_pdbx_poly_seq_scheme.pdb_seq_num 
_pdbx_poly_seq_scheme.auth_seq_num 
_pdbx_poly_seq_scheme.pdb_mon_id 
_pdbx_poly_seq_scheme.auth_mon_id 
_pdbx_poly_seq_scheme.pdb_strand_id 
_pdbx_poly_seq_scheme.pdb_ins_code 
_pdbx_poly_seq_scheme.hetero 
...
D 4 1   MET 1   0   ?   ?   ?   D . n 
D 4 2   ALA 2   1   1   ALA ALA D . n 
D 4 3   GLN 3   2   2   GLN GLN D . n 
D 4 4   SER 4   3   3   SER SER D . n 
D 4 5   VAL 5   4   4   VAL VAL D . n 
D 4 6   ALA 6   5   5   ALA ALA D . n 
D 4 7   GLN 7   6   6   GLN GLN D . n 
D 4 8   PRO 8   7   7   PRO PRO D . n 
D 4 9   GLU 9   8   8   GLU GLU D . n 
D 4 10  ASP 10  9   9   ASP ASP D . n 
D 4 11  GLN 11  10  10  GLN GLN D . n 
D 4 12  VAL 12  11  11  VAL VAL D . n 
D 4 13  ASN 13  12  12  ASN ASN D . n 
D 4 14  VAL 14  13  13  VAL VAL D . n 
D 4 15  ALA 15  14  14  ALA ALA D . n 
D 4 16  GLU 16  15  15  GLU GLU D . n 
D 4 17  GLY 17  16  16  GLY GLY D . n 
D 4 18  ASN 18  17  17  ASN ASN D . n 
D 4 19  PRO 19  18  18  PRO PRO D . n 
D 4 20  LEU 20  19  19  LEU LEU D . n 
D 4 21  THR 21  20  20  THR THR D . n 
D 4 22  VAL 22  21  21  VAL VAL D . n 
D 4 23  LYS 23  22  22  LYS LYS D . n 
D 4 24  CYS 24  23  23  CYS CYS D . n 
D 4 25  THR 25  24  24  THR THR D . n 
D 4 26  TYR 26  25  25  TYR TYR D . n 
D 4 27  SER 27  26  26  SER SER D . n 
D 4 28  VAL 28  27  27  VAL VAL D . n 
D 4 29  SER 29  28  28  SER SER D . n 
D 4 30  GLY 30  29  29  GLY GLY D . n 
D 4 31  ASN 31  36  36  ASN ASN D . n       <---------------------------
D 4 32  PRO 32  37  37  PRO PRO D . n 
D 4 33  TYR 33  38  38  TYR TYR D . n 
D 4 34  LEU 34  39  39  LEU LEU D . n 
D 4 35  PHE 35  40  40  PHE PHE D . n 
D 4 36  TRP 36  41  41  TRP TRP D . n 
D 4 37  TYR 37  42  42  TYR TYR D . n 
D 4 38  VAL 38  43  43  VAL VAL D . n 
D 4 39  GLN 39  44  44  GLN GLN D . n 
D 4 40  TYR 40  45  45  TYR TYR D . n 
D 4 41  PRO 41  46  46  PRO PRO D . n 
D 4 42  ASN 42  47  47  ASN ASN D . n 
...

where ? in _pdbx_poly_seq_scheme.auth_seq_num column indicates a missing/unmodeled residue.

Hoping everyone is alright.

@peastman
Copy link
Member

I'm not sure what you mean. His input file is a PDB, not a PDBx/mmCIF.

@Ruibin-Liu
Copy link
Contributor

Has anyone figured out how to solve the problem? I think 5J7S is problematic too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants