Unified access to up-to-date residue-level annotations from UniProt and other biological databases for PDB data via PDBx/mmCIF files

biorxiv(2022)

引用 0|浏览16
暂无评分
摘要
More than 58,000 proteins have up-to-date correspondence between their amino acid sequence (UniProtKB) and their 3D structures (PDB), enabled by the Structure Integration with Function, Taxonomy and Sequences (SIFTS) resource. In addition to this fundamental mapping, SIFTS incorporates residue-level annotations from other biological resources such as Pfam, InterPro, SCOP, SCOP2, CATH, IntEnz, GO, PubMed, Ensembl, NCBI taxonomy database and Homologene. The SIFTS data is exported in XML format per individual PDB entry and is also accessible via the PDBe REST API. These mappings have always been maintained separately from the structure data (PDBx/mmCIF file) in the PDB archive. In this current work, taking advantage of the extensibility of the core PDBx/mmCIF framework, we extended the wwPDB PDBx/mmCIF data dictionary with additional categories to accommodate SIFTS data and added the UniProt, Pfam, SCOP2, and CATH mapping information directly into the PDBx/mmCIF files from the PDB archive. The integration of mapping data in the PDBx/mmCIF files provides consistent numbering of residues in different PDB entries allowing easy comparison of structure models. The extended PDBx/mmCIF format yields a more consistent, standardised metadata description without altering the core PDB information. This development enables up-to-date cross-reference information at residue level resulting in better data interoperability, supporting improved data analysis and visualisation. Availability and implementation We expanded the PDBe release pipeline with a process that adds SIFTS annotations to the PDBx/mmCIF files for individual structures in the PDB archive. The scientific community can download these updated PDBx/mmCIF files from the PDBe entry pages () and through direct URLs (), using the PDBe download service () or from the EMBL-EBI FTP area (). ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
关键词
pdb databases,other biological databases,pdbx/mmcif,up-to-date,residue-level
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要