|[an error occurred while processing this directive]||
Brief manual on the database ENPDB
The experimentally determined information on protein, DNA and RNA 3-dimensional structures is accumulated in the Protein Data Bank. This databank serves a worldwide unique official source of scientific information on 3-dimensional structures of macromolecules. In the PDB is stored information on atomic coordinates, bibliographic citations, primary and secondary structure information, as well as crystallographic structure factors and NMR experimental data. EnPDB database is made by reformatting PDB in a way allowing for extended search possibilities by means of SRS. To support an effective search for 3 dimensional structures extra fields were introduced in EnPDB. First, these are fields comprising information on molecule structural features: the number and length of chains, the number of helices, the number of beta-sheets, presence of DNA/RNA molecules in complex with protein, the number of protein molecules in the complex, the number of heteroatoms. Second, complex PDB fields were split into simple ones. The field SOURCE was separated into Gene, MolSource, Source and Synthesis fields. From the COMPND the field BioUnit was singled out. From the HETNAM and HETSYN emerged the field Heterogen.
Coordinates of atoms are not included in the EnPDB. The detailed description of the fields is given below.
Complete list of the fields
ID, Header, Date, Title, Compound, Molecule, Synonym, EC, BioUnit, Gene, MolSource, Source, Synthesis, Keyword, Technique, Author, Jrnl, JrnlAuthor, JrnlTitle, JrnlRef, JrnlVolume, JrnlYear, Remark_1, Resolution, ChainAmount, ChainSizes, HelixAmount, SheetAmount, DnaRnaAmount, ProteinAmount, HetAmount, Heterogen, LinkEmbl, LinkPir, LinkSwissProt, LinkTransfac, LinkTrrd4.
PDB ID code. This identifier is unique within PDB
PDB classification for the entry
Deposition date is the date when the coordinates were received by PDB.
The date field contains in most cases the date when the entry was created, always stored in the index as an eight-digit number of the format "yyyymmdd" (y = year, m = month, and d = day), e.g., "19940117". It is also possible to type the date to be searched in a different, more intuitive, format: "dd-mmm-yy" or "dd-mmm-yyyy", e.g., "1-jan-97" or "01-jan-1997".
Contains the title for experiment or analysis described in the entry. The field content corresponds to that in PDB.
Describes the macromolecules contained in the entry. Each macromolecule of the entry is defined with a set of token:value pairs, and is referred to as a component of the COMPOUND field. The field content corresponds to that in PDB. For each macromolecular component, the molecule name, synonyms, number assigned by the Enzyme Commission (EC), and other relevant details are specified.
This specialized field is not present in the entry as a separate line. It contains names of macromolecules from the COMPND of PDB and is designed to search for entries by the names of macromolecules.
This specialized field is not present in the entry as a separate line. It contains synonymic names of macromolecules from the COMPND of PDB and is designed to search for entries by the synonymic names of macromolecules.
Contains the Enzyme Commission number associated with the molecule. If there is more than one EC number, they are presented as a comma-separated list.
If a MOLECULE functions as a part of a larger biological unit, the entire functional unit may be described. The field content is selected from the COMPND of PDB using the token BIOLOGICAL_UNIT.
Identifies the gene through the gene names taken from the SOURCE field of PDB.
Specifies biological and/or chemical sources of all the biological molecules in the entry.
There are three values: BIOLOGICAL, SYNTHETIC, and MIXED. SYNTHETIC means that all the molecules with the entry were chemically synthesized; BIOLOGICAL, all the molecules were not synthesized; and MIXED, some molecules were synthesized, whereas the rest are natural.
This field specifies the biological source of each biological molecule in the entry. Sources are described by both the common and scientific names, e.g., genus and species. Strain and/or cell line for immortalized cells are given when they help to uniquely identify the biological entity studied.
Note that the content of this field is not a replica of the SOURCE of PDB. The original PDB filed is divided into two parts: all the information concerning the biological source is retained in this field, while all the data related to synthesis is comprised in the new field SYNTHESIS. We believe that this division allows user to specify the region to be searched for more precisely.
This field specifies the data on expression systems, e.g. strain, variant, cell line, etc. The content originates from the SOURCE of PDB. See also the description of Field Source.
Contains keywords describing the macromolecule. The content corresponds to that of KEYWDS of PDB.
Identifies the experimental technique used. This may refer to the type of radiation and sample, or include the spectroscopic or simulation technique. The content originates from the EXPDTA of PDB.
Indicates the names of the experts responsible for the contents of the entry and corresponds to the AUTHOR field in PDB.
Indicates the reference to original publication that describes the experiment and defines the coordinate set. Its content originates from the JRNL field of PDB.
Contains the list of authors of the paper cited or contribution to a larger work. Its content originates from the JRNL field of PDB.
Specifies the title of the reference and is used for the title of a journal article, chapter, or part of a book. Its content originates from the JRNL field of PDB.
Contains name of the publication. Its content originates from the JRNL field of PDB.
Contains the volume of the publication. Its content originates from the JRNL field of PDB.
Indicates the year of the publication. Its content originates from the JRNL field of PDB.
Lists important publications related to the structure described in the entry. These citations are chosen by the depositor. The content originates from the REMARK 1 of PDB.
Derived from REMARK 2 in the PDB file. No resolution is given for NMR structures and models. The field indicates the highest resolution in Angstroms used in building the model.
Indicates the number of chains in the entry, calculated from the SEQRES field of PDB.
Specifies the lengths of the chains in the entry, calculated from the data contained in the SEQRES of PDB.
Indicates the number of helices in the entry and is derived from the MASTER field of PDB.
Indicates the number of beta-sheet structures in the entry and is derived from the MASTER field of PDB.
Specifies the number of DNA/RNA strands in the entry, calculated from the SEQRES field of PDB.
Indicates the number of protein chains, calculated from the SEQRES field of PDB.
Indicates the number of unusual residues, such as prosthetic groups, inhibitors, solvent molecules, and ions, supplemented with their coordinates. The data are calculated from the HET field of PDB.
Gives the chemical name and the synonyms of unusual residues, such as prosthetic groups, inhibitors, solvent molecules, and ions, supplemented with their coordinates. The data are calculated from the HETNAM and HETSYN fields of PDB.
Links to EMBL Data Bank through SWISS-PROT.
For example, we find a SWISS-PROT entry with references to both PDB and EMBL entries. In this case, we consider that the PDB and EMBL entries are linked.
Links to PIR Data Bank through SWISS-PROT.
Links to SWISS-PROT Data Bank as its entries contain references to PDB.
Links to Transfac Data Bank through SWISS-PROT.
For example, we find a SWISS-PROT entry with references to both PDB and TRANSFAC entries. In this case, we consider that the PDB and TRANSFAC entries are linked.
Links to TRRD Data Bank.