Contents

 

Theoretical background. 1

Base (or direct) readout: FGDR concept. 1

Shape (or indirect) readout: Physicochemical and conformational DNA descriptors (PCD)  1

Physicochemical parameters (PCP). 1

Conformational parameters (CP). 1

DRV visualisation concept 1

DNA PLOTTER. 1

MOTIF PLOTTER. 1

Position Frequency Matrix (PFM). 1

IUPAC. 1

TRANSFAC database. 1

JASPAR database. 1

UniPROBE database. 1

DNA Logo. 1

INTERFACE PLOTTER. 1

PDB database. 1

X-Ray Crystallography. 1

NMR spectroscopy. 1

Hydrogen bonds. 1

Hydrophobic contacts. 1

3D view of molecular structures. 1

External software. 1

References. 1

 

 

Theoretical background

 

 

Base (or direct) readout: FGDR concept

 

 

Fig. 1. The schematic structure of DNA

(Carl Ivar Branden, John Tooze Introduction to Protein Structure (1999) Garland Science)

 

 

 
The main elements of DNA structure are the two helical sugar-phosphate backbones located at the outer part of the macromolecule and the planar nucleotide pairs connecting the two backbones in the central part (Fig. 1). The entire structure resembles a very long winding staircase. The sugar-phosphate backbone performs no variability along the molecule so the specificity features of a particular DNA segment depend only on the local base pair composition. The asymmetrical attachment of the base pairs to the sugar rings of the

 

 
backbone results in two different grooves on the molecular surface with various width and depth (major and minor groove). The plane base pairs are packed tightly onto each other so only their edges are accessible from the outer environment. These edges form the floors of the two grooves, and are carrying different chemical functional groups. Most of the functional groups in the DNA grooves are hydrogen bond donors or acceptors suitable for establishing hydrogen bonds with the interacting protein or the surrounding water molecules. The edges of base pairs in DNA major groove are wider than those in the minor groove (Fig. 2).

 

 

Fig. 2. Topological orientation of base pairs within the DNA

(Carl Ivar Branden, John Tooze Introduction to Protein Structure (1999) Garland Science)

 

There are four different base pairs in the DNA, all having diverse hydrogen bond donor and acceptor pattern based on the position and the chemical nature of the functional groups located on their groove forming edges. In addition to the nitrogen and oxygen containing functional groups, there are two additional pattern forming molecular entities that can affect the specificity DNA-protein interaction: the H atom at the fifth position of cytosine and the methyl group at the fifth position of thymine nucleotides. In this document we collectively call all these pattern forming chemical entities as functional groups. In Fig. 3 the functional groups of the major groove are denoted by W1, W2, W1’, and W2’, while the functional groups of the minor groove are marked by S1, S2 and S1’

Fig. 3. Functional group composition of the different base pairs

(Carl Ivar Branden, John Tooze Introduction to Protein Structure (1999) Garland Science)

 

If the different hydrogen bond donors and the different hydrogen bond acceptors are considered as functionally equivalent, a simplified description of the chemical characteristics of a sequence stretch can be constructed as shown in Fig. 4.

 

 

Fig. 4. Functional group based representation of DNA

 

 

In base readout process of DNA the specificity determining points of a DNA motif participating in a sequence specific DNA-protein interaction are exclusively determined by the functional groups located at the edge of the nucleotide pairs. A vast majority of these specificity determining points can be found in the major groove. The importance of these functional groups in the establishment of DNA-protein interactions were proved directly by determining the molecular structure of numerous DNA-protein complexes and by studies where these chemical structures were experimentally altered1. Nature itself provides supportive indications about the importance of base pair functional groups by the “biological utilization” of functional group addition and removal in the process of nucleotide methylation. The presence or absence of a methyl group in a given sequence position on cytosine or adenine can drastically alter the protein binding characteristics of the DNA sequence involved2,3. From the perspective of protein binding, base pairs can be considered as a complex collection of “chemical tools” where the individual tools are the different functional groups facing in the major or minor groove.

 

Different base pairs that appeared in the classical nucleotide-based delineation of DNA as individual entities show a more complex appearance in the new functional group based description. The new representation unambiguously reflects that the different base pairs are armed by a partially identical and partially different set of functional groups. The binding of distinct proteins to DNA may be realized by different subsets of functional groups existing on a given base pair. Regarding this in one case a base pair is analogous to another since both of them possess the subset of functional groups necessary for binding a particular protein, while in another case when a distinct subset is required for anchoring a different protein a third base pair can be considered as a functional equivalent.

 

 

Shape (or indirect) readout: Physicochemical and conformational DNA descriptors (PCD)

 

In addition to the base readout procedures of DNA controlled by the intermolecular hydrogen bond patterns, sequence specific DNA recognition is also affected by the so called DNA shape (or indirect) readout mechanisms. Shape readout involves DNA properties that influence the steric compatibility of the interacting molecular partners, like shape, flexibility, or stability4. One potential approach for modelling shape readout features is based on the utilization of the properties of smaller di or trinucleotide building blocks. There are dozens of molecular parameters that were experimentally determined or calculated for DNA base pair dimer or trimer units. These numerical attributes of the DNA can basically be categorized into two classes. The physicochemical parameters (PCP) are defined based on physical or chemical measurements or calculations, while the conformational parameters (CP) are mostly deduced from molecular modelling or from statistical investigation of the existing 3D DNA structures.

 

As a consequence of the very special and highly ordered structure of the DNA, the parameter values of the predefined molecular units are summarized along the double helix, generating a unique pattern that can be used as a numerical fingerprint of the particular sequence region. In DRV this numerical fingerprint is produced by utilizing a sliding window algorithm and visualised by a stack of coloured stripes. By increasing the sliding window size an additional averaging layer is initiated, generating calculated parameter values that represents longer sequence fragments (Fig. 5). The size of the window used for PCD calculations can be set in the appropriate part of the input forms. The PCD pattern of the DNA recognition motifs affects the shape readout mechanisms, thus becoming an important specificity factor. Having a synchronized and comprehensive view about the FGDR and PCD patterns, opens a new way to decipher the specificity factors involved in the DNA-protein recognition process. To achieve this goal, the visualisation concept worked out in DRV offers a coordinated view about the FGDR and PCD patterns of the sequences of interest.

 

Fig. 5. Colour plotting of di or trinucleotide physicochemical and conformational DNA descriptors (PCD) with or without sliding window averaging. Sliding window averaging is working in the similar way for pentanucleotide based PCDs.

 

Physicochemical parameters (PCP)

 

The base pair dimers, trimers or other oligomers composing the DNA chain are individual molecular entities that can be distinguished by several physicochemical parameters (PCP). The PCPs can be summarized along the DNA, generating different numerical descriptions of the particular DNA segment. Most of the PCP parameters used by DRV derived from the related scientific literature, however one of them Minor Groove Electrostatic Potential is calculated on the fly by using DNAShapeR Bioconductor package5. This package uses a sliding pentamer window to derive the structural features from all-atom Monte Carlo simulations. The list of PCPs available in DRV is redundant in the sense that, there are several items describing the same phenomena using different approaches like for example SantaLucia dS, Sugimoto dS and Breslauer dS.

 

 

 

List of dinucleotide physicochemical descriptors employed in DRV:

 

 

Base stacking
Stacking energies between the complementary pairs of a dimer are calculated as a function of the rotational angle and separation distance.
Ornstein RL, Rein R, Breen DL, Macelroy RD: An optimized potential function for the calculation of nucleic acid interaction energies. Biopolymers 1978, 17(10):2341-2360.

Dinucleotide GC Content
Vlahovicek K, Kajan L, Pongor S: DNA analysis servers: plot.it, bend.it, model.it and IS. Nucleic Acids Res 2003, 31(13):3686-3687.

Duplex stability (free energy)
Regions with low free energy content will be more stable than regions with high thermodynamic energy content.
Sugimoto N, Nakano S, Yoneyama M, Honda K: Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res 1996, 24(22):4501-4505.

Duplex stability (disrupt energy)
Regions with a high disrupt energy value will be more stable than regions with a lower energy value.
Breslauer KJ, Frank R, Blocker H, Marky LA: Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A 1986, 83(11):3746-3750.

DNA denaturation
DNA regions with a low value are more likely to denaturate than regions with a higher value.
Blake R: Encyclopedia of molecular biology and molecular medicine. New York: Wiley; 1996.

Breslauer dG
Calculations were done by using the nearest-neighbour data to predict transition enthalpies and free energies for a series of DNA oligomers. These predicted values are in excellent agreement with the corresponding values determined experimentally.  Delta G degree (free energy) predicts the duplex stability.
Breslauer KJ, Frank R, Blocker H, Marky LA: Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A 1986, 83(11):3746-3750.

Breslauer dH
Calculations were done by using the nearest-neighbour data to predict transition enthalpies and free energies for a series of DNA oligomers. These predicted values are in excellent agreement with the corresponding values determined experimentally. Delta H degree (enthalpy) predicts the melting behaviour of the DNA.
Breslauer KJ, Frank R, Blocker H, Marky LA: Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A 1986, 83(11):3746-3750.

Breslauer dS
Calculations were done by using the nearest-neighbour data to predict transition enthalpies and free energies for a series of DNA oligomers. These predicted values are in excellent agreement with the corresponding values determined experimentally (Delta S: entropy).
Breslauer KJ, Frank R, Blocker H, Marky LA: Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci U S A 1986, 83(11):3746-3750.

Electron interaction
Vlahovicek K, Kajan L, Pongor S: DNA analysis servers: plot.it, bend.it, model.it and IS. Nucleic Acids Res 2003, 31(13):3686-3687.

Hartman trans free energy
Calculated B-DNA to Z-DNA transition free energies.
Hartmann B, Malfoy B, Lavery R: Theoretical prediction of base sequence effects in DNA. Experimental reactivity of Z-DNA and B-Z transition enthalpies. J Mol Biol 1989, 207(2):433-444.

Polar interaction
Gromiha MM, Ponnuswamy PK: Hydrophobic distribution and spatial arrangement of amino acid residues in membrane proteins. Int J Pept Protein Res 1996, 48(5):452-460.

SantaLucia dG
Thermodynamic measurement results of 23 oligonucleotides together with data for 21 oligonucleotides from the literature were used to get improved nearest-neighbour parameters.
SantaLucia J, Jr., Allawi HT, Seneviratne PA: Improved nearest-neighbour parameters for predicting DNA duplex stability. Biochemistry 1996, 35(11):3555-3562.

SantaLucia dH
Thermodynamic measurement results of 23 oligonucleotides together with data for 21 oligonucleotides from the literature were used to get improved nearest-neighbour parameters.
SantaLucia J, Jr., Allawi HT, Seneviratne PA: Improved nearest-neighbour parameters for predicting DNA duplex stability. Biochemistry 1996, 35(11):3555-3562.

SantaLucia dS
Thermodynamic measurement results of 23 oligonucleotides together with data for 21 oligonucleotides from the literature were used to get improved nearest-neighbour parameters.
SantaLucia J, Jr., Allawi HT, Seneviratne PA: Improved nearest-neighbour parameters for predicting DNA duplex stability. Biochemistry 1996, 35(11):3555-3562.

Stability
Melting profiles were calculated for restriction fragments of ϕX174 and fd phage DNAs and compared with experimental profiles. The algorithm of Fixman and Freire was slightly modified so that a stability parameter was assigned not to a base pair but to each nearest-neighbour doublet. Stabilities of the 10 kinds of nearest‐neighbour doublets were estimated by fitting the calculated profiles to the observed ones.
Gotoh O TY: Stabilities of nearest neighbour doublets in double helical DNA determined by fitting calculated melting profiles to observed profiles. Biopolymers 1980, 20(5):1033-1042.

Stacking energy
Dinucleotide base stacking energy represents how easily parts of the DNA de-stack. A high peak for this value represents an unstable region while a low peak represents more stable region.
Ornstein RL, Rein R, Breen DL, Macelroy RD: An optimized potential function for the calculation of nucleic acid interaction energies. Biopolymers 1978, 17(10):2341-2360.

Sugimoto dG
To improve the previous DNA/DNA nearest-neighbour parameters, thermodynamic parameters (deltaH degrees, deltaS degrees and deltaG degrees) of 50 DNA/DNA duplexes were measured.
Sugimoto N, Nakano S, Yoneyama M, Honda K: Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res 1996, 24(22):4501-4505.

Sugimoto dH
To improve the previous DNA/DNA nearest-neighbour parameters, thermodynamic parameters (deltaH degrees, deltaS degrees and deltaG degrees) of 50 DNA/DNA duplexes were measured.
Sugimoto N, Nakano S, Yoneyama M, Honda K: Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res 1996, 24(22):4501-4505.

Sugimoto dS
To improve the previous DNA/DNA nearest-neighbour parameters, thermodynamic parameters (deltaH degrees, deltaS degrees and deltaG degrees) of 50 DNA/DNA duplexes were measured.
Sugimoto N, Nakano S, Yoneyama M, Honda K: Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes. Nucleic Acids Res 1996, 24(22):4501-4505.

Watson-Crick interaction
Lewis JP, Sankey OF: Geometry and energetics of DNA base pairs and triplets from first principles quantum molecular relaxations. Biophys J 1995, 69(3):1068-1076.

 

 

 

List of trinucleotide physicochemical parameters employed in DRV:

 

 

Trinucleotide GC Content
Vlahovicek K, Kajan L, Pongor S: DNA analysis servers: plot.it, bend.it, model.it and IS. Nucleic Acids Res 2003, 31(13):3686-3687.

 

 

List of pentanucleotide physicochemical descriptors employed in DRV:

 

 

Minor Groove Electrostatic Potential (DNAShapeR)

DNA descriptor derived from Monte Carlo stimulation based pentamer data.

Tsu-Pei Chiu, Satyanarayan Rao, Richard S. Mann, Barry Honig and Remo Rohs: Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding. Nucleic Acids Res 2017,45(21):12565–12576

 

 

Conformational parameters (CP)

 

Base pairs of the DNA double helix can adopt conformations slightly differ from ideal, and the internal position of the connected nucleotides within a base pair can also be unique in some extent. These special base positions can be described by different distance and angle parameters. The geometric features of a base pair affect its neighbours, and local conformation generated in this way is characteristic for the given sequence, seriously influencing its functional properties. The relative positions of two neighbouring base pairs compared to each other can be deviate from the ideal structure in several ways. These positional deviations can be classified according to the axes of the three dimensional space (Fig.6.). In addition to the torsion parameters mentioned above, there are other conformation variables describing further properties of DNA such as the transition probability from one global DNA conformation to the other, or the depth of the different DNA grooves. Most of the conformational parameters parameter used by DRV derived from the related scientific literature, however some of them are calculated on the fly by using DNAShapeR Bioconductor package6,7. This package uses a sliding pentamer window to derive the structural features from all-atom Monte Carlo simulations. The list of PCPs available in DRV is redundant in the sense that, there are several items describing the same phenomena using different approaches ( Shift and DSR_Shift, Tilt and DSR_Tilt, Rise and DSR_Rise, etc.).

 

 

 

Fig.6. Different torsion arrangements of single base pairs or neighbouring base pair duos

(Jinsen Li et al. Nucleic Acids Res 2017, 45(22):12877–12887)

 

 

 

List of dinucleotide conformational descriptors employed in DRV:

 

 

Protein induced deformability
With this property a larger value reflects a more deformable sequence while a smaller value indicates a region where the DNA helix is less likely to be changed dramatically by proteins.
Olson WK, Gorin AA, Lu XJ, Hock LM, Zhurkin VB: DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci U S A 1998, 95(19):11163-11168.

B-DNA twist
Structures with a low twist region appear to unwind in response to steric clashes of large exocyclic groups in the major and minor grooves and those with high twist values are subject to lesser contact
Gorin AA, Zhurkin VB, Olson WK: B-DNA twisting correlates with base-pair morphology. J Mol Biol 1995, 247(1):34-48.

Propeller twist
The dinucleotide propeller twist is the value for the flexibility of the helix. Low values indicate flexible areas whereas high values indicate rigid areas.
el Hassan MA, Calladine CR: Propeller-twisting of base-pairs and the conformational mobility of dinucleotide steps in DNA. J Mol Biol 1996, 259(1):95-103.

Bending stiffness
High values correspond to DNA regions that are more rigid, while low values correspond to regions that will bend more easily.
Sivolob AV, Khrapunov SN: Translational positioning of nucleosomes on DNA: the role of sequence-dependent isotropic DNA bending stiffness. J Mol Biol 1995, 247(5):918-931.

Protein DNA twist
High peak values are more likely to be deformed by proteins than regions with a lower peak value.
Olson WK, Gorin AA, Lu XJ, Hock LM, Zhurkin VB: DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc Natl Acad Sci U S A 1998, 95(19):11163-11168.

Stabilising energy of Z-DNA
Stretches of DNA with low values are more likely to form Z-DNA than a high-value region.
Ho PS, Ellison MJ, Quigley GJ, Rich A: A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences. EMBO J 1986, 5(10):2737-2744.

Aida BA transition
The stacking interaction energies between nucleic acid bases in A DNA and B DNA are calculated by means of the ab initio molecular orbital method.
Aida M: An ab initio molecular orbital study on the sequence-dependency of DNA conformation: an evaluation of intra- and inter-strand stacking interaction energy. J Theor Biol 1988, 130(3):327-335.

Helix-Coil transition
Thermodynamic changes that accompany helix-to-coil transitions were characterized by combination of calorimetric and volumetric techniques.
Chalikian TV, Volker J, Plum GE, Breslauer KJ: A more unified picture for the thermodynamics of nucleic acid duplex melting: a characterization by calorimetric and volumetric techniques. Proc Natl Acad Sci U S A 1999, 96(14):7853-7858.

Ivanov BA transition
Ivanov VI KD, Shchyolkina AK, Chernov BK, Minchenkov LE.: Decimal code controlling the B to A transition of DNA. J Biomol Struct Dynamics 1995, 12:105-108.

Lisser BZ transition
The B to Z transition tendencies for the dinucleotides have been determined by experimental measurements and by theoretical considerations.
Lisser S, Margalit H: Determination of common structural features in Escherichia coli promoters by computer analysis. Eur J Biochem 1994, 223(3):823-830.

Sarai flexibility
Thermodynamic measurement results of 23 oligonucleotides together with data for 21 oligonucleotides from the literature were used to get improved nearest-neighbour parameters.
Sarai A, Mazur J, Nussinov R, Jernigan RL: Sequence dependence of DNA conformational flexibility. Biochemistry 1989, 28(19):7842-7849.

Twist
Dinucleotide flexibility information was obtained by molecular dynamics simulations of different DNA duplexes.
Goni JR, Perez A, Torrents D, Orozco M: Determining promoter location based on DNA structure first-principles calculations. Genome Biol 2007, 8(12):R263.

Tilt
Dinucleotide flexibility information was obtained by molecular dynamics simulations of different DNA duplexes.
Goni JR, Perez A, Torrents D, Orozco M: Determining promoter location based on DNA structure first-principles calculations. Genome Biol 2007, 8(12):R263.

Roll
Dinucleotide flexibility information was obtained by molecular dynamics simulations of different DNA duplexes.
Goni JR, Perez A, Torrents D, Orozco M: Determining promoter location based on DNA structure first-principles calculations. Genome Biol 2007, 8(12):R263.

Shift
Dinucleotide flexibility information was obtained by molecular dynamics simulations of different DNA duplexes.
Goni JR, Perez A, Torrents D, Orozco M: Determining promoter location based on DNA structure first-principles calculations. Genome Biol 2007, 8(12):R263.

Slide
Dinucleotide flexibility information was obtained by molecular dynamics simulations of different DNA duplexes.
Goni JR, Perez A, Torrents D, Orozco M: Determining promoter location based on DNA structure first-principles calculations. Genome Biol 2007, 8(12):R263.

Rise
Dinucleotide flexibility information was obtained by molecular dynamics simulations of different DNA duplexes.
Goni JR, Perez A, Torrents D, Orozco M: Determining promoter location based on DNA structure first-principles calculations. Genome Biol 2007, 8(12):R263.

Major Groove Width
Parameter was calculated using molecular modelling techniques.

Karas, H. ; Knuppel, R. ; Schulz, W. ; Sklenar, H. ; Wingender, E. Combining structural analysis of DNA with search routines for the detection of transcription regulatory elements. Comput. Appl. Biosci. 1996, 12 Nr. 5, 441-446

Major Groove Depth 
Parameter was calculated using molecular modelling techniques.

Karas, H. ; Knuppel, R. ; Schulz, W. ; Sklenar, H. ; Wingender, E. Combining structural analysis of DNA with search routines for the detection of transcription regulatory elements. Comput. Appl. Biosci. 1996, 12 Nr. 5, 441-446

Major Groove Size 
Parameter calculation was based on the exocyclic groups interactions in the grooves.

Gorin, A. A. ; Zhurkin, V. B. ; Olson, W. K. B-DNA Twisting Correlates with Base-pair Morphology J. Mol. Biol. 1995, 247, 34-48

Major Groove Distance 
Parameter calculation was based on the exocyclic groups interactions in the grooves.

Gorin, A. A. ; Zhurkin, V. B. ; Olson, W. K. B-DNA Twisting Correlates with Base-pair Morphology J. Mol. Biol. 1995, 247, 34-48

Minor Groove Width
Parameter was calculated using molecular modelling techniques.

Karas, H. ; Knuppel, R. ; Schulz, W. ; Sklenar, H. ; Wingender, E. Combining structural analysis of DNA with search routines for the detection of transcription regulatory elements. Comput. Appl. Biosci. 1996, 12 Nr. 5, 441-446

Minor Groove Depth
Parameter was calculated using molecular modelling techniques.

Karas, H. ; Knuppel, R. ; Schulz, W. ; Sklenar, H. ; Wingender, E. Combining structural analysis of DNA with search routines for the detection of transcription regulatory elements. Comput. Appl. Biosci. 1996, 12 Nr. 5, 441-446

Minor Groove Size
Parameter calculation was based on the exocyclic groups interactions in the grooves.

Gorin, A. A. ; Zhurkin, V. B. ; Olson, W. K. B-DNA Twisting Correlates with Base-pair Morphology J. Mol. Biol. 1995, 247, 34-48

Minor Groove Distance
Parameter calculation was based on the exocyclic groups interactions in the grooves.

Gorin, A. A. ; Zhurkin, V. B. ; Olson, W. K. B-DNA Twisting Correlates with Base-pair Morphology J. Mol. Biol. 1995, 247, 34-48

 

 

 

List of trinucleotide conformational descriptors employed in DRV:

 

 

Bendability (DNAse)
The trinucleotide bendability models the bendability of the DNA towards the major groove. Sections with high values are more bendable than regions with a low value.
Munteanu MG, Vlahovicek K, Parthasarathy S, Simon I, Pongor S: Rod models of DNA: sequence-dependent anisotropic elastic modelling of local bending phenomena. Trends Biochem Sci 1998, 23(9):341-347.

Bendability (consensus)
The trinucleotide bendability models the bendability of the DNA towards the major groove. Sections with high values are more bendable than regions with a low value.
Munteanu MG, Vlahovicek K, Parthasarathy S, Simon I, Pongor S: Rod models of DNA: sequence-dependent anisotropic elastic modelling of local bending phenomena. Trends Biochem Sci 1998, 23(9):341-347.

Nucleosome positioning
Nucleosome positioning model based upon observations as to which triplet sequences tend to favour locations on the concave or the convex surface of a bent double helix.
Goodsell DS, Dickerson RE: Bending and curvature calculations in B-DNA. Nucleic Acids Res 1994, 22(24):5497-5503.

Consensus-Rigid
Vlahovicek K, Kajan L, Pongor S: DNA analysis servers: plot.it, bend.it, model.it and IS. Nucleic Acids Res 2003, 31(13):3686-3687.

Dnase I
Structural parameters characterizing the bending propensity of trinucleotides were deduced from DNase I digestion data using simple probabilistic models.
Brukner I, Sanchez R, Suck D, Pongor S: Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO J 1995, 14(8):1812-1818.

Dnase I-Rigid
Structural parameters characterizing the bending propensity of trinucleotides were deduced from DNase I digestion data using simple probabilistic models.
Brukner I, Sanchez R, Suck D, Pongor S: Sequence-dependent bending propensity of DNA as revealed by DNase I: parameters for trinucleotides. EMBO J 1995, 14(8):1812-1818.

Nucleosome
NPP is a trinucleotide model that calculates the unlikeliness of the sequence being within a nucleosome. High values represent regions with a lower likelihood of nucleosome appearance.
Satchwell SC, Drew HR, Travers AA: Sequence periodicities in chicken nucleosome core DNA. J Mol Biol 1986, 191(4):659-675.

Nucleosome-Rigid
NPP is a trinucleotide model that calculates the unlikeliness of the sequence being within a nucleosome. High values represent regions with a lower likelihood of nucleosome appearance.
Satchwell SC, Drew HR, Travers AA: Sequence periodicities in chicken nucleosome core DNA. J Mol Biol 1986, 191(4):659-675.

 

 

List of pentanucleotide conformational descriptors employed in DRV:

 

 

Minor Groove Width (DNAshapeR)
DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.

Tianyin Zhou, Lin Yang, Yan Lu, Iris Dror, Ana Carolina Dantas Machado, Tahereh Ghane, Rosa Di Felice and Remo Rohs DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res 2013, 41 Web Server issue W56-W62

Helix Twist (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Tianyin Zhou, Lin Yang, Yan Lu, Iris Dror, Ana Carolina Dantas Machado, Tahereh Ghane, Rosa Di Felice and Remo Rohs DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res 2013, 41 Web Server issue W56-W62

Roll (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Tianyin Zhou, Lin Yang, Yan Lu, Iris Dror, Ana Carolina Dantas Machado, Tahereh Ghane, Rosa Di Felice and Remo Rohs DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res 2013, 41 Web Server issue W56-W62

Tilt (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

Slide (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

Shift (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.

Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

Rise (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

Propeller Twist (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Tianyin Zhou, Lin Yang, Yan Lu, Iris Dror, Ana Carolina Dantas Machado, Tahereh Ghane, Rosa Di Felice and Remo Rohs DNAshape: a method for the high-throughput prediction of DNA structural features on a genomic scale. Nucleic Acids Res 2013, 41 Web Server issue W56-W62

Buckle (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.

Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

Opening (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

Shear (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

Stretch (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.

Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

Stagger (DNAshapeR)

DNA shape descriptor derived from Monte Carlo stimulation based pentamer data.
Jinsen Li, Jared M. Sagendorf, Tsu-Pei Chiu, Marco Pasi, Alberto Perez and Remo Rohs: Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding Nucleic Acids Res 2017, 45(22):12877–12887

 

DRV visualisation concept

 

The DRV visualisation concept has four major components. The first two of these are coloured 2D displays about the FGDR and the PCD representations of the input DNA fragment. All the three modules of DRV generates different types of FGDR representations, while the PCD view appears only in the DNA Plotter and Interface Plotter outputs. The third element of the DRV visualisation is a 2D all-atom view of the DNA appears in the DNA Plotter and Interface Plotter result pages, while the fourth component is a molecular graphics output displaying the 3D structure of the DRV compatible nucleo-protein complexes within the Interface plotter module.

 

The FGDR view of the displayed DNA segment shows the functional group pattern of the major and minor grooves as a sequentially ordered stretch of coloured circles (Fig. 7). The different functional groups appear in different colours as indicated by the colouring legend at the left hand side of the panel. The colouring scheme can be customized by using the “View settings” dialogue box available through the right hand side toolbar. The basic FGDR view is modified in the Motif plotter and Interface plotter output according to the special tasks of these modules.

 

 

Fig. 7. FGDR visualisation of a DNA fragment

 

 

DRV generates a colour representation of DNA sequences according to the selected PCDs. The different parameters are represented as a stack of coloured strips aligned to the sequence in the FGDR panel at the top of the page. The PCD colour strips are generated in a way that the minimal and maximal values of all the parameters are indicated by the same colours located in the opposing extremities of the applied colour scale. The parameter values between the minimum and maximum are displayed by using proportional colour shades (Fig.8.). The PCD colouring scheme can be customized by using the “View settings” dialogue box available through the right hand side toolbar.

 

Fig.8. PCD visualisation of a DNA fragment

 

 

Although most of the specificity determining contact points of DNA considered to be in the major and minor grooves, many intermolecular hydrogen bonds and hydrophobic contact surfaces are located on the sugar-phosphate backbone of the double helix. DNA backbone may suffer sequence dependent torsions altering its local ability to participate in DNA-protein interaction, becoming thereby an indirect mediator of sequence specificity. In other DNA-protein complexes the DNA backbone contacts are irrelevant in respect of sequence specificity, but rather plays role in strengthening the DNA-protein interaction. As these functions of DNA backbone can be interesting for many researchers DRV provides an overview about backbone contacts in its 2D all-atom DNA view pages (Fig. 9) and in the 3D molecular structure display as well.

 

 

 

Fig.9. All-atom visualisation of a DNA fragment

 

 

DNA PLOTTER

 

DNA Plotter provides a comprehensive view about the direct and indirect specificity determining features of the native DNA and can be used to visualise the functional group pattern of the major and minor grooves and the physicochemical and conformational descriptor fingerprint of a DNA segment. This function can be useful for identifying the matching functional group position and the similar molecular descriptor patterns in different DNA segments.

 

DNA fragments are usually defined by a particular sequence of four characters (A,C,G,T) representing the nucleotide monomers building up one of the strands in the highly ordered double helical DNA structure. On sub-nucleotide level, however, a more complex sequence dependent molecular pattern can be observed along the DNA major and minor grooves. This functional group texture strongly influences the molecular function of the DNA as it participates in DNA-protein interactions providing donor or acceptor sites for intermolecular hydrogen bonding networks. In addition to the molecular patterns of the DNA grooves, the local conformational and physicochemical properties of DNA can also strongly influence the functional characteristics of a DNA fragment. Although the functional group pattern and the conformational and physicochemical parameters can be directly deduced from the nucleotide sequence, the conventional four letter DNA representation keep these important functional components mostly obscured. DNA plotter is suitable for the simultaneous visualisation of functional group pattern of an input DNA sequence together with numerous different conformational and physicochemical DNA parameters.

 

DNA Plotter’s output screen is composed of two distinct DNA sequence visualisation panels. The upper panel contains the FGDR, and the lower one displays the PCD representation of the input nucleotide fragment. The visualisation concept of the two panels is described in the previous section of this help document. The two panels are positioned on the output screen in an aligned manner. By hoovering the cursor above a certain position in the PCD colour strips, the actual parameter value is shown and the base pair range that was used for the calculation becomes highlighted in the FGDR panel (Fig. 10.)

 

 

Fig.10. DNA Plotter output screen

 

MOTIF PLOTTER

 

The information storage function of DNA is highly controlled, accomplished by numerous distinct nucleotide motifs mostly responsible for anchoring regulatory proteins. Protein binding motifs of DNA are often ambiguous in their sequence meaning that more than one nucleotide can be located in certain positions without compromising the function. DRV’s Motif Plotter module visualises the conventional nucleotide-based matrices in a new functional group based representation that allows to generate consensus functional group description of binding sites and to reveal the core set of functional groups essential for the sequence specific docking of the different transcription factors. These consensus patterns can then be used to predict new, hitherto unidentified motifs with potentially similar protein binding characteristics. Conventional nucleotide based PFMs (Position Frequency Matrix) or IUPAC-encoded motif definitions are accepted as input. In the Motif Plotter output the circles of the FGDR schematic view serves as pie charts indicating the occurrence probability of the different functional groups in the given position. Above the FGDR panel an aligned nucleotide sequence logo is displayed, generated by Weblogo 3 software8 (Fig. 11).

 

 

 

 

Fig. 11. Motif Plotter output screen

 

 

Position Frequency Matrix (PFM)

 

A large number of the DNA binding proteins can specifically interact with more than one different DNA sequence motifs. This phenomenon is an especially important feature of the transcription factors, the main protein components regulating the gene expression activity of the cells. Position Frequency Matrices (PFM) are widely used as consensus representations of sequence motifs (Fig. 12.). PFM is a convenient tool for describing the degree by which a sequence motif can be changed at different positions without losing its functionality. There are several specialized databases (e.g. JASPAR, TransFac, UniPROBE) that collect the recognition sequence information of transcription factors.

 

M={\begin{matrix}A\\C\\G\\T\end{matrix}}{\begin{bmatrix}3&6&1&0&0&6&7&2&1\\2&2&1&0&0&2&1&1&2\\1&1&7&10&0&1&1&5&1\\4&1&1&0&10&1&1&2&6\end{bmatrix}}.

Fig. 12. Position Frequency Matrix (PFM)

(https://en.wikipedia.org/wiki/Position_weight_matrix)

 

Related links:

Wikipedia: PWM

 

 

IUPAC

 

IUPAC is a qualitative description of sequence uncertainty (Fig.13.). It only denotes the presence of different nucleotides in the given sequence positions but provides no quantitative information about their relative ratio.

 

Fig.13. IUPAC Nucleotide ambiguity codes

Related links:

IUPAC
IUPAC recommendation 1970(Biochemical Journal)
IUPAC recommendation 1984(PNAS)
IUPAC recommendation 1984

 

 

TRANSFAC database

TRANSFAC is a proprietary database of eukaryotic transcription factors, the position frequency matrices of their recognition sequences, and the genomic positions they can be associated with. The TRANSFAC PFM format became one of the gold standard of the field, and is also accepted as input by the Motif Plotter tool of DRV.

 

JASPAR database

JASPAR is a publicly accessible repository of expert curated position frequency matrices representing DNA targeting sequence motifs for proteins involved in different DNA related functions. JASPAR has a wildly used PFM format that is accepted as input by the Motif Plotter tool of DRV. A local copy of the Core section of the JASPAR database can be searched form the DRV Motif Plotter input form, and the JASPAR PFM of interest can be directly uploaded for visualisation.

 

 

UniPROBE database

UniPROBE collects data about DNA binding specificities of proteins. UniPROBE is a database including DNA binding data generated by universal protein-binding microarray (PBM) technology. It provides information related to k-mers, position weight matrices and graphical sequence logos. PBM is a highly parallel in vitro microarray technology for high-throughput characterization of the sequence specificities of DNA-protein interactions. In PBM studies a combinatorial oligonucleotide a library is synthetized on the microarray surface and the binding of the labelled protein is monitored by fluorescent signal detection. A local copy of the UniPROBE database can be browsed form the DRV Motif Plotter input form, and the UniPROBE PFM of interest can be directly submit for visualisation. UniPROBE database is organised according to a publication-based logic. In the publications of UniPROBE databese different algorithms (Seed and Wobble and BEEML) are used for processing the PBM data resulting in different PFM formats. DRV can display both formats, however if Seed and Wobble database entries contains more than one PFMs only the one with the highest enrichment score is displayed9–11.

 

 

DNA Logo

 

The sequence logo is an intuitive graphical representation of the multiple alignment of generally short nucleotide sequences, reflecting the conservation of nucleotides in different sequence positions. There are different sequence logo types. In the information logo the vertical axis represents the information content of the individual sequence positions in bits. The maximal value is two, and the uneven height of the letter stack is proportional with the conservation of the particular sequence position (Fig. 14, A). Error bars can also be chosen to be plotted to the top of the letter stacks indicating the Bayesian 95% confidence intervals (Fig. 14, B). In frequency logo the overall height of the letter stack symbolising the nucleotide composition in different sequence position is even throughout the entire logo, and the extent of conservation is indicated by the size ratio of the different letters (Fig. 14, C). Sequence logos appearing in DRV were generated by WebLogo 3 software.

 

 

 

A

B

C

Fig. 14. Sequence logo types:

A)      Information logo, B) Information logo with error bars, C) Frequency logo

 

 

 

INTERFACE PLOTTER

 

Atomic level investigation of interacting molecular interfaces is essential for revealing the mechanisms behind the sequence specific DNA-protein recognition. There are thousands of DNA-protein structures deposited in the PDB database, which is an exhaustive source of structural information for studies aiming to decipher the specificity determining positions of DNA-protein interacting surfaces. However, the complexity of the 3D structure of macromolecules makes it very difficult to recognise the meaningful patterns and regularities by eyes, even via using special computer tools developed for molecular visualisation. One possible way to overcome this problem is to generate simplified representations of DNA motifs schematically displaying the molecular interaction network connecting the two macromolecules to each other. The Interface Plotter tool of DRV achieves this goal by projecting the intermolecular hydrogen bond pattern onto a schematic view of the chemical functional group texture characteristic for the major and minor grooves of the interacting DNA segment. In addition, coordinated display of colour stripes representing an extensive set of physicochemical and structural DNA descriptors helps to identify the corresponding components of base and shape readout mechanisms (Fig. 15.). The Interface Plotter module also provides realistic 3D views preconfigured for the visualisation of the functional elements participating in the establishment of the DNA-protein contact interfaces. The 3D viewer function of DRV is described in details in a separate section below.

 

 

 

Fig. 15. Interface Plotter output screen

 

 

Please consider that the Interface Plotter rather displays the total collection of the potential intermolecular H-bonding possibilities than a definite H-bonding state. According to the visualisation concept of DRV all the potential DNA-protein intermolecular hydrogen bonds conforming the defined geometrical H-bond prediction criteria are displayed. DRV doesn’t investigate if a hydrogen bond interferes with another one, or if they can simultaneously coexist or not. This is also true for hydrogen bonds of the water bridges. In this case all the potential hydrogen bond pair combinations of a water molecule connecting the interacting macromolecular partners are shown as separate water-bridges. DRV neither investigates if they can coexist or not, nor if there are any other factors affecting the probability of their establishment.

 

 

PDB database

 

Protein Data Bank Archive (PDB) is the major repository of macromolecular three-dimensional data which is managed by wwPDB, an organization founded by the Research Collaboratory for Structural Bioinformatics Protein Database (RCSB), the Protein Data Bank in Europe (PDBe), the Protein Data Bank Japan (PDBj) and the Biological Magnetic Resonance Data Bank (BMRB). All the three-dimensional DNA-protein structure data processed and visualised by DRV are derived from PDB. PDB files are filtered to have a specific dataset compatible with DRV visualisation concept and optimized for getting the most reliable information for graphical presentation. The current version of DRV database contains 4250 pre-processed DNA-protein structure files originated from PDB database. 2155 of the 4250 structure files are compatible with the visualisation concept of the DRV server.

 

The reasons why PDB structures can be excluded from the DRV visualisation pipeline are listed below (Fig. 16).

 

Fig. 16. List of possible DRV incompatibility situations

 

Custom DNA-protein structure files can also be uploaded to DRV for visualisation. DRV requires a "clean" Brookhaven (PDB) file, where all the atoms are accurately named and ordered. The background software tools processing the uploaded PDB files are expected to work with most "uncleaned" Brookhaven files as well, but it is advisable to run some utility program before uploading, to clean up the PDB file. Before uploading please ensure that the structure is complete, having correct and consistent information in every respect. Please note that DRV was developed especially for the visualisation of DNA-protein complexes that contain a regular double helical DNA structure. Only those PDB files are processed by DRV where two complementer DNA strands are present, each with a unique chain ID, and both nucleotide sequences are listed as SEQRES records. The maximum size of the PDB file allowed to upload is 100 MB.

X-Ray Crystallography

 

X-Ray Crystallography (XRC) is the leading methodology for the determination of macromolecular 3D structures, which acquire the structural information from the crystallized form of the investigated macromolecules. Within the crystals the macromolecules are organized into a structured hierarchical system resulting a regular diffraction image upon X-ray beam exposition. The unit cell of the crystal, and then the entire crystal can be build up from the so called asymmetric units. Although this is the smallest compartment that can be used to produce the crystal by using the standard symmetry and transformation operations, asymmetric units often contain more than one equivalent copies of macromolecular assemblies. These macromolecular assemblies are referred as biological units (or biological assemblies) and always hold the same set of macromolecular chains. The computer files deposited in structural databases contain the conformational data of an asymmetric unit that often consists of more than one biological unit. The biological units within an asymmetric unit might differ from each other. These variations mostly affect the flexible parts of the macromolecular structures. DRV can independently visualise the intermolecular DNA-protein hydrogen bonding networks of the different biological units, allowing the users to get an impression on what extent the dynamic nature of the given structure might contributes to the sequence specific DNA recognition process.

 

Most of the macromolecular structures determined by XRC has an approximate resolution around 2 Angstrom. This resolution is generally inadequate for detecting the hydrogen atoms of the macromolecules, the ligands or the water molecules around them. Most of the XRC structures are deposited without the hydrogen atoms, and can later be supplemented with hydrogen atoms by using computational tools.

 

In a subset of the XRC determined molecular structures some atoms could not be completely resolved so alternate atomic locations may appear in the PDB data files. This means that part of the molecules within the crystal differ in the exact position of the particular atom, and in concordance with how often they are found in different locations "partial occupancies" are assigned for them. These alternate locations often reflect the flexibility of the monomers, or bigger segments of the macromolecules. PDB files record this information by an AltLoc indicator. In PDB data files the distinct locations of an atom are distinguished by letter marks (A, B, C,…). Many of the structural biology software tools deal only with one of the AltLoc atom locations, generally the one marked with letter A or with the most frequent one. Since the potential molecular flexibility can be an important component of the DNA-protein recognition, DRV explicitly visualise the AltLoc related structural and hydrogen bonding properties which may appear in the output as shown in Fig. 17 and Fig. 18.

 

 

 

Fig. 17. Indication of AltLoc information in DRV visual reports

 

 

Fig. 18. Indication of AltLoc information in Bond Info popup table

 

 

NMR spectroscopy

               

NMR spectroscopy is one of the most important methodologies used to obtain structural information of molecular systems12. Although NMR is mostly used for investigation of small molecules, it is also suitable for macromolecular structure determination. In contrast to XRC, NMR can record structural information in solution that is much closer to the physiological condition of proteins than the crystal environment. An important advantage of NMR is that it is suitable for the investigation of the dynamics feature of macromolecules. The major bottleneck of macromolecular NMR studies is that the maximal size of target molecules that can be resolved in a routine NMR structure determination process is around 25 kDa. Usually only a part of the hydrogen positions is established during NMR spectroscopic investigations, and the remaining hydrogens are inserted computationally during the NMR data processing workflow. The majority of NMR models submitted to the PDB contain most of the hydrogen atoms that occur in the investigated macromolecule.

 

The final part of the NMR structure determination workflow consists of computational modelling steps, where the macromolecules are folded in silico into conformations compatible with the inter-atomic restraints determined during the first experimental phase. The restraints are generally compatible with several different macromolecular conformations so the output of an NMR measurement often contains an ensemble of multiple molecular models. The number of the models in an NMR ensemble deposited into the molecular structure databases depends on the authors and usually varies between 1 and 100 with an average of 20. The different models in the ensemble shows a certain level of structural variation that might reflect either the flexibility and the thermal motion of the investigated structure or just the uncertainty of the atomic position resulted by the insufficient number of available conformational restraints. Since the flexibility and the dynamic nature of the intermolecular interface can be an important specificity factor, DRV provides opportunity the convenience to visualise the hydrogen bonding pattern of the different NMR models either individually or as the part of an animated slideshow.

 

 

Hydrogen bonds

 

Hydrogen bond is a relatively weak dipole-dipole type of connection, playing essential role in establishing and stabilizing the conformation of macromolecules. Hydrogen bonds have important function in the ligand binding process and in the mechanisms leading to the formation of macromolecular interactions. Hydrogen bonds are always formed between two electronegative atoms, one bonded to a hydrogen and the other having a lone pair of electrons. The strength of the hydrogen bonds can vary within a range, however it is always stronger than a dipole-dipole contact, and always weaker than a covalent bond. The hydrogen bonds play important role in the DNA-protein recognition process. Some of the hydrogen bonds are formed between the protein and the sugar phosphate backbone of the DNA, while other hydrogen bonds are located in the major or minor grooves of the double helix. As the backbones are invariant along the DNA the former contacts can have a role solely in strengthening of the DNA-protein interactions, while hydrogen bonds formed with the chemically very diverse surface of the DNA grooves allows the establishment of a very sophisticated specificity controlling molecular mechanism. In addition to the direct DNA-protein hydrogen bonds, water molecules can bind the two macromolecules together by forming simultaneous hydrogen bonds with both interacting partners. These indirect water mediated DNA-protein connections are known as water bridges, and since the water molecules can theoretically donate or accept up to four hydrogen bonds at the same time, a very complex hydrogen bonding network can be built up around the water molecules located in the contact interfaces. It has to be mentioned that presumably not all the potential hydrogen bonds exist at the same time and the intermolecular hydrogen bond network connecting the DNA and the protein together is continuously changing and is dynamically reorganized13. The part of sequence specific DNA recognition established by the major and minor groove hydrogen bonding pattern is often referred as base (or direct) DNA readout14.

 

The hydrogen bonds can only be formed if the geometric parameters of the participating atomic components meet some requirements. These geometric parameters have to be within some distance and angle constraints, out of which the hydrogen bonds cannot be established. Such geometric considerations provide the theoretical background of software tools designed for supplementing the deposited macromolecular structures with the missing hydrogens and for predicting the missing hydrogen bonds. There are several tools for predicting hydrogen bonds like HBPLUS15, HBAT16 or pyrHBfind17. HBPLUS a very popular and widely used hydrogen bond calculation software developed in the European Bioinformatics Institute (Hinxton, UK), was integrated into the data processing pipeline of DRV. HBPLUS performs geometric-based hydrogen atom positioning and hydrogen bond prediction. For defining a hydrogen bond connection 5 parameters have to be met with the predefined requirements. These parameters are shown in Fig.19.

 

Image result for hbplus angles distances

Fig. 19. Geometric parameters of the HBPLUS hydrogen bond prediction

(Zho H et al. 2008 PLoS ONE 3(4): e1926.)

 

 

DistMaxDA: Maximum possible distance of the H-bond Donor and Acceptor atoms. [default value: 3.9Å]

DistMaxHA: Maximum possible distance of the H atom and the Acceptor atom. [default value: 2.5 Å]

AngleMinDHA: Minimum possible angle defined by the Donor-Hydrogen-Acceptor atom triplet. [default value: 90o]

AngleMinHAA: Minimum possible angle defined by the Hydrogen-Acceptor-Acceptor Antecedent atom triplet. [default value: 90o]

AngleMinDAA: Minimum possible angle defined by the Donor-Acceptor-Acceptor Antecedent atom triplet. [default value: 90o]

 

There is another geometric constraint used for predicting the water bridge connections:

 

DistMinWater: Minimum possible distance of water bridge. Distance of the water bridge means the sum of the length of the two hydrogen bonds involved. [default value: 5 Å]

 

The parameters listed here can be used to fine tune the hydrogen bond prediction of the Interface Plotter DRV module. The default values of the parameters representing the maximum values, that according to our current knowledge can rationally be applied for hydrogen bond predictions15 18. The default values result in the highest number of predicted hydrogen bonds, and constricting these parameters will reduce this number.

 

 

In the FGDR panel of the DRV Interface plotter tool the different types of hydrogen bonds and water bridges are indicated as shown in Fig. 20.

 

 

Fig.20. Visualisation of different type of hydrogen bond types in the graphical output of DRV Interface Plotter module

 

Hydrophobic contacts

Hydrophobic interactions based on the phenomenon that apolar structures trend to attach to each other in aqueous solution preserving in this way as much as possible the water H-bonding network of the surrounding environment. Hydrophobic surfaces that are close enough to each other trend to exclude water molecules from the surrounded volume increasing thereby the system entropy. Such hydrophobic surface contacting events often occur during the DNA-protein recognition process. In addition to direct H-bond contacts and water bridges hydrophobic interactions can also be important component of the base readout mechanism. The sequence specific hydrophobic interactions are located on the apolar structures of the major and minor grooves like the C7 methyl group of thymine (major groove), the C5 carbon atom of cytosine(major groove) or on the C2 carbon atom of adenine (minor groove), however hydrophobic contacts can also appear on the invariant sugar components of the DNA backbones. The contact strength of an individual hydrophobic interaction is much less than the strength of a hydrogen bond, however the relative abundance of hydrophobic contacts might results in a considerable attraction force. In DRV hydrophobic contacts are indicated in the 2D FGDR, all-atom DNA and in the 3D representations of the investigated structure (Fig. 21). The applied conditions for detecting hydrophobic contacts are as follows: I) both partners are carbon residues, II) the distance between them are less than the value defined by DistMax HpC parameter (default: 4 Å). The atomic distances are calculated by HBplus software.

 

 

 

Fig.21. Hydropobic contacts in the FGDR, 2d All-atom and 3D DNA representation

 

 

3D view of molecular structures

 

DRV Interface Plotter offers an internal 3D molecular viewer for the visualisation of the PDB structure used for the generation of the FGDR based hydrogen bond map. This function can be accessed by using the “3D view” button at the right hand side control panel of the Interface Plotter result page. After pushing this button an NGL based 3D structure viewer application opens in a new browser tab, showing the actual DNA-protein complex (Fig. 22). The structure viewer is preconfigured to enhance the focused 3D display of the specificity determining contact structures of the DNA-protein interaction. In the 3D structure viewer all the hydrogen bonds of the DNA-protein interface are displayed except those that are located on the non-variant DNA backbone. The default view shows the DNA-protein complex in a way that draws the attention to the intermolecular contact regions. The functional groups of major and minor grooves in these areas are indicated by coloured spheres (red: donor, blue: acceptor, yellow: methyl, green: hydrogen), and the backbone of the participating protein regions is highlighted. The amino acid sidechains of the interacting peptide fragments are displayed, and the direct intermolecular hydrogen bonds and the water bridges connecting the protein and DNA components are shown as pink and orange rods, respectively. By pressing the “Settings” button in the upper left corner of the structure viewer window a configuration panel opens, where the 3D display parameters can be adjusted. Among others here can be set to show only the DNA or the protein partners alone. The visualisation style of the macromolecular structures (cartoon, surface, trace) and the appearance of the hydrogen bond and functional group indicators can also be selected here. For DNA there are two special surface colouring options, one for displaying the functional group, and another for showing the PCD pattern of the double helix.

 

 

 

Fig. 20. 3D structure viewer module

 

 

The 3D structure viewer can zoom and rotate the displayed structure by mouse and keyboard actions. The mouse and keyboard controls of the 3D structure viewer are listed below (Fig. 22).

 

 

Fig. 22. The mouse and keyboard controls of DRV 3D structure viewer

 

The DRV Interface Plotter module projects the hydrogen bonding pattern of the PDB structure file to the FGDR texture of the major and minor grooves. The FGDR displaying approach assume that the entire DNA adopts either B or A DNA conformation, where the major and minor grooves are intact. However, sometimes this assumption is not completely fulfilled, since part of the DNA adopts some special conformation where the DNA grooves are compromised. This might result in an unrealistic hydrogen-bond pattern in the affected part of the FGDR representation. Since the remaining part of the FGDR visualisation can be plotted correctly and remains informative, these structures were not excluded from the DRV database, and were kept available for visualisation. This can be the case for example when methyltransferases flips out certain bases of the DNA helix (Fig. 23)19, or when the DNA has single stranded flanking sequences20 (Fig. 24). Flanking sequences clearly appear in FGDR plots, but all the other irregularities can only be recognized in 3D view. Hydrogen bonds of those functional groups normally buried in the canonical double helix structure and become accessible only in irregular DNA structures are not displayed in FGDR image, but remain visible in the 3D view. In this situation a “Bond ignored“ error message appears above the FGDR plot. It is strongly recommended to always apply the 3D structure viewer after generating the FGDR hydrogen bond map to check the canonical DNA conformations with intact major and minor grooves and to reveal the possible structural anomalies similar to those mentioned above.

 

 

Fig. 23. PDB (2y7h): DNA-bound methylase complex from the TYP restriction-modification enzyme

 

Fig.24. PDB (2is6): UVRD-DNA-ADPMGF3 ternary complex

 

 

 

 

External software

 

List of external software that were integrated into the background data processing pipeline of DRV or were used during program development:

 

Hydrogen bond prediction: HBPLUS

Sequence logo generation:  WebLogo 3

Generation of some physicochemical and conformational DNA descriptor: DNAshapeR

3D molecular graphics: NGL

 

Java based programming environment: Kotlin

The R Project for statistical computing: R

Application framework: Vert.X

Icons: Icon8

 

                                                                      References

 

1.     Bogdanove, A. J., Bohm, A., Miller, J. C., Morgan, R. D. & Stoddard, B. L. Engineering altered protein-DNA recognition specificity. Nucleic Acids Res 46, 4845–4871 (2018).

2.     Dantas Machado, A. C. et al. Evolving insights on how cytosine methylation affects protein-DNA binding. Brief Funct Genomics 14, 61–73 (2015).

3.     Jin, J. et al. The effects of cytosine methylation on general transcription factors. Scientific Reports 6, 29119 (2016).

4.     Meysman, P., Marchal, K. & Engelen, K. DNA Structural Properties in the Classification of Genomic Transcription Regulation Elements. Bioinformatics and Biology Insights 6, 155–168 (2012).

5.     Chiu, T.-P., Rao, S., Mann, R. S., Honig, B. & Rohs, R. Genome-wide prediction of minor-groove electrostatic potential enables biophysical modeling of protein–DNA binding. Nucleic Acids Research 45, 12565–12576 (2017).

6.     Chiu, T.-P. et al. DNAshapeR: an R/Bioconductor package for DNA shape prediction and feature encoding. Bioinformatics (2015) doi:10.1093/bioinformatics/btv735.

7.     Li, J. et al. Expanding the repertoire of DNA shape features for genome-scale studies of transcription factor binding. Nucleic Acids Research 45, 12877–12887 (2017).

8.     Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res 14, 1188–1190 (2004).

9.     Hume, M. A., Barrera, L. A., Gisselbrecht, S. S. & Bulyk, M. L. UniPROBE, update 2015: new tools and content for the online database of protein-binding microarray data on protein-DNA interactions. Nucleic Acids Res. 43, D117-122 (2015).

10.   Newburger, D. E. & Bulyk, M. L. UniPROBE: an online database of protein binding microarray data on protein–DNA interactions. Nucleic Acids Research 37, D77–D82 (2009).

11.   Berger, M. F. & Bulyk, M. L. Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors. Nat. Protocols 4, 393–411 (2009).

12.   Kwan, A. H., Mobli, M., Gooley, P. R., King, G. F. & Mackay, J. P. Macromolecular NMR spectroscopy for the non‐spectroscopist. The FEBS Journal 278, 687–703 (2011).

13.   Velmurugu, Y. Dynamics and Mechanism of DNA-Bending Proteins in Binding Site Recognition. (Springer International Publishing, 2017). doi:10.1007/978-3-319-45129-9.

14.   Baker, E. N. Hydrogen bonding in biological macromolecules. in International Tables for Crystallography (eds. Rossmann, M. G. & Arnold, E.) vol. F 546–552 (International Union of Crystallography, 2006).

15.   McDonald, I. K. & Thornton, J. M. Satisfying Hydrogen Bonding Potential in Proteins. Journal of Molecular Biology 238, 777–793 (1994).

16.   Tiwari, A. & Panigrahi, S. K. HBAT: a complete package for analysing strong and weak hydrogen bonds in macromolecular crystal structures. In Silico Biol 7, 651–661 (2007).

17.   Mukherjee, S., Majumdar, S. & Bhattacharyya, D. Role of hydrogen bonds in protein-DNA recognition: effect of nonplanar amino groups. J Phys Chem B 109, 10484–10492 (2005).

18.   Sticke, D. F., Presta, L. G., Dill, K. A. & Rose, G. D. Hydrogen bonding in globular proteins. Journal of Molecular Biology 226, 1143–1159 (1992).

19.   Kennaway, C. K. et al. The structure of M.EcoKI Type I DNA methyltransferase with a DNA mimic antirestriction protein. Nucleic Acids Research 37, 762–770 (2008).

20.   Lee, J. Y. & Yang, W. UvrD Helicase Unwinds DNA One Base Pair at a Time by a Two-Part Power Stroke. Cell 127, 1349–1360 (2006).