A package to evaluate structural models using chemical crosslinking distance constraints.

Institute of Chemistry
University of Campinas

XL Statistics
See also:
Dalton MS lab
Group software page
The TANGO project
Home Server Download

How to use TOPOLINK

The package is accompanied by an example input file, which is found in the topolink/input directory. It is mostly self-explicative. Possibly the best way to get started is to open the input file and look at it. An example input is available [here], and the structure file to be used in this example is [here].

The executable must be run with:
topolink inputfile.inp > topolink.log 
Alternatively, the PDB file of the structure can be provided on the command line as the second argument, overwriting the definition given in the input file:
topolink inputfile.inp model.pdb > topolink.log



Basic structure of the input file

The basic TopoLink input file contains: 1. The name of the PDB file of the model to be evaluated. 2. The type of links to be computed. 3. The specification of the linker used and experimental observations. 4. Technical options.

1. Name of model file and output of linkers computed.

The name of the input file is provided with:
pdbfile model.pdb
and this definition can be overwritten by providing the name of the file in the command line, to facilitate the execution of TopoLink for multiple models with the same input file, as described above.

TopoLink can output coordinates for the topological paths obtained, by using
printlinks yes
linkdir ./links
where ./links is the directory where the PDB files of the links will be written (the directory must exist). Note that there can be many links, so multiple files will be created. The files created are simple PDB files which can be open in any structure visualization software together with the model PDB file to visualize the topological distance. For the execution of TopoLink in multiple models, it is recommended not to write link files, using printlinks no.

2. Types of links to be computed

The next important parameter of the TopoLink input file is the definition of which links are to be searched and computed. There are three options:
compute observed
compute all
compute reactive
When using compute observed, only the links that were observed experimentally (see below) will be computed.

If compute all is used instead, all possible crosslinks will be computed. That means that, given the definition of the linker used, TopoLink will search for consistent topological distances for every pair of residues that could, by the chemical nature of the linker and the residues involved, be attached by the linker.

Finally, compute reactive tells TopoLink to consider that only residues that were experimentally observed to react (by participating in observed crosslinks or dead-ends) are reactive. Then, TopoLink will search for topological distances consistent with the linker used only between these pairs of "observed-reactive" residues.

Additionally, the user may optionally choose to compute only the crosslinks between different chains of the PDB file, for instance to compute inter-proteins crosslinks in a complex. To do so, just add the
option to the input file. All intra-protein links will be skiped.

3. Specification of linker types and experimental observations

The linker types and observations are specified in the TopoLink input file using an experiment-based structure, as follows: experiment DSS # ResType Chain ResNum AtomType ResType Chain ResNum AtomType MaxDist linktype MET all 1 N LYS all all CB 30. linktype LYS all all CB SER all all CB 24. ... # Observed cross-links observed LYS A 6 SER A 113 observed SER A 9 LYS A 113 ... # Observed dead-ends deadend LYS A 6 deadend SER A 71 ... end experiment DSS
The experiment line defines the name of the experiment. For example, the name of the reactant used.

The linktype lines define the reactivity of the linker model used in this experiment. For example, the first linktype line here specifies that Methyonine 1 can be linked by this reactant to any Lysine residue. This link can occur if the distance between atom N of the MET residue to atom CB of the LYS of the LYS residues is at most 30. Angstroms. The second linktype line in the example defines that any LYS residue can be linked to any SER residue, and this link can be formed if the CB atoms of these residues are far from each other by at most 24 Angstroms. Add linktype lines until the reactivity of the linker used is completely defined.

Next, there is a report of what was experimentally observed. Each observed line specifies an observed crosslink. For example, the first line in the example shows that a crosslink was found experimentally between LYS 9 of chain A and SER 113 of chain A. Note that each of these observations must be consistent of one linktype listed.

Finally, the observed reactivity of the side chains can be reported in terms of deadends. This might be important for the following: If a deadend was observed for a residue, this residue must be solvent exposed. Therefore, if it is close to another reactive residue, the pair of residues should have formed a crosslink. If TopoLink finds that two residues are reactive and are close enough to form the link, but the link was not observed, it will report that the link is missing from the observations. This is one of the statistics than can be used to evaluate the consistency of the model with the experimental observations.

4. Technical options

The TopoLink example input file contains also several technical options which generally should not be of concern. Some of them, however, deserve some attention:
search_limit relative 1.5
This option specifies the maximum distance up to which the search for paths will be performed. Here, in the example, TopoLink will search for paths at most 1.5 greater than the maximum linker length defined in each 'linktype' definition. Increase this number if you want to know the topological distances even if they are much greater than the linker length. The greater the search range, the longer the search for paths takes. This keyword accepts, instead of the 'relative' option, two other alternatives: 'sum', and 'fixed'. If the 'sum' option is chosen, followed by a distance, in angstroms, the search will be performed for the length of the linker increased by that distance (5 Angs, for example). If 'fixed' is chosen, the distance is the absolute distance for path search, for example, 40 Angs.
endread ENDMDL
Sometimes the PDB files contain more than one structure (two chains, for example). If the user wants that only of the structures is read, he/she can add a keyword, ENDMDL in the example, until which TopoLink will read the PDB file. The file will be ignored after that keyword. If no keyword is defined or the keyword is not found, the file will be read to the end and all structures will be considered in the calculations.
readatoms [all/heavy/backbone/backplusCB]
Defines the protein atoms to be considered when computing surface accessible topological distances. Normally the reasonable choice is to consider heavy atoms, as the volume of hydrogen atoms is not important. However, other choices are available.
pgood 0.70 # Probability of observing a link which is within linker reach
pbad  0.01 # Probability of observing a link which is NOT within linker reach
If the experimentalist can estimate the sensitivity and false-positive probability of the experiment, these parameters can be used to estimate the likelihood of the experimental result. pgood is the probability of observing a link if two atoms are reactive (that is, are exposed to solvent and within linker reach), and pbad is the experimental probability of a false-positive (that is, that the experiment reports the existence of a link for two residues that are not within linker reach).