Today, Virtual Screening (VS) is a standard technique used in almost every drug discovery campaign, both in industrial and academic environments. Its main target is to narrow down the vastness of chemical space to a level where experimental scientists can cope with the sheer amount of substances to test.
VS can be roughly divided in receptor based and ligand based approaches.
Receptor based VS relies on a 3-dimensional structure of the biological target, which may be determined experimentally by X-ray diffraction or NMR spectroscopy, or estimated theoretically by homology modelling. Collections of substances, so called “screening libraries”, are then “docked” into the binding pocket of the respective protein. The library is then ranked according to the estimated free energies of the predicted binding modes. Since the geometric operations needed to fit the ligand into the pocket, as well as the estimation of the binding free energy, are very complex, receptor based VS is computationally quite expensive.
A potential inhibitor of SARS Coronavirus main protease docked into the binding pocket of the target. The ligand is bound tightly by several hydrogen bonds (green dashes) and hydrophobic interactions (orange dashes). Finding new antiviral comounds and supporting their optimisation is a main focus of our group's research.
In ligand based VS we make use of the knowledge derived from substances which were experimentally proven to be active against the target of the campaign. These substances – the “query” - are encoded using so called “descriptors”. Descriptors constitute vectors of numbers that represent distinct properties of the encoded molecules. These can be actual physico-chemical properties like molecular weight, molecular volume or logP, but also presence or absence of certain substructural features as well as mutual distances of pharmacophore patterns. Screening libraries are then coded in the same way and ranked according to the similarity they share with the actives in desriptor space, following the paradigma that similar chemical properties cause similar activity.
Descriptors are vectors of numbers that represent certain molecular features. When presence or absence of substructures is encoded in binary vectors of “bitstrings”, they are called “molecular fingerprints”. When the descriptor is calculated from physico-chemical properties of the molecule, it is called “molcular property descriptor”.
Thousands of descriptors have been developed for the encoding of molecules but they all face a common dilemma: On the one hand
they must reflect a molecules' properties as accuarately as possible, but on the other hand they have to abstract and generalize
them in order to find new chemical entities.
Validation of VS techniques: Estimating what we're gonna get...
When starting a campaign to find a new drug against a target, one has to decide which method to use for virtual screening. With the plethora of methods available for receptor and ligand based VS and their various flavours there are easily hundreds of possibilities a scientist faces to choose. Considering the huge amounts of money that have to be invested for the development and market introduction of a potential drug, it is essential that he chooses the method that is likely to yield the maximum amount of substances that will prove to be active in experiments. To achieve this, methods for virtual screening are tested and compared using benchmark datasets. These datasets usually consist of several hundred substances of known activity against a given target. A few of them are chosen at random to act as query and the rest is mixed with a large number of molecules that have no activity against the target (“background”) to form a “test database”. After the test database was searched with the query, the performance of a method is measured by the retrieval of active substances in the first percent of the ranked test database.
Based on this measure for virtual screening performance, it is possible to select the best from a number of available methods, as well as to optimize the parameter set of a given method in order to achieve best performance.
Although this methodology is quite commonly used for the validation and optimization of VS methods, it has several weaknesses. For instance it heavily depends on the composition of the active and background datasets. A second problem that is often neglected when validating VS methods, is that results may change dramatically when a different set of query molecules is chosen from the set of actives. These and several other effects make an unbiased validation of virtual screening procedures quite difficult. Our group is working on methodologies for the standardization and normalization of datasets and validation results allowing correct and objective evaluation and optimisation of presently available and future VS methods.
Virtual Screening for new antiinfectious compounds
In its first outbreak 2003, the viral disease Severe Acute Respiratory Syndrome (SARS) infected more than 8,000 people worldwide. With 900 of these people dying, SARS is a life-threatening danger, which is not yet treatable in a sufficient manner.
Within the search for new anti SARS drugs, one of the promising targets that are currently investigated, is the main protease (Mpro). Integrated into a large group of structural biologists, microbiologists and medicinal chemists we facilitate virtual screening, QSAR-analyses and other theoretical calculations.
Furthermore our workgroup is part of the SFB 630 (http://www.sfb-630.uni-wuerzburg.de/). This project engages in infectious diseases, e.g. malaria or dengue-fever, investigating new leads for drug discovery.