of bio-assays were extracted from PubChem. Here the requirement
was that the bioactivity against the same protein target was first
determined in a high-throughput primary screen and then in a follow-up,
low-throughput confirmation screen.
Inactives from the primary screens were used as "Potential Decoys" (PD), actives from the confirmation screens as "Potential Actives" (PA). Potential actives were further required to provide associated dose-response data and EC50 values.
2. From the datasets of potential actives, all compounds with doubtable bioactivities were purged by an array of automatic filters.
This included compounds with suspicious Hill-Slopes, compounds hitting in an unusually high number of assays in PubChem (i.e. frequent hitters) and compounds that are known to exhibit undesirable interference with optical detection methods.
3. Actives not adequately embedded in the available decoys were removed from the datasets. This is essential because no dataset of decoys can be designed, that prevents such actives from artificial enrichment.
4. Subsets of k=30 actives were extracted for each dataset with a common spread measured by the Refined Nearest Neigbor analysis figure ΣG.
5. Subsets of d=15000 decoys were extracted for each dataset with a common separation from the actives measured by the Refined Nearest Neigbor analysis figure ΣF.
|Target||Mode of Interaction||Target Class||Prim. Assay (AID)||Confirm. Assay (AID)||Assay-Type||Actives (original dataset)||Decoys (original dataset)||Actives (MUV)||Decoys (MUV)||Scaffolds (MUV)|
|S1P1 rec.||Agonists||GPCR||449||466||Reporter Gene||223||55395||30||15000||28|
|SF1||Inhibitors||Nuclear Receptor||525||600||Reporter Gene||213||64550||30||15000||24|
|Eph rec. A4||Inhibitors||Rec. Tyr. Kinase||689||689||Enzyme||80||61480||30||15000||29|
|SF1||Agonists||Nuclear Receptor||522||692||Reporter Gene||75||63683||30||15000||30|
|D1 rec.||Allosteric Modulators||GPCR||641||858||Reporter Gene||226||54292||30||15000||24|
|M1 rec.||Allosteric Inhibitors||GPCR||628||859||Reporter Gene||231||61477||30||15000||29|
||| Good, A. &
Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection?
J. Comput.-Aided Mol. Des., 2008, 22, 169-178
|||Verdonk, M. L.; Berdini, V.; Hartshorn, M. J.; Mooij,
W. T. M.; Murray, C. W.; Taylor, R. D. & Watson, P.
Virtual screening using protein-ligand docking: avoiding artificial enrichment.
J. Chem. Inf. Comput. Sci., 2004, 44, 793-806
|||Rohrer, S.G.; Baumann, K.
Impact of Benchmark Data Set Topology on the Validation of Virtual Screening Methods: Exploration and Quantification by Spatial Statistics.
J. Chem. Inf. Model., 2008, 48, 704-71
|||Rohrer, S.G.; Baumann, K.
Maximum Unbiased Validation (MUV) Datasets for Virtual Screening Based on PubChem Bioactivity Data
J. Chem. Inf. Model., in press