MUV

MUV -Home
Download the MUV Datasets
Create your own MUV Datasets
Spatial Statistics Toolbox for Matlab
MUV Theory
FAQ
header

Impact of Benchmark Dataset Topology on VS Validation Results

We introduced Refined Nearest Neighbor analysis methods for the analysis of chemcial datasets in the paper:

Rohrer, S.G.; Baumann, K.
Impact of Benchmark Data Set Topology on the Validation of Virtual Screening Methods: Exploration and Quantification by Spatial Statistics.
J. Chem. Inf. Model., 2008, 48, 704-71
doi: 10.1021/ci700099u (Open Access)

It was shown, that datasets with a "clumpy" topology in in descriptor space bias validations of virtual screening methods towards over-optimistic results. This lead to the rationale, that datasets without bias, i.e. Maximum Unbiased Validation (MUV) Datasets, should have a non-clumpy, spatially random topology.

Both the original findings about the impact of dataset topology and the rationale behind MUV dataset design are summarized quite concisely in a talk Knut Baumann gave at the EuroQSAR 2008 at Uppsala: Slides.

Please refer to muv@tu-bs.de if you have any questions or suggestions.