Systematic
analysis of large screening sets for drug discovery
Paul Blower
1, Kevin Cross 1, Michael
Fligner 2, Glenn Myatt 1,
and Joseph Verducci 2, and Chihae Yang
1
We have developed a novel, systematic process for analysing
the structure-activity relationships (SAR) of large, heterogeneous
data sets. It leverages existing techniques as components
in a common, overall analysis process that can be applied
differently in individual cases. We first filter the initial
screening set to remove compounds that would be unsuitable
as lead compounds. Then we group active compounds into structurally
meaningful categories and perform an outlier analysis of the
classification results. For each active class, we identify
key macrostructural features by reassembling common structural
building blocks in the class. The algorithm can be parameterized
to meet differing objectives: (1) features that discriminate
for biological activity, (2) scaffolds for R-group analysis,
and (3) features that discriminate for membership in the class.
The macrostructures that discriminate for activity are useful
descriptors for building local prediction models; others provide
the basis for R-group analysis to further refine the SAR within
the class. The suite of tools can assist pharmaceutical researchers
in making better use of the vast quantities of information
already residing in pharmaceutical databases, to select more
and better qualified lead series, and to more effectively
design follow-up experiments for optimization studies.
1. LeadScope, Inc., 2 The Ohio State University, Columbus,
OH, USA
|