MULTIVARIATE LINEAR QSPR/QSAR MODELS:

RIGOROUS EVALUATION OF VARIABLE SELECTION FOR PLS

 

Computational and Structural Biotechnology Journal, 5 [6], e201302007, 1-10 (2013).

Link to this open access online journal. Direct download of PDF.

DOI: http://dx.doi.org/10.5936/csbj.201302007

Kurt Varmuza* [email, web ], Peter Filzmoser [email, web], Matthias Dehmer [email, web]

* Corresponding author

 

 

Supplementary Material:  QSPR example using R

GC retention indices of polycyclic aromatic compounds (PACs),

modeled by molecular descriptors (data, software, user guide

 

R_for_QSPR_UserGuide.pdf      User Guide (130308)

 

R_scripts_for_QSPR.zip            R scripts (140214)   New version of go_rdcv() for R 3.0.2, and errors in plots 6, 7 corrected.

 

PAC209_3D.zip                         Chemical structures of 209 PACs (polycyclic aromatic compounds) in Molfile format

                                                 contains PAC209_3D_all_H.SDF,

                                                 with approximate 3D atom coordinates and all H-atoms (by software Corina)

 

PAC209_dragon_2772.zip        Molecular descriptors (m = 2772) calculated by software Dragon from PAC209_3D_all_H.SDF.

                                                 Contains three files:

                                                    Descriptors                              PAC209_dragon_2772.txt (3.5 MB)

                                                    Descriptor (variable) names          PAC209_dragon_2772_descriptors.txt

                                                    Structure (object) names          PAC209_dragon_2772_molecules.txt

 

PAC209_X2772.zip                   Molecular descriptors imported into R from the three Dragon output files by Dragon60_import(),

                                                 contains PAC209_X_2772.RData, load() of this RData-file gives a matrix object x (209x2772)

 

PAC209_X2688_y.zip               Data for variable selection and PLS models (contains 2 files):

                                                 PAC209_X_2688.RData               Molecular descriptors after cleaning, matrix object x (209x2688),

                                                                                                    obtained from PAC209_X_2772.RData by varsel_almost_const()

                                                                                                    as described in the User Guide

                                                 PAC209_y.txt                             Properties (GC retention indices) of the 209 PACs,

                                                                                                    as described in the User Guide

                                                 These two files are used throughout the example for variable selection and model evaluation.

  Last changes of this website 200131