MULTIVARIATE
LINEAR QSPR/QSAR MODELS:
RIGOROUS
EVALUATION OF VARIABLE
SELECTION FOR PLS
Computational and Structural Biotechnology Journal, 5 [6], e201302007, 1-10 (2013).
Link to this open
access online journal. Direct download of
PDF.
DOI: http://dx.doi.org/10.5936/csbj.201302007
Kurt Varmuza* [email, web ], Peter Filzmoser
[email, web], Matthias Dehmer [email, web]
* Corresponding author
Supplementary Material: QSPR example using R
GC retention indices of polycyclic aromatic compounds (PACs),
modeled by molecular descriptors (data, software, user guide
R_for_QSPR_UserGuide.pdf User Guide (130308)
R_scripts_for_QSPR.zip R scripts (140214) New version of go_rdcv() for R
3.0.2, and errors in plots
6, 7 corrected.
PAC209_3D.zip Chemical structures of 209 PACs (polycyclic aromatic compounds) in Molfile format
contains
PAC209_3D_all_H.SDF,
with
approximate 3D atom coordinates and all H-atoms (by software Corina)
PAC209_dragon_2772.zip Molecular descriptors (m = 2772) calculated by software Dragon
from PAC209_3D_all_H.SDF.
Contains
three files:
Descriptors PAC209_dragon_2772.txt
(3.5 MB)
Descriptor (variable) names PAC209_dragon_2772_descriptors.txt
Structure (object) names PAC209_dragon_2772_molecules.txt
PAC209_X2772.zip Molecular descriptors imported into R
from the three Dragon output files by Dragon60_import(),
contains
PAC209_X_2772.RData, load() of
this RData-file gives a matrix object x
(209x2772)
PAC209_X2688_y.zip Data for variable selection and
PLS models (contains 2 files):
PAC209_X_2688.RData Molecular
descriptors after cleaning, matrix object x
(209x2688),
obtained from
PAC209_X_2772.RData by varsel_almost_const()
as described in the User Guide
PAC209_y.txt Properties
(GC retention indices) of the 209 PACs,
as described in the
User Guide
These two files are used throughout the
example for variable selection and model evaluation.
Last changes of this website 200131