Weighted distance functions improve analysis of high-dimensional data: Application to molecular dynamics simulations
Title | Weighted distance functions improve analysis of high-dimensional data: Application to molecular dynamics simulations |
Publication Type | Journal Article |
Year of Publication | 2015 |
Authors | Blöchliger N., Caflisch A., Vitalis A. |
Journal | Journal of Chemical Theory and Computation |
Volume | 11 |
Issue | 11 |
Pagination | 5481-5492 |
Date Published | 2015 Nov 10 |
Type of Article | Research Article |
Keywords | data analysis, Feature Selection, Feature Weighting, Progress Index, Protein Folding, Scalable algorithm, Time Series Data |
Abstract | Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propose to globally or locally weight simulation features based on effective rates. This emphasizes, in a data-driven manner, slow degrees of freedom that often report on the metastable states sampled by the molecular system. We couple this idea to several unsupervised learning protocols. Our approach unmasks slow side chain dynamics within the native state of a miniprotein and reveals additional metastable conformations of a protein. The approach can be combined with most algorithms for clustering or dimensionality reduction. |
DOI | 10.1021/acs.jctc.5b00618 |
pubindex | 0203 |
Alternate Journal | J. Chem. Theory Comput. |
PubMed ID | 26574336 |