Weighted distance functions improve analysis of high-dimensional data: Application to molecular dynamics simulations

TitleWeighted distance functions improve analysis of high-dimensional data: Application to molecular dynamics simulations
Publication TypeJournal Article
Year of Publication2015
AuthorsBlöchliger N., Caflisch A., Vitalis A.
JournalJournal of Chemical Theory and Computation
Volume11
Issue11
Pagination5481-5492
Date Published2015 Nov 10
Type of ArticleResearch Article
Keywordsdata analysis, Feature Selection, Feature Weighting, Progress Index, Protein Folding, Scalable algorithm, Time Series Data
Abstract

Data mining techniques depend strongly on how the data are represented and how distance between samples is measured. High-dimensional data often contain a large number of irrelevant dimensions (features) for a given query. These features act as noise and obfuscate relevant information. Unsupervised approaches to mine such data require distance measures that can account for feature relevance. Molecular dynamics simulations produce high-dimensional data sets describing molecules observed in time. Here, we propose to globally or locally weight simulation features based on effective rates. This emphasizes, in a data-driven manner, slow degrees of freedom that often report on the metastable states sampled by the molecular system. We couple this idea to several unsupervised learning protocols. Our approach unmasks slow side chain dynamics within the native state of a miniprotein and reveals additional metastable conformations of a protein. The approach can be combined with most algorithms for clustering or dimensionality reduction.

DOI10.1021/acs.jctc.5b00618
pubindex

0203

Alternate JournalJ. Chem. Theory Comput.
PubMed ID26574336
Full Text PDF: