Title Požymių išskyrimas optimizuojant priklausomumo struktūrą /
Translation of Title Feature extraction via dependence structure optimization.
Authors Daniušis, Povilas
Full Text Download
Pages 29
Keywords [eng] feature extraction ; dimensionality reduction ; dependence maximization ; dependence optimization ; HSIC
Abstract [eng] In many important real world applications the initial representation of the data is inconvenient, or even prohibitive for further analysis. For example, in image analysis, text analysis and computational genetics high-dimensional, massive, structural, incomplete, and noisy data sets are common. Therefore, feature extraction, or revelation of informative features from the raw data is one of fundamental machine learning problems. Efficient feature extraction helps to understand data and the process that generates it, reduce costs for future measurements and data analysis. The representation of the structured data as a compact set of informative numeric features allows applying well studied machine learning techniques instead of developing new ones.. The dissertation focuses on supervised and semi-supervised feature extraction methods, which optimize the dependence structure of features. The dependence is measured using the kernel estimator of Hilbert-Schmidt norm of covariance operator (HSIC measure). Two dependence structures are investigated: in the first case we seek features which maximize the dependence on the dependent variable, and in the second one, we additionally minimize the mutual dependence of features. Linear and kernel formulations of HBFE and HSCA are provided. Using Laplacian regularization framework we construct semi-supervised variants of HBFE and HSCA. Suggested algorithms were investigated experimentally using conventional and multilabel classification data sets. The extracted features were classified by k nearest neighbor classifier, and their quality is evaluated by classification performance measures. Experiments show that in certain cases our algorithms are more efficient comparing to PCA or LDA.
Type Summaries of doctoral thesis
Language Lithuanian
Publication date 2012