Classifying Incomplete Data Using Group Difference Detection with Parimputation Approach
    Download PDF
Shichao Zhang,Jilian Zhang. Classifying Incomplete Data Using Group Difference Detection with Parimputation Approach. International Journal of Software and Informatics, 2012,6(4):535~552
Hits: 2277
Download times: 2095
Fund:This work is supported in part by the Australian Research Council (ARC) under large grant DP0985456, the China "1000-Plan" National Distinguished Professorship, the China 863 Program under grant 2012AA011005, the Nature Science Foundation of China (NSFC)u
Abstract:We propose an effcient approach for classifying insu±cient dataset with missing data (incomplete data) with group di?erence detection. Specifically, missing data in an insuffcient dataset are first completed with the parimputation strategy. And then, the insuffcient dataset is grouped by contrasting with a known dataset (transfer learning). Finally, for assessing the quality of the induced models, empirical likelihood (EL) inference is used to estimate the confidence intervals of structural differences between the insuffcient dataset and the known dataset. In such a way of mining, classifying incomplete data can be beneficial to industries as it will provide easier and smarter use of information. This will include evaluating a new medical product by detecting differences between the new product and an old one for pharmaceutical companies and, identifying frauds by detecting abnormal operations. To experimentally illustrate the benefits, we evaluate the proposed approach using UCI datasets, and demonstrate that our method works much better than the boot-strap resampling method on, for example, distinguishing spam from non-spam emails; and the benign breast cancer from the malign one.
keywords:incomplete data  missing data imputation  group difference detection
View Full Text  View/Add Comment  Download reader

 

 

more>>  
Visitor:3141090
Top Paper  |  E-mail Alert  |  Publication Ethics  |  New Version

© Copyright by Institute of Software, the Chinese Academy of Sciences
京ICP备05046678号-5

京公网安备 11040202500065号