Academic
Publications
Mining positive and negative patterns for relevance feature discovery

Mining positive and negative patterns for relevance feature discovery,10.1145/1835804.1835900,Yuefeng Li,Abdulmohsen Algarni,Ning Zhong

Mining positive and negative patterns for relevance feature discovery   (Citations: 1)
BibTex | RIS | RefWorks Download
It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of the large number of terms, patterns, and noise. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences, but many experiments do not support this hypothesis. The innovative technique presented in paper makes a breakthrough for this difficulty. This technique discovers both positive and negative patterns in text documents as higher level features in order to accurately weight low-level features (terms) based on their specificity and their distributions in the higher level features. Substantial experiments using this technique on Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms both the state-of-the-art term-based methods underpinned by Okapi BM25, Rocchio or Support Vector Machine and pattern based methods on precision, recall and F measures.
Conference: Knowledge Discovery and Data Mining - KDD , pp. 753-762, 2010
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...The low-level term weights are evaluated according to both their specicity and their distributions in the higher level features, where the higher level features include both positive and negative patterns [9, 1]...
    • ...A promising model Relevance Feature Discovery (RFD) was a model that has been proposed for IF within the data mining community and has shown encouraging improvements of eectiveness [9]...
    • ...The features were extracted from selected negative documents used for groups and low-level features (terms) were devised based on both their appearances in the higher-level features (patterns) and their categories [9, 1]...
    • ...For a given topic, relevance feature discovery in text documents aims to nd a set of features, including patterns, terms and their weights, in a training set D, which consists of a set of positive documents, D + , and a set of negative documents, D . The following denitions can be found in [9] and [11]...
    • ...In this paper, we also use the same techniques as used for mining the initial training set to extract knowledge from Ds (see Section 3, or [9])...
    • ...All closed sequential patterns DP + in D + b extracted used SPMining algorithm [9]...
    • ...Some of the negative documents are also selected (call oender) [9] to update and revise the terms inTb; and we can have closed patternsDPb extracted from the selected negative documents...
    • ...If the term tb appear only in D + then coverset (tb) =; and also the term appear only inD thencoverset + (tb) =;. Term weights are also revised in Algorithm NRevision()(see [9]) based on their specicity, spe function (see Eq. (3))...
    • ...Then used negative feedback to group and revised the extracted features from positive documents as shown in section 3.4 as well [9]...

    Abdulmohsen Algarniet al. Selected new training documents to update user profile

Sort by: