Academic
Publications
Approximating a collection of frequent sets

Approximating a collection of frequent sets,10.1145/1014052.1014057,Foto N. Afrati,Aristides Gionis,Heikki Mannila

Approximating a collection of frequent sets   (Citations: 54)
BibTex | RIS | RefWorks Download
One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.
Conference: Knowledge Discovery and Data Mining - KDD , pp. 12-19, 2004
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...The problem of summarizing frequent graph patterns is related to that of summarizing frequent item sets, which has been extensively investigated in [15], [12], [16], and [17]...
    • ...Afrati et al. [15] uses K frequent item sets at border to approximate the collection of frequent item sets...
    • ...When the method in [15] is applied to graph databases, support information in a large number of frequent graph patterns will be lost because it is aimed at obtaining a subset of maximal frequent patterns...

    Jianzhong Liet al. Efficient Algorithms for Summarizing Graph Patterns

    • ...To achieve a concise summary or representatives of frequent itemsets, recent works such as the closed pattern (Pasquier et al. 1999), the spanning set approach (Afrati et al. 2004), the clusteringbased approach (Xin et al. 2005), the profile-based approach (Yan et al. 2005), the Markov random field approach (Wang and Parthasarathy 2006) and the regression-based approach (Jin et al. 2008) are developed...

    Guanhua Chenet al. Efficient approaches for summarizing subspace clusters into

    • ...The concept of summarization is also common in frequent itemset analysis, where various techniques have been developed to generate a compact approximation to a collection of frequent itemsets [5][6], or other multidimensional summaries [7]...

    A. N. Mahmoodet al. Hierarchical summarization techniques for network traffic

    • ...Summarization is a common method for representing huge amounts of patterns [5,6,7,8,9,10]...
    • ...Note that most of the existing methods use the same language for P and S [5,6,7,8]...
    • ...Table 2 highlights some characteristics of the approaches closest to ours [10,6,5,8,7,9] . We comment the table below...
    • ...each pattern is covered by at least a pattern of the summary, contrary to other methods which provide summaries that approximately cover the set of association rules [5,8]...
    • ...Reference [9] [6] [5] [10] [7] [8] Our method...
    • ...Regeneration: The main objective of some approaches is to build summaries so that they can be used to regenerate the original itemsets [5] or their frequency [8,7,10] whereas the others are interested in the representation of the patterns for future exploration [6,9]...
    • ...The measure used in [5] calculates the size of the subset of patterns which are really covered by at least one pattern of the summary...

    Marie Ndiayeet al. Cube Based Summaries of Large Association Rule Sets

    • ...Frequent Itemsets are huge when the given threshold is low; consequently, the condensed representations of frequent itemsets including closed itemsets[19], maximal itemsets[6][7][8][9][18][22], free itemsets[12], approximate k-sets[10], weighted itemsets[21], and non-derivable itemsets[14] were proposed; in addition, [20] focused on discovering a minimal set of unexpected itemsets...

    Haifeng Liet al. Mining maximal frequent itemsets on graphics processors

Sort by: