Sign in
Author
|
Conference
|
Journal
|
Organization
|
Year
|
DOI
Look for results that meet for the following criteria:
since
equal to
before
between
and
Search in all fields of study
Limit my searches in the following fields of study
Agriculture Science
Arts & Humanities
Biology
Chemistry
Computer Science
Economics & Business
Engineering
Environmental Sciences
Geosciences
Material Science
Mathematics
Medicine
Physics
Social Science
Multidisciplinary
Keywords
(7)
Approximate Algorithm
Approximation Method
Best Approximation
Correspondence Problem
Data Mining
Empirical Evidence
frequent itemset
Related Publications
(15)
Mining association rules between sets of items in large databases
Learning nonstationary models of normal network traffic for detecting novel attacks
Mining Compressed Frequent-Pattern Sets
Summarization - Compressing Data into an Informative Representation
Modern Information Retrieval: A Brief Overview
Subscribe
Academic
Publications
Approximating a collection of frequent sets
Approximating a collection of frequent sets,10.1145/1014052.1014057,Foto N. Afrati,Aristides Gionis,Heikki Mannila
Edit
Approximating a collection of frequent sets
(
Citations: 54
)
BibTex
|
RIS
|
RefWorks
Download
Foto N. Afrati
,
Aristides Gionis
,
Heikki Mannila
One of the most well-studied problems in
data mining
is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give
empirical evidence
showing that the approximation methods work well in practice.
Conference:
Knowledge Discovery and Data Mining - KDD
, pp. 12-19, 2004
DOI:
10.1145/1014052.1014057
Cumulative
Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
(
portal.acm.org
)
(
portal.acm.org
)
(
doi.acm.org
)
(
portal.acm.org
)
(
cs-people.bu.edu
)
(
www.informatik.uni-trier.de
)
(
www.cs.uiuc.edu
)
(
cchen1.csie.ntust.edu.tw
)
More »
Citation Context
(40)
...The problem of summarizing frequent graph patterns is related to that of summarizing frequent item sets, which has been extensively investigated in [
15
], [12], [16], and [17]...
...Afrati et al. [
15
] uses K frequent item sets at border to approximate the collection of frequent item sets...
...When the method in [
15
] is applied to graph databases, support information in a large number of frequent graph patterns will be lost because it is aimed at obtaining a subset of maximal frequent patterns...
Jianzhong Li
,
et al.
Efficient Algorithms for Summarizing Graph Patterns
...To achieve a concise summary or representatives of frequent itemsets, recent works such as the closed pattern (Pasquier et al. 1999), the spanning set approach (
Afrati et al. 2004
), the clusteringbased approach (Xin et al. 2005), the profile-based approach (Yan et al. 2005), the Markov random field approach (Wang and Parthasarathy 2006) and the regression-based approach (Jin et al. 2008) are developed...
Guanhua Chen
,
et al.
Efficient approaches for summarizing subspace clusters into
...The concept of summarization is also common in frequent itemset analysis, where various techniques have been developed to generate a compact approximation to a collection of frequent itemsets [5][
6
], or other multidimensional summaries [7]...
A. N. Mahmood
,
et al.
Hierarchical summarization techniques for network traffic
...Summarization is a common method for representing huge amounts of patterns [
5
,6,7,8,9,10]...
...Note that most of the existing methods use the same language for P and S [
5
,6,7,8]...
...Table 2 highlights some characteristics of the approaches closest to ours [10,6,
5
,8,7,9] . We comment the table below...
...each pattern is covered by at least a pattern of the summary, contrary to other methods which provide summaries that approximately cover the set of association rules [
5
,8]...
...Reference [9] [6] [
5
] [10] [7] [8] Our method...
...Regeneration: The main objective of some approaches is to build summaries so that they can be used to regenerate the original itemsets [
5
] or their frequency [8,7,10] whereas the others are interested in the representation of the patterns for future exploration [6,9]...
...The measure used in [
5
] calculates the size of the subset of patterns which are really covered by at least one pattern of the summary...
Marie Ndiaye
,
et al.
Cube Based Summaries of Large Association Rule Sets
...Frequent Itemsets are huge when the given threshold is low; consequently, the condensed representations of frequent itemsets including closed itemsets[19], maximal itemsets[6][7][8][9][18][22], free itemsets[12], approximate k-sets[
10
], weighted itemsets[21], and non-derivable itemsets[14] were proposed; in addition, [20] focused on discovering a minimal set of unexpected itemsets...
Haifeng Li
,
et al.
Mining maximal frequent itemsets on graphics processors
References
(19)
Advances in frequent itemset mining implementations: report on FIMI'03
(
Citations: 59
)
Bart Goethals
,
Mohammed Javeed Zaki
Journal:
Sigkdd Explorations
, vol. 6, no. 1, pp. 109-117, 2004
Discovery of Frequent Episodes in Event Sequences
(
Citations: 678
)
Heikki Mannila
,
Hannu Toivonen
,
A. Inkeri Verkamo
Journal:
Data Mining and Knowledge Discovery - DATAMINE
, vol. 1, no. 3, pp. 259-289, 1997
Approximation algorithms for np-hard problems
(
Citations: 397
)
D. S. Hochbaum
Published in 1997.
Mining Sequential Patterns
(
Citations: 2019
)
Rakesh Agrawal
,
Ramakrishnan Srikant
Conference:
International Conference on Data Engineering - ICDE
, pp. 3-14, 1995
Mining associations between sets of items in massive databases
(
Citations: 220
)
R. Agrawal
,
T. Imielinski
,
A. Swami
Conference:
International Conference on Management of Data - SIGMOD
, 1993
Sort by:
Citations
(54)
Summarizing transactional databases with overlapped hyperrectangles
(
Citations: 1
)
Yang Xiang
,
Ruoming Jin
,
David Fuhry
,
Feodor F. Dragan
Journal:
Data Mining and Knowledge Discovery - DATAMINE
, vol. 23, no. 2, pp. 215-251, 2011
Efficient Algorithms for Summarizing Graph Patterns
Jianzhong Li
,
Yong Liu
,
Hong Gao
Journal:
IEEE Transactions on Knowledge and Data Engineering - TKDE
, vol. 23, no. 9, pp. 1388-1405, 2011
Efficient approaches for summarizing subspace clusters into
Guanhua Chen
,
Xiuli Ma
,
Dongqing Yang
,
Shiwei Tang
,
Meng Shuai
,
Kunqing Xie
Journal:
Soft Computing - SOCO
, vol. 15, no. 5, pp. 845-853, 2011
Hierarchical summarization techniques for network traffic
A. N. Mahmood
,
C. Leckie
,
R. Islam
,
Z. Tari
Conference:
IEEE Conference on Industrial Electronics and Applications - ICIEA
, pp. 2474-2479, 2011
Cube Based Summaries of Large Association Rule Sets
(
Citations: 1
)
Marie Ndiaye
,
Cheikh Talibouya Diop
,
Arnaud Giacometti
,
Patrick Marcel
,
Arnaud Soulet
Conference:
Advanced Data Mining and Applications - ADMA
, pp. 73-85, 2010