Distributed association rules mining using non-derivable frequent patterns



Mining association rules in distributed databases is an interesting problem in the context of parallel and distributed data mining. A number of approaches have, so far, been proposed for distributed mining of association rules. However, most of them consider all types of frequent itemsets the same, even though there are different types of itemsets in distributed databases, e.g., derivable and non-derivable. In this study, a new application of deduction rules is introduced for distributed mining of association rules which exploits the derivability of itemsets to reduce communication overhead and to enhance response time. A new algorithm is proposed which mines derivable and non-derivable frequent itemsets in a distributed database. Since the collection of derivable and non-derivable frequent itemsets form all frequent itemsets, our algorithm mines all frequent itemsets rather than a subset of them. In the algorithm, there is no need to scan local databases and exchange messages in order to obtain support counts of derivable frequent itemsets, since each site can produce them autonomously. Experimental evaluations on horizontally partitioned real-life datasets show that such exploitation drastically reduces communication and also improves response time.  Therefore the new algorithm is useful when communication bandwidth is the main bottleneck.