Knowledge-Intensive Subgroup Mining
Techniques for Automatic and Interactive Discovery
Subgroup mining is a powerful and broadly applicable data mining approach: In general, the goal is to efficiently discover novel, potentially useful and ultimately interesting knowledge given by subgroup patterns. However, in real-world situations these requirements often cannot be fulfilled, e.g., if the applied methods do not scale for large data sets, if too many results are presented, or if many of the discovered patterns are already known to the user. This work proposes a combination of several techniques in order to cope with the sketched problems: Concerning automatic methods we present the novel SD-Map algorithm that is fast and effective. We describe interactive techniques for subgroup introspection and analysis, and we present advanced visualization methods that can be used for subgroup optimization, comparison and exploration. Furthermore, we propose to include several classes and types of background knowledge into the mining process. The techniques are combined into a knowledge-intensive process supporting both automatic and interactive methods for subgroup mining. The evaluation consists of two parts: With respect to objective evaluation criteria (efficiency and effectiveness), we provide a thorough experimental evaluation using synthetic data demonstrating the benefit of the presented methods. Subjective evaluation criteria include the user acceptance, the benefit, and finally the interestingness of the results. The approach has been successfully implemented in medical and technical applications, for which we present five case studies using real-world data.