Combining multiple clustering results into a single consensus output is challenging due to the lack of explicit correspondence between clusters. Traditional approaches often assume direct one-to-one mappings or rely on indirect transformations that reduce interpretability. These methods can also introduce high computational cost and limited robustness when dealing with heterogeneous or distributed datasets. As clustering is increasingly applied in large-scale and privacy-sensitive environments, there is a need for more flexible and efficient techniques that can accurately align and integrate multiple clustering outputs without restrictive assumptions.
The invention introduces a soft correspondence framework that models relationships between clusters using weighted correspondence matrices rather than one-to-one mappings. It defines an optimization problem based on minimizing the distance between transformed and target membership matrices. An iterative algorithm, using multiplicative updating rules, jointly computes consensus clustering and correspondence matrices. This approach enables more accurate alignment across clusterings while accommodating varying cluster structures and incomplete labeling. By directly modeling inter-cluster relationships, the framework improves robustness and interpretability while maintaining computational efficiency across diverse datasets.
• Directly addresses the correspondence problem without restrictive assumptions
• Improves clustering robustness and stability across diverse inputs
• Reduces computational complexity compared to co-association methods
• Provides interpretable correspondence matrices between clusterings
• Handles missing labels effectively without data loss
• Supports distributed and privacy-preserving data scenarios
• Scales efficiently to large datasets
• United States 8,195,73 Issued 6/5/2012
• United States 8,499,022 Issued 7/30/2013
Evaluated on three real-world benchmark datasets (IRIS, PENDIG, ISOLET) with varying sizes and feature types; compared against four clustering ensemble methods (CSPA, MCLA, QMI, MMEC).
This technology is available for licensing.
Strong potential for adoption by data analytics providers, AI developers, and organizations working with distributed or privacy-sensitive datasets seeking robust and interpretable clustering ensemble solutions.
Benchmark evaluation results and algorithmic implementation details available upon request.