Multimedia databases contain diverse and interrelated data types such as images and text, making it difficult to efficiently extract meaningful relationships and perform accurate retrieval or inference. Existing multimodal data mining approaches often struggle with scalability, slow convergence, and computational complexity. These limitations hinder performance and responsiveness, particularly as database sizes grow. As a result, current methods are not well-suited for real-time querying or large-scale applications, creating a need for more efficient frameworks that can handle complex, multimodal data while maintaining speed and accuracy.
The invention presents an Enhanced Max Margin Learning (EMML) framework that formulates multimodal data mining as a structured prediction problem. It learns relationships between different data modalities by optimizing a max margin objective while selectively focusing on active constraints to reduce computational burden. By significantly reducing the number of constraints considered during optimization, the framework improves efficiency without sacrificing accuracy. The system supports scalable querying by decoupling query response time from database size, enabling rapid retrieval and inference regardless of dataset scale. This approach allows efficient training and high-performance analysis across multimodal datasets, making it suitable for large and complex multimedia environments.
• Improves learning efficiency through faster convergence rates
• Enables query response time independent of database size
• Supports scalable multimodal data mining for large datasets
• Enhances accuracy in image annotation and retrieval tasks
• Provides a generalizable framework for structured prediction problems
• Reduces number of optimization constraints by approximately 70×, improving computational efficiency
• United States 8,463,053 Issued 6/11/2013
• United States 8,923,630 Issued 12/30/2014
• United States 10,007,679 Issued 6/26/2018
Tested on a real-world annotated image dataset (Berkeley Drosophila embryo database) with ~36,000 images and associated text labels.
This technology is available for licensing.
Strong potential for adoption by developers of multimedia search platforms, AI and machine learning companies, and organizations managing large-scale image and text databases seeking scalable, high-efficiency multimodal data mining and retrieval solutions.
Validation performed using a large annotated image dataset; additional performance details and implementation information available upon request.