Building an Integrative System: Enriching Gesture Language Video Recognition through the Multi-Stream of Hybrid and Improved Deep Learning Models with an Adaptive Decision System

Authors

  • Anwar Mira University of Babylon, Iraq
  • Olaf Hellwich Technische Universität Berlin, Germany

DOI:

https://doi.org/10.4114/intartif.vol27iss74pp181-213

Keywords:

Recurrent Neural Networks, Self Organizing Network, Radial Basis Function Network, Video Gesture Recognition, Adaptive Decision System

Abstract

The recognition of hand gestures is of growing importance in developing human-machine interfaces that rely on hand motions for communication. However, recognizing hand gesture motions poses challenges due to overlapping gestures from different categories that share similar hand poses. Temporal information has proven to be more effective in distinguishing sequences of hand gestures. To address these challenges, this research presents an innovative adaptive decision-making system that aim to enhance gesture recognition within the identical category have been introduced. The system capitalizes on the potential for variations in recognition outcomes derived from a diverse model of time-sharing neural networks, each employing different neural networks and trained on distinct input features. By incorporating such diverse input features, the system significantly boosts the robustness of recognition decisions, enabling it to effectively capture even the most subtle disparities within internal video representations. To achieve our research objective, we extensively investigate deep convolutional neural networks specifically trained on videos for hand gesture recognition. We also incorporate enhanced features from deep CNN using standard neural networks, namely Self Organizing Network and Radial Basis Function Network. By combining these features in various configurations, we develop novel frame-wise features based on the enhanced CNN features. These frame-wise features enable the training of diverse sets of recurrent neural network models, resulting in novel ensembles of composite models derived from various recurrent neural networks with diverse configurations. Some models are trained using multiple streams, while others utilize a single stream. To ensure the effective integration of these models, we implement a novel adaptive decision system mechanism that improves performance for weak prediction models and enhances overall recognition capability by taking a collective prediction decision. Experimental results demonstrate the significance of each proposed recurrent neural network model and the effectiveness of the new frame-wise features in enabling accurate decisions. This research achieves state-of-the-art performance in hand gesture recognition, highlighting the potential of combining different neural network architectures and feature representations to achieve superior outcomes.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biography

Olaf Hellwich, Technische Universität Berlin, Germany

Technische Universität Berlin, Computer Vision & Remote Sensing, Germany

Downloads

Published

2024-09-04

How to Cite

Mira, A., & Hellwich, O. . (2024). Building an Integrative System: Enriching Gesture Language Video Recognition through the Multi-Stream of Hybrid and Improved Deep Learning Models with an Adaptive Decision System. Inteligencia Artificial, 27(74), 181–213. https://doi.org/10.4114/intartif.vol27iss74pp181-213