Human Activity Recognition in the Deep Learning Era: Different Modalities, Recent Advances in Applications, and Emerging Techniques

Mohammad Osman Khan; Imran Khan Apu

doi:10.20944/preprints202512.0906.v1

Submitted:

09 December 2025

Posted:

10 December 2025

You are already at the latest version

Abstract

Human Action Recognition (HAR) has grown into one of the most active areas in computer vision, finding uses in healthcare, smart homes, security, autonomous driving, and even human–robot interaction. Over the past decade, deep learning has transformed how HAR is approached. Instead of relying on handcrafted features, modern models learn directly from raw data, whether that comes from RGB videos, skeleton sequences, depth maps, wearable devices, or wireless signals. Existing surveys typically focus on either technical architectures or specific modalities, lacking comprehensive integration of recent advances, practical applications, and explainability. This survey addresses this gap by examining cutting-edge deep learning methods alongside their real-world deployment in fall detection, rehabilitation monitoring, and navigation systems. We analyze emerging techniques driving HAR forward: transformer architectures for temporal modeling, self-supervised learning reducing annotation requirements, contrastive learning for robust representations, and graph neural networks excelling in skeleton-based recognition through joint relationship modeling. Advanced approaches, including few-shot and meta-learning, enable novel activity recognition with limited data, while cross-modal learning facilitates knowledge transfer between sensor modalities. Federated learning preserves privacy across distributed devices, neural architecture search automates design optimization, and domain adaptation improves generalization across environments and populations, collectively advancing HAR toward efficient, adaptable, deployment-ready solutions. By synthesizing recent advances, real-world applications, and explainability requirements, this survey provides researchers and practitioners a consolidated roadmap for developing HAR systems that are accurate, interpretable, and ready for practical deployment across diverse domains.

Keywords:

human action recognition

;

deep learning

;

recent survey

;

multimodal learning

;

HAR application

Subject:

Computer Science and Mathematics - Computer Vision and Graphics

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Human Activity Recognition in the Deep Learning Era: Different Modalities, Recent Advances in Applications, and Emerging Techniques

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe