A Comprehensive Review of Real-Time Stream Data Analysis and Processing Frameworks for Anomaly Detection

With the rise of IoT devices, wearable sensors, and real-time monitoring systems, vast amounts of human activity data are being continuously generated. Streaming data analysis has become essential in applications such as healthcare, cybersecurity, fraud detection, and smart environments. However, detecting anomalies (outliers) in these real-time data streams remains a significant challenge due to high velocity, dynamic distributions, and resource constraints.

This survey paper provides a comprehensive review of:

Big data stream processing architectures (Apache Spark, Flink, Storm).
Anomaly detection approaches for human activity streams.
Machine learning-based frameworks for human activity recognition (HAR).

The study also presents a systematic taxonomy of HAR techniques and proposes a conceptual framework for integrating big data analytics with AI-driven anomaly detection.

Big Data Stream Processing Frameworks

Streaming data frameworks provide the foundation
for real-time analytics and anomaly detection in human activity datasets. This
section evaluates Apache Spark, Apache Flink, and Apache
Storm, comparing their scalability, fault tolerance, and efficiency in
handling HAR data streams.

Key Frameworks Evaluated:

1- Apache Spark Structured Streaming:

Strengths: Efficient micro-batch processing, strong
machine learning integration, fault tolerance.
Limitations: Medium latency.
Key Finding: Processes 1,000-byte records 6x
faster than Storm (benchmarked in Yahoo studies).

2- Apache Flink:

Strengths: Event-driven processing, low latency.
Limitations: Limited built-in ML libraries.
Key Finding: Outperforms Spark in real-time advertising
analytics (30-minute latency tests).

3- Apache Storm:

Strengths: Continuous processing, low overhead.
Limitations: Complexity in scaling.
Key Finding: Struggles beyond 135,000 events/sec
(Chintapalli et al., 2016).

📊 Framework Comparison:

Anomaly Detection in Streaming Human Activity Data

Anomaly detection techniques are crucial for
identifying irregular patterns in HAR data streams, such as abnormal walking
patterns, sudden falls, or unauthorized movements. This section provides a
detailed comparison of the most effective anomaly detection methods applied to
streaming HAR.

Types of Anomaly Detection Approaches:

1 Statistical Methods (e.g., Z-score):

Limitations: Assumes normal distribution; ineffective for multivariate data.

2- Density-Based Methods (e.g., LOF):

Limitations: High computational cost; struggles with dynamic data streams.

3- Isolation Forest (IF):

Strengths: Linear time complexity, scalable for high-dimensional data.
Case Study: Detected credit card fraud with higher accuracy than distance-based
methods (AUC = 0.89).

📌 Table 3.1: Anomaly Detection Method Comparison

Human Activity Recognition (HAR) Taxonomy & Applications

HAR plays a key role in healthcare, security, and smart home automation. This section categorizes sensor-based, vision-based, and hybrid HAR techniques, highlighting their advantages and limitations.

HAR Classification Approaches:

1- Sensor-Based HAR:

Data Sources: Smartphone accelerometers, gyroscopes.
Accuracy: Up to 94% with time-domain features (Casale et al., 2011).

2-Vision-Based HAR:

Limitations: Privacy concerns, high computational cost.

3- Hybrid Systems:

Advantage: Combines sensor/vision data for improved reliability.

📌 HAR Taxonomy Diagram

the following diagram is summarizing most of the conducted research and works on the area of human activity recognition in which it shows different considered points where a one can keep in mind when planning to implement new work related to this area such as recognition types, used techniques, applied algorithms, data sources as well as and application areas.

📌 Human Activity Recognition Framework Design

Aiming to summarize the general and typical picture of the designing and implementing of a pattern recognition system such as human activity recognition, the following Human activity recognition framework design shows how human activity recognition system’s main modules and parts are constructed together in order to achieve the task of activity recognition successfully. several modules can be noticed in the previous diagram in which the human activity recognition system consists of a data collection step, Data segmentation, feature Engineering, and the training of the model selected. And as a result of the earlier processes, we will finally be able to detect and recognize the desired activity

Note on Methodology

The information and conclusions presented in this table are based on a comprehensive review of existing research and studies in the field of anomaly detection. No experiments were conducted by the authors as part of this survey. The findings and insights are synthesized from the analysis of prior work, including benchmark studies, comparative evaluations, and theoretical frameworks published in reputable journals and conferences.

Final Thoughts

This survey establishes a foundation for real-time anomaly detection in human activity recognition, addressing key challenges in stream data analytics, anomaly detection, and AI-driven behavior modeling. It serves as a valuable resource for researchers, data scientists, and engineers working in the field of real-time AI-powered HAR systems.

📌 Key Research Contributions:
✅ A detailed comparative study of real-time stream processing frameworks.
✅ A comprehensive review of anomaly detection methods tailored for HAR applications.
✅ A systematic HAR taxonomy, classifying techniques based on data sources and algorithms.

📄 Full Research Paper: View Here