Research

Work & Interests

Research assistant at CMATER Lab, Jadavpur University. Broadly interested in multimodal learning, efficient deep learning, and AI for embodied systems.

Publications

Under ReviewICPR / Springer LNCS

Wilson-Prime Channel Attention and Gini-Adaptive Dual-Backbone Fusion of Deep Models for Medical Image Classification

Sajjad Ahmed Shaaz et al. — CMATER Lab, Jadavpur University

Proposes a dual-backbone architecture (MobileNetV2 + DenseNet121) with novel Wilson Prime Channel Attention, Prime-Gini Adaptive Fusion, and linear-complexity global context modeling for medical image classification.

LC25000BreakHisPneumoniaMNIST

In ProgressBMVC (target)

Multimodal Deepfake Detection via Audio-Visual Causal Divergence

Sajjad Ahmed Shaaz — CMATER Lab, Jadavpur University

Frames deepfake detection as identifying broken causal links between audio and visual modalities. Uses WhisperX forced phoneme alignment, CLIP ViT encoders, and Riemannian manifold representations to measure cross-modal synchrony divergence.

FakeAVCelebPolyGlotFake

Multimodal Deepfake Detection

Investigating audio-visual causal divergence as a signal for deepfake detection. Current work explores cross-modal synchrony via WhisperX forced phoneme alignment, CLIP ViT encoders, and Riemannian manifold representations of causal link integrity between modalities.

WhisperXCLIPRiemannian geometryCausal reasoningFakeAVCelebPolyGlotFake

Medical Image Classification

Dual-backbone deep learning architectures with attention mechanisms for robust medical image classification across pathology, histology, and radiology domains.

MobileNetV2DenseNet121Channel AttentionKnowledge DistillationLC25000BreakHis

On-Device AI & Model Compression

Deploying deep learning models on constrained hardware via structured pruning, quantization, and knowledge distillation. Current focus on FPGA deployment with Vitis AI and TADNet for task-aware open-vocabulary detection.

PruningINT8 QuantizationKnowledge DistillationVitis AIFPGACLIP

Autonomous Systems & Robotics

Computer vision pipelines and reinforcement learning for autonomous drone and rover systems. Includes GPS-resilient navigation via IMU-based dead reckoning and RL-based manipulator control.

YOLOStable-Baselines3PPOMAVLinkDead ReckoningROS

Interested in collaborating? Reach out.