AI Heap
Published on

SNIFR : Boosting Fine-Grained Child Harmful Content Detection Through Audio-Visual Alignment with Cascaded Cross-Transformer

arXiv:2506.03378 - [arXiv,PDF]
Authors
  • Name
    Orchid Chetia Phukan
  • Name
    Mohd Mujtaba Akhtar
  • Name
    Girish
  • Name
    Swarup Ranjan Behera
  • Name
    Abu Osama Siddiqui
  • Name
    Sarthak Jain
  • Name
    Priyabrata Mallick
  • Name
    Jaya Sai Kiran Patibandla
  • Name
    Pailla Balakrishna Reddy
  • Name
    Arun Balaji Buduru
  • Name
    Rajesh Sharma
  • Affiliation
    Department of Biological Sciences, XYZ University
  • Affiliation
    Department of Chemistry, ABC Institute
  • Affiliation
    Department of Physics, DEF University
  • Affiliation
    Department of Environmental Science, GHI University
  • Affiliation
    Department of Computer Science, JKL University
  • Affiliation
    Department of Mathematics, MNO University
  • Affiliation
    Department of Engineering, PQR Institute
  • Affiliation
    Department of Biotechnology, STU University
  • Affiliation
    Department of Pharmacy, VWX University
  • Affiliation
    Department of Statistics, YZA University
  • Affiliation
    Department of Economics, BCD University
As video-sharing platforms have grown over the past decade, child viewership has surged, increasing the need for precise detection of harmful content like violence or explicit scenes. Malicious users exploit moderation systems by embedding unsafe content in minimal frames to evade detection. While prior research has focused on visual cues and advanced such fine-grained detection, audio features remain underexplored. In this study, we embed audio cues with visual for fine-grained child harmful content detection and introduce SNIFR, a novel framework for effective alignment. SNIFR employs a transformer encoder for intra-modality interaction, followed by a cascaded cross-transformer for inter-modality alignment. Our approach achieves superior performance over unimodal and baseline fusion methods, setting a new state-of-the-art.