AIM Intelligence Exposes Major Safety Flaw in AI Video Models: SceneSplit Achieves 77-84% Filter Bypass Rate
Research accepted to ICLR 2026, world's top ML conference, reveals how harmless scenes evade detection but combine into harmful videos
SF, CA, UNITED STATES, February 27, 2026 /EINPresswire.com/ -- AIM Intelligence, a company specializing in AI safety, in collaboration with researchers from Yonsei University, Korea Institute of Science and Technology (KIST), Seoul National University, and Kyung Hee University, has identified structural vulnerabilities in the safety systems of major commercial Text-To-Video (T2V) models - including Google DeepMind’s Veo2, Luma’s Ray2, and Minimax’s Hailuo - that allow content safety filters to be bypassed with success rates ranging from 77% to 84%.The research paper, "Jailbreaking on Text-to-Video Models via Scene Splitting Strategy," has been officially accepted to ICLR 2026 (International Conference on Learning Representations), one of the world's most prestigious machine learning conferences. Approximately 19,000 papers were submitted to ICLR 2026, with only 28.2% being accepted. The full paper is available at https://velpegor.github.io/SceneSplit/.
Structural Blind Spot Where Harmless Scene Combinations Transform into Harmful Content
The "SceneSplit" technique developed by the research team divides a single harmful video request into 2-5 individual scenes, with each scene constructed to appear harmless on its own. For example, individual descriptions such as "smoke rising into the sky," "people lying on the ground," and "red liquid" pass safety filters independently, but when these scenes are sequentially combined, they transform into a video reminiscent of an explosion scene.
Current safety filters in commercial T2V models only examine input prompts at the individual unit level. SceneSplit exploits this exact vulnerability by allowing individual scenes to pass through filters while ultimately generating harmful content as the scenes combine—a structural blind spot in narrative context evaluation.
Up to 84% Attack Success Rate Recorded Across Major Commercial Models
The research team evaluated five major commercial models using 220 prompts across 11 safety categories including pornography, violence, discrimination, and illegal activities. The results were as follows:
- Minimax Hailuo: 84.1% success rate
- Kling v1.0: 78.6%
- Google DeepMind Veo2: 78.2%
- Luma Ray2: 77.2%
- OpenAI Sora2: 68.6%
While existing single-prompt-based attack techniques achieved success rates of only 33-41%, SceneSplit recorded more than double the success rate across all models. Notably, the technique achieved 60% success in Hailuo's pornography category where existing methods had 0% success, and in Veo2's illegal activities category, the success rate surged from 10% to 90%.
Three-Stage Automated Attack System That Learns from Failures
SceneSplit goes beyond simple prompt manipulation, consisting of a three-stage system that learns from failed attempts:
1. Scene Splitting: Reconstruct harmful prompts into multiple harmless scenes
2. Scene Manipulation: Analyze generated videos and selectively modify only the most influential scenes. Adjust expressions to be more direct when weak, or more circumventive when filtered, exploring the boundaries of safety filters
3. Strategy Update: Store successful patterns for reuse in similar attacks
The research team's ablation experiments confirmed that each stage independently improves performance by approximately 17-18 percentage points, proving that all three components play critical roles in attack success.
Need for Next-Generation Safety Filters That Understand Narrative Context
This research reveals a fundamental limitation of current video generation models' safety systems, which remain confined to prompt-level censorship and fail to comprehensively evaluate cross-scene context. As T2V models from Google DeepMind, Luma, Minimax, and other major companies rapidly expand into advertising, media, and social media content creation, advanced safety technologies capable of evaluating entire narrative contexts are urgently needed.
"If text LLMs generate harmful information, video models create harmful content itself," said Ha-on Park, CTO of AIM Intelligence. "As these models are rapidly being deployed in real-world industrial settings, automated red-teaming that proactively identifies vulnerabilities and safety control technologies based on such findings are essential, not optional. Building on this research, AIM Intelligence will continue to advance safety verification technologies across multimodal AI systems."
Research Collaboration Information
This research was conducted jointly by Ha-on Park, CTO of AIM Intelligence (first author), Wonjun Lee from Yonsei University and Korea Institute of Science and Technology (KIST), and Doehyeon Lee from Seoul National University, under the supervision of Professor Suhyun Kim from Kyung Hee University.
About AIM Intelligence
AIM Intelligence is a Seoul-based AI safety company specializing in automated red-teaming, real-time guardrails, and AI monitoring solutions. Founded in 2024, the company has collaborated with global industry leaders including BMW, OpenAI, and LG Electronics, and has conducted safety assessments on Anthropic's private models. AIM Intelligence conducts research across large language models, multimodal systems, autonomous agents, and physical AI, and has published over 15 papers at top-tier conferences including ICLR, ICML, ACL, NeurIPS, and CVPR.
Team Cookie Official
Team Cookie
email us here
Visit us on social media:
LinkedIn
Facebook
Legal Disclaimer:
EIN Presswire provides this news content "as is" without warranty of any kind. We do not accept any responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright issues related to this article, kindly contact the author above.
