NTU CCTV-Fights Dataset

CCTV-Fights Dataset contains 1,000 videos picturing real-world fights, recorded from CCTVs or mobile cameras. We also provide frame-level annotation of each fight instance segment present in the videos, with its exact starting and ending points.

The dataset videos were collected from YouTube, searching with keywords such as: CCTV Fight, Mugging, Violence, Surveillance, Physical violence, etc. The fights can contain a diverse range of actions and attributes, for example: punching, kicking, pushing, wrestling, with two persons or more, etc. It was discarded videos that did not came directly from a CCTV recording (e.g., footage made with a mobile camera recording a screen), as well as videos with heavy special effects (e.g., shaded borders, slow-motion).

The dataset consists of 280 CCTV videos containing different types of fights, ranging from 5 seconds to 12 minutes, with an average length of 2 minutes. Furthermore, it also contains 720 videos of real fights from other sources (hereinafter referred to as Non-CCTV), mainly from mobile cameras, but a few from car cameras (dash-cams) and drones or helicopters. These videos are shorter, 3 seconds to 7 minutes, with an average length of 45 seconds, but still some have multiple instances of fight and can help the model to generalize better.

The table below presents a summary of the dataset statistics.

  Videos Duration (hours) Fight Instances Instances Average per video
All 1,000 17.68 2,414 2.41
CCTV 280 8.54 747 2.67
Non-CCTV 720 9.13 1,667 2.32

The overall size of the dataset is 7.2 GB.

How to obtain the dataset:
If interested, researchers can register an account, submite the request form and accept the Release Aggrement. We will validate your request and grand approve for downloading the datasets.

Sample Frames

Sample Videos


Usage for Academic Reseach

This video dataset is released for academic research only, and is free to researchers from educational or research institutes for non-commercial purposes.

Related Publications
All publications using the NTU CCTV-Fights dataset should include the following acknowledgement: “(Portions of) the research in this paper used the NTU CCTV-Fights Dataset made available by the ROSE Lab at the Nanyang Technological University, Singapore.”

Furthermore, these publications should cite the following reference:
Mauricio Perez, Alex C. Kot, Anderson Rocha, “Detection of Real-world Fights in Surveillance Videos”, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019