NTU RGB+D Action Recognition Dataset

NTU RGB+D action recognition dataset consists of 56,880 action samples containing RGB videos, depth map sequences, 3D skeletal data, and infrared videos for each sample. This dataset is captured by 3 Microsoft Kinect v.2 cameras concurrently. The resolution of RGB videos are 1920×1080, depth maps and IR videos are all in 512×424, and 3D skeletal data contains the three dimensional locations of 25 major body joints, at each frame.

Size of the data:
To ease the downloading, we separate the modalities of the samples into different files.
The size of each modality is shown in the below table:
Data ModalitySize
3D skeletons (body joints)5.8 GB
Masked depth maps83 GB
Full depth maps886 GB
RGB videos136 GB
IR videos221 GB
Total: 1.3 TB

Masked depth maps are the foreground masked version of the depth maps.
Masking is done based on the locations of the detected body joints, to remove the background and less important parts of the depth maps and to improve the compression rate.

How to obtain the dataset:
Please click on the "Request Dataset" hyperlink at the bottom of the page. We will then send you the LoginID and password to download the dataset.

More info:
We provide more information about the data, answers to FAQs, samples codes to read the data, and the latest published results on our dataset here.

Sample Frames


Sample Video
​​

Usage for Academic Reseach

The image database is released for academic research only, and is free to researchers from educational or research institutes for non-commercial purposes.

If interested, please click on the "Request Dataset" hyperlink at the bottom of the page. We will then send you the LoginID and password to download the dataset.


Related Publications

All publications using the NTU RGB+D Action Recognition Database should include the following acknowledgement: “(Portions of) the research in this paper used the NTU RGB+D Action Recognition Dataset made available by the ROSE Lab at the Nanyang Technological University, Singapore.”

Furthermore, these publications should cite the following reference:
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 [PDF] [bibtex].

Requestor may also wish to cite the following related work:

Request Dataset

Authorised to Download