NTU RGB+D Action Recognition Dataset

NTU RGB+D action recognition dataset consists of 56,880 action samples containing RGB videos, depth map sequences, 3D skeletal data, and infrared videos for each sample. This dataset is captured by 3 Microsoft Kinect v.2 cameras concurrently. The resolution of RGB videos are 1920×1080, depth maps and IR videos are all in 512×424, and 3D skeletal data contains the three dimensional locations of 25 major body joints, at each frame.

Available Action Classes
Our dataset contains 60 different action classes in three broad categories: daily actions, mutual actions, and medical conditions.

Daily Actions (40)
drink water reading take off a hat/cap pointing to something
eat meal writing cheer up taking selfie
brushing teeth tear up paper hand waving check time (from watch)
brushing hair wear jacket kicking something rub two hands
drop take of jacket reach into pocket nod head/bow
pickup wear a shoe hopping shake head
throw take off shoe jump up wipe face
sitting down wear on glasses phone call salute
standing up take off glasses playing with phone put the palms together
clapping put on hat/cap typing cross hands in front

Medical Conditions (9)
sneeze/cough headache neck pain
staggering chest pain vomiting
falling back pain fan self

Mutual Conditions (11)
punch/slap pat on the back giving something walking towards
kicking point finger touch pocket walking apart
puching hugging handshaking -

Size of the data:
To ease the downloading, we separate the modalities of the samples into different files.
The size of each modality is shown in the below table:
Data ModalitySize
3D skeletons (body joints)5.8 GB
Masked depth maps83 GB
Full depth maps886 GB
RGB videos136 GB
IR videos221 GB
Total: 1.3 TB

Masked depth maps are the foreground masked version of the depth maps.
Masking is done based on the locations of the detected body joints, to remove the background and less important parts of the depth maps and to improve the compression rate.

How to obtain the dataset:
Please click on the "Request Dataset" hyperlink at the bottom of the page. We will then send you the LoginID and password to download the dataset.

More info:
We provide more information about the data, answers to FAQs, samples codes to read the data, and the latest published results on our dataset here.

Sample Frames

Sample Video

Usage for Academic Reseach

The image database is released for academic research only, and is free to researchers from educational or research institutes for non-commercial purposes.

If interested, please click on the "Request Dataset" hyperlink at the bottom of the page. We will then send you the LoginID and password to download the dataset.

Related Publications

All publications using the NTU RGB+D Action Recognition Database should include the following acknowledgement: “(Portions of) the research in this paper used the NTU RGB+D Action Recognition Dataset made available by the ROSE Lab at the Nanyang Technological University, Singapore.”

Furthermore, these publications should cite the following reference:
Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016 [PDF] [bibtex].

Requestor may also wish to cite the following related work:

Request Dataset

Authorised to Download