This tutorial aims to provide a comprehensive coverage on major techniques for image and video semantic analytics. I will first describe a list of applications that have huge market demands and commercial values. Some of these applications can already be completely or at least partially achieved by existing image and video analytic techniques, while others are hot topics for future research and development. By taking a closer look at these diverse and apparently unrelated applications, we can identify a few key technical components that are essential to accomplish all these applications. Among these key components, I will focus on images features that have been proven effective for object detections and semantic analytics, such as SIFT, HOG, LBP, Sparse Coding, Fisher Vectors, etc. Then I will present several state-of-the-art object detection and image semantic classification techniques, including AdaBoost, Cascade method, Deformable Parts Models (DPM), Deep Learning Convolutional Neural Networks (DLCNN), etc. My presentation will cover both the classical works from the literature, and the innovative research studies conducted by my team. I will also present various experimental evaluation results performed by my team to reveal the characteristics of the above image features, object detection and image semantic classification methods, as well as their combinations. If time allows, I will present our latest research works on the multi-object tracking task, and show their performance evaluations results.
Yihong GONG received his B.S., M.S., and Ph.D. degrees in Electrical and Electronic Engineering from the University of Tokyo in 1987, 1989, and 1992, respectively. He then joined Nanyang Technological University of Singapore, where he worked as an assistant professor in the School of Electrical and Electronic Engineering for four years. From 1996 to 1998, he worked for the Robotics Institute, Carnegie Mellon University as a project scientist. He was a principal investigator for both the Informedia Digital Video Library project and the Experience-On-Demand project funded in multi-million dollars by NSF, DARPA, NASA, and other government agencies. In 1999, he joined NEC Laboratories America, and established the multimedia analytics group for the lab. In 2006, he became the site manager to lead the entire Cupertino branch of the lab. In 2012, he joined Xi’an Jiaotong University in China, and became a professor of the “Thousand Talent Program”. His research interests include image/video content analysis and machine learning applications. He is among the first batch of researchers in the world initiating research studies on content-based image retrieval, sports video highlight detection, and text/video content summarization. He has published more than 100 technical papers, is the author of two monographs, and contributed chapters to two multimedia handbooks. To date, his works have been cited for over 12000 times, and his most cited paper has received more than 1200 citations by peer researchers around the world. He led the team to develop the SmartCatch intelligent video surveillance system that led to a successful spin-off. The spin-off company was among the top three companies in terms of market share in the intelligent video surveillance sector. He also led the team to successfully develop the first commercial human gender/age recognition software in the world, which was widely reported by major Japanese and US TV stations. Under his leadership, his teams won No.1 positions at 2008, 2009 TRECVID Event Detection Contests, 2009 PASCAL VOC Contest, and 2014 Chinese Graduate Student Video Content Analytics Contest.