Program

Time Mon, July 4 Tue, July 5 Wed, July 6 Thu, July 7 Fri, July 8
0900-1030 Chiou-Ting HSU
Visual Analysis
Antonio ORTEGA
Graph Signal Processing
Part 1
Xudong JIANG
Dimensionality Reduction
Junsong YUAN
Data Mining
Part 1
Simon SEE
Deep Learning
1030-1100 Refreshment Break Refreshment Break Refreshment Break Refreshment Break Refreshment Break
1100-1230 Nikolaos BOULGOURIS
Gait Recognition 1
Antonio ORTEGA
Graph Signal Processing
Part 2
Xudong JIANG
Sparse Coding
Junsong YUAN
Data Mining
Part 2
Ettikan KARUPPIAH
DIGITS Deep Learning
1230-1400 Lunch Lunch Lunch Lunch Lunch
1400-1530 C.-C. Jay KUO
CNN, Pedestrian Detection
Nikolaos BOULGOURIS
Gait Recognition 2
Irene Yu-Hua GU
Visual Analysis
Wenjun ZENG
Video Understanding
Tiejun HUANG
Visual Information
1530-1600 Refreshment Break Refreshment Break Refreshment Break Refreshment Break Refreshment Break
1600-1730 C.-C. Jay KUO
CNN, Road Scene Understanding
Chiou-Ting HSU
Visual Recognition
Irene Yu-Hua GU
Machine Learning
Tour
S.E.A. Aquarium
Yonghong TIAN
Multi-Task Learning

 

 

Speakers

 

Nikolaos BOULGOURIS
Senior Lecturer at Brunel University


Gait Recognition : Fundamental techniques and architectures (Part 1)

Gait recognition : Advanced systems (Part 2)

 
 

Abstract:Gait analysis and recognition are relevant to a number of applications in several areas, including security, healthcare, and biomechanics. In this tutorial, we discuss application areas of gait analysis and recognition, and we go on to focus on the use of gait as a biometric trait.
Part I: Fundamental techniques and architectures
In part I of the tutorial we initially review applications of gait processing. We discuss application cases in healthcare and biomechanics as well as security and defence scenarios that would benefit from gait recognition technology. Subsequently, we embark on an exploration of gait as a biometric trait that can be used for identification of individuals. Although numerous generic techniques have been applied to vision problems, gait has its own characteristics which necessitate the development of systems and processes that are tailored to the particular requirements of gait recognition. To this end we examine fundamental problems, such as the exploitation of gait periodicity in background subtraction and the compensation of speed variations, and we study several instances of features or templates as well as suitable classification strategies that were experimentally shown to yield good recognition performance.
Part II: Advanced systems
Although it can be claimed that vision-based gait recognition is inherently less accurate than other visual recognition technologies, it cannot be doubted that imperfect video capturing conditions, posture variations of the walking individual (e.g., due to fatigue), walking while carrying objects or wearing heavy clothes, all have a negative impact on system performance. In part II of this tutorial we discuss advanced solutions that have been proposed for dealing with problems that are expected in real-life situations. The advanced systems that we will explore are based on 3D/Depth processing, model-based techniques, region partitioning techniques, hidden markov modelling, deep neural networks, and other techniques that have been seen to be efficient in gait recognition. These techniques can achieve significantly better results than basic techniques and therefore provide directions for future research and applications.

Biography:
Nikolaos BOULGOURIS is an academic member of staff with the Department of Electronic and Computer Engineering of Brunel University London. He joined Brunel in September 2010, where he currently serves as Senior Lecturer. Between December 2004 and August 2010 he was an academic member of staff at King's College London, U.K. Prior to that, he was a Post-Doctoral Fellow with the Department of Electrical and Computer Engineering of the University of Toronto, Canada. Dr Boulgouris has published more than 90 papers in international journals and conferences and his papers have been cited widely in papers by other researchers who work in the same area of research. His research interests are in the areas of biometrics, signal/image processing, biomedical signal processing, and machine learning. He co-edited the book Biometrics: theory, methods, and applications, which was published by Wiley-IEEE Press. He was Principal Investigator for the UK group in the EC-funded Project ACTIBIO, where he led the development of gait recognition systems. He is an elected member of the IEEE Image Video and Multidimensional Signal Processing Technical Committee (IVMSP - TC). He is a Senior Area Editor for the IEEE Transactions on Image Processing and an Associate Editor for the IEEE Transactions on Circuits and Systems for Video Technology. He also served as Associate Editor for the IEEE Transactions on Image Processing and the IEEE Signal Processing Letters.

 

Dr. Boulgouris is a Senior Member of the IEEE

 

 

Irene Yu-Hua GU
Professor at Chalmers University of Technology

Video Analysis & Action Recognition: Stochastic Modeling and Analysis for Visual Image Analysis (Part 1)

Machine Learning/ Visual Recognition: Machine Learning Methods for Automatic Traffic Sign Detection and Recognition (Part 2)

 

Abstract:
Video Analytics: Stochastic Modeling and Sequential Bayesian Estimation: from the vector space to smooth manifolds
Abstract: There has been a growing research interest in image/video analytics for computer vision, machine learning and artificial intelligence. This tutorial is contributed to video analytics related to stochastic modeling and sequential Bayesian estimation. The tutorial is subdivided into two parts. In the first part we review and explain several methods. We start with the models and estimation methods in a vector space, we then move on to the manifold-versions of models and estimation methods. Brief introduction of Riemannian/Grassmann manifolds, their mapping functions and metrics will also be included. In the second part, we present several application examples and results, including human fall detection, human activity classification, and manifold object tracking from videos.

Image Analytics: Saliency Object Detection Methods with Application to Automatic Classification of Traffic Signs
Abstract: There has been a growing research interest in video/image analytics for computer vision and machine learning. Detection and segmentation of salient objects have a wide variety of applications, for example, in pattern recognition, content-based image retrieval and editing, video summarization and object enhancement. This tutorial is contributed to some recent developed methods on salient visual object detection. In the first part we describe several salient object detection methods, including methods based on geodesic propagation, on normalized graph cuts, on adaptive graph weight reconstruction, and on full learning of continuous-CRFs. In the second part of this tutorial, we present an application to saliency enhanced machine learning for automatic traffic sign recognition.

 

Biography:
Irene Yu-Hua GU received the Ph.D. degree in electrical engineering from the Eindhoven University of Technology, Eindhoven, The Netherlands, in 1992. From 1992 to 1996, she was Research Fellow at Philips Research Institute IPO, Eindhoven, The Netherlands, post dr. at Staffordshire University, Staffordshire, U.K., and Lecturer at the University of Birmingham, Birmingham, U.K. Since 1996, she has been with the Department of Signals and Systems, Chalmers University of Technology, Gothenburg, Sweden, where she has been a full professor since 2004. Her research interests and work include statistical image and video modeling and analysis, object detection and tracking, pattern classification and machine learning. The applications include video surveillance, assisted living, road traffic analysis and traffic sign classification, MRI medical image analysis, and diagnostic systems for electric power system disturbances. She has published about 200 papers in international conferences and journals, and 4 book chapters. She is the co-author of the book “Signal processing of power quality disturbances” in John Wiley & Sons- IEEE Press.

Dr. Gu was an Associate Editor for the IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, and Part B: Cybernetics from 2000 to 2005. She was the Chair-elect of the IEEE Swedish Signal Processing Chapter from 2002 to 2004. She has been an Associate Editor with the EURASIP Journal on Advances in Signal Processing since 2005, and with the Editorial board of the Journal of Ambient Intelligence and Smart Environments since 2011.

 

Chiou-Ting HSU
Professor at National Tsing Hua University



Semantic Image Segmentation

 

Abstract:
Automatic image or scene understanding is essential to many computer vision applications. The goal of semantic image segmentation, or scene parsing, is to assign dense semantic labels to each pixel in an image. Different mechanisms, including parametric and nonparametric methods, have been proposed for semantic segmentation. Parametric methods adopt fully-supervised algorithms to infer the pixel-level labels. Given a densely annotated image dataset, a multi-category classifier is off-line trained and then used to label each pixel. Although existing parametric methods have achieved remarkable performance on datasets with a moderate size of category labels, these methods are not easy to adapt to ever-increasing real-world data. On the other hand, nonparametric methods do not pre-define the semantic labels and rely on transferring labels from similar images. Under the training-free scenario, nonparametric methods are more efficient and adaptive to real-world data, but their performance is inherently inferior to that of parametric methods. This tutorial will give an overview of existing parametric and nonparametrc methods, discuss their different challenges, and present possible direction toward jointly exploiting both methods for labeling real-world data.

 

Biography:
Chiou-Ting HSU is a professor of the Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan. She received the Ph.D. degree from National Taiwan University in 1997. From 1998 to 1999, she was with Philips Innovation Center, Taipei, Philips Research, as a senior research engineer. Her research interests include image processing, image analysis, computer vision, digital image forensics, and machine learning. She received the Citation Classic Award from Thomson ISI in 2001 for her paper “Hidden digital watermarks in images.” She was an elected member of the IEEE Information Forensics and Security Technical Committee (2013-2015). She has served in the editorial board of Advances in Multimedia (2006-2012) and IEEE Transactions on Information Forensics and Security (2012-2015), and is currently an Associate Editor of Journal of Visual Communication and Image Representation and EURASIP Journal on Image and Video Processing.

 

Tiejun HUANG
Professor at Peking University



Rethinking Coding And Analysis of Visual Information

 

Abstract:

 

Biography:
Tiejun HUANG is a professor and the director of the Institute for Digital Media Technology, the School of Electronic Engineering and Computer Science, Peking University. He is also the vice director of the National Engineering Lab for Video technology of China from 2009. He was awarded as the New Century Excellent Talents by the Ministry of Education of China in 2011.

Prof. Huang is the member of the Advisory Board of IEEE Computing Now, the Board of Director Digital Media Project, and the Editorial Board of Springer 3D Research Journal. In China, he is a council member of Chinese Institute of Electronics, the head of the China delegation of MPEG, a senior member of China Computer Federation. He is also the member of China National Standardization Theory and Methodology Standardization Technical Committee and the Multimedia subgroup of the China National Informational Technology Standardization Technical Committee.

Professor Tiejun Huang’s research areas are video coding, image understanding, digital right management (DRM) and digital library. In the last ten years, as the principal investigator, he led ten research projects funded by the Ministry of Science and Technology, Natural Science Foundation, Ministry of Education of China. He involved in three cooperation projects between China and USA, Europe and Korea. He published more then one hundred peer reviewed papers and three books as author or co-author. He holds twenty-three patents in multimedia field.

 

Xudong JIANG
Professor at Nanyang Technological University


Dimensionality Reduction: extract discriminative information or remove misleading information?

Sparse Coding for Classification: its merits, problems and recent developments

 

Abstract:
Dimensionality Reduction: extract discriminative information or remove misleading information?
Image contains rich information. Fully utilizing the rich information in image undoubtedly increases the possibilities of solving difficult real world problems. This, however, brings the difficulty for us to design a robust automated recognition system due to the complex characteristics of image and large variations of images taken under different conditions. Machine learning from the training database is a solution to extract effective features from the high dimensional image for recognition. It is thus not a surprise that approaches of the learning-based dimensionality reduction emerge in various research journals, many of which are in prestige journals. Many researchers and engineers find it difficult and even confused in choosing a proper approach from numerous diverse techniques due to a lack of thorough understanding of the roles of feature extraction and dimensionality reduction in the statistical inference and recognition. The different roles and effects of various dimensionality reduction techniques on facilitating a better detection and recognition have not been studied. Many fundamental yet critical issues are still outstanding or not thoroughly analyzed. This talk analyzes the fundamental problems of feature extraction and dimensionality reduction for automated image recognition. Based on this, the speech clarifies doubts, confusions and misunderstandings about roles of the learning-based dimensionality reduction. It aims at helping audience have an in-depth understanding and gain a clear picture of machine learning-based feature extraction. A total novel concept will be presented in this talk: “Removing misleading information” to replace the conventional “Extracting discriminative information” in machine learning-based image recognition.

Sparse Coding for Classification: its merits, problems and recent developments
High data dimensionality and lack of human knowledge about the effective features to classify the data are two challenging problems in computer vision and pattern recognition. The sparse representation-based classifier (SRC) significantly differentiates itself from the other classifiers in three aspects. One is the utilization of training samples of all classes collaboratively to represent the query images and another is the sparse representation code that coincides with the general classification target. The last is the L1-norm minimization of the representation error that enables SRC to recognize query images heavily corrupted by outlier pixels and occlusions. These three merits of SRC lead to some encouraging and impressive image recognition results, which attract great interest in further research on SRC. Many extensions of SRC are proposed in recent years and published in prestige journals. In this talk, we first help audience to have a deep understanding to the underline principles of SRC, i.e., how and why the sparse representation can be utilized to solve classification problems and the key advantages of this approach. The deep understanding to the SCR is necessary to analyze and find the problems and limitations of the SRC. These analyses and findings pave the way for us to investigate how the recent developments solve these problems and overcome the limitations of SRC, which bring the sparse representation-based image classification to a significantly higher level.

 

Biography:
Xudong Jiang received the B.Eng. and M.Eng. degrees from the University of Electronic Science and Technology of China (UESTC), Chengdu, China, in 1983 and 1986, respectively, and the Ph.D. degree from Helmut Schmidt University, Hamburg, Germany, in 1997, all in electrical engineering. From 1986 to 1993, he was a Lecturer with UESTC, where he received 2 Science and Technology Awards from the Ministry for Electronic Industry of China. From 1993 to 1997, he was a Scientific Assistant with Helmut Schmidt University. From 1998 to 2004, he was with the Institute for Infocomm Research, A*STAR, Singapore, as a Lead Scientist, and the Head of the Biometrics Laboratory, where he developed a system that achieved the most efficiency and the second most accuracy at the International Fingerprint Verification Competition in 2000. He joined Nanyang Technological University (NTU), Singapore, as a Faculty Member, in 2004, where he served as the Director of the Centre for Information Security from 2005 to 2011. He is currently a Tenured Associate Professor with the School of EEE, NTU. He holds seven patents and has authored over 100 papers with 25 papers in the IEEE journals, including 8 papers in IEEE Trans. Image Processing, 5 papers in IEEE Trans. Pattern Analysis and Machine Intelligence, and 3 papers in IEEE Trans. Signal Processing. His research interests include signal/image processing, pattern recognition, computer vision, machine learning, and biometrics. He is an Elected Voting Member of the IFS Technical Committee of the IEEE Signal Processing Society, and serves as an Associate Editor of IEEE Signal Processing Letters and IET Biometrics.

 

Ettikan KARUPPIAH
Director, Developers Ecosystem at NVIDIA



Hands-on Workshop: Introduction to the DIGITS Deep Learning Training System

  Abstract:
This Workshop will demonstrates introductory concepts of deep neural networks for image classification workflows using DIGITS. The NVIDIA Deep Learning GPU Training System (DIGITS) lets you quickly design the best deep neural network (DNN) for image classification and object detection tasks using real-time network behavior visualization. You will go through two labs- Using DIGITS for image classification problem and second using DIGITS for Object detection.

Biography:
Dr. Ettikan Kandasamy KARUPPIAH serves as Director, Developers Ecosystem at NVIDIA, APAC South assisting innovators and techno-entrepreneurs to accelerate GPU adaptation for their R&D and software solutioning needs. He has direct experience in both HPC and BDA with library/software design and development covering end-to-end needs. He also has published numerous publications, patents and software libraries from his past work.

 

C.-C. Jay KUO
Professor at University of Southern California



Boosted Convolutional Neural Networks (BCNN) for Pedestrian Detection (Part 1)

A Convolutional Neural Network Approach to Road Scene Understanding (Part 2)

 

Abstract:
Boosted Convolutional Neural Networks (BCNN) for Pedestrian Detection
With the emerging popularity of advanced driver assistance systems (ADAS) and autonomous cars, pedestrian detection has received a lot of attention in the computer vision field. In this talk, I present a boosted convolutional neural network (BCNN) system to enhance the pedestrian detection performance. Being inspired by the classic boosting idea, we develop a weighted loss function that emphasizes challenging training samples. Three types of challenging samples are examined: 1) samples with low detection scores, 2) temporally associated samples with inconsistent detection scores and 3) samples of different sizes. A weighting scheme is designed for each challenging samples. Finally, we train a boosted fusion scheme to integrate different trained detectors. We use the Fast-RCNN as the baseline, and show a significant performance gain of the BCNN over its baseline on the Caltech pedestrian dataset.

A Convolutional Neural Network Approach to Road Scene Understanding
The problem of street scene understanding is critical to the advanced driver assistance systems (ADAS) and autonomous cars. Several datasets are provided for the research community to study this problem such as the CamVid, KITTI and Cityscapes datasets. Several algorithms based on the Convolutional Neural Network (CNN) technique have been developed. One solution is called the SegNet, which learns to predict pixel-wise class labels from supervised learning. In this talk, I will first give a review on state-of-the-art street scene understanding algorithms and point out their strengths and weaknesses. Then, I will present a solution developed by my lab at the University of Southern California (USC).

 

Biography:
Dr. C.-C. Jay KUO C.-C. Jay Kuo received his Ph.D. degree from the Massachusetts Institute of Technology in 1987. He is now with the University of Southern California (USC) as Director of the Media Communications Laboratory and Dean’s Professor in Electrical Engineering-Systems. His research interests are in the areas of digital media processing, compression, communication and networking technologies. Dr. Kuo was the Editor-in-Chief for the IEEE Trans. on Information Forensics and Security in 2012-2014. He was the Editor-in-Chief for the Journal of Visual Communication and Image Representation in 1997-2011, and served as Editor for 10 other international journals. Dr. Kuo received the National Science Foundation Young Investigator Award (NYI) and Presidential Faculty Fellow (PFF) Award in 1992 and 1993, respectively. He was an IEEE Signal Processing Society Distinguished Lecturer in 2006, and the recipient of the Electronic Imaging Scientist of the Year Award in 2010 and the holder of the 2010-2011 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies. Dr. Kuo is a Fellow of AAAS, IEEE and SPIE. Dr. Kuo has guided 134 students to their Ph.D. degrees and supervised 25 postdoctoral research fellows. He is a co-author of about 240 journal papers, 880 conference papers, 30 patents and 13 books.

 

Antonio ORTEGA
Professor at University of Southern California


An introduction to Graph Signal Processing (GSP): Basic concepts, sampling and transform design

Applications of GSP in machine learning, computer vision and image and video coding

 

Abstract:
An Introduction to Graph Signal Processing (GSP): Basic concepts, Sampling and Transform Design
Graph Signal Processing (GSP) is an emerging research area that seeks to extend core digital signal processing concepts (sampling, transforms, etc) to signals defined on graphs. This research builds on top of results in multiple areas, including spectral graph theory, a provides a unifying framework for graph-based processing methods developed over the years. In this talk, we introduce basic GSP concepts, provide a historical perspective of work that in various fields that led to GSP, and discuss recent results in areas such as sampling and transform design.

Applications of GSP in Machine Learning, Computer Vision and Image and Video Coding
This second talk focuses on proposed applications of GSP that have been considered recently. In the context of machine learning we describe how semi supervised learning can be formulated as a graph signal sampling problem, and provide some insights about how label complexity may relate to the bandwidth of the label signal. We also describe preliminary work that shows promising results for human activity analysis using graphs and we end with an overview of recent applications of GSP to image and video compression.

 

Biography:
Antonio ORTEGA received the Telecommunications Engineering degree from the Universidad Politecnica de Madrid, Madrid, Spain in 1989 and the Ph.D. in Electrical Engineering from Columbia University, New York, NY in 1994. At Columbia he was supported by a Fulbright scholarship.

In 1994 he joined the Electrical Engineering department at the University of Southern California (USC), where he is currently a Professor. He has served as Associate Chair of EE-Systems and director of the Signal and Image Processing Institute at USC. He is a Fellow of the IEEE, and a member of ACM and APSIPA. He has been Chair of the Image and Multidimensional Signal Processing (IMDSP) technical committee, a member of the Board of Governors of the IEEE Signal Processing Society (SPS), and chair of the SPS Big Data Special Interest Group. He has been technical program co-chair of MMSP 1998, ICME 2002, ICIP 2008 and PCS 2013. He has been Associate Editor for the IEEE Transactions on Image Processing (IEEE TIP) and the IEEE Signal Processing Magazine, among others. He is the inaugural Editor-in-Chief of the APSIPA Transactions on Signal and Information Processing, an Associate Editor of IEEE T-SIPN and Senior Area Editor of IEEE TIP. He received the NSF CAREER award, the 1997 IEEE Communications Society Leonard G. Abraham Prize Paper Award, the IEEE Signal Processing Society 1999 Magazine Award, the 2006 EURASIP Journal of Advances in Signal Processing Best Paper Award, the ICIP 2011 best paper award, and a best paper award at Globecom 2012. He was a plenary speaker at ICIP 2013 and APSIPA ASC 2015.

 

His research interests are in the areas of signal compression, representation, communication and analysis. His recent work is focusing on distributed compression, multiview coding, error tolerant compression, information representation in wireless sensor networks and graph signal processing. Almost 40 PhD students have completed their PhD thesis under his supervision at USC and his work has led to over 300 publications in international conferences and journals, as well as several patents.

His work at USC has been or is being funded by agencies such as NSF, NASA, DOE, and companies such as HP, Samsung, Chevron or Texas Instruments.

 

Simon SEE
Chief Solution Architect/Technologist, NVIDIA

Talk on Deep Learning: Past, Present and Future

 

Yonghong TIAN
Professor at Peking University



Multi-Task Learning For Precise Object Search from Massive Surveillance Videos

 

Abstract:
Precise object search from large-scale camera networks is an important yet very challenging task in the field of computer vision. Different with the traditional visual object search task that aims to find visually similar objects from a collection, precise object search is to find exactly the different occurrences of the given object in large-scale surveillance networks by elaborately distinguishing the visually similar but non-identical objects. Technologically, person re-identification (Re-ID) can be viewed as a special solution of precise object search on a small scale dataset. However, most existing person re-identification approaches follow a supervised learning framework, in which a large number of labelled matching pairs are required for training. This severely limits their scalability in real-world applications. To address this limitation, we formulate precise object search as a multi-task learning task that should solve both recognition (i.e., object re-identification) and search (i.e., fine-grain object retrieval) problems simultaneously in one framework. In this talk, I will firstly discuss several technological challenges for precise object search, and then present several recent developments on multi-task learning to tackle with these challenges. Moreover, I will also introduce a large-scale vehicle image database captured by different real-world cameras in a city, which can be used to evaluate the algorithmic performance for precise object search.

 

Biography:
Yonghong Tian is currently a full professor with the National Engineering Laboratory for Video Technology, School of Electronics Engineering and Computer Science, Peking University, Beijing, China. He received the Ph.D. degree from the Institute of Computing Technology, Chinese Academy of Sciences, China, in 2005. His research interests include machine learning, computer vision, and multimedia big data. He is the author or coauthor of over 120 technical articles in refereed journals and Conferences, and has owned more than 25 patents. Dr. Tian is currently an Associate Editor of IEEE Transactions on Multimedia, International Journal of Multimedia Data Engineering and Management (IJMDEM), and a Young Associate Editor of Frontiers of Computer Science. He has served as the Technical Program Co-chair of IEEE ICME 2015, IEEE BigMM 2015 and IEEE ISM 2015, the organizing committee members of ACM Multimedia 2009, IEEE MMSP 2011, IEEE ISCAS 2013, IEEE ISM 2016. He was the recipient of several national and ministerial prizes in China, and obtained the 2015 EURASIP Best Paper Award for the EURASIP Journal on Image and Video Processing. His team was also ranked as one of the best performers in the TRECVID CCD/SED tasks from 2009 to 2012, PETS 2012 and the WikipediaMM task in ImageCLEF 2008. He is a senior member of IEEE and CIE, a member of ACM and CCF.

 

Junsong YUAN
Professor at Nanyang Technological University



Mining Image and Video Data

 

Abstract:
Motivated by the previous success in mining structured data (e.g., transaction data) and semi-structured data (e.g., text), it has aroused our curiosity in mining meaningful patterns in non-structured multimedia data like images and videos. Although the discovery of visual patterns from images and videos appears to be quite exciting, data mining techniques that are successful in business and text data may not be simply applied to image and video data that contain high-dimensional features and have spatial or spatio-temporal structures. Unlike transaction and text data that are composed of discrete elements without much ambiguity (i.e. predefined items and vocabularies), visual patterns generally exhibit large variabilities in their visual appearances, thus challenge existing data mining and pattern discovery algorithms. This tutorial will discuss the state of the art of image and video data mining, and provide in-depth studies on some of the recently developed techniques. The topics cover bottom-up common visual pattern discovery, top-down visual pattern discovery using topic model, abnormal video pattern discovery, as well as their applications in image search and recognition, scene understanding, video summarization and anomaly detection, intelligent video surveillance, etc.

 

Biography:
Junsong Yuan is currently an associate professor and program director of video analytics at School of Electrical and Electronics Engineering (EEE), Nanyang Technological University (NTU), Singapore. He received Ph.D. from Northwestern University and M.Eng. from National University of Singapore. Before that, he graduated from the Special Class for the Gifted Young of Huazhong University of Science and Technology in China. His research interests include computer vision, video analytics, gesture and action analysis, large-scale visual search and mining, etc. He has published 150 conference and journal papers, and filed several patents, with technology licensed by the industry. He serves as guest editor of International Journal of Computer Vision (IJCV), and is currently an associate editor of IEEE Trans. on Image Processing (T-IP), IEEE Trans. on Circuits and Systems for Video Technology (T-CSVT) and The Visual Computer journal (TVC). He is also in the organizing committee of a few conferences, including ACCV'14, VCIP'15, ICME'16, CVPR'17 etc. He received Nanyang Assistant Professorship and Tan Chin Tuan Exchange Fellowship from Nanyang Technological University, Outstanding EECS Ph.D. Thesis award from Northwestern University, Best Paper Award from IEEE Trans. on Multimedia, Doctoral Spotlight Award from IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'09), and National Outstanding Student from Ministry of Education, P.R.China.

 

Wenjun ZENG
Principal Research Manager at Microsoft Research Asia



Human Understanding in Video

 

Abstract:
Video is the biggest big data that contains an enormous amount of information. Recently computer vision and deep learning technologies have been significantly leveraged to turn raw video data into insights to facilitate various applications and services. Since human is the main subject in many videos, understanding human becomes a critical step in video understanding. In this tutorial, I will introduce recent efforts on human centric video analysis and present some latest technologies for understanding humans in videos, including face/human detection/tracking/identification, human attributes extraction, human pose estimation, skeleton-based human action recognition, and real-time human action detection and forecasting in streaming video, etc. I will also shed some light on the go-to-market aspect of this exciting field.

 

Biography:
Wenjun (Kevin) ZENG is a Principal Research Manager overseeing the Internet Media Group and the Media Computing Group at Microsoft Research Asia, while on leave from the Univ. of Missouri (MU) where he is a Full Professor. He had worked for PacketVideo Corp., Sharp Labs of America, Bell Labs, and Panasonic Technology prior to joining MU in 2003. Wenjun has contributed significantly to the development of international standards (ISO MPEG, JPEG2000, and OMA). He received his B.E., M.S., and Ph.D. degrees from Tsinghua Univ., the Univ. of Notre Dame, and Princeton Univ., respectively. His current research interest includes mobile-cloud media computing, computer vision, social network/media analysis, multimedia communications, and content/network security.

He is a Fellow of the IEEE. He is an Associate Editor-in-Chief of IEEE Multimedia Magazine, and was an AE of IEEE Trans. on Circuits & Systems for Video Technology (TCSVT), IEEE Trans. on Info. Forensics & Security, and IEEE Trans. on Multimedia (TMM). He is/was on the Steering Committee of IEEE Trans. on Mobile Computing (current) and IEEE TMM (2009-2012). He served as the Steering Committee Chair of IEEE ICME in 2010 and 2011, and has served as the TPC Chair of several IEEE conferences (e.g., ChinaSIP’15, WIFS’13, ICME’09, CCNC’07). He will be a general co-Chair of ICME2018. He is currently guest editing an IEEE Communications Magazine Special Issue on Impact of Next-Generation Mobile Technologies on IoT-Cloud Convergence and a TCSVT Special Issue on Visual Computing in the Cloud - Mobile Computing, and was a Special Issue Guest Editor for the Proceedings of the IEEE, IEEE TMM, and ACM TOMCCAP.