Time Tue, Dec 16 Wed, Dec 17 Thu, Dec 18 Fri, Dec 19
0900-0930 Registration Registration
0930-1100 Yihong GONG
(Part 1)
Yihong GONG
(Part 3)
Pierre MOULIN
(Part 1)
David FORSYTH
(Part 1)
1100-1130 Refreshment Break Refreshment Break Refreshment Break Refreshment Break
1130-1300 Yihong GONG
(Part 2)
Yihong GONG
(Part 4)
Pierre MOULIN
(Part 2)
David FORSYTH
(Part 2)
1300-1400 Lunch provided at Venue Lunch provided at
ROSE Lab Foyer
Lunch provided at Venue Lunch provided at Venue
1400-1530 Dong XU
(Part 1)
ROSE Lab Demo Xiaogang WANG
(Part 1)
Z. Jane WANG
(Part 1)
1530-1600 Refreshment Break Refreshment Break Refreshment Break
1600-1730 Dong XU
(Part 2)
Xiaogang WANG
(Part 2)
Z. Jane WANG
(Part 2)

Abstract:
Rooms and indoor spaces are important, because humans live and work there. Robots built to help humans with day to day tasks, from cleaning to care of the frail, will need to work in rooms. There is now a rich literature studying methods to understand these indoor spaces from pictures and video. I will discuss methods to: model the overall shape of the space; identify important objects that are situated in the space; identify what people are doing, or could be doing, in the space; and investigate what the space would look like if objects were inserted or removed. My talk will provide a general review of methods and problems, with some particular application examples.


Biography:
David FORSYTH is a computer scientist and full professor at the University of Illinois at Urbana–Champaign. He holds a BSc and an MSc in Electrical Engineering from the University of the Witwatersrand, Johannesburg, and an MA and Ph. D from Oxford University. He was a full professor at U.C. Berkeley before moving to the University of Illinois at Urbana-Champaign. He co-authored, with UIUC CS Professor Jean Ponce, 2002's "Computer Vision: A Modern Approach", one of the leading publications addressing the topic. He has published over 100 papers on computer vision, computer graphics and machine learning. He served as program co-chair for IEEE Computer Vision and Pattern Recognition in 2000, general co-chair for IEEE CVPR 2006, program co-chair for ECCV 2008, program co-chair for IEEE CVPR 2011, and is a regular member of the program committee of all major international conferences on computer vision. He served on the NRC Committee on "Protecting Kids from Pornography and other Inappropriate Material on the Internet", which sat for three years and produced a study widely praised for its sensible content. He has received best paper awards at the International Conference on Computer Vision and at the European Conference on Computer Vision. David's research interest also includes graphics and machine learning; he served as a committee member of ICML 2008. In 2013, he became a fellow of the Association for Computing Machinery.

Abstract:
This tutorial aims to provide a comprehensive coverage on major techniques for image and video semantic analytics. I will first describe a list of applications that have huge market demands and commercial values. Some of these applications can already be completely or at least partially achieved by existing image and video analytic techniques, while others are hot topics for future research and development. By taking a closer look at these diverse and apparently unrelated applications, we can identify a few key technical components that are essential to accomplish all these applications. Among these key components, I will focus on images features that have been proven effective for object detections and semantic analytics, such as SIFT, HOG, LBP, Sparse Coding, Fisher Vectors, etc. Then I will present several state-of-the-art object detection and image semantic classification techniques, including AdaBoost, Cascade method, Deformable Parts Models (DPM), Deep Learning Convolutional Neural Networks (DLCNN), etc. My presentation will cover both the classical works from the literature, and the innovative research studies conducted by my team. I will also present various experimental evaluation results performed by my team to reveal the characteristics of the above image features, object detection and image semantic classification methods, as well as their combinations. If time allows, I will present our latest research works on the multi-object tracking task, and show their performance evaluations results.


Biography:
Yihong GONG received his B.S., M.S., and Ph.D. degrees in Electrical and Electronic Engineering from the University of Tokyo in 1987, 1989, and 1992, respectively. He then joined Nanyang Technological University of Singapore, where he worked as an assistant professor in the School of Electrical and Electronic Engineering for four years. From 1996 to 1998, he worked for the Robotics Institute, Carnegie Mellon University as a project scientist. He was a principal investigator for both the Informedia Digital Video Library project and the Experience-On-Demand project funded in multi-million dollars by NSF, DARPA, NASA, and other government agencies. In 1999, he joined NEC Laboratories America, and established the multimedia analytics group for the lab. In 2006, he became the site manager to lead the entire Cupertino branch of the lab. In 2012, he joined Xi’an Jiaotong University in China, and became a professor of the “Thousand Talent Program”. His research interests include image/video content analysis and machine learning applications. He is among the first batch of researchers in the world initiating research studies on content-based image retrieval, sports video highlight detection, and text/video content summarization. He has published more than 100 technical papers, is the author of two monographs, and contributed chapters to two multimedia handbooks. To date, his works have been cited for over 12000 times, and his most cited paper has received more than 1200 citations by peer researchers around the world. He led the team to develop the SmartCatch intelligent video surveillance system that led to a successful spin-off. The spin-off company was among the top three companies in terms of market share in the intelligent video surveillance sector. He also led the team to successfully develop the first commercial human gender/age recognition software in the world, which was widely reported by major Japanese and US TV stations. Under his leadership, his teams won No.1 positions at 2008, 2009 TRECVID Event Detection Contests, 2009 PASCAL VOC Contest, and 2014 Chinese Graduate Student Video Content Analytics Contest.

Abstract:
The first part of this tutorial will introduce the fundamentals of action recognition, which finds applications to areas as diverse as surveillance and monitoring of streets and buildings, rehabilitation of hospital patients, and database indexing. Action recognition algorithms typically reduce the visual inputs to robust and discriminative features which are then input to a classifier. The features and classifiers are learnt during a training phase. The second part of this tutorial will introduce the fundamentals of human gait analysis, which is increasingly used in biometrics applications. For both action recognition and gait analysis, the fundamental concepts will be applied to RGB images as well as to image sequences acquired with the Kinect camera, which outputs RGB and depth data.

Biography:
Pierre MOULIN received his doctoral degree from Washington University in St. Louis in 1990, after which he joined at Bell Communications Research in Morristown, New Jersey, as a Research Scientist. In 1996, he joined the University of Illinois at Urbana-Champaign, where he is currently Professor in the Department of Electrical and Computer Engineering, Research Professor at the Beckman Institute and the Coordinated Science Laboratory, and affiliate professor in the Department of Statistics.

His fields of professional interest include image and video processing, compression, statistical signal processing and modeling, media security, decision theory, and information theory.

Dr. Moulin has served on the editorial boards of the IEEE Transactions on Information Theory, the IEEE Transactions on Image Processing, and the Proceedings of IEEE. He currently serves on the editorial board of Foundations and Trends in Signal Processing. He was co-founding Editor-in-Chief of the IEEE Transactions on Information Forensics and Security (2005-2008), member of the IEEE Signal Processing Society Board of Governors (2005-2007), and has served IEEE in various other capacities.

He received a 1997 Career award from the National Science Foundation and an IEEE Signal Processing Society 1997 Senior Best Paper award. He is also co-author (with Juan Liu) of a paper that received an IEEE Signal Processing Society 2002 Young Author Best Paper award. In 2003 he became IEEE Fellow and Beckman Associate of UIUC's Center for Advanced Study. In 2007-2009 he was Sony Faculty scholar at UIUC. He was plenary speaker for ICASSP 2006, ICIP 2011, and several other conferences. He was Distinguished Lecturer of the IEEE Signal Processing Society for 2012-2013.

Abstract:
Digital media has profoundly changed our daily life during the last decade. However, convenient distribution and easy manipulation of digital media data also raise critical multimedia security and management research concerns. How to efficiently manage the large-scale media data through efficient indexing, searching and retrieval remains an important topic to be explored.

This talk contains two major parts: (1) We present novel image hashing algorithms to achieve superior robustness (against various perceptually insignificant manipulations and distortions on image content) and higher image identification accuracy. Its uniqueness and compactness make image hashing attractive for efficient image indexing and retrieval applications. We extend the image hashing concept to the content-based fingerprinting concept and proposed a generalized framework to combine different types of image hashes to generate a robust, fixed-length binary signature. (2) We develop novel machine learning algorithms to deal situations with limited training data in visual object recognition. The increasing amount of information/data actually builds an illusion that we are going to have enough} data to solve all the data driven problems. Unfortunately it is not true, since sufficient high-quality training data doesn't necessarily come with the big data, and it is not easy or sometimes impossible to collect sufficient training samples, which most computational algorithms depend on. We investigate three issues in object recognition involving limited training data: a. one-shot object recognition, b. cross-domain object recognition, and c. object recognition for images with different picture styles.

Biography:
Z. Jane WANG received the B.Sc. degree from Tsinghua University, China, in 1996, and the M.Sc. and Ph.D. degrees from the University of Connecticut in 2000 and 2002, respectively, all in electrical engineering. She has been Research Associate of Electrical & Computer Engineering Department at the University of Maryland, College Park. Since Aug. 1, 2004, she has been with the Department Electrical and Computer Engineering at the University of British Columbia, Canada, and is currently a Professor. Her research interests are in the broad areas of statistical signal processing theory and applications. She co-received the EURASIP Journal on Applied Signal Processing (JASP) Best Paper Award 2004, and the IEEE Signal Processing Society Best Paper Award 2005. She has published over 80 journal papers and about 90 conference papers. She served as or is serving as Associate Editor for IEEE journals including IEEE Trans. on Signal Processing, IEEE Trans. on Information Forensics & Security, IEEE Trans. on Biomedical Engineering, IEEE Signal Processing Letters and IEEE Trans. on Multimedia.

Abstract:
Deep learning has become a major breakthrough in artificial intelligence and achieved amazing success on solving grand challenges in many fields including computer vision. Its success benefits from big training data and super parallel computational power emerging in recent years, as well as advanced model design and training strategies. In this talk, I will try to introduce deep learning and explain the magic behind it with layman terms. Through concrete examples of computer vision applications, I will illustrate four key points about deep learning. (1) Different than traditional pattern recognition systems, which heavily rely on manually designed features, deep learning automatically learns hierarchical feature representations from data and disentangles hidden factors of input data through multi-level nonlinear mappings. (2) Different than existing pattern recognition systems which sequentially design or training their key components, deep learning is able to jointly optimize all the components and crate synergy through close interactions among them. (3) While most machine learning tools can be approximated with neural networks with shallow structures, for some tasks, the expressive power of deep models increases exponentially as their architectures go deep. (4) Benefitting the large learning capacity of deep models, we also recast some classical computer vision challenges as high-dimensional data transform problems and solve them from new perspectives. The introduced applications of deep learning in computer vision will focus on object detection, segmentation, and recognition. Some open questions related to deep learning will also be discussed in the end.


Biography:
Xiaogang WANG received his Bachelor degree in Electrical Engineering and Information Science from the Special Class of Gifted Young at the University of Science and Technology of China in 2001, M. Phil. degree in Information Engineering from the Chinese University of Hong Kong in 2004, and PhD degree in Computer Science from Massachusetts Institute of Technology in 2009. He is an assistant professor in the Department of Electronic Engineering at the Chinese University of Hong Kong since August 2009. He received the Outstanding Young Researcher in Automatic Human Behaviour Analysis Award in 2011, Hong Kong RGC Early Career Award in 2012, and Young Researcher Award of the Chinese University of Hong Kong. He is the associate editor of the Image and Visual Computing Journal. He was the area chair of ICCV 2011, ECCV 2014 and ACCV 2014. His research interests include computer vision, deep learning, crowd video surveillance, object detection, and face recognition.

Abstract:
Domain adaptation (also called transfer learning) is an emerging research topic in computer vision. In some vision applications, the domain of interest (i.e., the target domain) contains very few or even no labeled samples, while an existing domain (i.e., the auxiliary domain) is often available with a large number of labeled examples. For example, millions of loosely labeled Flickr photos or YouTube videos can be readily obtained by using keywords based search. On the other hand, users may be interested in retrieving and organizing their own multimedia collections of images and videos at the semantic level, but may be reluctant to put forth the effort to annotate their photos and videos by themselves. This problem becomes furthermore challenging because the feature distributions of training samples from the web domain and consumer domain may differ tremendously in statistical properties. To explicitly cope with the feature distribution mismatch for the samples from different domains, in this talk I will describe our SVM based approaches for domain adaptation under different settings as well as their interesting applications in computer vision.


Biography:
Dong XU is currently an Associate Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. His research focuses on new theories, algorithms and systems for intelligent processing and understanding of visual data such as images and videos. One of his co-authored works on domain adaptation for video event recognition won the Best Student Paper Award in CVPR 2010. His co-authored work also won the Prize Paper Award in IEEE Transactions on Multimedia (T-MM) in 2014. He is on the editorial boards of IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Neural Networks and Learning Systems, and Machine Vision and Applications (Springer).