Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 2nd Global Summit and Expo on Multimedia & Applications Crowne Plaza, Heathrow, London, UK.

Day 2 :

Keynote Forum

Vijayan K Asari

University of Dayton, USA

Keynote: A nonlinear manifold learning strategy for robust face recognition

Time : 10:05-10:35

Conference Series Multimedia 2016 International Conference Keynote Speaker Vijayan K Asari photo
Biography:

Dr Vijayan Asari is a Professor in Electrical and Computer Engineering and Ohio Research Scholars Endowed Chair in Wide Area Surveillance at the University of Dayton, Dayton, Ohio, USA. He is the director of the Center of Excellence for Computer Vision and Wide Area Surveillance Research (Vision Lab) at UD. As leaders in innovation and algorithm development, UD Vision Lab specializes in object detection, recognition and tracking in wide area surveillance imagery captured by visible, infrared, thermal, hyperspectral, LiDAR (Light Detection and Ranging) and EEG (electroencephalograph) sensors. Dr Asari's research activities include development of novel algorithms for human identification by face recognition, human action and activity recognition, brain signal analysis for emotion recognition and brain machine interface, 3D scene creation from 2D video streams, 3D scene change detection, and automatic visibility improvement of images captured in various weather conditions. Dr Asari received his BS in electronics and communication engineering from the University of Kerala, India, and M Tech and PhD degrees in Electrical Engineering from the Indian Institute of Technology, Madras. Prior to joining UD in February 2010, Dr Asari worked as Professor in Electrical and Computer Engineering at Old Dominion University, Norfolk, Virginia for 10 years. Dr Asari worked at National University of Singapore during 1996-98 and led a research team for the development of a vision-guided microrobotic endoscopy system. He also worked at Nanyang Technological University, Singapore during 1998-2000 and led the computer vision and image processing related research activities in the Center for High Performance Embedded Systems at NTU. Dr Asari holds three patents and has published more than 500 research papers, including 85 peer-reviewed journal papers in the areas of image processing, pattern recognition, machine learning and high performance embedded systems. Dr Asari has supervised 22 PhD dissertations and 35 MS theses during the last 15 years. Currently 18 graduate students are working with him in different sponsored research projects. He is participating in several federal and private funded research projects and he has so far managed around $15M research funding. Dr. Asari received several teaching, research, advising and technical leadership awards. He is a Senior Member of IEEE and SPIE, and member of the IEEE Computational Intelligence Society. Dr Asari is the co-organizer of several SPIE and IEEE conferences and workshops.

Abstract:

The human brain processes enormous volumes of high-dimensional data for everyday perception. To humans, a picture is worth a thousand words, but to a machine, it is just a seemingly random array of numbers. Although machines are very fast and efficient, they are vastly inferior to humans for everyday information processing. Algorithms that mimic the way the human brain computes and learns may be the solution. In this paper we present a theoretical model based on the observation that images of similar visual perceptions reside in a complex manifold in a low-dimensional image space. The perceived features are often highly structured and hidden in a complex set of relationships or high-dimensional abstractions. To model the pattern manifold, we present a novel learning algorithm using a recurrent neural network. The brain memorizes information using a dynamical system made of interconnected neurons. Retrieval of information is accomplished in an associative sense. It starts from an arbitrary state that might be an encoded representation of a visual image and converges to another state that is stable. The stable state is what the brain remembers. In designing a recurrent neural network, it is usually of prime importance to guarantee the convergence in the dynamics of the network. We propose to modify this picture: if the brain remembers by converging to the state representing familiar patterns, it should also diverge from such states when presented with an unknown encoded representation of a visual image belonging to a different category. That is, the identification of an instability mode is an indication that a presented pattern is far away from any stored pattern and therefore cannot be associated with current memories. These properties can be used to circumvent the plasticity-stability dilemma by using the fluctuating mode as an indicator to create new states. We capture this behavior using a novel neural architecture and learning algorithm, in which the system performs self-organization utilizing a stability mode and an instability mode for the dynamical system. Based on this observation we developed a self-organizing line attractor, which is capable of generating new lines in the feature space to learn unrecognized patterns. Experiments performed on various face lighting variant, pose variant and expression variant databases for face recognition have shown that the proposed nonlinear line attractor is able to successfully identify the individuals and it provided better recognition rate when compared to the state of the art face recognition techniques. These results show that the proposed model is able to create nonlinear manifolds in a multidimensional feature space to distinguish complex patterns.

Keynote Forum

Robert S Laramee

Swansea University, UK

Keynote: Visual analytics for big video data

Time : 10:35-11:05

Conference Series Multimedia 2016 International Conference Keynote Speaker Robert S Laramee photo
Biography:

Robert S Laramee received a bachelors degree in physics, cum laude, from the University of Massachusetts, Amherst (ZooMass). He received a masters degree in computer science from the University of New Hampshire, Durham. He was awarded a PhD from Vienna University of Technology (Gruess Gott TUWien), Austria at the Institute of Computer Graphics and Algorithms in 2005. From 2001 to 2006 he was a researcher at the VRVis Research Center (www.vrvis.at) and a software engineer at AVL (www.avl.com) in the department of Advanced Simulation Technologies. Currently he is an Associate Professor in Data Visualizaton at Swansea University (Prifysgol Cymru Abertawe), Wales in the Department of Computer Science (Adran Gwyddor Cyfrifiadur). His research interests are in the areas of big data visualization, visual analytics, and human-computer interaction. He has published more than 100 peer-reviewed papers in scientific conferences and journals and served as Conference Chair of EuroVis 2014, the premiere conference on data visualization in Europe.

Abstract:

With advancements in multimedia and data storage technologies and the ever-decreasing costs of hardware, our ability to generate and store evermore video and other multimedia data is unprecedented. YouTube, for example, has over 1 billion users. However, a very large gap remains between our ability to generate and store large collections of complex, time-dependent video and multimedia data and our ability to derive useful information and knowledge from it. Viewing video and multimedia as a data source, visual analytics exploits our most powerful sense, vision, in order to derive information, knowledge and gain insight into big multimedia data sets that record complicated and often time-dependent events. This talk presents a case study of state-of-the art visualization and visual analytics techniques applied to video multimedia in order to explore, analyze, and present video data. In this case, we show how glyph-based visualization can be used to convey the most important information and events from videos of rugby games. The talk showcases some of visualizations strengths, weaknesses, and, goals. We describe inter-disciplinary case-study based on rugby sports analytics, where visual analytics and visualization is used to address fundamental questions-the answers of which we hope to discover in various large, complex, and time-dependent multimedia data

Break: Networking and Refreshments: 11:05-11:25 @ Foyer
  • Computer Vision
    Multimedia Health Computation and its Applications
    Multimedia Systems and Applications
    Human-Computer Interaction | Multimedia Content Analysis
Speaker

Chair

Ghyslain Gagnon

Ecole de Technologie Supérieure, Canada

Speaker

Co-Chair

Changsoo Je

Sogang University, Korea

Session Introduction

Leonel Antonio Toledo Díaz

Instituto Teconologico de Estudios Superiores de Monterrey

Title: Visualization Techniques for Crowd Simulation
Biography:

Leonel Toledo recieved his PhD from Instituto Tecnológico de Estudios Superiores de Monterrey Campus Estado de México in 2014, where he currently is a full-time professor. From 2012 to 2014 he was an assistant professor and researcher. He has devoted most of his research work to crowd simulation and visualization optimization. He has worked at the Barcelona Supercomputing Center using general purpose graphics processors for high performance graphics. His thesis work was in Level of detail used to create varied animated crowds. His research interests include crowd simulation, animation, visualization and high-performance computing and HCI.

Abstract:

Animation and simulation of crowds finds applications in many areas, including entertainment( e,g., animation of large numbers of people of movies and games), creation of immersive virtual environments, and evaluation of crowd management techniques (for instance, simulation of the flow of people leaving a football stadium after a match).In order to have a persuasive application using crowds in virtual environments, various aspects of the simulation have to be addressed, including behavioral animation, environment modelling, and crowd rendering. Real-time graphics systems are required to render millions of polygons to the screen per second. Real-time computer animated graphics has a very heavy reliance of the current generation of graphics hardware. However like many fields in computing science, the requirements of computer graphics software far outstrips hardware capabilities.

Procedural generation techniques are widely used in computer graphics to model systems of high complexity. Many of these techniques target the generation of natural phenomena in high complexity and detail to achieve realistic results. Procedural generation can be computationally intensive and is not commonly used in real-time systems to generate entire virtual worlds. However, advancements in processing speed and graphics hardware generate three-dimensional models in real-time on commodity hardware.Applications can range from entertainment to urban design or crisis management, traffic simulation in big cities can benefit from this visualization techniques.

Ghyslain Gagnon

École de technologie supérieure, Canada

Title: Robust multiple-instance learning ensembles using random subspace instance selection

Time : 11:25-11:50

Speaker
Biography:

Ghyslain Gagnon received the Ph.D. degree in electrical engineering from Carleton University, Canada in 2008. He is now an Associate Professor at École de technologie supérieure, Montreal, Canada. He is an executive committee member of ReSMiQ and Director of research laboratory LACIME, a group of 10 Professors and nearly 100 highly-dedicated students and researchers in microelectronics, digital signal processing and wireless communications. Highly inclined towards research partnerships with industry, his research aims at digital signal processing and machine learning with various applications, from media art to building energy management

Abstract:

Many real-world pattern recognition problems can be modeled using multiple-instance learning (MIL), where instances are grouped into bags, and each bag is assigned a label. State-of-the-art MIL methods provide a high level of performance when strong assumptions are made regarding the underlying data distributions, and the proportion of positive to negative instances in positive bags. In this paper, a new method called Random Subspace Instance Selection (RSIS) is proposed for the robust design of MIL ensembles without any prior assumptions on the data structure and the proportion of instances in bags. First, instance selection probabilities are computed based on training data clustered in random subspaces. A pool of classifiers is then generated using the training subsets created with these selection probabilities. By using RSIS, MIL ensembles are more robust to many data distributions and noise, and are not adversely affected by the proportion of positive instances in positive bags because training instances are repeatedly selected in a probabilistic manner. Moreover, RSIS also allows the identification of positive instances on an individual basis, as required in many practical applications. Results obtained with several real-world and synthetic databases show the robustness of MIL ensembles designed with the proposed RSIS method over a range of witness rates, noisy features and data distributions compared to reference methods in the literature

Speaker
Biography:

Dr Vijayan Asari is a Professor in Electrical and Computer Engineering and Ohio Research Scholars Endowed Chair in Wide Area Surveillance at the University of Dayton, Dayton, Ohio, USA. He is the director of the Center of Excellence for Computer Vision and Wide Area Surveillance Research (Vision Lab) at UD. As leaders in innovation and algorithm development, UD Vision Lab specializes in object detection, recognition and tracking in wide area surveillance imagery captured by visible, infrared, thermal, hyperspectral, LiDAR (Light Detection and Ranging) and EEG (electroencephalograph) sensors. Dr Asari's research activities include development of novel algorithms for human identification by face recognition, human action and activity recognition, brain signal analysis for emotion recognition and brain machine interface, 3D scene creation from 2D video streams, 3D scene change detection, and automatic visibility improvement of images captured in various weather conditions. Dr Asari received his BS in electronics and communication engineering from the University of Kerala, India, and M Tech and PhD degrees in Electrical Engineering from the Indian Institute of Technology, Madras. Prior to joining UD in February 2010, Dr Asari worked as Professor in Electrical and Computer Engineering at Old Dominion University, Norfolk, Virginia for 10 years. Dr Asari worked at National University of Singapore during 1996-98 and led a research team for the development of a vision-guided microrobotic endoscopy system. He also worked at Nanyang Technological University, Singapore during 1998-2000 and led the computer vision and image processing related research activities in the Center for High Performance Embedded Systems at NTU. Dr Asari holds three patents and has published more than 500 research papers, including 85 peer-reviewed journal papers in the areas of image processing, pattern recognition, machine learning and high performance embedded systems. Dr Asari has supervised 22 PhD dissertations and 35 MS theses during the last 15 years. Currently 18 graduate students are working with him in different sponsored research projects. He is participating in several federal and private funded research projects and he has so far managed around $15M research funding. Dr. Asari received several teaching, research, advising and technical leadership awards. He is a Senior Member of IEEE and SPIE, and member of the IEEE Computational Intelligence Society. Dr Asari is the co-organizer of several SPIE and IEEE conferences and workshops.

Abstract:

The amazing progress in sensor technology has made it possible for capturing images of Giga bytes in frame size at a reasonable frame rate in wide area motion imagery (WAMI) processing scenario. Automatic detection, tracking and identification of objects in this imagery in real time are becoming a necessity for security and surveillance applications. Feature extraction and classification of moving objects in WAMI data is challenging as the size of the objects in the image may be too small and they appear in different viewing angles and in varying environmental conditions. We present a new framework for detection and tracking of such low resolution objects in wide area imagery. The motivation behind the development of this algorithm is to utilize the entire information that is available about the object of interest in the detection and tracking processes. The proposed method makes use of a dense version of localized histogram of gradients on the difference images. A Kalman filter based predictive mechanism is employed in the tracking methodology. The feature based tracking mechanism can track all the moving objects.  The robustness of the proposed methodology is illustrated with the help of detection and tracking of several objects of interest in varying situations. It is observed that the new method can even track pedestrians in WAMI data. We also present the effect of our shadow illumination and super-resolution techniques to improve object detection and tracking in very long range videos. The processing steps include stitching of images captured by multiple sensors, video stabilization and distortion correction, frame alignment for registration and moving object detection, tracking of multiple objects and humans in the motion imagery, classification of objects and identification of humans in the scene, and communication of large image data and decisions to multiple destinations. In addition, information extracted from video streams captured by sensors located at different regions could also be used for accurate decision making. 

Robert S Laramee

Swansea University, UK

Title: Visual analytics for big video data

Time : 10:35-11:05

Speaker
Biography:

Robert S Laramee received a bachelors degree in physics, cum laude, from the University of Massachusetts, Amherst (ZooMass). He received a masters degree in computer science from the University of New Hampshire, Durham. He was awarded a PhD from Vienna University of Technology (Gruess Gott TUWien), Austria at the Institute of Computer Graphics and Algorithms in 2005. From 2001 to 2006 he was a researcher at the VRVis Research Center (www.vrvis.at) and a software engineer at AVL (www.avl.com) in the department of Advanced Simulation Technologies. Currently he is an Associate Professor in Data Visualizaton at Swansea University (Prifysgol Cymru Abertawe), Wales in the Department of Computer Science (Adran Gwyddor Cyfrifiadur). His research interests are in the areas of big data visualization, visual analytics, and human-computer interaction. He has published more than 100 peer-reviewed papers in scientific conferences and journals and served as Conference Chair of EuroVis 2014, the premiere conference on data visualization in Europe.

Abstract:

With advancements in multimedia and data storage technologies and the ever-decreasing costs of hardware, our ability to generate and store evermore video and other multimedia data is unprecedented. YouTube, for example, has over 1 billion users. However, a very large gap remains between our ability to generate and store large collections of complex, time-dependent video and multimedia data and our ability to derive useful information and knowledge from it. Viewing video and multimedia as a data source, visual analytics exploits our most powerful sense, vision, in order to derive information, knowledge and gain insight into big multimedia data sets that record complicated and often time-dependent events. This talk presents a case study of state-of-the art visualization and visual analytics techniques applied to video multimedia in order to explore, analyze, and present video data. In this case, we show how glyph-based visualization can be used to convey the most important information and events from videos of rugby games. The talk showcases some of visualizations strengths, weaknesses, and, goals. We describe inter-disciplinary case-study based on rugby sports analytics, where visual analytics and visualization is used to address fundamental questions-the answers of which we hope to discover in various large, complex, and time-dependent multimedia data

Ching Y. Suen

Concordia University, Canada

Title: Digital Fonts and Reading
Speaker
Biography:

Ching Y. Suen is the Director of CENPARMI and the Concordia Honorary Chair on AI & Pattern Recognition. He received his Ph.D. degree from UBC (Vancouver) and his Master's degree from the University of Hong Kong. He has served as the Chairman of the Department of Computer Science and as the Associate Dean (Research) of the Faculty of Engineering and Computer Science of Concordia University. Prof. Suen has served at numerous national and international professional societies as President, Vice-President, Governor, and Director. He has given 45 invited/keynote papers at conferences and 200 invited talks at various industries and academic institutions around the world. He has been the Principal Investigator or Consultant of 30 industrial projects. His research projects have been funded by the ENCS Faculty and the Distinguished Chair Programs at Concordia University, FCAR (Quebec), NSERC (Canada), the National Networks of Centres of Excellence (Canada), the Canadian Foundation for Innovation, and the industrial sectors in various countries, including Canada, France, Japan, Italy, and the United States. Currently, he is the Editor-in-Chief of the journal of Pattern Recognition, an Adviser or Associate Editor of 5 journals, and Editor of a new book series on Language Processing and Pattern Recognition. Actually he has held previous positions as Editor-in-Chief, or Associate Editor or Adviser of 5 other journals. He is not only the founder of three conferences: ICDAR, IWFHR/ICFHR, and VI, but has also organized numerous international conferences including ICPR, ICDAR, ICFHR, ICCPOL, and as Honorary Chair of numerous international conferences.

Abstract:

Thousands of years ago, humans started to create symbols to represent things they saw, heard, touched, found, remembered, imagined, and talked about. We can see them carved on rocks, walls, shells, and other materials. From these symbols, words and different languages were invented, modified, expanded, and evolved over the years. Following the invention of paper and writing instruments, different ways of representing the same symbol started to appear, forming the basis of different stylistic variations and font types.As time went by, computers and digital technology emerged with which the alphabets of all languages in the world can be printed digitally. Once a symbol has been represented in a digital format, there are infinite ways of representing it in unlimited type fonts for publishing. This talk summarizes the evolution of fonts, their characteristics and their personality traits. Aspects such as font styles and their effects on reading and eyesight, legibility and comprehension, will be discussed with experimental results.

Speaker
Biography:

Takashi Nakamura has completed his PhD at the age of 28 years from Kobe University. He is the professor of media studies in Faculty of Humanities in Niigata University. He has published more than 20 papers (including ones in Japanese) and two books (one as a singular author and the other as a sigular editor) in Japanese. He is an editorial board member of Annals of Behavioural Science.

Abstract:

This presentation focused on the action of looking at a mobile phone display as a type of nonverbal behavior/communication and compared it cross-culturally. The diversity of nonverbal behavior/communication was considered to be caused by the difference between Western and non-Western cultures. The questionnaire was conducted in three countries (the USA, Hong Kong and Japan), and a total of 309 subjects participated. The participants were required to record their opinions for the action according to the situation with ‘co-present’ familiar persons. The analysis declared that the difference between the USA and Japan was more pronounced as the relationship with the ‘co-present’ person was more intimate. The results of the Hong Kong sample were intermediate between those of the other two countries. The diversity was discussed in terms of independent/interdependent self in the perspective of cultural comparison and of mobile phone usage. The analysis revealed that the action as a form of nonverbal behavior/communication has functioned in human relationships and has been deeply embedded into culture in the mobile phone era.

Changyu Liu

South China Agricultural University, China

Title: Complex Event Detection via Bank based Multimedia Representation

Time : 12:40-13:05

Speaker
Biography:

Changyu Liu received the PhD degree in 2015 from South China University of Technology, where he worked under the supervision of Prof. Shoubin Dong. He is currently a lecturer at the College of Mathematics and Informatics, South China Agricultural University. He was a visiting scholar at the School of Computer Science, Carnegie Mellon University, from September 2012 to October 2013, advised by Dr. Alex Hauptmann. Then, he worked with Prof. Mohamed Abdel-Mottaleb and Prof. Mei-Ling Shyu at the Department of Electrical and Computer Engineering, University of Miami, from October 2013 to September 2014. He serves as a reviewer for many international journals, such as Neural Computing and Applications, Security and Communication Networks, KSII Transactions on Internet and Information Systems, Journal of Computer Networks and Communications, and Tumor Biology. He is a Technical Program Committee member for many international conferences, such as GMEE2015, PEEM2016, and ICEMIE2016. His research interests include computer vision, pattern recognition, multimedia analysis, bioinformatics, virtual reality, and machine learning.

Abstract:

Along with the advent of big data era, available multimedia collections are expanding. To meet increasingly diversified demands of multimedia applications from the public, effective multimedia analysis approaches are required urgently. Multimedia event detection, as an emerging branch in multimedia analysis, is gaining considerable attention from both industrial and academic researchers. However, much current effort on multimedia event detection has been dedicate to detecting complex events in controlled video clips or simple events in uncontrolled video clips. In order to perform complex event detection tasks in uncontrolled video clips, we propose an event bank descriptor approach, which has been published in the journal of Neurocomputing, for multimedia representation. The approach divides spatial temporal objects of an event into objects described by a latent group logistic regression mixture model trained on a large number of labeled images which can be obtained very easily from standard image datasets, and spatial temporal relationships described by spatial temporal grids trained on a relatively small number of labeled videos which can be also obtained very easily from standard video datasets. Furthermore, we combine the coordinate descent approach and the gradient descent approach to develop an efficient iterative training algorithm to learn model parameters in the event bank descriptor, and conduct extensive experiments on the ImageNet challenge 2012 dataset and the TRECVID MED 2012 dataset. The results showed that the proposed approach outperformed state-of-the-art approaches for complex event detection in uncontrolled video clips. The benefits of our approach are mainly threefold: Firstly, outliers in training examples are removed. Secondly, subtle structural variations are allowed for detection. Thirdly, feature vectors of event bank are jointly sparse.

Break: Lunch: 13:05-13:45 @ Orwelll’s Brasserie
Biography:

AKM Mahbubur Rahman has completed his PhD at the age of 32 years. He is working as Senior Research Scientist in Eyelock LLC, an acknowledged leader in advanced iris authentication for the Internet of Things (IoT). He has published more than 10 papers in reputed conferences and journals.

Abstract:

Disabilities related congenital blindness, vision loss or partial-sight disturbs not only one's physical body, but the trajectory of one's social interactions due to lack of perception of partner's facial behavior, head pose, and body movements. It is well documented amongst the literature that 'sight loss can lead to depression, loneliness, and anxiety. The complex emotional states are recognized by the sighted people disproportionately by processing the visual cues from the eye and mouth region of the face. For instance, social communications with eye to eye contact provide information about concentration, confidence, and engagement. Smiles are universally recognized as signs of pleasure and welcome. In contrast, looking away for a long time is perceived as lack of concentration, break of engagement, or boredom. However, the visually impaired people have no access to these cues from the eye and mouth regions. Hence, this non-verbal information are less likely to be communicated through the voice. Additionally, if the interlocutor is silent (listening), the blind individual would have no clue about his interlocutor's mental state. The scenario might be more complex where a group of people including visually impaired persons are interacting in a discussion, debates, etc. The disability of perceiving emotions and epistemic states can be improved by Computer Vision & Machine Learning based assistive technology solution that is capable of processing facial behavior, head pose, facial expressions, and physiological signals in real-time. A practical and portable system is desired that would predict VAD dimensions as well as the facial events from interlocutor's facial behavior and head pose in natural environments (for instance: conversation in a building corridor, asking questions to a stranger in a street, discussing topics of interest in a university campus, etc). Building social assistive technologies using computer vision and machine learning techniques is rather new and unexplored that poses complex research challenges. However, the challenges have been overcome to implement such kind of robust system for real-world deployment. Research challenges are identified and divided into three categories. a) System and face-tracker related challenges b) Classification and prediction related challenges c) Deployment related issues. This paper presents the design and implementation of EmoAssist: a smart-phone based system to assist in dyadic conversations. The main goal of the system is to provide access to more non-verbal communication options to people who are blind or visually impaired. The key functionalities of the system are to predict behavioral expressions (such a yawn, a closed lip smile, a open lip smile, looking away, sleepy, etc.) and 3-D affective dimensions (valence, arousal, and dominance) from visual cues in order to provide the correct auditory feedback or response. A number of challenges related to the data communication protocols, efficient tracking of the face, modeling of behavioralexpressions/affective dimensions, feedback mechanism and system integration were addressed to build an effective and functional system. In addition, orientation sensor information from the smart-phone was used to correct image alignment to improve the robustness for real world application . Empirical studies show that the EmoAssist can predict affective dimensions with acceptable accuracy (Maximum Correlation-Coefficient for valence: 0.76, arousal: 0.78, and dominance: 0.76) in natural dyadic conversation. The overall minimum and maximum response-times are (64.61 milliseconds) and (128.22 milliseconds), respectively. The integration of sensor information for correcting the orientation improved (16% in average) the accuracy in recognizing behavioral expressions. A usability study with ten blind people in social interaction shows that the EmoAssist is highly acceptable with an Average acceptability rating using of 6:0 in Likert scale (where 1 and 7 are the lowest and highest possible ratings, respectively).

Dongkyu Lee

Kwangwoon University, Korea

Title: Fast motion estimation for HEVC on graphics processing unit (GPU)

Time : 13:45-14:10

Speaker
Biography:

Dongkyu Lee received his B.S. and M.S. degrees in Electronic Engineering from Kwangwoon University, Seoul, Korea, in 2012 and 2014, respectively. He is a Ph.D. candidate at the Kwangwoon University. His research interests are image and video processing, video compression, and video coding

Abstract:

The recent video compression standard, HEVC (high efficiency video coding), will most likely be used in various applications in the near future. However, the encoding process is far too slow for real-time applications. At the same time, computing capabilities of GPUs (graphics processing units) have become more powerful in these days. In this talk, we present a GPU-based parallel motion estimation (ME) algorithm to enhance the performance of an HEVC encoder. A frame is partitioned into two subframes for pipelined execution to improve GPU utilization. The flow chart is redetermined to solve data hazards in the pipelined execution. Two new methods are introduced in the proposed ME: decision of a representative search center position (RSCP) and warp-based concurrent parallel reduction (WCPR). A RSCP employs motion vectors of a co-located CTU (coding tree unit) in a previously encoded frame to solve a dependency problem in parallel computation with negligible coding loss. WCPR concurrently executes several parallel reduction operations, which increases the thread utilization from 20 to 89 % without any thread synchronization. The proposed encoder can make the portion of ME in the encoder negligible with 2.2 % bitrate increase against the HEVC test model (HM) encoder. In terms of ME, the proposed ME is 130.7 times faster than that of the HM encoder.

Speaker
Biography:

Dr Morrow is a specialist in paediatric rehabilitation. She completed her PhD in 2010 and is head of the Brain Injury Service at the Children’s Hospital at Westmead, Sydney, Australia. Her research interests include the role of applications in the delivery of paediatric health services and consumer engagement in the design and development of health interventions.

Abstract:

The BrightHearts app has been developed to teach children biofeedback assisted relaxation techniques (BART) to manage pain and anxiety in health care settings. This digital artwork which responds to changes in heart rate transmitted via a wireless pulse oximeter was developed through an interative design process incorporating qualitative data from health professionals and children and prototype exhibitions in hospital waiting areas. The final iteration of the work used in the pilot trial comprised an iPad app used in conjunction with a custom-built bluetooth 4.0 wireless pulse oximenter, that measures and transmits inter-beat interval data that is then used to control the changes in the appearance and sound of the app.In contrast to object and/or character driven visuals used in many computer games and biofeedback displays, BrightHearts focuses the user’s attention on the gradual changes in a ‘mandala’ like circular interface encouraging a more relaxed quality of engagement. Users can contract successive layers of overlapping circular shapes using gentle, sustained exhalations. The more relaxed they become, the slower their average heart rate and the more layers they can draw inwards toward the center of the screen. BrightHearts has been succesfully piloted for the management of procedural pain and anxiety in children aged 7-18 years and for the management of pain and anxiety associated with vaccination in a school based vaccination programme for adolescents. BrightHearts is currently being evaluated in three reandomised controlled trials including a study evaluating the efficacy of BrightHearts for managing chronic pain in children with cerebral palsy.

Speaker
Biography:

Zhaoming Guo began to study the narrowband mobile data communication system in the communication corporation in 1995. In March 1997, he raised the concept of ‘‘mobile network computer’’ combined with the research experience of narrow band mobile data communication system and the concept of ‘‘network computer’’ raised by ORACLE CEO. In June 2000 and July 2001, he also published article about mobile network computer in ‘‘China Computer Newspaper’’ and ‘‘China Wireless Communication’’. Today, 20 years later, the concept of mobile network computer still remains vigorous. He has received his MS degree from Nanjing University of Science and Technology, China in 1995, and has completed his PhD degree from Beijing Institute of Technology, China in 2016. In july,2015, Guo Zhaoming published the article of mobile network computer in the SCI media “Wireless Personal Communication” again,the tittle is "Mobile Network Computers Should be the Terminal of Mobile Communication Networks". In June,2016,another article about mobile network computer wroten by Guo Zhaoming was also acceppted by SCI media "China Communication",and will be published in the end of year,the tittle is"Mobile Network Computer Can Better Describe the Future of Information Society

Abstract:

The concept of a network computer was proposed by ORACLE CEO Larry Ellison in 1995, and the concept of a mobile network computer was put forward by Mr. Guo Zhaoming of China in March, 1997. Today, nearly 20 years later, the concept of a mobile network computer still remains vigorous. We illustrate the importance of the concept of mobile network computer from a technological perspective. Because of the usefulness of mobile network computers, with the growth of the Internet of things, A modern mobile communication network should be referred to as a mobile computer network and a modern mobile communication terminal should be referred to as a “mobile network computer”, rather than other name. Mobile network computers may include not only TV box audio-visual equipment, wireless household appliances, and mobile communication equipment, but may also include devices such as intelligent foot rings, smart watches, smart glasses, smart shoes and smart coats. In a word, everything of mobile internet are mobile network computer. We aim to popularize the concept of mobile network computers for its accuracy and importance, which better define modern mobile terminals and reflects the nature of multiple mobile terminals based on the structure of their integrated computers and the capabilities of processing multimedia Also, an introduction to the integration approach of mobile communication and computer networks is provided, including technology integration, business integration, network integration, and IMS technology in terms of several aspects, for the purpose of providing people with opportunity to learn more about mobile computer networks, and thereby better understand the concept of a mobile network computer. In the computer and internet age, network and mobile network computers may be the main terminals of fixed and mobile networks. Therefore, based on the concept of mobile network computers, we discuss the future of information society.

Chetan Bhole

A9.com (Amazon's search engine subsidiary), USA

Title: Automated Person Segmentation in Unconstrained Video
Speaker
Biography:

Chetan Bhole is currently a machine learning scientist at A9.com (Amazon's search engine subsidiary) working on ranking problems. He completed his PhD from University of Rochester in Computer science in 2013 specializing in applying machine learning to computer vision. He has published more than 10 papers in conferences and journals, is a reviewer of reputed journals and a contributor of open source software.

Abstract:

Segmentation of people is an important problem in computer vision with uses in image understanding, graphics, security applications, sports analysis, education etc. In this talk, I will summarize work done in this area and our contributions. We have focussed on automatically segmenting a person from challenging video sequences. To have a general solution, we place no constraint on camera viewpoint, camera motion or the movements of a person in the scene. Our approach uses the most confident predictions from a pose or stick figure detector in key frames as forms of anchors that helps guide the segmentation of other more challenging frames in the video. Due to the unreliability of state of the art pose detectors on general frames, only highest confidence pose detections (key frames) are used. Features like color, position and optical flow are extracted from key frames and multiple conditional random fields (CRFs) are used to process blocks of video in batches. 2D CRFs for detailed key frame segmentations and 3D CRFs for propagating segmentations to the entire sequence of frames belonging to batches are used. Location information derived from the pose detector is also used to refine the results. As an important note, no hand labeled segmentation training data is required by our method. We discuss variants of the model and comparison to prior work. We also contribute our evaluation data to the community to facilitate further experiments.

Xintao Ding

Anhui Normal University, China

Title: The global performance evaluation for local descriptors

Time : 15:00-15:25

Speaker
Biography:

Xintao Ding is an associate professor at Anhui Normal University. He has completed his PhD from Anhui Normal University. He has spent his entire career working in the field of computer vision and machine learning. He holds three patents and has published more than 10 papers in the areas of image processing and computer vision. He has worked on and managed many funded research projects developing computer vision for use across a range of application.

Abstract:

Interest descriptors have become popular for obtaining image to image correspondence for computer vision tasks. Traditionally, local descriptors are mainly evaluated in a local scope, such as repeatability, ROC curves, and recall versus 1-precision curves. These local evaluations did not take into account the application fields of descriptors. Generally, local descriptors have to be refined before application so that they meet the desire of the global tasks. The correspondence toughness between two images depends on the number of true matches. Therefore, the number of correctly detected true matches (NoCDTM), which is the number of matches after random sample consensus (RANSAC) refinement, is proposed as a global score to evaluate descriptors performance. A larger NoCDTM suggests a larger number of true matches and takes advantage of a tougher correspondence. When the evaluation is run over a set of images, all their NoCDTM may be directly shown in a pseudo-color image, in which the pseudo-color of each pixel shows a NoCDTM of an image. In order to show descriptors performance over an image set in an overall way, a histogram of NoCDTM may be employed for evaluation. After dividing the range of the obtained NoCDTM into several intervals, the occurrences of NoCDTM in every interval are counted to generate the histogram. The histogram of a descriptor with a fat-tail suggests a high performance. It may be more reasonable to break descriptors local attribute and evaluate descriptors performance in a global scope.

Break: Networking and Refreshments: 15:25-15:45 @ Foyer
Speaker
Biography:

El Habib Nfaoui is currently an associate Professor of Computer Science at the University of Sidi Mohammed Ben Abdellah. He obtained his PhD in Computer Science from University of Sidi Mohamed Ben Abdellah in Morocco and University of Lyon (LIESP Laboratory) in France under a COTUTELLE (co-advising) agreement. His current research interests are Information Retrieval, Semantic Web, Social networks, Machine learning, Web services, Multi-Agent Systems, Decision-making and modeling. He is a Guest Editor at the International Journal of Intelligent Engineering Informatics (ACM, DBLP…). He co-founded the International Conference on Intelligent Systems and Computer Vision (ISCV2015) and has served as Program Committee of various conferences. He has published several papers in reputed journals and international conferences

Abstract:

Microblogging platforms allow users to post short messages and content of interest, such as tweets and user statuses in friendship networks. Searching and mining microblog streams offer interesting technical challenges in many microblog search scenarios, and the goal is to determine what people are saying about concepts such as products, brands, persons, etc. However, retrieving short text and determining the subject of an individual micro post present a significant research challenge owing to several factors: creative language usage, high contextualization, the informal nature of micro blog posts and the limited length of this form of communication. Thus, micro blogging retrieval systems suffer from the problems of data sparseness and the semantic gap. To overcome these problems, recent studies on content-based microblog searching have focused on adding semantics to micro posts by linking short text to knowledge bases resources. Moreover, previous studies use bag-of-concepts representation by linking named entities to their corresponding knowledge base concepts. In the first part of this talk, we are going to review the drawbacks of these approaches. In the second part, we present a graph-of-concepts method that considers the relationships among concepts that match named entities in short text and their related concepts and contextualizes each concept in the graph by leveraging the linked nature of DBpedia as a Linked Open Data knowledge base and graph-based centrality theory. Finally, we introduce some experiment results, using a real Twitter dataset, to show the effectiveness of our approach

Leonel Antonio Toledo Díaz

Instituto Teconologico de Estudios Superiores de Monterrey

Title: Visualization Techniques for Crowd Simulation
Biography:

Leonel Toledo recieved his PhD from Instituto Tecnológico de Estudios Superiores de Monterrey Campus Estado de México in 2014, where he currently is a full-time professor. From 2012 to 2014 he was an assistant professor and researcher. He has devoted most of his research work to crowd simulation and visualization optimization. He has worked at the Barcelona Supercomputing Center using general purpose graphics processors for high performance graphics. His thesis work was in Level of detail used to create varied animated crowds. His research interests include crowd simulation, animation, visualization and high-performance computing and HCI.

Abstract:

Animation and simulation of crowds finds applications in many areas, including entertainment( e,g., animation of large numbers of people of movies and games), creation of immersive virtual environments, and evaluation of crowd management techniques (for instance, simulation of the flow of people leaving a football stadium after a match).In order to have a persuasive application using crowds in virtual environments, various aspects of the simulation have to be addressed, including behavioral animation, environment modelling, and crowd rendering. Real-time graphics systems are required to render millions of polygons to the screen per second. Real-time computer animated graphics has a very heavy reliance of the current generation of graphics hardware. However like many fields in computing science, the requirements of computer graphics software far outstrips hardware capabilities.

Procedural generation techniques are widely used in computer graphics to model systems of high complexity. Many of these techniques target the generation of natural phenomena in high complexity and detail to achieve realistic results. Procedural generation can be computationally intensive and is not commonly used in real-time systems to generate entire virtual worlds. However, advancements in processing speed and graphics hardware generate three-dimensional models in real-time on commodity hardware.Applications can range from entertainment to urban design or crisis management, traffic simulation in big cities can benefit from this visualization techniques.

Speaker
Biography:

Yolanda Mafikeni has completed her Diploma at the age of 22 years from Hartland Training & Development Centre. She is the Supervisor at Oodua Technologies and Investment Pty Ltd, a premier Information Technology, Management and Media service organization.

Abstract:

Multimedia learning is innovative and has revolutionised the way we learn online. It is important to create a multimedia learning environment that stimulates active participation and effective learning. The significance of multimedia learning extends to include the cultivation of professional and personal experiences that reflect the reality of a traditional face-to-face classroom milieu. The difficulties from e-learning often relate to the absence of human-liked presence and characteristics (Woo, 2009), leading to a need for research investigation into this area. A number of strategies have been used to foster effective learning and to create a social learning environment that reflects humanistic characters. The purpose of this article is in twofold: (i) to examine the cognitive theory of multimedia learning (Mayer, 2001, 2002) and its relevance to multimedia presentations, and (ii) to discuss the strategies of visualisation (e.g., static and dynamic visual representations) and their relationship to multimedia learning, and the applicability and importance of multimedia learning to the enhancement of effective learning. Drawing from the evidence examined, we provide a conceptualised framework that accentuates the integration of the cognitive load theory and the theory of multimedia learning in e-learning. We discuss, for example, the use of animated pedagogical agents (APAs) to help establish a social learning environment that is conducive to learning and the promotion of critical thinking.

Speaker
Biography:

Dimitrios A. Karras received his Diploma and M.Sc. Degree in Electrical and Electronic Engineering from the National Technical University of Athens, Greece in 1985 and the Ph. Degree in Electrical Engineering, from the National Technical University of Athens, Greece in 1995, with honours. From 1990 and up to 2004 he collaborated as visiting professor and researcher with several universities and research institutes in Greece. Since 2004, after his election, he has been with the Sterea Hellas Institute of Technology, Automation Dept., Greece as associate professor in Digital Systems and Signal Processing as well as with the Hellenic Open University, Dept. Informatics as a visiting professor in Communication Systems (the latter since 2002 and up to 2010). He has published more than 65 research refereed journal papers in various areas of pattern recognition, image/signal processing and neural networks as well as in bioinformatics and more than 170 research papers in International refereed scientific Conferences. His research interests span the fields of pattern recognition and neural networks, image and signal processing, image and signal systems, biomedical systems, communications, networking and security. He has served as program committee member in many international conferences, as well as program chair and general chair in several international workshops and conferences in the fields of signal, image, communication and automation systems. He is, also, editor in chief of the International Journal in Signal and Imaging Systems Engineering (IJSISE), academic editor in the TWSJ, ISRN Communications and the Applied Mathematics Hindawi journals as well as associate editor in various scientific journals. He has been cited in more than 1400 research papers his H/G-indices are 16/27 (Google Scholar) and his Erdos number is 5. His RG score is 30.07. He is an industry consultant since 2009 and senior consultant at Senior Consultant at E.S.E.E. (Hellenic Confederation of Commerce&Entrepreneurship

Abstract:

A novel methodology is herein outlined for multimedia data mining problems by designing an hierarchical pattern mining neural system. The proposed system combines the data mining decisions of different neural network pattern mining systems. Instead of the usual approach for applying voting schemes on the decisions of their output layer neurons, the proposed methodology integrates higher order features extracted by their upper hidden layer units. More specifically, different instances (cases) of each such pattern mining system, derived from the same training process but with different training parameters, are investigated in terms of their higher order features, through similarity analysis, in order to find out repeated and stable higher order features. Then, all such higher order features are integrated through a second stage neural network pattern mining system having as inputs suitable similarity features of them. The herein suggested hierarchical pattern mining neural system for multimedia data mining applications shows improved pattern mining performance in series of experiments in computer vision databases and face recognition databases . The validity of this novel combination approach of pattern mining neural systems has been investigated when the first stage neural pattern mining systems involved correspond to different Feature Extraction Methodologies (FEM) for either shape or face classification. The experimental study illustrates that such an approach, integrating higher order features through similarity analysis of a committee of the same pattern mining instances (cases) and a second stage neural pattern mining integration system, outperforms other combination methods, like voting combination schemes as well as single neural network pattern mining systems having as inputs all FEMs derived features. In addition, it outperforms hierarchical combination methods non performing integration of cases through similarity analysis

Speaker
Biography:

Gregor is working in the games industry since 1999 for studios like, Crytek, Westka, Web.de AG, Acclaim, Ninja Theory, Visual Science, proper games ltd, MAXStudios.de and Chronos-Games on AAA games like FarCry and others. He has also been involved in research and development projects with Mozilla related to WebGL and HTML 5. He received his MSC in ISM while already working in the industry for 10 years and his dissertation has been about business models in the games industry.

Abstract:

We have seen over the past four years the big players in the middleware and game engine licensing and development business change in a drastic way. Rather than charging substantial amounts of money in relation to the game developers budget up front they have adopted per developer seat license based as well as subscription models and in some cases even no charges at all but rather offer a revenue share model. This change has also extended beyond the core game engine and direct development tools section. But what does this mean beyond the obvious potential to save money during the production time of a project. Well this question becomes significant when looking at a game project over its entire lifetime. For example, the release of a game based on a license where the middleware or engine provider gets the revenue share in return for not paying any licensing fees during the development has a significant impact into the relation between a studio and its publisher. Beyond those implications that are directly related to budget and monetisation of the projects this also has a drastic effect for the entire market landscape. Previously the game engine and middleware providers acted as gatekeepers that prevented small teams with low budgets from accessing the up-to-date technology used by the big studios. This has now changed and can be seen as a democratization of the technology side of the game development business. So the question becomes who will benefit from this change in what way.

Jamie Denham

Sliced Bread Animation, UK

Title: Animation: The Reality of Emotion
Speaker
Biography:

Jamie studied animation on the acclaimed course at Farnham in Surrey and has been in the field of animation production for over 18 years during which time he has worked on a number of broadcast and commercial productions. He is now Managing Director of the London based animation studio, Sliced Bread Animation. They offer 2D and 3D animation, illustration and design for digital media projects, including Virtual reality and Augmented Reality. Over the last 13 years they have successfully applied animation to all media platforms - from motion graphics, title sequences and TV commercials to online animation series and events.

Abstract:

Animation has long played a integral part in generating an emotional response to cinematic storytelling but now the mold has become more fragmented, and we are beginning to immerse ourselves into virtual worlds, and distort our own. What role then does animation play in manipulating and managing emotional levels? As humans we interact through connection, and ways of establishing that connection can be joy, sadness and anger, is there a danger they are enhanced through audio and visual manipulation in the virtual space. Is there an onus on the auteur to show restraint and responsibility within cognitive stimulus? In my talk I plan to explore the connective aspects of the emotional states, the fabric of storytelling and the virtual constructs we begin to enter.

Speaker
Biography:

Oscar Koller is a doctoral student researcher in the Human Language Technology and Pattern Recognition Group led by Prof. Ney at RWTH Aachen University, Germany. He joined the group in 2011 and follows a dual supervision by Prof. Bowden and his Cognitive Vision group at University of Surrey, UK, where he spent 12 months as a visiting researcher. His main research interests include sign language and gesture recognition, lip reading, speech recognition and machine translation.

Abstract:

Observing the nature inspires to find answers to difficult technical problems. Gesture recognition is a difficult problem and sign language is its natural source of inspiration. Sign languages, the natural languages of the Deaf, are as grammatically complete and rich as their spoken language counterparts. Science discovered sign languages a few decades ago and research promises new insights into many different fields from automatic language processing to action recognition and video processing. In this talk, we will present our recent advances in the field of automatic gesture and sign language recognition. As sign language conveys information through different articulators in parallel, we process it multi-modally. In addition to hand shape this includes hand orientation, hand position (with respect to the body and to each other), hand movement, the shoulders and the head (orientation, eye brows, eye gaze, mouth). Multi-modal streams occur partly synchronous, partly asynchronous. One of our major contributions is an approach to training statistical models that generalise across different individuals, while only having access to weakly annotated video data. We will focus on a new approach to learning a frame-based classifier on weakly labelled sequence data by embedding a CNN within an iterative EM algorithm. This allows the CNN to be trained on a vast number of example images when only loose sequence level information is available for the source videos. Although we demonstrate this in the context of sign language, the approach has wider application to any video recognition task where frame level labelling is not available.

Gayane.Shalunts

Sail Labs Technology, Austria

Title: Segmentation of Building Facade Tower
Speaker
Biography:

Gayane Shalunt has completed her PhD in Computer Vision from Institute of Computer Aided Automation. She is currently working as a Software Engineer and researcher at Sail Labs Technology at Austria since May 2013.

Abstract:

Architectural styles are phases of development that classify architecture in the sense of historic periods, regions and cultural influences.The article presents the first approach, performing automatic segmentation of building facade towers in the framework of an image-based architectural style classification system.The observed buildings, featuring towers, belong to Romanesque, Gothic and Baroque architectural styles. The method is a pipeline unifying bilateral symmetry detection, graph-based segmentation approaches and image analysis and processing technique. It employs the specific visual features of the outstanding architectural element tower -vertical bilateral symmetry, raising out of the main building and solidity. The approach is robust to high perspective distortions. It comprises two branches, targeting facades with single and double towers correspondingly. The performance evaluation on a vast number of images reports extremely high segmentation precision.

Speaker
Biography:

The introduction of animation techniques in film production such as motion capture, virtual reality, modelling and simulation, has revolutionized the entire film industry. We (National Centre for Computer Animation, NCCA) as the No.1 UK research and education base for computer animation, are endeavouring to bring these state-of-the-art animation techniques into health industry, benefit more people by improving the efficiency and efficacy of healthcare services. Since 1989, the NCCA (winner of the UK – Queen’s Anniversary Prize in 2012), has been at the forefront of computer animation education and research in the UK, and our graduates have made a global impact upon the film industry such as the achievement on the films of Gravity, Inception and Avatar. We have prioritised multidisciplinary applications of our computer animation technology from film production into other fields, especially the Digital health area. In the past five years, we have successfully developed a few medical projects cooperated with doctors and local hospitals. For example, “Augury project”: a sophisticated colorectal surgery simulator collaborated with the consultant surgeons from Bournemouth & Poole NHS; “Neuravatar”: an intelligent virtual avatar to guide GPs to make neurological diagnosis in their clinical practice, guided by Dr. Rupert Page, funded by AHSN; “Digital Psychiatrist”: a facial and emotional recognition system to perform Mental State Examination based on videos and images of patients, collaborated with Dr. Wai Chen.

Abstract:

Xiaosong Yang is currently a Principal Academician at the National Centre for Computer Animation, Bournemouth University, United Kingdom. He has produced more than 60 peer reviewed publications that include international journal articles and conference papers. He has secured over 10 research grants from European Commission, Wessex AHSN, British Academy, Leverhulme, Department for Business, Innovation & Skills (UK), Higher Education Innovation Fund, etc. He is a member of the International Program Committee for several international conferences, and reviewer for many peer reviewed journals. He has given several invited talks and keynote presentations internationally.

Cyrille Gaudin

University of Toulouse Jean Jaurès, France

Title: The use of video technology in teacher training
Speaker
Biography:

A review of the research literature reveals that video technology has been increasingly employed over the past 10 years in the training of teachers, in all subject areas, at all grade levels, and all over the world (Gaudin & Chaliès, 2015). The literature presents three main reasons for the growing reliance on videos in teacher training. First, videos give teachers greater access to classroom events than classic observation without sacrificing “authenticity.” This method thus constitutes a choice “artifact of practice” that creates a link between the traditional theoretical education at the university and classroom practice. Second, technical progress has greatly facilitated video use. Digitalization, vastly improved storage capacities, and sophisticated software have all contributed to the development of video in the framework of professional practice analysis. Last, video technology is increasingly used as a means to facilitate the implementation of institutional reforms. The principal aim of this communication is first to present the different possible uses of video technology for teacher training, and then identify new avenues for innovation.

Abstract:

Cyrille Gaudin has completed his PhD in education sciences from University of Toulouse Jean Jaurès. He is the Head of a Master Program in the Toulouse High School of Teaching and Education. He has helped to organize international seminars for The Consortium of Institutions for Development and Research in Education in Europe (CIDREE). He is also a member of the European Association for Research in Learning and Instruction (EARLI). He has recently published a literature review about the use of video technology in teacher training in the Educational Research Review.

Speaker
Biography:

Gholamreza Anbarjafari received his B.Sc., M.Sc., and Ph.D. degrees from Department of Electrical and Electronic Engineering at Eastern Mediterranean University, North Cyprus – Turkey, in 2007, 2008, 2010 respectively. He has been working in the field of image processing and is currently focusing in many research works related to multimodal emotion recognition, image illumination enhancement, super resolution, image compression, watermarking, visualization and 3D modelling, and computer vision for robotics. He is involved in many national and international projects. He is currently head of iCV Research Group at University of Tartu and is working as an Assoc. Prof. in Institute of Technology.

Abstract:

Internet has affected our everyday life drastically. An extensive amount of information is continuously exchanged over the Internet which raises numerous security concerns. Issues such as content identification, document and image security, audience measurement, ownership and copyright among others can be settled by the use of digital watermarking. In this talk, robust and imperceptible non-blind color image watermarking algorithm is discussed, which benefit from the fact that watermark can be hidden in different color channel which results in to further robustness of the proposed technique to attacks. Given method uses some algorithms such as entropy, discrete wavelet transform, Chirp z-transform, orthogonal-triangular decomposition and Singular value decomposition in order to embed the watermark in a color image. As the values of main diagonal of R matrix in QR decomposition and also the singular values obtained via SVD are very big those changes caused by aforementioned attack will not change those values significantly. Most facmous signal processing attacks will be discussed and the robustness of the aforementioned technique on those techniques will be explained.

Speaker
Biography:

Menna Sadek has received a B.Sc. (Honor) in 2009 in Computer Science and a M.Sc. in 2015 in Computer science from the Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt. She currently works as a teaching assistant at Basic Sciences department. She has published three papers in reputed international journals and local conferences. Her research interests includes: Steganography, encryption and Information Security.

Abstract:

Steganography is the art and science of secret communication. Modern cover types can take different forms. Nowadays, video streams are transmitted more frequently on internet websites imposing a larger practical significance on video steganography. A video can be considered a sequence of images. Information hiding in video has a variety of techniques. Although great efforts were done in developing these techniques, but most of them suffer from intolerance to video processing attacks and lack any intelligent processing of the cover video. Adaptive video steganography was recently proposed in the literature. It aims to achieve better quality of the stego-video by intelligently processing the cover according to some criteria. This helps to identify the best regions for data hiding, referred to as Regions Of Interest (ROI). A recent research showed that data embedding in human skin regions as ROI yield better imperceptibility and increase the hiding robustness. In this work, a blind adaptive algorithm for robust video steganography is proposed. The proposed algorithm adaptively processes the cover video and hides data in its human skin regions. A skin map is created for each frame using a fast adaptive skin detection method. Then a blocking step is applied on the produced skin-map converting it into a skin-block-map for discarding the errorprone skin pixels and enhancing the extraction quality. Next, the skin-blockmap is used for guiding the embedding procedure. Finally, the secret bits are embedded in the detail coefficients of the red and blue components of each frame using a wavelet quantization-based algorithm for achieving robustness against MPEG-4 compression. Hiding capacity, imperceptibility, extraction accuracy and robustness against MPEG-4 compression of the proposed algorithm were tested. Results show the high imperceptibility of the proposed algorithm and its robustness against MPEG-4 compression.

Biography:

Image analysis is a powerful tool for solving different engineering ‎problems in ‎‎particle technology. It is the process of extracting important information from the digital image. Different image analysis methods have been used for the studies of rotary drums; mainly can be classified as manual and automated methods. The manual method is depending on using an appropriate software (Like ImageJ software) for the manual selection of the material. The automated method is a combination of using ImageJ software and Matlab image processing toolbox. In the present research the two methods were used and compared for studying flighted rotary drums under the variance of some operating parameters. The varied parameters are; number of flights used (12 and 18) and rotational speeds (from 1 to 5 rpm). The comparisons between the two methods revealed that the manual method is more reliable and of preciseness. However it needs much time consuming compared to the automated one especially in light of numerous photographs to be analyzed.

Abstract:

Dr.-Ing. Mohamed A. Karali, is a lecturer at the Mechanical Engineering Department, Faculty of Engineering and Technology, Future University in Egypt, specializing in Mechanical Power Engineering. Dr. Karali received his Ph.D. degree from the Institute of Fluid Dynamics and Thermodynamics, Otto von Geuricke University Magdeburg, Germany in 2015. Where he participated as a lecturer for undergraduate and post graduate students and joined projects with industry. He had received his Bachelor of Science and Masters in Mechanical Engineering in 2001 and 2007, respectively, from the Faculty of Engineering El-Mataria, Helwan University in Cairo, Egypt. His research area recently interested in image processing techniques and its applications in rotary drums studies.

Speaker
Biography:

Ab Al-Hadi Ab Rahman obtained his Ph.D. degree from the École Polytechnique Fédérale de Lausanne, Switzerland in 2013, M.Eng. degree from the Universiti Teknologi Malaysia in 2008, and B.S. degree from the University of Wisconsin-Madison, USA in 2004. His current research is mainly focused on the algorithms and architectures for the new HEVC/H.265 video coding standard. He has authored and co-authored more than 25 journals and conference papers in the related field. He is also a member of IEEE and the Board of Engineers Malaysia. He is currently a lecturer at the Universiti Teknologi Malaysia.

Abstract:

The new HEVC/H.265 video coding standard was launched in November 2013 that promises improved compression efficiency by more than 50% based on the existing media files. With it however, comes a multitude of challenges on both the encoding and decoding side. One of the major challenges that we are currently tackling is the computational complexity which leads to about 400% delay in video encoding compared to the current AVC/H.264 standard. In retrospective, a raw video from 100 frames (i.e. with 4 seconds playback) with a CIF resolution takes about 30 minutes to encode in the standard using the reference software; this latency in encoding can be extrapolated to determine the time it takes to encode a UHD video that consists of 180,000 frames--which could take at least a day to encode. In this talk, I will present some of the new algorithms that we have developed and thoroughly tested, which could reduce the encoding time by at most 72%, with neglible loss in coding efficiency and video quality degradation. It proposes a solution based on three key areas of HEVC; 1) a new motion estimation algorithm to quickly obtain the motion vectors, 2) a new inter-prediction mode selection scheme to quickly determine the optimal PU mode, and 3) a new intra-prediction technique to quickly find the optimal CU mode. The application of these algorithms would enhance the performance of the HEVC compression standard and make it adaptable to mobile and hand held devices with resource constraint.

Speaker
Biography:

Kun Guo has completed his PhD in Cognitive Neuroscience from Shanghai Institute of Physiology, Chinese Academy of Sciences, and postdoctoral training from University of Oxford and University of Newcastle. He is currently a reader and lead of Perception, Action and Cogniton researcgh group in School of Psychology at Universoty of Linclon. His research is focused on visual information processing and its relation with environmental statistics and human adaptive behavior. He has published more than 50 papers in the leading academic journals and has been serving as an academic editor of PloS One.

Abstract:

A central research question in natural vision is how to allocate fixation to extract informative cues for scene perception. With high quality images, psychological and computational studies have made significant progress to understand and predict human gaze allocation in scene exploration and understaidng. However, it is unclear whether these findings can be generalised to degraded naturalistic visual inputs. Here we combined psychophysical, eye-tracking and computational approaches to systematically examine the impact of image resolution and image nosie (Gaussian low-pass filter, circular averaging filter, Additive Gaussian white noise) on observers’ gaze allocation and subsequent scene perception when inspecting both man-made and natural scenes. Compared with high quality images, degraded scenes would reduce the perceived image quality and affect the scene categorization, but this deterioration effect was scene content-dependent. Distorted images also attracted fewer numbers of fixations but longer fixation durations, shorter saccade distance and stronger central fixation bias. The impact of image noise manipulation on gaze distribution was mainly determined by noise intensity rather than noise type, and was more pronounced for natural scenes than for man-made scenes. We further compared four high performing visual attention models in predicting human gaze allocation in degraded scenes, and found that model performance lacked human-like sensitivity to noise type and intensity, and was considerably worse than human performance measured as inter-observer variance. Our results indicate a crucial role of external noise intensity in determining scene-viewing gaze behaviour and scene understanding, which should be considered in the development of realistic human-vision-inspired attention models.

Pascal Lorenz

University of Haute Alsace, France

Title: Architectures of Next Generation Wireless Networks
Speaker
Biography:

Pascal Lorenz received his M.Sc. (1990) and Ph.D. (1994) from the University of Nancy, France. Between 1990 and 1995 he was a research engineer at WorldFIP Europe and at Alcatel-Alsthom. He is a professor at the University of Haute-Alsace, France, since 1995. His research interests include QoS, wireless networks and high-speed networks. He is the author/co-author of 3 books, 3 patents and 200 international publications in refereed journals and conferences. He was Technical Editor of the IEEE Communications Magazine Editorial Board (2000-2006), Chair of Vertical Issues in Communication Systems Technical Committee Cluster (2008-2009), Chair of the Communications Systems Integration and Modeling Technical Committee (2003-2009) and Chair of the Communications Software Technical Committee (2008-2010)Pascal Lorenz received his M.Sc. (1990) and Ph.D. (1994) from the University of Nancy, France. Between 1990 and 1995 he was a research engineer at WorldFIP Europe and at Alcatel-Alsthom. He is a professor at the University of Haute-Alsace, France, since 1995. His research interests include QoS, wireless networks and high-speed networks. He is the author/co-author of 3 books, 3 patents and 200 international publications in refereed journals and conferences. He was Technical Editor of the IEEE Communications Magazine Editorial Board (2000-2006), Chair of Vertical Issues in Communication Systems Technical Committee Cluster (2008-2009), Chair of the Communications Systems Integration and Modeling Technical Committee (2003-2009) and Chair of the Communications Software Technical Committee (2008-2010)

Abstract:

Emerging Internet Quality of Service (QoS) mechanisms are expected to enable wide spread use of real time services such as VoIP and videoconferencing. The "best effort" Internet delivery cannot be used for the new multimedia applications. New technologies and new standards are necessary to offer Quality of Service (QoS) for these multimedia applications. Therefore new communication architectures integrate mechanisms allowing guaranteed QoS services as well as high rate communications. The service level agreement with a mobile Internet user is hard to satisfy, since there may not be enough resources available in some parts of the network the mobile user is moving into. The emerging Internet QoS architectures, differentiated services and integrated services, do not consider user mobility. QoS mechanisms enforce a differentiated sharing of bandwidth among services and users. Thus, there must be mechanisms available to identify traffic flows with different QoS parameters, and to make it possible to charge the users based on requested quality. The integration of fixed and mobile wireless access into IP networks presents a cost effective and efficient way to provide seamless end-to-end connectivity and ubiquitous access in a market where the demand for mobile Internet services has grown rapidly and predicted to generate billions of dollars in revenue.

Syed Afaq Ali Shah

The University of Western Australia, Australia

Title: Deep Learning for Image set based Face and Object Classification
Speaker
Biography:

I shall present a novel technique for image set based face/object recognition, where each gallery and query example contains a face/object image set captured from different viewpoints, background, facial expressions, resolution and illumination levels. While several image set classification approaches have been proposed in recent years, most of them represent each image set as a single linear subspace, mixture of linear subspaces or Lie group of Riemannian manifold. These techniques make prior assumptions in regards to the specific category of the geometric surface on which images of the set are believed to lie. This could result in a loss of discriminative information for classification. The proposed technique alleviates these limitations by proposing an Iterative Deep Learning Model (IDLM) that automatically and hierarchically learns discriminative representations from raw face and object images. In the proposed approach, low level translationally invariant features are learnt by the Pooled Convolutional Layer (PCL). The latter is followed by Artificial Neural Networks (ANNs) applied iteratively in a hierarchical fashion to learn a discriminative non-linear feature representation of the input image sets. The proposed technique was extensively evaluated for the task of image set based face and object recognition on YouTube Celebrities, Honda/UCSD, CMU Mobo and ETH-80 (object) dataset, respectively. Experimental results and comparisons with state-of-the-art methods show that our technique achieves the best performance on all these datasets.

Abstract:

Syed Afaq Ali Shah has done PhD in 3D computer vision (feature extraction, 3D object recognition, reconstruction) and machine learning in the School of Computer Science and Software Engineering (CSSE), University of Western Australia, Perth. He was holder of the most competitive Australian scholarships, which include Scholarship for International Research Fee (SIRF) and Research Training Scheme (RTS). He has published several research papers in high impact factor journals and reputable conferences. Afaq has developed machine learning systems and various feature extraction algorithms for 3D object recognition. He is the reviewer for IEEE Transactions on Cybernetics, Journal of Real Time Image Processing and IET Image Processing journal.

Biography:

Qin-Zhen Guo received the B. S. degree in Automation from Hunan University in 2011. He is currently pursuing the Ph.D. degree at the High-tech Innovation Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include image retrieval, machine learning, and pattern recognition.

Abstract:

Fast approximate nearest neighbor search techniques play an important role in large scale database search. Hashing based methods which convert the original data into binary codes have two advantages, high retrieval efficiency and low memory cost. But due to the thick boundary in Hamming space, the hashing based methods can not achieve ideal retrieval precision. Vector quantization, especially product quantization (PQ), based methods, which use a large codebook to quantize the data to reduce the cardinality of the original data space, are another class of approximate nearest neighbor search methods. There are also two advantages with PQ based methods, low memory cost and high retrieval precision. However, compared to hashing based methods, the retrieval efficiency of PQ based methods is lower. Considering the strengths and weaknesses of hashing and PQ methods, we have proposed a hierarchical method which combines hashing based methods and PQ based methods. Since the hashing methods have high retrieval efficiency, firstly, we use hashing methods to filter the obviously distant data. Then use PQ based methods to search the data retrieved by hashing methods since they have better retrieval precision. Experiments have shown that in large scale database, the hierarchical method can achieve better results than hashing based methods and higher retrieval efficiency than PQ based methods.

Yao-Jen Chang

Industrial Technology Research Institute (ITRI), Taiwan

Title: Uni-/Bi-/Multi-Color Intra Modes for HEVC Screen Content Coding
Speaker
Biography:

Yao-Jen Chang received the M.S. and Ph.D. degrees in communication engineering from National Central University, Taiwan, in 2006 and 2010, respectively. Since 2011, he has been a researcher in Industrial Technology Research Institute (ITRI), Taiwan. He has published over 30 refereed papers and filed 29 patents in multiple engineering fields from communications to video coding technologies. He has actively contributed over 50 proposals to joint meetings of ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) for developing the H.265/HEVC and its extensions. His current research interests include H.265/HEVC, Screen Content Coding, HEVC Encoder Optimization, Future Video Codec, Machine Learning, and Adaptive Filtering.

Abstract:

High Efficiency Video Coding (HEVC) Screen Content Coding (SCC) has been standardized for the screen-captured contents. Because many areas are composed of the texts and lines featuring non-smooth textures, the traditional intra prediction is not suitable for those areas. Several newly coding tools, such as the palette mode, intra block copy, string matching mode, uni-color intra mode, bi-color intra mode, and multi-color intra mode, were developed to address the issues during the joint meetings of MPEG and VCEG from 2014 to 2015. ITRI had contributed many techniques over the uni-/bi-/multi-color intra modes to improve compression performance, and coordinated the core experiment activities of the uni-/bi-/multi-color intra modes to study the performance under the HEVC SCC standard draft text 1. The concept of the color intra modes is to select a few samples out of the neighboring coding units to predict the pixels inside the current coding unit. In this talk, I will give you an elaboration over the uni-/bi-/multi-color intra modes.

Speaker
Biography:

Gang Wu is a Professor of Department of Mathematics, School of Science, China University of Mining and Technology. He received the B.S.degree from School of Mathematics, Shandong University, in 1998, M.S.degree from Department of Applied Mathematics, Dalian University of Technology, in 2001, and Ph.D. degree from Institute of Mathematics, Fudan University, in 2004. His current research mainly focuses on large sparse matrix computations, pattern recognition and data mining.

Abstract:

Recently, matrix-based methods have gained wide attentions in pattern recognition and machine learning communities. The generalized low rank approximations of matrices (GLRAM) and the bilinear Lanczos components algorithm (BLC) are two popular algorithms that treat data as the native two-dimensional matrix patterns. However, these two algorithms often require heavy computation time and memory space in practice, especially for large scale problems. In this talk, we propose inexact and incremental bilinear Lanczos components algorithms for high dimensionality reduction and image reconstruction. We first introduce the thick-restarting strategy to the BLC algorithm, and present a thick-restarted Lanczos components algotithm (TRBLC). In this algorithm, we use the Ritz vectors as approximations to dominant eigenvectors instead of the Lanczos vectors. In our implementation, the iterative matrices are not formed nor stored explicitly, thanks to the characteristic of the Lanczos procedure. Then, we explore the relationship between the reconstruction error and the accuracy of the Ritz vectors, so that the computational complexities of eigenpairs can be reduced significantly. As a result, we propose an inexact thick-restarted Lanczos components algorithm (Inex-TRBLC). Moreover, we investigate the problem of incremental generalized low rank approximations of matrices, and propose an incremental and inexact TRBLC algorithm (Incr-TRBLC). Numerical experiments illustrate the superiority of the new algorithms over the GLRAM algorithm and its variations, as well as the BLC algorithm for some real-world image reconstruction and face recognition problems.

Speaker
Biography:

Takashi Nakamura has completed his PhD at the age of 28 years from Kobe University. He is the professor of media studies in Faculty of Humanities in Niigata University. He has published more than 20 papers (including ones in Japanese) and two books (one as a singular author and the other as a sigular editor) in Japanese. He is an editorial board member of Annals of Behavioural Science.

Abstract:

This presentation focused on the action of looking at a mobile phone display as a type of nonverbal behavior/communication and compared it cross-culturally. The diversity of nonverbal behavior/communication was considered to be caused by the difference between Western and non-Western cultures. The questionnaire was conducted in three countries (the USA, Hong Kong and Japan), and a total of 309 subjects participated. The participants were required to record their opinions for the action according to the situation with ‘co-present’ familiar persons. The analysis declared that the difference between the USA and Japan was more pronounced as the relationship with the ‘co-present’ person was more intimate. The results of the Hong Kong sample were intermediate between those of the other two countries. The diversity was discussed in terms of independent/interdependent self in the perspective of cultural comparison and of mobile phone usage. The analysis revealed that the action as a form of nonverbal behavior/communication has functioned in human relationships and has been deeply embedded into culture in the mobile phone era.

Wang Xufeng

Air Force Engineering University School of Aeronautics and Astronautics Engineering, China

Title: Real-time Drogue Measurement for Autonomous Aerial Refueling Based on Computer Vision
Speaker
Biography:

Wang Xufeng received the B.S. and M.S. degrees from Air Force Engineering University in 2011 and 2013, respectively, where he is currently pursuing the Ph.D. degree. He has been a visiting scholar with the Department of Computer Science and Technology, Tsinghua University since 2014. His research interests include autonomous aerial refueling, computer vision and deep learning.

Abstract:

Autonomous aerial refueling (AAR) has been playing an increasingly important role in improving the capacity of the aircraft. During the docking phase of the probe-drogue AAR, one of the key problems is the drogue measurement which includes drogue detection and recognition, drogue spatial locating and drogue attitude estimation. To solve this problem, a novel and effective method based on computer vision is presented. For the drogue detection and recognition, considering the safety and robustness to the drogue diversity in the changing environmental conditions, the high reflection red-ring-shape feature instead of a set of infrared light emitting diodes (LEDs) is set on the parachute part of the drogue to achieve optimal performance, by using computer vision with prior domain knowledge incorporated. For the drogue spatial locating and drogue attitude estimation, in order to ensure the accuracy and real-time performance of the entire system, a monocular vision method is designed based on the camera calibration model with camera lens distortion considered, in view of its simple structure and high operation speed. Experiments demonstrate the effectiveness of the proposed method and a practical implementation that considers the effect of airflow has been provided. The results of the drogue measurement are analyzed, together with a comparison between the proposed method and the competing methods. The results show that the proposed method can realize drogue measurement efficiently and satisfy the requirement of AAR.

Balamuralidhar P

Tata Consultancy Services, India

Title: Low Altitude Aerial Vision
Biography:

Balamuralidhar P has completed his PhD from Aalborg University, Denmark. He is a Principal Scientist and Head of TCS Innovation labs Bangalore. He leads several research topics related cyber physical systems including aerial sensing, cognitive computer vision, sensor informatics, security & privacy. He has published more than 90 papers in reputed journals and has over 20 patents to his credit

Abstract:

Low altitude aerial Imaging and analytics is getting much business interest these days. This is due to the availability of affordable unmanned aerial vehicles and miniaturized sensors for cost effective spatial data collection. Applications include inspection of infrastructures and spatially distributed systems such as power lines, wind farms, pipelines, railways, buildings, farms and forests. Predominantly they use vision based sensing. However multi spectral, hyperspectral and laser based mapping are also used in certain cases. Advances in image processing and computer vision research coupled with high performance embedded computing platforms are generating interesting possibilities in this area. Traditional techniques along with deep learning and cognitive architectures are being explored to provide automatic analysis and assessment of the huge data acquired. In this talk some of our experiences and learnings on computer vision applications in related areas will be presented.