Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 2nd Global Summit and Expo on Multimedia & Applications Crowne Plaza, Heathrow, London, UK.

Day 17 :

  • Imaging and Image processing
    Multimedia in Computer Games
    Mobile Multimedia
    Multimedia Tools and Applications
    Multimedia Signal Processing
Location: Madison
Speaker

Chair

Vijayan K Asari

University of Dayton, USA

Speaker

Co-Chair

Margrit Gelautz

Vienna University of Technology, Austria

Session Introduction

Yoichiro Kawaguchi

Tokyo University, Japan

Title: Art and Multimedia

Time : 10:35-11:05

Speaker
Biography:

Born in Tanegashima Island. Kawaguchi has been working on Computer Graphics since 1975, and has been recognized as a pioneer and a world-class authority of CG art by his unique style. Using his ‘GROWTH Model’, a Self-Organizing procedural modeling algorithm, he has been creating various artificial complex life forms. Recent work includes development of CG expression of natural beauty based on physical basic model, 8K Ultra High Definition CG art, creation of new traditional art-form incorporating traditional craftsmanship and advanced IT-based expression, creation of artistic and primitive robot for planet exploration, development of ‘Gemotion’(Gene, Growth + emotion) 3D Bumpy display which react emotion like living beings. He won ‘ACM SIGGRAPH Distinguished Artist Award for Lifetime Achievement’ in 2010 for creative and innovative artistry, giving life to a stunning aesthetic derived from his dedicated research in computer technology, biological forms, and contemporary artistic practice. In 2013, He received the Award from the Ministry of Education in the Art Encouragement Prizes and Medal with Purple Ribbon.

Abstract:

I would like to reflect on computer graphic and multi-media arts from the vantage point of my own 40 years of association with them. First of all, a person must seek out the creative things he or she is aiming for from a point removed from principle-like habits and the tendencies of soft-ware programs created in advance. Namely, I think it is important that a person does not let his or her own creative urge get covered by a program with rather technical things pushed into the foreground. Independence and originality concerning form and color have no meaning unless completely controlled by the person who stands at the core of art. One should never create a structure embodying something developed through easy, wild numbers and chance. Moreover, there is a method of creating living things and/or nature under a quasi-reconstruction of the laws of the natural world. This method, which approaches the observation values as closely as possible, will continue for a long time hence. This is to pursue why materialistic shapes and colors exist in this transcendent sphere which include the natural world and the cosmos. The enlargement of those realms of existence will certainly render a moulding cut off from the framework of earth or mankind possible. This will come to question its main constituent in the very process of selecting various subjects. Consequently, the way of existence of the object itself is already a product of conception, which is not cut off at all from the artistic qualities from which it should be separate. The very laws themselves concerning form have arisen and grown, creating their own present system which is self-propagating. In other words, the process has hypothesized something which retains energy within. It is something which has advanced one step beyond a simulation of a cross-section of the natural world. It is an approach to nature in another sense. That is to say, because circumstantial stimulation can called a hypothesis thin the framework of fixed time, it becomes equipped with the same time as the observer despite the lack of a non-present system of simultaneity. Thus, the immediacy and the extemporaneousness of formation come to have direct perceptions and connections. My “Growth” series lies indeed at the starting point. Demands will be made from now on in the form of self-propagation and natural occurrence, and they appear to have active bearings on the sixth sense in human beings and aesthetics.

David Xu

Regent University, USA

Title: Dynamic simulation and effects in animation and computer game

Time : 11:55-12:20

Speaker
Biography:

Professor David Xu is tenure associate professor at Regent University, specializing in computer 3D animation and special effects. He got MFA Computer Graphics in 3D Animation from Pratt Institute in NY. He has served as a senior 3D animator in Sega, Japan; a senior CG special effector in Pacific Digital Image Inc., Hollywood; and as a professor of animation in several colleges and universities where he developed the 3D animation program and curriculum. He has been a committee member of the computer graphics organization Siggraph Electronic Theater, where he was recognized with an award for his work. He is also an organizing committee member of CG2016. He published the book Mastering Maya: The Special Effects Handbook invited by Shanghai People's Fine Arts Publishing House.

Abstract:

Dynamics is the simulation of motion through the application of the principles of physics. Instead of assigning keyframes to objects to animate them, you assign physical characteristics that define how an object behaves in a simulated world. The dynamic bodies are converted from the objects created, and defined through dynamic attributes, which affect how the objects behave in a dynamic simulation. With dynamic simulation, you can create many impressive effects such as explosion, flood, storm, tonado, ocean, etc. for animations and computer games. In this presentation, Professor Xu will overview the tools and techniques to simulate and render hair, fur, feathers, cloth, liquids, fluids, particles and rigid and soft bodies, and demonstrate how to use the Dynamic Relationships Editor to connect and disconnect dynamic relationships between dynamic objects such as particles, nParticles, fluids and emitters, and non-Nucleus collision objects, how to use the Collision Events Editor to create collision events for nParticles, and how to use the Sprite Wizard to simplify the process for displaying a texture image or image sequences on nParticles. The applications of dynamic simulation and effects in animations and computer games will also be explored.

Margrit Gelautz

Vienna University of Technology, Austria

Title: Algorithms for 3D film content generation and post-processing

Time : 12:20-12:45

Speaker
Biography:

Margrit Gelautz is an associate professor at Vienna University of Technology, Austria, where she directs a research group on Image and Video Analysis & Synthesis with a focus on 3D film/TV applications. She is co-founder of emotion3D, a spin-off company working in the field of 3D imaging/displaying and mobile vision. Margrit Gelautz has directed a number of research projects in national and international collaboration and recently co-edited a book on Advances in Embedded Computer Vision. She was director of TU Vienna’s Doctoral College Computational Perception (2010-2013) and Vice-chair of the IEEE Austria Section (2012-2014).

Abstract:

In this talk we present algorithms for 3D reconstruction and novel view synthesis that form part of a 3D film processing chain which aims at generating high-quality 3D content for different types of 3D displays. The first part of the talk focuses on a trifocal camera set-up consisting of a high-end main camera and a compact assistant stereo rig which are aligned in an L-shape configuration. We discuss different strategies for stereo matching between the main and assistant cameras along with challenges posed by our specific camera arrangement. We further address the need for depth map post-processing and inpainting techniques in order to generate high-quality novel views. The results of our trifocal system with virtual view synthesis are evaluated by a user study. In the second part of the talk, we concentrate on 3D content generation from monoscopic film material. We describe a method for converting original 2D image sequences to 3D content based on comfortable user scribbles placed on key frames. The initially sparse depth values are propagated throughout the video volume to obtain temporally and perceptually coherent 2D-to-3D conversions. The results of this approach are illustrated on a variety of test material.

Break: Lunch: 12:45-13:25 @ Orwelll’s Brasserie

Bogdan Smolka

Silesian University of Technology, Poland

Title: On the fast impulsive noise removal in color digital images

Time : 13:25-13:50

Speaker
Biography:

Bogdan Smolka received the Diploma in Physics degree from the Silesian University, Katowice, Poland, in 1986 and the Ph.D. degree in computer science from the Department of Automatic Control, Silesian University of Technology, Gliwice, Poland, in 1998. Since 1994, he has been with the Silesian University of Technology. In 2007, Dr. Smolka was promoted to Professor at the Silesian University of Technology. He has published over 250 papers on digital signal and image processing in refereed journals and conference proceedings. His current research interests include low-level color image processing, human-computer interaction, and visual aspects of image quality.

Abstract:

Noise reduction in color images is still an important research field of image processing and computer vision. Recent advances in imaging technology has been accompanied by the miniaturization of optical systems and shrinkage of pixel sensors area to accommodate increasing spatial resolution. As a result, many imaging devices provide shots of poor quality in low light situations. Therefore, fast and effective image denoising techniques are still of vital importance for the performance of the imaging pipeline and the successive steps of image analysis. The novel filter is based on the concept of exploration of the local pixel neighborhood by digital paths which start from the boundary of a filtering window and reach its central element. The minimal value of the paths ending at a given pixel serves as a measure of its impulsiveness. To decrease the computational complexity, the proposed approach is utilizing only shortest paths joining the border of a window with its center. In the 5x5 filtering window only 8 paths consisting of 2 nodes have to be examined, which leads to the computation of only 16 Euclidean distances between the pixels in a given color space. To determine the filter output, a soft-switching scheme is applied, which is a compromise between the identity filter and the weighted average of uncorrupted pixels in the processing window. The comparison with the state-of-the-art algorithms revealed excellent properties of the proposed denoising framework. The filtering operations can be easily parallelized and thus can be utilized for real time image and video enhancement.

Speaker
Biography:

Ruzena Bajcsy received the Master's and Ph.D. degrees in electrical engineering from Slovak Technical University, Bratislava, Slovak Republic, in 1957 and 1967, respectively, and the Ph.D. in computer science from Stanford University, Stanford, CA, in 1972. She is a Professor of Electrical Engineering and Computer Sciences at the University of California, Berkeley, and Director Emeritus of the Center for Information Technology Research in the Interest of Science (CITRIS). Prior to joining Berkeley, she headed the Computer and Information Science and Engineering Directorate at the National Science Foundation. Dr. Bajcsy is a member of the National Academy of Engineering and the National Academy of Science Institute of Medicine as well as a Fellow of the Association for Computing Machinery (ACM) and the American Association for Artificial Intelligence. In 2001, she received the ACM/Association for the Advancement of Artificial Intelligence Allen Newell Award, and was named as one of the 50 most important women in science in the November 2002 issue of Discover Magazine. She is the recipient of the Benjamin Franklin Medal for Computer and Cognitive Sciences (2009) and the IEEE Robotics and Automation Award (2013) for her contributions in the field of robotics and automation.

Abstract:

By now it has become a cliché the statement that the population in industrial world is aging and hence the problem of physical agility is a serious health problem. Moreover this issue is aggravated even with younger population due to our sedative life style. It also is an undeniable (perhaps even too obvious) fact that every human’s anatomy and physiology is different. In recognition of this fact we are focusing in our efforts in development of personalized models of kinematic and dynamics of an individual during physical activities. For this purpose we are focusing on non-invasive observations in order to extract the necessary physical parameters to develop veritable kinematic and dynamic models of the human physical capabilities. The above mentioned kinematic and dynamical models are facilitated by: Availability of various relatively inexpensive/affordable and noninvasive devices that can deliver the necessary parameters of the position, velocity, acceleration, masses of not only the body but individual limbs, forces generated during various physical activities. These devices are not only the standard cameras, motion capture, force plates and force sensors, Inertial measuring devices, but also hand held ultrasound cameras, infrared sensors measuring oxygen in the blood. More advanced sensors are rapidly developing such as glucose measurements. In this presentation we shall show how this multimedia observations on people enable to develop the individual kinematic and dynamic predictive models of the physical performance of the individual. These models predict not only the physical performance of the individual but also delineate the boundaries of stable reachable space both for kinematic workspace as well as for dynamic workspace.

Robert S Laramee

Swansea University, UK

Title: Flow as you've never seen it

Time : 13:50-14:15

Speaker
Biography:

Robert S Laramee received a bachelors degree in physics, cum laude, from the University of Massachusetts, Amherst (ZooMass). He received a masters degree in computer science from the University of New Hampshire, Durham. He was awarded a PhD from Vienna University of Technology (Gruess Gott TUWien), Austria at the Institute of Computer Graphics and Algorithms in 2005. From 2001 to 2006 he was a researcher at the VRVis Research Center (www.vrvis.at) and a software engineer at AVL (www.avl.com) in the department of Advanced Simulation Technologies. Currently he is an Associate Professor in Data Visualizaton at Swansea University (Prifysgol Cymru Abertawe), Wales in the Department of Computer Science (Adran Gwyddor Cyfrifiadur). His research interests are in the areas of big data visualization, visual analytics, and human-computer interaction. He has published more than 100 peer-reviewed papers in scientific conferences and journals and served as Conference Chair of EuroVis 2014, the premiere conference on data visualization in Europe.

Abstract:

With the advancement of simulation and big data storage technologies and the ever-decreasing costs of hardware, our ability to derive and store data is unprecedented. However, a large gap remains between our ability to generate and store large collections of complex, time-dependent flow simulation data and our ability to derive useful knowledge from it. Flow visualization leverages our most powerful sense, vision, in order to derive knowledge and gain insight into large, multi-variate flow simulation data sets that describe complicated and often time-dependent behavior. This talk presents a selection of state-of-the art flow visualization techniques and applications in the area of computational fluid dynamics (CFD), showcasing some of visualizations strengths, weaknesses, and, goals. We describe inter-disciplinary projects based on flow, where visualization is used to address fundamental questions-the answers of which we hope to discover in various large, complex, and time-dependent phenomena. It’s flow like you’ve never seen it before.

Leonardo Sacht

Federal University of Santa Catarina, Brazil

Title: Real-time correction of panoramic images using hyperbolic Möbius transformations

Time : 14:15-14:40

Speaker
Biography:

Leonardo Sacht is an adjunct professor at Federal University of Santa Catarina (UFSC) in Florianopolis, Brazil. He received a bachelor degree in Mathematics and Scientific Computing from UFSC in 2008 and MSc and DSc degrees in Mathematics from the Brazilian Institute for Pure and Applied Mathematics (IMPA) in 2010 and 2014, respectively. He also spent one year between 2012 and 2013 as a visiting student at ETH Zurich, in Switzerland. Dr. Sacht has recently published papers on important journals such as ACM Transactions on Graphics, Journal of Real-Time Image Processing and IEEE Transactions on Image Processing.

Abstract:

Wide-angle images gained a huge popularity in the last years due to the development of computational photography and imaging technological advances. They present the information of a scene in a way which is more natural for the human eye but, on the other hand, they introduce artifacts such as bent lines. These artifacts become more and more unnatural as the field of view increases. In this work, we present a technique aimed to improve the perceptual quality of panorama visualization. The main ingredients of our approach are, on one hand, considering the viewing sphere as a Riemann sphere, what makes natural the application of Möbius (complex) transformations to the input image, and, on the other hand, a projection scheme which changes in function of the field of view used. We also introduce an implementation of our method, compare it against images produced with other methods and show that the transformations can be done in real time, which makes our technique very appealing for new settings, as well as for existing interactive panorama applications.

Changsoo Je

Sogang University, Korea

Title: Homographic p-norms: Metrics of homographic image transformation

Time : 14:40:15:05

Speaker
Biography:

Changsoo Je received the BS degree in Physics and the MS and PhD degrees in Media Technology from Sogang University, Seoul, Korea, in 2000, 2002, and 2008, respectively. He is currently a Research Professor of Electronic Engineering at Sogang University, Seoul, Korea. He was a Senior Analyst of Standard and Patent at Korea Institute of Patent Information (2010-2011), a Research Professor (2009-2010), and a Postdoctoral Research Fellow (2008-2009) of Computer Science and Engineering at Ewha Womans University. His research interests include computer vision, computer graphics, and image processing. He received an Outstanding Paper Award at the Korea Computer Graphics Society Conference in 2008 and a Samsung Humantech Thesis Prize Award from Samsung Electronics Co., Ltd. in 2004.

Abstract:

Images often need to be aligned in a single coordinate system, and homography is one of the most efficient geometric models to align images. In this talk, we present homographic p-norms, scalar metrics of homographic image transformation, and to the best of our knowledge these are the most rigorous definition of scalar metrics quantifying homographic image transformations. We first define a metric between two homography matrices, and show it satisfies metric properties. Then we propose metrics of a single homography matrix for a general planar region, and ones for a usual rectangle image. For use of the proposed metrics, we provide useful homographic 2-norm expressions derived from the definition of the metrics, and compare the approximation errors of the metrics with respect to the exact metric. As a result, the discrete version of the metric obtained by pixel-wise computation is greatly close to the exact metric. The proposed metrics can be applied to evaluation of transformation magnitude, image closeness estimation, evaluation of camera pose difference, selection of image pair in stereo vision, panoramic image mosaic, and deblurring. Experimental results show the efficacy of the proposed metrics

Toshie Takahashi

Waseda University, Japan

Title: The complexity model of communication with computer images

Time : 15:05-15:30

Speaker
Biography:

Toshie Takahashi is Professor in the School of Culture, Media and Society, Waseda University, Tokyo, Japan. She was appointed faculty fellow at the Berkman Center for Internet and Society at Harvard University, 2010-2011 and, before that, visiting research fellow at the Department of Education in the University of Oxford. Her current research is an ethnography centred on cross-cultural research into youth and digital media among US, UK and Japan. She graduated with a PhD in Media and Communications from the London School of Economics and Political Science and an MA in Sociology from the University of Tokyo
 

Abstract:

Social and natural scientists have used the complexity paradigm to address issues of the complexity and dynamism of phenomena which hitherto in traditional approaches had been made invisible or had been regarded as aberrant – thereby adding to our explanatory and manipulative power (Eve, 1997). As Appadurai (1996) calls for a human version of complexity theory in order to further the theory of global cultural interactions, Takahashi (2009) has applied a non-linear, non-reductionist model to human communication, using the ethnographic method, which is a non-linear methodology. Takahashi has provided an integrated framework for the demonstration of three dimensions of complex systems: individuals, social groups and cultures and the paths of dynamic interaction between these in terms of interactivity, self-organisation, adaptivity and the notion of the edge of chaos, thus contributing to the idea of a human version of complexity theory. There are numerous complex systems that exist among the micro and macro levels and each level is not discrete but rather is intra- and inter-connected and moreover dynamically interacts with the other. In this presentation, we will demonstrate the complex model of communication with some computer images to understand the diversity, dynamism and complexity of human communication in the global digital world

Xiaobo Zhang

Xianyang Normal University, China

Title: Gradient domain image denoising

Time : 15:30-15:55

Speaker
Biography:

Xiaobo Zhang received his Ph.D. degree in Department of Mathematics from XiDian University, Xi’an, China, in March 2014. He is currently an associate professor with Xianyang Normal University, Xianyang, China. His research interests include Wavelet, Partial Differential Equation and Statistical Theory for Image Processing. He has published more than 14 papers in reputed journals and conferences as first author and corresponding author. He is the reviewer of Computers and Electrical Engineering.

Abstract:

Image denoising is one of the oldest topics in image processing and has been widely studied. The main aim of an image denoising algorithm is then to reduce the noise level, while preserving the image features (such as edges, textures, etc.). A good denoising algorithm should achieve high quality results with low computation load. Recently, image denoising methods based on gradient domain have shown superior performance in this respect. This presentation aims to establish a universal framework for image denoising algorithms from local scheme to nonlocal one by gradient domain. Firstly, the gradient domain local adaptive layered Wiener filter is presented by researching the statistical property of gradient coefficients. Secondly, the multi-direction gradient domain scheme is proposed by researching the classical nonlocal means method. Finally, the multi-direction gradient domain scheme is extended to wavelet domain because of the statistical characteristic of wavelet coefficients. At this time, the multiple-step local Wiener filter method in the wavelet domain is produced. Experimental results and comparisons with the related state-of-the-art methods show that the proposed techniques achieve the good performance with high efficiency.

Break: Networking and Refreshments: 15:55-16:15 @ Foyer
Speaker
Biography:

Mohamed A. Naiel received the B.Sc. degree in electronics and communications engineering from Tanta University, Tanta, Egypt in June 2006 and the M.Sc. degree in communications and information technology from Nile University, Giza, Egypt in June 2010. He is currently pursuing the Ph.D. degree in electrical and computer engineering with Concordia University, Montreal, QC, Canada. He has been a research assistant with the center for signal processing and communications, Concordia University, since 2011. His research interests include image and video processing, computer vision, human action recognition, object detection and recognition, and multi-object tracking.

Abstract:

Feature extraction from each scale of an image pyramid to construct a feature pyramid is considered as a computational bottleneck for many object detectors. In this paper, we present a novel technique for the approximation of feature pyramids in the transform domain, namely, the 2D discrete Fourier transform (2DDFT) or the 2D discrete cosine transform (2DDCT) domain. The proposed method is based on a feature resampling technique in the 2DDFT or the 2DDCT domain, exploiting the effect of resampling an image on the feature responses. Experimental results show that the proposed scheme provides feature approximation accuracy which is higher than that of the spatial domain counterpart when gradient magnitude or gradient histogram features are used. Further, when the proposed method is employed for object detection, it provides a detection accuracy superior to that provided by the spatial domain counterpart and compares favorably with that of the state-of-the-art techniques, while performing in real-time.

Ahmad A. Mazhar

Saudi Electronic University College of Computing and Informatics, KSA

Title: Efficient video compression techniques

Time : 16:40-17:05

Speaker
Biography:

Dr. Ahmad A. Mazhar has been a member of the College of Computing and Informatics at Saudi Electronic University since 2015. He has more than ten years teaching experience. He received his Ph.D. in 2013 from De Montfort University, UK; his Master’s degree in computer science from Al- Balqa' Applied University, Salt, Jordan; and his Bachelor’s degree in computer science from Al-Zaytoonah University, Amman, Jordan. Dr. Ahmad has several publications in video compression and analysis.

Abstract:

Video coding is widely used in a variety of applications such as TV streaming, online gaming, virtual reality tours and video conferencing. These applications require good compression techniques so the communication bitrate is reduced without compromising the quality. H.264 has been dominating on many video applications since it was released in 2003. It showed high coding efficiency and reliability especially for standard-definition streaming. The H.264 and VP8 are designed mainly for resolutions lower than High Definition (HD); however, the resolutions nowadays and in the near future demand codecs that are designed to support HD resolutions in addition to Ultra High Definition (UHD). This led to one of the most popular codecs High-Efficiency Video Coding (HEVC) that was released in its first edition in 2013. As video compression is an open competition area and many codec developers are working on codecs developing. The giant Google company has also an important share in the field of video compression by its codec VP9. Google started developing the codec in 2011 as an improved successor of VP8 and released in 2013. Many video coding techniques are available nowadays, however, coding efficiency and complexity are vey important factors that affect selecting a codec. New approaches proposed to decrease the time complexity of encoding, one is done by jointly reducing the number of inter-modes and reference frames. After analyzing the likelihood of selecting inter-modes and reference frames, we arrange them in likelihood levels and check lower levels only if an RD cost condition is satisfied.

Biography:

 

Abstract:

 

Speaker
Biography:

Director of the Imagineering Institute, Iskandar Malaysia, chair Professor of Pervasive Computing at City University London. Founder and Director of the Mixed Reality Lab, Singapore. Previously Full Professor at Keio University, Graduate School of Media Design, Associate Professor in the National University of Singapore and Engineer in Mitsubishi Electric, Japan.Research in mixed reality, human-computer interfaces, wearable computers, pervasive and ubiquitous computing. Featured in worldwide broadcasts such as CNN, BBC, National Geographic, Discovery Channel, etc. Recipient of awards and prizes: A-STAR Young Scientist of the Year, Hitachi Fellowship, SCS Young Professional of the Year, Fellow in Education, World Technology Network, Microsoft Research Award in Gaming and Graphics, C4C Children Competition Prize, Integrated Art Competition Prize, Creativity in Action Award, First prize Mindtrek Award, First prize Milan International InventiON, Keio University Japan Gijyuju-sho award, SIP Distinguished Fellow Award, Young Global Leader by the World Economic Forum. Fellow of the Royal Society for the encouragement of Arts, Manufactures and Commerce (RSA), an organisation which is committed to finding innovative practical solutions to today’s social challenges.Editor in Chief of the academic journals: ACM Computers in Entertainment, Transactions on Edutainment (Springer), and Lovotics: Academic Studies of Love and Friendship with Robots. 1994, Bachelor's (First Class Hons) in Engineering, University of Adelaide; 1999, PhD in Engineering.


Abstract:

This talk outlines new facilities that are arising in the hyper connected internet era within human media spaces. This allows new embodied interaction between humans, species, and computation both socially and physically, with the aim of novel interactive communication and entertainment. Humans can develop new types of communication environments using all the senses, including touch, taste, and smell, which can increase support for multi-person multi-modal interaction and remote presence. In this talk, we present an alternative ubiquitous computing environment and space based on an integrated design of real and virtual worlds. We discuss some different research prototype systems for interactive communication, culture, and play

Ramesh Jain

University of California, USA

Title: Building Visual Web
Speaker
Biography:

Ramesh Jain is an entrepreneur, researcher, and educator. He is a Donald Bren Professor in Information & Computer Sciences at University of California, Irvine where he is doing research in Event Web and experiential computing. Earlier he served on faculty of Georgia Tech, University of California at San Diego, The university of Michigan, Ann Arbor, Wayne State University, and Indian Institute of Technology, Kharagpur. He is a Fellow of ACM, IEEE, AAAI, IAPR, and SPIE. His current research interests are in processing massive number of geo-spatial heterogeneous data streams for building Social Life Networks. He is the recipient of several awards including the ACM SIGMM Technical Achievement Award 2010. Ramesh co-founded several companies, managed them in initial stages, and then turned them over to professional management. These companies include PRAJA, Virage, and ImageWare. Currently he is involved in building Krumbs, a company building personalized visual web. He has also been advisor to several other companies including some of the largest companies in media and search space.

Abstract:

The 21st century began with a major disruption: the rapid rise of smartphones meant that capturing, storing, and sharing photos and their context became easier than using text. Photos and videos communicate directly, without the need for language or literacy. Thus, everyone in the world with a phone is a potential prosumer who can generate as well as consume these new documents. A photo represents information and experience related to a moment. A photo may be linked to many other photos along different dimensions. One may also create explicit links among photos or objects in photos. All photos on the Web form a Visual Web that links photos with other photos and other information elements including all documents on the WWW. This Visual Web offers opportunities to address many difficult yet unsolved problems. We will discuss role of photos as powerful information sources, technical challenges, and some interesting opportunities in this area. We will present a prototype system that could be used for creating such a system and outline technical challenges.

Susan Johnston

CEO Select Services Films Inc- & Founder/Director New Media Film Festival, USA

Title: Commerce and Engagement for Entertainment, The New Frontier
Speaker
Biography:

Susan Johnston is President of award winning Select Services Films, Inc. (DBE) and is Founder/Director of New Media Film Festival. Johnston has a background in the traditional film and TV industry and in recent years as a pioneering new media producer including Stan Lee’s Comikaze Expo panel for Independent Creators, Co-Producing Feature Film Dreams Awake & currently producing the Marvel Comic feature Prey: Origin of the Species. While the Industry was changing from standard def to HD Johnston produced the 1stseries for mobile, Mini-Bikers the 1st live stream talk show on HD with a Panasonic Varicam and tested the Panasonic DVX100 which led to some changes on the DVX100A and was on the SAG New Media committee in 2003. Editorial board for the Encyclopedia of Computer Graphics and Games, Member-The Caucus New Media Steering Committee, Academy of Television Arts & Sciences New Media Interactive, BMI, SAG/AFTRA, Advisory board member for SET Awards (Entertainment Industry Council), Machinma Expo Judge, Advisory Board for Woodbury University Digital Media Lab and Professor Emeritus in New Media.

Abstract:

Engagement, monetization and technology are the stepping stones of building awareness and longevity of your brand. Interactive experiences over multiple platforms are becoming the norm. Which platforms are best for your content? Which strategies speak to your demographic? Technology has aligned with viewers objectives and curated content continues to gain popularity. There are too many Games and TV programs online to watch them all, but the really good ones can attract major attention. We will discuss how to take your content to the next level. We will share companies successes and failures in this area to learn what they did right and how you can match their success and avoid there failures. In doing so, you will learn how to make your content not only stand out from all the rest but build a loyal following that helps you spread the word to engage the masses.

Will Kim

Riverside City College, USA

Title: From Paper to Screen
Speaker
Biography:

Will Kim is a Los Angeles based artist and filmmaker. Will is Associate Professor of Art at Riverside City College where he also directs Riverside City College Animation Showcase. Will Kim received an MFA ('09) in Animation from UCLA and a BFA ('07) in Character Animation from Cal Arts. In recent years, he also taught at Cal Arts, Community Arts Partnership and Sitka Fine Arts Camp as a media art instructor. Kim’s work has showed in over 100 international film/animation festivals and auditoriums including Directors Guild of America (DGA) Theater, Academy of TV Arts and Sciences Theater, The Getty Center, The USC Arts and Humanities Initiative, and Museum of Photographic Arts San Diego. As an animation supervisor and a lead animator, Will has participated in various feature and short live action films that were selected for the New York Times' Critic's Pick, the United Nations' Climate Change Conference, Los Angeles Film Festival, Tribeca Film Festival, and Cannes. Also, Will has painted and animated for companies like 'Adidas' and 'Ito En.'

Abstract:

A definition of Multimedia is ‘the use of a variety of artistic or communicative media.’ Multimedia just like drawing, and a sculpture, is one of many tools to visually communicate ideas, stories, and feelings. When discussing different multiple media sources, it is important to keep in mind the audiences and consumers receive information and ideas very differently from one another in various ways. The presentation will focus on using painting and drawing mediums to create graphics and animation and digitally compiled, edited, and converted to be viewed in galleries, social networking sites, TV, theaters, and etc.

Will Kim

Riverside City College, USA

Title: From Paper to Screen
Speaker
Biography:

Will Kim is a Los Angeles based artist and filmmaker. Will is Associate Professor of Art at Riverside City College where he also directs Riverside City College Animation Showcase. Will Kim received an MFA ('09) in Animation from UCLA and a BFA ('07) in Character Animation from Cal Arts. In recent years, he also taught at Cal Arts, Community Arts Partnership and Sitka Fine Arts Camp as a media art instructor. Kim’s work has showed in over 100 international film/animation festivals and auditoriums including Directors Guild of America (DGA) Theater, Academy of TV Arts and Sciences Theater, The Getty Center, The USC Arts and Humanities Initiative, and Museum of Photographic Arts San Diego. As an animation supervisor and a lead animator, Will has participated in various feature and short live action films that were selected for the New York Times' Critic's Pick, the United Nations' Climate Change Conference, Los Angeles Film Festival, Tribeca Film Festival, and Cannes. Also, Will has painted and animated for companies like 'Adidas' and 'Ito En.'

Abstract:

A definition of Multimedia is ‘the use of a variety of artistic or communicative media.’ Multimedia just like drawing, and a sculpture, is one of many tools to visually communicate ideas, stories, and feelings. When discussing different multiple media sources, it is important to keep in mind the audiences and consumers receive information and ideas very differently from one another in various ways. The presentation will focus on using painting and drawing mediums to create graphics and animation and digitally compiled, edited, and converted to be viewed in galleries, social networking sites, TV, theaters, and etc.

Biography:

Ben Kenwright (MEng,PhD,CEng,SFHEA,FBCS,SMIEEE) is the Programme Leader for the undergraduate and postgraduate games technology degrees at Edinburgh Napier University. He is also Head of Infrastructure in the School of Computing. He studied at both the University of Liverpool and Newcastle University before moving on to work in the game industry and eventually joining the Department of Computing at Edinburgh Napier University in February 2014. His research interests include real-time systems, evolutionary computation, and interactive animation.

Abstract:

The emergence of evolving search techniques (e.g., genetic algorithms) has paved the way for innovative character animation solutions. For example, generating human movements `without' key-frame data. Instead character animations can be created using biologically inspired algorithms in conjunction with physics-based systems. While the development of highly parallel processors, such as the graphical processing unit (GPU), has opened the door to performance accelerated techniques allowing us to solve complex physical simulations in reasonable time frames. The combined acceleration techniques in conjunction with sophisticated planning and control methodologies enable us to synthesize ever more realistic characters that go beyond pre-recorded ragdolls towards more self-driven problem solving avatars. While traditional data-driven applications of physics within interactive environments have largely been confined to producing puppets and rocks, we explore a constrained autonomous procedural approach. The core difficulty is that simulating an animated character is easy, while controlling one is difficult. Since the control problem is not confined to human type models, e.g., creatures with multiple legs, such as dogs and spiders, ideally there would be a way of producing motions for arbitrary physically simulated agents. This presentation focuses on evolutionary algorithms (i.e., genetic algorithms), compared to the traditional data-driven approach. We explain how generic evolutionary techniques are able to emulate physically-plausible and life-like animations for a wide range of articulated creatures in dynamic environments. We help explain the computational bottlenecks of evolutionary algorithms and possible solutions, such as, exploiting massively parallel computational environments (i.e., graphical processing unit (GPU)).

Huiyu Zhou

Queen’s University Belfast, UK

Title: Event reasoning for transport video surveillance
Speaker
Biography:

Dr. Huiyu Zhou obtained a Bachelor of Engineering degree in Radio Technology from the Huazhong University of Science and Technology of China, and a Master of Science degree in Biomedical Engineering from the University of Dundee of United Kingdom, respectively. He was awarded a Doctor of Philosophy degree in Computer Vision from the Heriot-Watt University, Edinburgh, United Kingdom. Dr. Zhou is presently an assistant professor at Queen's University Belfast, UK. He has published over 120 peer reviewed papers in the field. He serves or has served as a technical program committee for 300 conferences in signal and image processing.

Abstract:

The aim of transport video surveillance is to provide robust security camera solutions for mass transit systems, ports, subways, city buses and train stations. As we have known, numerous security threats exist within the transportation sector, including crime, harassment, liability suits and vandalism. Possible solutions have been directed to insulate transportation system from security threats and to make the system safer for passengers. In this talk, I will introduce our solution to deal with the challenges in transports, in particular, city buses. For the benefit of easy understanding, I will structure the talk into the following four sections: (1) The techniques that we developed to automatically extract and select features from face images for robust age recognition, (2) An effective combination of facial and full body measurements for gender classification, (3) Human tracking and trajectory clustering approaches to handle challenging circumstances such as occlusions and pose variations, and (4) event reasoning in smart transport video surveillance.

Speaker
Biography:

Xiaohao Cai received his MS degree in Mathematics from Zhejiang University, China, in 2008, and PhD degree in Mathematics from The Chinese University of Hong Kong, Hong Kong in 2012. He is currently a Research Associate in the Department of Applied Mathematics and Theoretical Physics, University of Cambridge. His research interests include image processing, numerical analysis and their applications in processing of digital image, video, biomedical imaging, remote sensing data, etc.

Abstract:

Image segmentation and image restoration are two important topics in image processing with a number of important applications. In this paper, we propose a new multiphase segmentation model by combining image restoration and image segmentation models. Utilizing aspects of image restoration, the proposed segmentation model can effectively and robustly tackle images with a high level of noise or blurriness, missing pixels or vector values. In particular, one of the most important segmentation models, the piecewise constant Mumford–Shah model, can be extended easily in this way to segment gray and vector-valued images corrupted, for example, by noise, blur or information loss after coupling a new data fidelity term which borrowed from the field of image restoration. It can be solved efficiently using the alternating minimization algorithm, and we prove the convergence of this algorithm with 3 variables under mild conditions. Experiments on many synthetic and real-world images demonstrate that our method gives better segmentation results in terms of quality and quantity in comparison to other state-of-the-art segmentation models, especially for blurry images and those with information loss.

Speaker
Biography:

Michael Alexander Riegler is a PhD student at Simula Research Laboratory. He received his master degree from the Klagenfurt University with distinction. His master thesis was about large scale content based image retrieval. He wrote it at the Technical University of Delft under the supervision of Martha Larson. He is a part of the EONS project at the Media Performance Group. His research interests are endoscopic video analysis and understanding, image processing, image retrieval, parallel processing, gamification and serious games, crowdsourcing, social computing and user intentions. Furthermore he is involved in several initiatives like the MediaEval Benchmarking initiative for Multimedia Evaluation.

Abstract:

Nowadays multimedia is characterized by a very complex nature due to the combination of different type of media, data sources, formats and resolutions, etc. Moreover, the performance is an important factor because of the sheer scale of the data to process. Therefore, the area of high-performance and scalable multimedia system gets more important. One of most important, complex and rapidly growing part of multimedia processing is the medical field. In most of the hospitals the potential of the large amount of collected multimedia data is ignored. This is very often because of the difficulties that processing such amount of data implies and lacking of efficient and simple-to-use analysis system. On the other hand, medical experts get more used to interact with multimedia content because of their daily live interaction and they want to use it also in their work. Most of the time this is a problem and the most common multimedia problems lay unsolved in this area. In this talk this problem is put into the spotlight and a multimedia system is presented that tackles automatic analysis of the gastrointestinal tract as a part of this problem. The focus lies on the presentation and evaluation of multimedia systems capabilities in the medical field. Therefore a novel system, that utilizes the benefits of heterogeneous architectures and can be used to automatically analyse high definition colonoscopies and large amount of capsular endoscopy devices is presented as a use case. Furhter it will be shown, that the improvement of multimedia systems performance via GPU-based processing can help to reach real-time, live multimedia stream processing and low resource consumption which is important for the medical field and can help to save lives.

Speaker
Biography:

Hedva Spitzer has completed his PhD at the age of 32 years from Hebrew University in Jerusalem and postdoctoral studies from National Institute of Health (NIH) in Maryland. She is the head of vision research Lab in electrical school of engineering in Tel Aviv University. She has published more than 40 papers in reputed journals and more than 10 granted patents. Her research interests in recent years have focused on luminance, color, visual adaptation mechanisms, companding natural and medical HDR images, illusory contours, lateral facilitation mechanisms, chromatic aberrations and medical image segmentations and classifications.

Abstract:

We propose a segmentation algorithm and algorithms for companding high dynamic images, which include crucial multi scale texture. This component serves us as an adaptation measure for the algorithm of companding the high dynamic range, and for the segmentation algorithm of non-homogeneous edges and regions. This component was originally inspired by computational model for modeling adaptation mechanisms of assimilation effects in the visual system, were probably functional in enhancing the differences between adjacent texture regions. Many algorithms have been suggested to cope with challenge of the effective companding (compressing and expanding) of high dynamic images. However, the need for a simple, robust and fully automatic algorithm for a large repertoire of images has not yet been met. An algorithm of multi scale adaptive contrast companding (ACC) is suggested, which is inspired by the adaptation mechanisms in the visual system, where a texture contrast in a specific region appears stronger or weaker according to its own value and according to its context contrast. The ACC algorithm successfully compands a large variety of LDR and HDR images, including medical images.The segmentation algorithm applied to B-mode ultrasonographic breast lesions is very challenging. Many studies have tried to answer this aim, with trials to perform maximal resemblance to the contour made by a manual delineation of trained radiologist. In this study, we have developed an algorithm, designed to capture the spiculated boundary of the lesion the above unique multi-scale texture identifier integrated in a level-set framework. The algorithm's performance has been evaluated quantitatively via contour-based and region-based error metrics, on both the algorithm–generated contour and the manual contour delineated by an expert radiologist. The evaluation showed: 1) Mean absolute error of 0.5 pixels between the original and the corrected contour; 2) Overlapping area of 99.2% between the lesion regions, obtained by the algorithm and the corrected contour. These results are significantly better than those previously reported.

Iris Galloso

Universidad Politecnica de Madrid, Spain

Title: Sensorially-enhanced media: one size fits all
Speaker
Biography:

Recent studies encourage the development of sensorially-enriched media to enhance the user experience by stimulating senses other than sight and hearing. Sensory effects have been found to favour media enjoyment and to have a positive influence on the sense of Presence and on the perceived quality, relevance and reality of a multimedia experience. Sports is among the genres that could benefit the most from these solutions. However, scientific evidence on the impact of human factors on the user experience with multi-sensorial media is insufficient and sometimes, contradictory. Furthermore, the associated implications, as regards to the potential adoption of these technologies, have been widely ignored. In this talk I will present the results of an experimental study analysing the impact of binaural audio and sensory (light and olfactory) effects on the sports (football) media experience. We consider the impact on the quality and Presence dimensions, both at the overall level (average effect) and as a function of users’ characteristics (heterogeneous effects). Along the quality dimension, we look for possible variations on the quality scores assigned to the overall media experience and to the media components content, image, audio and sensory effects. The potential impact on Presence is analyzed in terms of Spatial Presence and Engagement. The users’ characteristics considered encompass personal affective, cognitive and behavioral attributes. In this talk I will: i) present our experimental study and its outcomes; ii) discuss and contextualize our results; and, iii) highlight open issues, research challenges and specific hypotheses for future research.

Abstract:

Iris Galloso graduated with honors as Engineer and Master in Telecommunications from CUJAE and as PhD in Telecommunications from Universidad Politécnica de Madrid (Outstanding Cum Laude, given unanimously for her doctoral thesis “User experience with immersiva and interactive media”). She has contributed to +35 R&D collaborative Spanish and European projects. She has authored one book chapter, three research papers (two in Q1 journals) and +21 contributions to international events (6 as invited speaker). She has served as reviewer and member of the Scientific Committee in 7 international conferences and is Substitute Member of the MC of the COST Action 3D‐ConTourNet.

S.-H. Gary Chan

The Hong Kong University of Science and Technology, HK

Title: Scaling up Video Streaming with Cloud and Fog: From Research To Deployment
Speaker
Biography:

Dr. S.-H. Gary Chan is currently Professor at the Department of Computer Science and Engineering, The Hong Kong University of Science and Technology (HKUST), Hong Kong. He is also Chair of the Task Force on Entrepreneurship Education at HKUST. He received MSE and PhD degrees in Electrical Engineering from Stanford University (Stanford, CA), with a Minor in Business Administration. He obtained his B.S.E. degree (highest honor) in Electrical Engineering from Princeton University (Princeton, NJ), with certificates in Applied and Computational Mathematics, Engineering Physics, and Engineering and Management Systems. His research on wireless and streaming networks have led to a number of startups, and received several industrial innovation awards in Hong Kong, Pan Pearl River Delta and Asia-Pacific regions due to their commercial impacts. His research interest includes multimedia networking, wireless networks, mobile computing and IT entrepreneurship.

Abstract:

Due to the popularity of both live and on-demand streaming contents, video has become the mainstream in the Internet traffic. Video traffic is expected to continue to grow due to our demand for better video quality and penetration of mobile devices. Out of all the approaches to scale up video services to millions of users, over-the-top (OTT) delivery emerges as promising to achieve deep user penetration with wide geographical footprints. In order to tame the network congestion and better use network bandwidth in large-scale deployment, we have been conducting research and development of a novel OTT video streaming network using cloud and fog optimization. For live broadcasting, the major concern is to achieve low source-to-end delay and bandwidth cost. For on-demand contents, the major concern is where to replicate and access contents to achieve low interactive delay. In this talk, I will first highlight our research results in cloud and fog streaming. Over the past years, we have been working with industries to deploy the technology to large scale. I will share our experience in our deployment trials, and our technology transfer and commercialization activities in this regard.

Alcides Xavier Benicasa

Federal University of Sergipe, Brazil

Title: An object-based visual selection framework
Speaker
Biography:

Alcides Xavier Benicasa received the Ph.D. degree in Mathematics and Computer Sciences in 2013 by the Institute of Mathematics and Computer Sciences (ICMC), University of São Paulo (USP). He is currently an Adjunct Professor at the Department of Information Systems of the Federal University of Sergipe, Itabaiana, Sergipe, Brazil. His current research interests include artificial neural networks, computer vision, visual attention, bioinformatics, and pattern recognition.

Abstract:

Real scenes are composed of multiple points possessing distinct characteristics. Selectively, only part of the scene undergoes scrutiny at a time, and the mechanism responsible for this task is named selective visual attention. Spatial location with the highest contrast might highlight from scene reaching level of awareness (bottom-up attention). On the other hand, attention may also be voluntarily directed to a particular object in the scene (object-based attention), which requires the recognition of a specific target (top-down modulation). In this paper, a new visual selection model is proposed, which combines both early visual features and object-based visual selection modulations. The possibility of the modulation regarding specific features enables the model to be applied to different domains. The proposed model integrates three main mechanisms. The first handles the segmentation of the scene allowing the identification of objects. In the second one, the average of saliency of each object is computed, which provides the modulation of the visual attention for one or more features. Finally, the third builds the object-saliency map, which highlights the salient objects in the scene. We show that top-down modulation has a stronger effect than bottom-up saliency when a memorized object is selected, and this evidence is clearer in the absence of any bottom-up clue. Experiments with synthetic and real images are conducted, and the obtained results demonstrate the effectiveness of the proposed approach for visual selection.

Speaker
Biography:

Andrey Zakharoff has completed his PhD in 2004 from Moscow State University of Forest and then work as an image processing engineer with Samsung HDTV R@D lab. Now he is computer vision engineer at Softerra Complex Pro company in Moscow. Being a practical programmer, he can't devote much time to prepare papers for publishing.

Abstract:

As a result of significant JPEG or MPEG loss compression and decompression some specific artifacts may turn up on an image, such as blocks of quadrates or false contours, they especially become ugly visible when the image is stretched on a bigger size of HDTV-panel.In this work we try to detect these artifacts, and reduce their visual perception (reduction) by smoothing, while saving natural edges untouched. The task of detection is completed with general histogram analyzing and it's comparison with some local features on the image. Also a measure of false edge smoothing efficacy is suggested and demonstrated with different combination of smoothing methods.

Zhenzhou Wang

Chinese Academy of Sciences Shenyang Institute of Automation, China

Title: A New Approach for automatic and robust segmentation and quantification of cells and nanoparticles
Speaker
Biography:

ZhenZhou Wang received his bachelorship and mastership from the department of electronics and information engineering at Tianjin University of China in 2000 and 2003 respectively. He got his Ph.D. degree from the department of electrical and computer engineering at University of Kentucky in 2007. He was selected in the "Hundred Talents Plan, A-Class" of Chinese Academy of Sciences in 2013 and worked as the research fellow/professor at Shenyang Institute of Automation. He also serves as the Panelist for the National Natural Science Foundation of China. He publishes more than 30 papers in reputed journals as the first author.

Abstract:

With the rapid development of microscopy and nanoscale imaging technology, the requirement for automatic and robust segmentation and quantification of cells or nanoparticles increases greatly. Because of the vast variety of the cell or nanoparticle images, most existing methods are only capable of segmenting some specific type of cells or nanoparticles. In this paper, we propose a more versatile and generalized method that is capable of segmenting and quantifying a variety of cells and nanoparticles automatically and robustly. It consists of five parts: (1) automatic gradient image formation;(2) automatic threshold selection; (3) manual calibration of the threshold selection method for each the specific type of cell or nanoparticle images; (4) manual determination of the segmentation cases for each specific type of cell or nanoparticle images; (5) automatic quantification by iterative morphological erosion. After the parameter, N is calibrated and the segmentation case is determined manually for each specific type of cell or nanoparticle images with one or several typical images, only parts (1), (2) and (5) are needed for the rest of processing and they are fully automatic. The proposed approach is tested with different types of cell and nanoparticle images. Experimental results verified its effectiveness and generality. The qunatitative results show that the proposed approach also achieves significantly better accuracy compared to state of the art methods.

Jon-Chao Hong

National Taiwan Normal University, Taiwan

Title: How to gamify learning contents in the flip classroom
Speaker
Biography:

Jon-Chao Hong has received his doctoral degree in Education from the University of Illinois, Champaign-Urbana, and is currently working as a Chair professor in the department of industrial education at National Taiwan Normal University (NTNU). As the director of Digital Game-based Learning Laboratory (GBL), he has developed 9 web games and 15 educational Apps. These App games were developed based on game-based learning theory to increase the learning motivation of the students. As the President of Taiwan Creativity Development Association, he also organizes several creative contests, such as PowerTech Contest and World Bond Robot Contest to invite elementary, junior and senior high school students to produce robots or miniatures in the morning and using these to compete in the afternoon to ensure students’ hands-on creation without parents or teachers’ assistance. In addition, he has published a number of academic articles in international journals related to digital game-based learning and thinking skills and creativity. Within the last three years, he has successfully published twenty-nine articles on Social Sciences Citation Index (SSCI) journals and received the Outstanding Research Prize from Ministry of Science and Technology in Taiwan.

Abstract:

Gamification typically involves applying game design thinking to non-game applications to make them more fun and engaging. It can potentially be applied to any educational purposes and almost anything to create fun and engaging experiences. In addition, game-based learning is aimed at creating interest. This includes giving students the chance to pose questions, which could help students to activate prior knowledge and link new information to existing knowledge structures with deeper elaboration. In the process of posing and gamifying questions, some factors (e.g. learning anxiety, learning motivation, epistemic curiosity....) could affect the students’ attitudes and learning outcomes. This will be examined based on various App games developed be Digital Game-based Learning Laboratory including “TipOn”, “Whywhy”, “Garden Science” and “Mastering the World” and so on.

Speaker
Biography:

Mohamed G. El-mashed received the B.Sc. (Honors), and M.Sc. and Ph.D. from the Faculty of Electronic Engineering, Menofia University, Menouf, Egypt, in 2008, 2012 and 2016, respectively. He joined the teaching staff of theDept. of Electronics and Electrical Communications, Faculty of Electronic Engineering, Menoufia University. His research areas of interest include: Ultra-Wide Band (UWB) Radar applications, Radar signal processing and imaging, MIMO radar system, SAR imaging techniques, digital signal processing, advanced digital communication systems, wireless communication systems, WiMAX, LTE, LTE-A and FPGA implementation in communication systems.

Abstract:

One of the most challenging tasks of advanced MIMO signal processing with respect to the computational requirements is data detection at the receiver side. The transmitted data has to be detected with low probability of error. For high rate MIMO transmission schemes using spatial multiplexing, optimum data detection can easily become prohibitively complex, since one has to deal with a very strong spatial interference of the multiple transmitted data streams. These systems require receiver with high probability of detection and high performance in order to estimate the transmitted data streams. We propose a scalable and flexible detection algorithm with higher performance. It is characterized by dividing the total detection problem into sub-problems. Each sub-problem is solved separately to reduce complexity. The proposed detection algorithm consists of five stages. The stages are preprocessing, Group Interference Suppression, sub-optimum detection algorithm with low dimension to detect first data streams, Interference Cancelation and linear detector to detect last data. Each stage can be updated with advanced stage to enhance the performance without affecting other stages. This algorithm is applicable for advanced communication systems that deploy multiple antennas at transmitter and receiver. This paper investigates the performance of the proposed algorithm and compares its performance with other detection algorithms.

  • Computer Vision
    Multimedia Health Computation and its Applications
    Multimedia Systems and Applications
    Human-Computer Interaction | Multimedia Content Analysis
Speaker

Chair

Ghyslain Gagnon

Ecole de Technologie Supérieure, Canada

Speaker

Co-Chair

Changsoo Je

Sogang University, Korea

Session Introduction

Leonel Antonio Toledo Díaz

Instituto Teconologico de Estudios Superiores de Monterrey

Title: Visualization Techniques for Crowd Simulation
Biography:

Leonel Toledo recieved his PhD from Instituto Tecnológico de Estudios Superiores de Monterrey Campus Estado de México in 2014, where he currently is a full-time professor. From 2012 to 2014 he was an assistant professor and researcher. He has devoted most of his research work to crowd simulation and visualization optimization. He has worked at the Barcelona Supercomputing Center using general purpose graphics processors for high performance graphics. His thesis work was in Level of detail used to create varied animated crowds. His research interests include crowd simulation, animation, visualization and high-performance computing and HCI.

Abstract:

Animation and simulation of crowds finds applications in many areas, including entertainment( e,g., animation of large numbers of people of movies and games), creation of immersive virtual environments, and evaluation of crowd management techniques (for instance, simulation of the flow of people leaving a football stadium after a match).In order to have a persuasive application using crowds in virtual environments, various aspects of the simulation have to be addressed, including behavioral animation, environment modelling, and crowd rendering. Real-time graphics systems are required to render millions of polygons to the screen per second. Real-time computer animated graphics has a very heavy reliance of the current generation of graphics hardware. However like many fields in computing science, the requirements of computer graphics software far outstrips hardware capabilities.

Procedural generation techniques are widely used in computer graphics to model systems of high complexity. Many of these techniques target the generation of natural phenomena in high complexity and detail to achieve realistic results. Procedural generation can be computationally intensive and is not commonly used in real-time systems to generate entire virtual worlds. However, advancements in processing speed and graphics hardware generate three-dimensional models in real-time on commodity hardware.Applications can range from entertainment to urban design or crisis management, traffic simulation in big cities can benefit from this visualization techniques.

Ghyslain Gagnon

École de technologie supérieure, Canada

Title: Robust multiple-instance learning ensembles using random subspace instance selection

Time : 11:25-11:50

Speaker
Biography:

Ghyslain Gagnon received the Ph.D. degree in electrical engineering from Carleton University, Canada in 2008. He is now an Associate Professor at École de technologie supérieure, Montreal, Canada. He is an executive committee member of ReSMiQ and Director of research laboratory LACIME, a group of 10 Professors and nearly 100 highly-dedicated students and researchers in microelectronics, digital signal processing and wireless communications. Highly inclined towards research partnerships with industry, his research aims at digital signal processing and machine learning with various applications, from media art to building energy management

Abstract:

Many real-world pattern recognition problems can be modeled using multiple-instance learning (MIL), where instances are grouped into bags, and each bag is assigned a label. State-of-the-art MIL methods provide a high level of performance when strong assumptions are made regarding the underlying data distributions, and the proportion of positive to negative instances in positive bags. In this paper, a new method called Random Subspace Instance Selection (RSIS) is proposed for the robust design of MIL ensembles without any prior assumptions on the data structure and the proportion of instances in bags. First, instance selection probabilities are computed based on training data clustered in random subspaces. A pool of classifiers is then generated using the training subsets created with these selection probabilities. By using RSIS, MIL ensembles are more robust to many data distributions and noise, and are not adversely affected by the proportion of positive instances in positive bags because training instances are repeatedly selected in a probabilistic manner. Moreover, RSIS also allows the identification of positive instances on an individual basis, as required in many practical applications. Results obtained with several real-world and synthetic databases show the robustness of MIL ensembles designed with the proposed RSIS method over a range of witness rates, noisy features and data distributions compared to reference methods in the literature

Speaker
Biography:

Dr Vijayan Asari is a Professor in Electrical and Computer Engineering and Ohio Research Scholars Endowed Chair in Wide Area Surveillance at the University of Dayton, Dayton, Ohio, USA. He is the director of the Center of Excellence for Computer Vision and Wide Area Surveillance Research (Vision Lab) at UD. As leaders in innovation and algorithm development, UD Vision Lab specializes in object detection, recognition and tracking in wide area surveillance imagery captured by visible, infrared, thermal, hyperspectral, LiDAR (Light Detection and Ranging) and EEG (electroencephalograph) sensors. Dr Asari's research activities include development of novel algorithms for human identification by face recognition, human action and activity recognition, brain signal analysis for emotion recognition and brain machine interface, 3D scene creation from 2D video streams, 3D scene change detection, and automatic visibility improvement of images captured in various weather conditions. Dr Asari received his BS in electronics and communication engineering from the University of Kerala, India, and M Tech and PhD degrees in Electrical Engineering from the Indian Institute of Technology, Madras. Prior to joining UD in February 2010, Dr Asari worked as Professor in Electrical and Computer Engineering at Old Dominion University, Norfolk, Virginia for 10 years. Dr Asari worked at National University of Singapore during 1996-98 and led a research team for the development of a vision-guided microrobotic endoscopy system. He also worked at Nanyang Technological University, Singapore during 1998-2000 and led the computer vision and image processing related research activities in the Center for High Performance Embedded Systems at NTU. Dr Asari holds three patents and has published more than 500 research papers, including 85 peer-reviewed journal papers in the areas of image processing, pattern recognition, machine learning and high performance embedded systems. Dr Asari has supervised 22 PhD dissertations and 35 MS theses during the last 15 years. Currently 18 graduate students are working with him in different sponsored research projects. He is participating in several federal and private funded research projects and he has so far managed around $15M research funding. Dr. Asari received several teaching, research, advising and technical leadership awards. He is a Senior Member of IEEE and SPIE, and member of the IEEE Computational Intelligence Society. Dr Asari is the co-organizer of several SPIE and IEEE conferences and workshops.

Abstract:

The amazing progress in sensor technology has made it possible for capturing images of Giga bytes in frame size at a reasonable frame rate in wide area motion imagery (WAMI) processing scenario. Automatic detection, tracking and identification of objects in this imagery in real time are becoming a necessity for security and surveillance applications. Feature extraction and classification of moving objects in WAMI data is challenging as the size of the objects in the image may be too small and they appear in different viewing angles and in varying environmental conditions. We present a new framework for detection and tracking of such low resolution objects in wide area imagery. The motivation behind the development of this algorithm is to utilize the entire information that is available about the object of interest in the detection and tracking processes. The proposed method makes use of a dense version of localized histogram of gradients on the difference images. A Kalman filter based predictive mechanism is employed in the tracking methodology. The feature based tracking mechanism can track all the moving objects.  The robustness of the proposed methodology is illustrated with the help of detection and tracking of several objects of interest in varying situations. It is observed that the new method can even track pedestrians in WAMI data. We also present the effect of our shadow illumination and super-resolution techniques to improve object detection and tracking in very long range videos. The processing steps include stitching of images captured by multiple sensors, video stabilization and distortion correction, frame alignment for registration and moving object detection, tracking of multiple objects and humans in the motion imagery, classification of objects and identification of humans in the scene, and communication of large image data and decisions to multiple destinations. In addition, information extracted from video streams captured by sensors located at different regions could also be used for accurate decision making. 

Robert S Laramee

Swansea University, UK

Title: Visual analytics for big video data

Time : 10:35-11:05

Speaker
Biography:

Robert S Laramee received a bachelors degree in physics, cum laude, from the University of Massachusetts, Amherst (ZooMass). He received a masters degree in computer science from the University of New Hampshire, Durham. He was awarded a PhD from Vienna University of Technology (Gruess Gott TUWien), Austria at the Institute of Computer Graphics and Algorithms in 2005. From 2001 to 2006 he was a researcher at the VRVis Research Center (www.vrvis.at) and a software engineer at AVL (www.avl.com) in the department of Advanced Simulation Technologies. Currently he is an Associate Professor in Data Visualizaton at Swansea University (Prifysgol Cymru Abertawe), Wales in the Department of Computer Science (Adran Gwyddor Cyfrifiadur). His research interests are in the areas of big data visualization, visual analytics, and human-computer interaction. He has published more than 100 peer-reviewed papers in scientific conferences and journals and served as Conference Chair of EuroVis 2014, the premiere conference on data visualization in Europe.

Abstract:

With advancements in multimedia and data storage technologies and the ever-decreasing costs of hardware, our ability to generate and store evermore video and other multimedia data is unprecedented. YouTube, for example, has over 1 billion users. However, a very large gap remains between our ability to generate and store large collections of complex, time-dependent video and multimedia data and our ability to derive useful information and knowledge from it. Viewing video and multimedia as a data source, visual analytics exploits our most powerful sense, vision, in order to derive information, knowledge and gain insight into big multimedia data sets that record complicated and often time-dependent events. This talk presents a case study of state-of-the art visualization and visual analytics techniques applied to video multimedia in order to explore, analyze, and present video data. In this case, we show how glyph-based visualization can be used to convey the most important information and events from videos of rugby games. The talk showcases some of visualizations strengths, weaknesses, and, goals. We describe inter-disciplinary case-study based on rugby sports analytics, where visual analytics and visualization is used to address fundamental questions-the answers of which we hope to discover in various large, complex, and time-dependent multimedia data

Ching Y. Suen

Concordia University, Canada

Title: Digital Fonts and Reading
Speaker
Biography:

Ching Y. Suen is the Director of CENPARMI and the Concordia Honorary Chair on AI & Pattern Recognition. He received his Ph.D. degree from UBC (Vancouver) and his Master's degree from the University of Hong Kong. He has served as the Chairman of the Department of Computer Science and as the Associate Dean (Research) of the Faculty of Engineering and Computer Science of Concordia University. Prof. Suen has served at numerous national and international professional societies as President, Vice-President, Governor, and Director. He has given 45 invited/keynote papers at conferences and 200 invited talks at various industries and academic institutions around the world. He has been the Principal Investigator or Consultant of 30 industrial projects. His research projects have been funded by the ENCS Faculty and the Distinguished Chair Programs at Concordia University, FCAR (Quebec), NSERC (Canada), the National Networks of Centres of Excellence (Canada), the Canadian Foundation for Innovation, and the industrial sectors in various countries, including Canada, France, Japan, Italy, and the United States. Currently, he is the Editor-in-Chief of the journal of Pattern Recognition, an Adviser or Associate Editor of 5 journals, and Editor of a new book series on Language Processing and Pattern Recognition. Actually he has held previous positions as Editor-in-Chief, or Associate Editor or Adviser of 5 other journals. He is not only the founder of three conferences: ICDAR, IWFHR/ICFHR, and VI, but has also organized numerous international conferences including ICPR, ICDAR, ICFHR, ICCPOL, and as Honorary Chair of numerous international conferences.

Abstract:

Thousands of years ago, humans started to create symbols to represent things they saw, heard, touched, found, remembered, imagined, and talked about. We can see them carved on rocks, walls, shells, and other materials. From these symbols, words and different languages were invented, modified, expanded, and evolved over the years. Following the invention of paper and writing instruments, different ways of representing the same symbol started to appear, forming the basis of different stylistic variations and font types.As time went by, computers and digital technology emerged with which the alphabets of all languages in the world can be printed digitally. Once a symbol has been represented in a digital format, there are infinite ways of representing it in unlimited type fonts for publishing. This talk summarizes the evolution of fonts, their characteristics and their personality traits. Aspects such as font styles and their effects on reading and eyesight, legibility and comprehension, will be discussed with experimental results.

Speaker
Biography:

Takashi Nakamura has completed his PhD at the age of 28 years from Kobe University. He is the professor of media studies in Faculty of Humanities in Niigata University. He has published more than 20 papers (including ones in Japanese) and two books (one as a singular author and the other as a sigular editor) in Japanese. He is an editorial board member of Annals of Behavioural Science.

Abstract:

This presentation focused on the action of looking at a mobile phone display as a type of nonverbal behavior/communication and compared it cross-culturally. The diversity of nonverbal behavior/communication was considered to be caused by the difference between Western and non-Western cultures. The questionnaire was conducted in three countries (the USA, Hong Kong and Japan), and a total of 309 subjects participated. The participants were required to record their opinions for the action according to the situation with ‘co-present’ familiar persons. The analysis declared that the difference between the USA and Japan was more pronounced as the relationship with the ‘co-present’ person was more intimate. The results of the Hong Kong sample were intermediate between those of the other two countries. The diversity was discussed in terms of independent/interdependent self in the perspective of cultural comparison and of mobile phone usage. The analysis revealed that the action as a form of nonverbal behavior/communication has functioned in human relationships and has been deeply embedded into culture in the mobile phone era.

Changyu Liu

South China Agricultural University, China

Title: Complex Event Detection via Bank based Multimedia Representation

Time : 12:40-13:05

Speaker
Biography:

Changyu Liu received the PhD degree in 2015 from South China University of Technology, where he worked under the supervision of Prof. Shoubin Dong. He is currently a lecturer at the College of Mathematics and Informatics, South China Agricultural University. He was a visiting scholar at the School of Computer Science, Carnegie Mellon University, from September 2012 to October 2013, advised by Dr. Alex Hauptmann. Then, he worked with Prof. Mohamed Abdel-Mottaleb and Prof. Mei-Ling Shyu at the Department of Electrical and Computer Engineering, University of Miami, from October 2013 to September 2014. He serves as a reviewer for many international journals, such as Neural Computing and Applications, Security and Communication Networks, KSII Transactions on Internet and Information Systems, Journal of Computer Networks and Communications, and Tumor Biology. He is a Technical Program Committee member for many international conferences, such as GMEE2015, PEEM2016, and ICEMIE2016. His research interests include computer vision, pattern recognition, multimedia analysis, bioinformatics, virtual reality, and machine learning.

Abstract:

Along with the advent of big data era, available multimedia collections are expanding. To meet increasingly diversified demands of multimedia applications from the public, effective multimedia analysis approaches are required urgently. Multimedia event detection, as an emerging branch in multimedia analysis, is gaining considerable attention from both industrial and academic researchers. However, much current effort on multimedia event detection has been dedicate to detecting complex events in controlled video clips or simple events in uncontrolled video clips. In order to perform complex event detection tasks in uncontrolled video clips, we propose an event bank descriptor approach, which has been published in the journal of Neurocomputing, for multimedia representation. The approach divides spatial temporal objects of an event into objects described by a latent group logistic regression mixture model trained on a large number of labeled images which can be obtained very easily from standard image datasets, and spatial temporal relationships described by spatial temporal grids trained on a relatively small number of labeled videos which can be also obtained very easily from standard video datasets. Furthermore, we combine the coordinate descent approach and the gradient descent approach to develop an efficient iterative training algorithm to learn model parameters in the event bank descriptor, and conduct extensive experiments on the ImageNet challenge 2012 dataset and the TRECVID MED 2012 dataset. The results showed that the proposed approach outperformed state-of-the-art approaches for complex event detection in uncontrolled video clips. The benefits of our approach are mainly threefold: Firstly, outliers in training examples are removed. Secondly, subtle structural variations are allowed for detection. Thirdly, feature vectors of event bank are jointly sparse.

Break: Lunch: 13:05-13:45 @ Orwelll’s Brasserie
Biography:

AKM Mahbubur Rahman has completed his PhD at the age of 32 years. He is working as Senior Research Scientist in Eyelock LLC, an acknowledged leader in advanced iris authentication for the Internet of Things (IoT). He has published more than 10 papers in reputed conferences and journals.

Abstract:

Disabilities related congenital blindness, vision loss or partial-sight disturbs not only one's physical body, but the trajectory of one's social interactions due to lack of perception of partner's facial behavior, head pose, and body movements. It is well documented amongst the literature that 'sight loss can lead to depression, loneliness, and anxiety. The complex emotional states are recognized by the sighted people disproportionately by processing the visual cues from the eye and mouth region of the face. For instance, social communications with eye to eye contact provide information about concentration, confidence, and engagement. Smiles are universally recognized as signs of pleasure and welcome. In contrast, looking away for a long time is perceived as lack of concentration, break of engagement, or boredom. However, the visually impaired people have no access to these cues from the eye and mouth regions. Hence, this non-verbal information are less likely to be communicated through the voice. Additionally, if the interlocutor is silent (listening), the blind individual would have no clue about his interlocutor's mental state. The scenario might be more complex where a group of people including visually impaired persons are interacting in a discussion, debates, etc. The disability of perceiving emotions and epistemic states can be improved by Computer Vision & Machine Learning based assistive technology solution that is capable of processing facial behavior, head pose, facial expressions, and physiological signals in real-time. A practical and portable system is desired that would predict VAD dimensions as well as the facial events from interlocutor's facial behavior and head pose in natural environments (for instance: conversation in a building corridor, asking questions to a stranger in a street, discussing topics of interest in a university campus, etc). Building social assistive technologies using computer vision and machine learning techniques is rather new and unexplored that poses complex research challenges. However, the challenges have been overcome to implement such kind of robust system for real-world deployment. Research challenges are identified and divided into three categories. a) System and face-tracker related challenges b) Classification and prediction related challenges c) Deployment related issues. This paper presents the design and implementation of EmoAssist: a smart-phone based system to assist in dyadic conversations. The main goal of the system is to provide access to more non-verbal communication options to people who are blind or visually impaired. The key functionalities of the system are to predict behavioral expressions (such a yawn, a closed lip smile, a open lip smile, looking away, sleepy, etc.) and 3-D affective dimensions (valence, arousal, and dominance) from visual cues in order to provide the correct auditory feedback or response. A number of challenges related to the data communication protocols, efficient tracking of the face, modeling of behavioralexpressions/affective dimensions, feedback mechanism and system integration were addressed to build an effective and functional system. In addition, orientation sensor information from the smart-phone was used to correct image alignment to improve the robustness for real world application . Empirical studies show that the EmoAssist can predict affective dimensions with acceptable accuracy (Maximum Correlation-Coefficient for valence: 0.76, arousal: 0.78, and dominance: 0.76) in natural dyadic conversation. The overall minimum and maximum response-times are (64.61 milliseconds) and (128.22 milliseconds), respectively. The integration of sensor information for correcting the orientation improved (16% in average) the accuracy in recognizing behavioral expressions. A usability study with ten blind people in social interaction shows that the EmoAssist is highly acceptable with an Average acceptability rating using of 6:0 in Likert scale (where 1 and 7 are the lowest and highest possible ratings, respectively).

Dongkyu Lee

Kwangwoon University, Korea

Title: Fast motion estimation for HEVC on graphics processing unit (GPU)

Time : 13:45-14:10

Speaker
Biography:

Dongkyu Lee received his B.S. and M.S. degrees in Electronic Engineering from Kwangwoon University, Seoul, Korea, in 2012 and 2014, respectively. He is a Ph.D. candidate at the Kwangwoon University. His research interests are image and video processing, video compression, and video coding

Abstract:

The recent video compression standard, HEVC (high efficiency video coding), will most likely be used in various applications in the near future. However, the encoding process is far too slow for real-time applications. At the same time, computing capabilities of GPUs (graphics processing units) have become more powerful in these days. In this talk, we present a GPU-based parallel motion estimation (ME) algorithm to enhance the performance of an HEVC encoder. A frame is partitioned into two subframes for pipelined execution to improve GPU utilization. The flow chart is redetermined to solve data hazards in the pipelined execution. Two new methods are introduced in the proposed ME: decision of a representative search center position (RSCP) and warp-based concurrent parallel reduction (WCPR). A RSCP employs motion vectors of a co-located CTU (coding tree unit) in a previously encoded frame to solve a dependency problem in parallel computation with negligible coding loss. WCPR concurrently executes several parallel reduction operations, which increases the thread utilization from 20 to 89 % without any thread synchronization. The proposed encoder can make the portion of ME in the encoder negligible with 2.2 % bitrate increase against the HEVC test model (HM) encoder. In terms of ME, the proposed ME is 130.7 times faster than that of the HM encoder.

Speaker
Biography:

Dr Morrow is a specialist in paediatric rehabilitation. She completed her PhD in 2010 and is head of the Brain Injury Service at the Children’s Hospital at Westmead, Sydney, Australia. Her research interests include the role of applications in the delivery of paediatric health services and consumer engagement in the design and development of health interventions.

Abstract:

The BrightHearts app has been developed to teach children biofeedback assisted relaxation techniques (BART) to manage pain and anxiety in health care settings. This digital artwork which responds to changes in heart rate transmitted via a wireless pulse oximeter was developed through an interative design process incorporating qualitative data from health professionals and children and prototype exhibitions in hospital waiting areas. The final iteration of the work used in the pilot trial comprised an iPad app used in conjunction with a custom-built bluetooth 4.0 wireless pulse oximenter, that measures and transmits inter-beat interval data that is then used to control the changes in the appearance and sound of the app.In contrast to object and/or character driven visuals used in many computer games and biofeedback displays, BrightHearts focuses the user’s attention on the gradual changes in a ‘mandala’ like circular interface encouraging a more relaxed quality of engagement. Users can contract successive layers of overlapping circular shapes using gentle, sustained exhalations. The more relaxed they become, the slower their average heart rate and the more layers they can draw inwards toward the center of the screen. BrightHearts has been succesfully piloted for the management of procedural pain and anxiety in children aged 7-18 years and for the management of pain and anxiety associated with vaccination in a school based vaccination programme for adolescents. BrightHearts is currently being evaluated in three reandomised controlled trials including a study evaluating the efficacy of BrightHearts for managing chronic pain in children with cerebral palsy.

Speaker
Biography:

Zhaoming Guo began to study the narrowband mobile data communication system in the communication corporation in 1995. In March 1997, he raised the concept of ‘‘mobile network computer’’ combined with the research experience of narrow band mobile data communication system and the concept of ‘‘network computer’’ raised by ORACLE CEO. In June 2000 and July 2001, he also published article about mobile network computer in ‘‘China Computer Newspaper’’ and ‘‘China Wireless Communication’’. Today, 20 years later, the concept of mobile network computer still remains vigorous. He has received his MS degree from Nanjing University of Science and Technology, China in 1995, and has completed his PhD degree from Beijing Institute of Technology, China in 2016. In july,2015, Guo Zhaoming published the article of mobile network computer in the SCI media “Wireless Personal Communication” again,the tittle is "Mobile Network Computers Should be the Terminal of Mobile Communication Networks". In June,2016,another article about mobile network computer wroten by Guo Zhaoming was also acceppted by SCI media "China Communication",and will be published in the end of year,the tittle is"Mobile Network Computer Can Better Describe the Future of Information Society

Abstract:

The concept of a network computer was proposed by ORACLE CEO Larry Ellison in 1995, and the concept of a mobile network computer was put forward by Mr. Guo Zhaoming of China in March, 1997. Today, nearly 20 years later, the concept of a mobile network computer still remains vigorous. We illustrate the importance of the concept of mobile network computer from a technological perspective. Because of the usefulness of mobile network computers, with the growth of the Internet of things, A modern mobile communication network should be referred to as a mobile computer network and a modern mobile communication terminal should be referred to as a “mobile network computer”, rather than other name. Mobile network computers may include not only TV box audio-visual equipment, wireless household appliances, and mobile communication equipment, but may also include devices such as intelligent foot rings, smart watches, smart glasses, smart shoes and smart coats. In a word, everything of mobile internet are mobile network computer. We aim to popularize the concept of mobile network computers for its accuracy and importance, which better define modern mobile terminals and reflects the nature of multiple mobile terminals based on the structure of their integrated computers and the capabilities of processing multimedia Also, an introduction to the integration approach of mobile communication and computer networks is provided, including technology integration, business integration, network integration, and IMS technology in terms of several aspects, for the purpose of providing people with opportunity to learn more about mobile computer networks, and thereby better understand the concept of a mobile network computer. In the computer and internet age, network and mobile network computers may be the main terminals of fixed and mobile networks. Therefore, based on the concept of mobile network computers, we discuss the future of information society.

Chetan Bhole

A9.com (Amazon's search engine subsidiary), USA

Title: Automated Person Segmentation in Unconstrained Video
Speaker
Biography:

Chetan Bhole is currently a machine learning scientist at A9.com (Amazon's search engine subsidiary) working on ranking problems. He completed his PhD from University of Rochester in Computer science in 2013 specializing in applying machine learning to computer vision. He has published more than 10 papers in conferences and journals, is a reviewer of reputed journals and a contributor of open source software.

Abstract:

Segmentation of people is an important problem in computer vision with uses in image understanding, graphics, security applications, sports analysis, education etc. In this talk, I will summarize work done in this area and our contributions. We have focussed on automatically segmenting a person from challenging video sequences. To have a general solution, we place no constraint on camera viewpoint, camera motion or the movements of a person in the scene. Our approach uses the most confident predictions from a pose or stick figure detector in key frames as forms of anchors that helps guide the segmentation of other more challenging frames in the video. Due to the unreliability of state of the art pose detectors on general frames, only highest confidence pose detections (key frames) are used. Features like color, position and optical flow are extracted from key frames and multiple conditional random fields (CRFs) are used to process blocks of video in batches. 2D CRFs for detailed key frame segmentations and 3D CRFs for propagating segmentations to the entire sequence of frames belonging to batches are used. Location information derived from the pose detector is also used to refine the results. As an important note, no hand labeled segmentation training data is required by our method. We discuss variants of the model and comparison to prior work. We also contribute our evaluation data to the community to facilitate further experiments.

Xintao Ding

Anhui Normal University, China

Title: The global performance evaluation for local descriptors

Time : 15:00-15:25

Speaker
Biography:

Xintao Ding is an associate professor at Anhui Normal University. He has completed his PhD from Anhui Normal University. He has spent his entire career working in the field of computer vision and machine learning. He holds three patents and has published more than 10 papers in the areas of image processing and computer vision. He has worked on and managed many funded research projects developing computer vision for use across a range of application.

Abstract:

Interest descriptors have become popular for obtaining image to image correspondence for computer vision tasks. Traditionally, local descriptors are mainly evaluated in a local scope, such as repeatability, ROC curves, and recall versus 1-precision curves. These local evaluations did not take into account the application fields of descriptors. Generally, local descriptors have to be refined before application so that they meet the desire of the global tasks. The correspondence toughness between two images depends on the number of true matches. Therefore, the number of correctly detected true matches (NoCDTM), which is the number of matches after random sample consensus (RANSAC) refinement, is proposed as a global score to evaluate descriptors performance. A larger NoCDTM suggests a larger number of true matches and takes advantage of a tougher correspondence. When the evaluation is run over a set of images, all their NoCDTM may be directly shown in a pseudo-color image, in which the pseudo-color of each pixel shows a NoCDTM of an image. In order to show descriptors performance over an image set in an overall way, a histogram of NoCDTM may be employed for evaluation. After dividing the range of the obtained NoCDTM into several intervals, the occurrences of NoCDTM in every interval are counted to generate the histogram. The histogram of a descriptor with a fat-tail suggests a high performance. It may be more reasonable to break descriptors local attribute and evaluate descriptors performance in a global scope.

Break: Networking and Refreshments: 15:25-15:45 @ Foyer
Speaker
Biography:

El Habib Nfaoui is currently an associate Professor of Computer Science at the University of Sidi Mohammed Ben Abdellah. He obtained his PhD in Computer Science from University of Sidi Mohamed Ben Abdellah in Morocco and University of Lyon (LIESP Laboratory) in France under a COTUTELLE (co-advising) agreement. His current research interests are Information Retrieval, Semantic Web, Social networks, Machine learning, Web services, Multi-Agent Systems, Decision-making and modeling. He is a Guest Editor at the International Journal of Intelligent Engineering Informatics (ACM, DBLP…). He co-founded the International Conference on Intelligent Systems and Computer Vision (ISCV2015) and has served as Program Committee of various conferences. He has published several papers in reputed journals and international conferences

Abstract:

Microblogging platforms allow users to post short messages and content of interest, such as tweets and user statuses in friendship networks. Searching and mining microblog streams offer interesting technical challenges in many microblog search scenarios, and the goal is to determine what people are saying about concepts such as products, brands, persons, etc. However, retrieving short text and determining the subject of an individual micro post present a significant research challenge owing to several factors: creative language usage, high contextualization, the informal nature of micro blog posts and the limited length of this form of communication. Thus, micro blogging retrieval systems suffer from the problems of data sparseness and the semantic gap. To overcome these problems, recent studies on content-based microblog searching have focused on adding semantics to micro posts by linking short text to knowledge bases resources. Moreover, previous studies use bag-of-concepts representation by linking named entities to their corresponding knowledge base concepts. In the first part of this talk, we are going to review the drawbacks of these approaches. In the second part, we present a graph-of-concepts method that considers the relationships among concepts that match named entities in short text and their related concepts and contextualizes each concept in the graph by leveraging the linked nature of DBpedia as a Linked Open Data knowledge base and graph-based centrality theory. Finally, we introduce some experiment results, using a real Twitter dataset, to show the effectiveness of our approach

Leonel Antonio Toledo Díaz

Instituto Teconologico de Estudios Superiores de Monterrey

Title: Visualization Techniques for Crowd Simulation
Biography:

Leonel Toledo recieved his PhD from Instituto Tecnológico de Estudios Superiores de Monterrey Campus Estado de México in 2014, where he currently is a full-time professor. From 2012 to 2014 he was an assistant professor and researcher. He has devoted most of his research work to crowd simulation and visualization optimization. He has worked at the Barcelona Supercomputing Center using general purpose graphics processors for high performance graphics. His thesis work was in Level of detail used to create varied animated crowds. His research interests include crowd simulation, animation, visualization and high-performance computing and HCI.

Abstract:

Animation and simulation of crowds finds applications in many areas, including entertainment( e,g., animation of large numbers of people of movies and games), creation of immersive virtual environments, and evaluation of crowd management techniques (for instance, simulation of the flow of people leaving a football stadium after a match).In order to have a persuasive application using crowds in virtual environments, various aspects of the simulation have to be addressed, including behavioral animation, environment modelling, and crowd rendering. Real-time graphics systems are required to render millions of polygons to the screen per second. Real-time computer animated graphics has a very heavy reliance of the current generation of graphics hardware. However like many fields in computing science, the requirements of computer graphics software far outstrips hardware capabilities.

Procedural generation techniques are widely used in computer graphics to model systems of high complexity. Many of these techniques target the generation of natural phenomena in high complexity and detail to achieve realistic results. Procedural generation can be computationally intensive and is not commonly used in real-time systems to generate entire virtual worlds. However, advancements in processing speed and graphics hardware generate three-dimensional models in real-time on commodity hardware.Applications can range from entertainment to urban design or crisis management, traffic simulation in big cities can benefit from this visualization techniques.

Speaker
Biography:

Yolanda Mafikeni has completed her Diploma at the age of 22 years from Hartland Training & Development Centre. She is the Supervisor at Oodua Technologies and Investment Pty Ltd, a premier Information Technology, Management and Media service organization.

Abstract:

Multimedia learning is innovative and has revolutionised the way we learn online. It is important to create a multimedia learning environment that stimulates active participation and effective learning. The significance of multimedia learning extends to include the cultivation of professional and personal experiences that reflect the reality of a traditional face-to-face classroom milieu. The difficulties from e-learning often relate to the absence of human-liked presence and characteristics (Woo, 2009), leading to a need for research investigation into this area. A number of strategies have been used to foster effective learning and to create a social learning environment that reflects humanistic characters. The purpose of this article is in twofold: (i) to examine the cognitive theory of multimedia learning (Mayer, 2001, 2002) and its relevance to multimedia presentations, and (ii) to discuss the strategies of visualisation (e.g., static and dynamic visual representations) and their relationship to multimedia learning, and the applicability and importance of multimedia learning to the enhancement of effective learning. Drawing from the evidence examined, we provide a conceptualised framework that accentuates the integration of the cognitive load theory and the theory of multimedia learning in e-learning. We discuss, for example, the use of animated pedagogical agents (APAs) to help establish a social learning environment that is conducive to learning and the promotion of critical thinking.

Speaker
Biography:

Dimitrios A. Karras received his Diploma and M.Sc. Degree in Electrical and Electronic Engineering from the National Technical University of Athens, Greece in 1985 and the Ph. Degree in Electrical Engineering, from the National Technical University of Athens, Greece in 1995, with honours. From 1990 and up to 2004 he collaborated as visiting professor and researcher with several universities and research institutes in Greece. Since 2004, after his election, he has been with the Sterea Hellas Institute of Technology, Automation Dept., Greece as associate professor in Digital Systems and Signal Processing as well as with the Hellenic Open University, Dept. Informatics as a visiting professor in Communication Systems (the latter since 2002 and up to 2010). He has published more than 65 research refereed journal papers in various areas of pattern recognition, image/signal processing and neural networks as well as in bioinformatics and more than 170 research papers in International refereed scientific Conferences. His research interests span the fields of pattern recognition and neural networks, image and signal processing, image and signal systems, biomedical systems, communications, networking and security. He has served as program committee member in many international conferences, as well as program chair and general chair in several international workshops and conferences in the fields of signal, image, communication and automation systems. He is, also, editor in chief of the International Journal in Signal and Imaging Systems Engineering (IJSISE), academic editor in the TWSJ, ISRN Communications and the Applied Mathematics Hindawi journals as well as associate editor in various scientific journals. He has been cited in more than 1400 research papers his H/G-indices are 16/27 (Google Scholar) and his Erdos number is 5. His RG score is 30.07. He is an industry consultant since 2009 and senior consultant at Senior Consultant at E.S.E.E. (Hellenic Confederation of Commerce&Entrepreneurship

Abstract:

A novel methodology is herein outlined for multimedia data mining problems by designing an hierarchical pattern mining neural system. The proposed system combines the data mining decisions of different neural network pattern mining systems. Instead of the usual approach for applying voting schemes on the decisions of their output layer neurons, the proposed methodology integrates higher order features extracted by their upper hidden layer units. More specifically, different instances (cases) of each such pattern mining system, derived from the same training process but with different training parameters, are investigated in terms of their higher order features, through similarity analysis, in order to find out repeated and stable higher order features. Then, all such higher order features are integrated through a second stage neural network pattern mining system having as inputs suitable similarity features of them. The herein suggested hierarchical pattern mining neural system for multimedia data mining applications shows improved pattern mining performance in series of experiments in computer vision databases and face recognition databases . The validity of this novel combination approach of pattern mining neural systems has been investigated when the first stage neural pattern mining systems involved correspond to different Feature Extraction Methodologies (FEM) for either shape or face classification. The experimental study illustrates that such an approach, integrating higher order features through similarity analysis of a committee of the same pattern mining instances (cases) and a second stage neural pattern mining integration system, outperforms other combination methods, like voting combination schemes as well as single neural network pattern mining systems having as inputs all FEMs derived features. In addition, it outperforms hierarchical combination methods non performing integration of cases through similarity analysis

Speaker
Biography:

Gregor is working in the games industry since 1999 for studios like, Crytek, Westka, Web.de AG, Acclaim, Ninja Theory, Visual Science, proper games ltd, MAXStudios.de and Chronos-Games on AAA games like FarCry and others. He has also been involved in research and development projects with Mozilla related to WebGL and HTML 5. He received his MSC in ISM while already working in the industry for 10 years and his dissertation has been about business models in the games industry.

Abstract:

We have seen over the past four years the big players in the middleware and game engine licensing and development business change in a drastic way. Rather than charging substantial amounts of money in relation to the game developers budget up front they have adopted per developer seat license based as well as subscription models and in some cases even no charges at all but rather offer a revenue share model. This change has also extended beyond the core game engine and direct development tools section. But what does this mean beyond the obvious potential to save money during the production time of a project. Well this question becomes significant when looking at a game project over its entire lifetime. For example, the release of a game based on a license where the middleware or engine provider gets the revenue share in return for not paying any licensing fees during the development has a significant impact into the relation between a studio and its publisher. Beyond those implications that are directly related to budget and monetisation of the projects this also has a drastic effect for the entire market landscape. Previously the game engine and middleware providers acted as gatekeepers that prevented small teams with low budgets from accessing the up-to-date technology used by the big studios. This has now changed and can be seen as a democratization of the technology side of the game development business. So the question becomes who will benefit from this change in what way.

Jamie Denham

Sliced Bread Animation, UK

Title: Animation: The Reality of Emotion
Speaker
Biography:

Jamie studied animation on the acclaimed course at Farnham in Surrey and has been in the field of animation production for over 18 years during which time he has worked on a number of broadcast and commercial productions. He is now Managing Director of the London based animation studio, Sliced Bread Animation. They offer 2D and 3D animation, illustration and design for digital media projects, including Virtual reality and Augmented Reality. Over the last 13 years they have successfully applied animation to all media platforms - from motion graphics, title sequences and TV commercials to online animation series and events.

Abstract:

Animation has long played a integral part in generating an emotional response to cinematic storytelling but now the mold has become more fragmented, and we are beginning to immerse ourselves into virtual worlds, and distort our own. What role then does animation play in manipulating and managing emotional levels? As humans we interact through connection, and ways of establishing that connection can be joy, sadness and anger, is there a danger they are enhanced through audio and visual manipulation in the virtual space. Is there an onus on the auteur to show restraint and responsibility within cognitive stimulus? In my talk I plan to explore the connective aspects of the emotional states, the fabric of storytelling and the virtual constructs we begin to enter.

Speaker
Biography:

Oscar Koller is a doctoral student researcher in the Human Language Technology and Pattern Recognition Group led by Prof. Ney at RWTH Aachen University, Germany. He joined the group in 2011 and follows a dual supervision by Prof. Bowden and his Cognitive Vision group at University of Surrey, UK, where he spent 12 months as a visiting researcher. His main research interests include sign language and gesture recognition, lip reading, speech recognition and machine translation.

Abstract:

Observing the nature inspires to find answers to difficult technical problems. Gesture recognition is a difficult problem and sign language is its natural source of inspiration. Sign languages, the natural languages of the Deaf, are as grammatically complete and rich as their spoken language counterparts. Science discovered sign languages a few decades ago and research promises new insights into many different fields from automatic language processing to action recognition and video processing. In this talk, we will present our recent advances in the field of automatic gesture and sign language recognition. As sign language conveys information through different articulators in parallel, we process it multi-modally. In addition to hand shape this includes hand orientation, hand position (with respect to the body and to each other), hand movement, the shoulders and the head (orientation, eye brows, eye gaze, mouth). Multi-modal streams occur partly synchronous, partly asynchronous. One of our major contributions is an approach to training statistical models that generalise across different individuals, while only having access to weakly annotated video data. We will focus on a new approach to learning a frame-based classifier on weakly labelled sequence data by embedding a CNN within an iterative EM algorithm. This allows the CNN to be trained on a vast number of example images when only loose sequence level information is available for the source videos. Although we demonstrate this in the context of sign language, the approach has wider application to any video recognition task where frame level labelling is not available.

Gayane.Shalunts

Sail Labs Technology, Austria

Title: Segmentation of Building Facade Tower
Speaker
Biography:

Gayane Shalunt has completed her PhD in Computer Vision from Institute of Computer Aided Automation. She is currently working as a Software Engineer and researcher at Sail Labs Technology at Austria since May 2013.

Abstract:

Architectural styles are phases of development that classify architecture in the sense of historic periods, regions and cultural influences.The article presents the first approach, performing automatic segmentation of building facade towers in the framework of an image-based architectural style classification system.The observed buildings, featuring towers, belong to Romanesque, Gothic and Baroque architectural styles. The method is a pipeline unifying bilateral symmetry detection, graph-based segmentation approaches and image analysis and processing technique. It employs the specific visual features of the outstanding architectural element tower -vertical bilateral symmetry, raising out of the main building and solidity. The approach is robust to high perspective distortions. It comprises two branches, targeting facades with single and double towers correspondingly. The performance evaluation on a vast number of images reports extremely high segmentation precision.

Speaker
Biography:

The introduction of animation techniques in film production such as motion capture, virtual reality, modelling and simulation, has revolutionized the entire film industry. We (National Centre for Computer Animation, NCCA) as the No.1 UK research and education base for computer animation, are endeavouring to bring these state-of-the-art animation techniques into health industry, benefit more people by improving the efficiency and efficacy of healthcare services. Since 1989, the NCCA (winner of the UK – Queen’s Anniversary Prize in 2012), has been at the forefront of computer animation education and research in the UK, and our graduates have made a global impact upon the film industry such as the achievement on the films of Gravity, Inception and Avatar. We have prioritised multidisciplinary applications of our computer animation technology from film production into other fields, especially the Digital health area. In the past five years, we have successfully developed a few medical projects cooperated with doctors and local hospitals. For example, “Augury project”: a sophisticated colorectal surgery simulator collaborated with the consultant surgeons from Bournemouth & Poole NHS; “Neuravatar”: an intelligent virtual avatar to guide GPs to make neurological diagnosis in their clinical practice, guided by Dr. Rupert Page, funded by AHSN; “Digital Psychiatrist”: a facial and emotional recognition system to perform Mental State Examination based on videos and images of patients, collaborated with Dr. Wai Chen.

Abstract:

Xiaosong Yang is currently a Principal Academician at the National Centre for Computer Animation, Bournemouth University, United Kingdom. He has produced more than 60 peer reviewed publications that include international journal articles and conference papers. He has secured over 10 research grants from European Commission, Wessex AHSN, British Academy, Leverhulme, Department for Business, Innovation & Skills (UK), Higher Education Innovation Fund, etc. He is a member of the International Program Committee for several international conferences, and reviewer for many peer reviewed journals. He has given several invited talks and keynote presentations internationally.

Cyrille Gaudin

University of Toulouse Jean Jaurès, France

Title: The use of video technology in teacher training
Speaker
Biography:

A review of the research literature reveals that video technology has been increasingly employed over the past 10 years in the training of teachers, in all subject areas, at all grade levels, and all over the world (Gaudin & Chaliès, 2015). The literature presents three main reasons for the growing reliance on videos in teacher training. First, videos give teachers greater access to classroom events than classic observation without sacrificing “authenticity.” This method thus constitutes a choice “artifact of practice” that creates a link between the traditional theoretical education at the university and classroom practice. Second, technical progress has greatly facilitated video use. Digitalization, vastly improved storage capacities, and sophisticated software have all contributed to the development of video in the framework of professional practice analysis. Last, video technology is increasingly used as a means to facilitate the implementation of institutional reforms. The principal aim of this communication is first to present the different possible uses of video technology for teacher training, and then identify new avenues for innovation.

Abstract:

Cyrille Gaudin has completed his PhD in education sciences from University of Toulouse Jean Jaurès. He is the Head of a Master Program in the Toulouse High School of Teaching and Education. He has helped to organize international seminars for The Consortium of Institutions for Development and Research in Education in Europe (CIDREE). He is also a member of the European Association for Research in Learning and Instruction (EARLI). He has recently published a literature review about the use of video technology in teacher training in the Educational Research Review.

Speaker
Biography:

Gholamreza Anbarjafari received his B.Sc., M.Sc., and Ph.D. degrees from Department of Electrical and Electronic Engineering at Eastern Mediterranean University, North Cyprus – Turkey, in 2007, 2008, 2010 respectively. He has been working in the field of image processing and is currently focusing in many research works related to multimodal emotion recognition, image illumination enhancement, super resolution, image compression, watermarking, visualization and 3D modelling, and computer vision for robotics. He is involved in many national and international projects. He is currently head of iCV Research Group at University of Tartu and is working as an Assoc. Prof. in Institute of Technology.

Abstract:

Internet has affected our everyday life drastically. An extensive amount of information is continuously exchanged over the Internet which raises numerous security concerns. Issues such as content identification, document and image security, audience measurement, ownership and copyright among others can be settled by the use of digital watermarking. In this talk, robust and imperceptible non-blind color image watermarking algorithm is discussed, which benefit from the fact that watermark can be hidden in different color channel which results in to further robustness of the proposed technique to attacks. Given method uses some algorithms such as entropy, discrete wavelet transform, Chirp z-transform, orthogonal-triangular decomposition and Singular value decomposition in order to embed the watermark in a color image. As the values of main diagonal of R matrix in QR decomposition and also the singular values obtained via SVD are very big those changes caused by aforementioned attack will not change those values significantly. Most facmous signal processing attacks will be discussed and the robustness of the aforementioned technique on those techniques will be explained.

Speaker
Biography:

Menna Sadek has received a B.Sc. (Honor) in 2009 in Computer Science and a M.Sc. in 2015 in Computer science from the Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt. She currently works as a teaching assistant at Basic Sciences department. She has published three papers in reputed international journals and local conferences. Her research interests includes: Steganography, encryption and Information Security.

Abstract:

Steganography is the art and science of secret communication. Modern cover types can take different forms. Nowadays, video streams are transmitted more frequently on internet websites imposing a larger practical significance on video steganography. A video can be considered a sequence of images. Information hiding in video has a variety of techniques. Although great efforts were done in developing these techniques, but most of them suffer from intolerance to video processing attacks and lack any intelligent processing of the cover video. Adaptive video steganography was recently proposed in the literature. It aims to achieve better quality of the stego-video by intelligently processing the cover according to some criteria. This helps to identify the best regions for data hiding, referred to as Regions Of Interest (ROI). A recent research showed that data embedding in human skin regions as ROI yield better imperceptibility and increase the hiding robustness. In this work, a blind adaptive algorithm for robust video steganography is proposed. The proposed algorithm adaptively processes the cover video and hides data in its human skin regions. A skin map is created for each frame using a fast adaptive skin detection method. Then a blocking step is applied on the produced skin-map converting it into a skin-block-map for discarding the errorprone skin pixels and enhancing the extraction quality. Next, the skin-blockmap is used for guiding the embedding procedure. Finally, the secret bits are embedded in the detail coefficients of the red and blue components of each frame using a wavelet quantization-based algorithm for achieving robustness against MPEG-4 compression. Hiding capacity, imperceptibility, extraction accuracy and robustness against MPEG-4 compression of the proposed algorithm were tested. Results show the high imperceptibility of the proposed algorithm and its robustness against MPEG-4 compression.

Biography:

Image analysis is a powerful tool for solving different engineering ‎problems in ‎‎particle technology. It is the process of extracting important information from the digital image. Different image analysis methods have been used for the studies of rotary drums; mainly can be classified as manual and automated methods. The manual method is depending on using an appropriate software (Like ImageJ software) for the manual selection of the material. The automated method is a combination of using ImageJ software and Matlab image processing toolbox. In the present research the two methods were used and compared for studying flighted rotary drums under the variance of some operating parameters. The varied parameters are; number of flights used (12 and 18) and rotational speeds (from 1 to 5 rpm). The comparisons between the two methods revealed that the manual method is more reliable and of preciseness. However it needs much time consuming compared to the automated one especially in light of numerous photographs to be analyzed.

Abstract:

Dr.-Ing. Mohamed A. Karali, is a lecturer at the Mechanical Engineering Department, Faculty of Engineering and Technology, Future University in Egypt, specializing in Mechanical Power Engineering. Dr. Karali received his Ph.D. degree from the Institute of Fluid Dynamics and Thermodynamics, Otto von Geuricke University Magdeburg, Germany in 2015. Where he participated as a lecturer for undergraduate and post graduate students and joined projects with industry. He had received his Bachelor of Science and Masters in Mechanical Engineering in 2001 and 2007, respectively, from the Faculty of Engineering El-Mataria, Helwan University in Cairo, Egypt. His research area recently interested in image processing techniques and its applications in rotary drums studies.

Speaker
Biography:

Ab Al-Hadi Ab Rahman obtained his Ph.D. degree from the École Polytechnique Fédérale de Lausanne, Switzerland in 2013, M.Eng. degree from the Universiti Teknologi Malaysia in 2008, and B.S. degree from the University of Wisconsin-Madison, USA in 2004. His current research is mainly focused on the algorithms and architectures for the new HEVC/H.265 video coding standard. He has authored and co-authored more than 25 journals and conference papers in the related field. He is also a member of IEEE and the Board of Engineers Malaysia. He is currently a lecturer at the Universiti Teknologi Malaysia.

Abstract:

The new HEVC/H.265 video coding standard was launched in November 2013 that promises improved compression efficiency by more than 50% based on the existing media files. With it however, comes a multitude of challenges on both the encoding and decoding side. One of the major challenges that we are currently tackling is the computational complexity which leads to about 400% delay in video encoding compared to the current AVC/H.264 standard. In retrospective, a raw video from 100 frames (i.e. with 4 seconds playback) with a CIF resolution takes about 30 minutes to encode in the standard using the reference software; this latency in encoding can be extrapolated to determine the time it takes to encode a UHD video that consists of 180,000 frames--which could take at least a day to encode. In this talk, I will present some of the new algorithms that we have developed and thoroughly tested, which could reduce the encoding time by at most 72%, with neglible loss in coding efficiency and video quality degradation. It proposes a solution based on three key areas of HEVC; 1) a new motion estimation algorithm to quickly obtain the motion vectors, 2) a new inter-prediction mode selection scheme to quickly determine the optimal PU mode, and 3) a new intra-prediction technique to quickly find the optimal CU mode. The application of these algorithms would enhance the performance of the HEVC compression standard and make it adaptable to mobile and hand held devices with resource constraint.

Speaker
Biography:

Kun Guo has completed his PhD in Cognitive Neuroscience from Shanghai Institute of Physiology, Chinese Academy of Sciences, and postdoctoral training from University of Oxford and University of Newcastle. He is currently a reader and lead of Perception, Action and Cogniton researcgh group in School of Psychology at Universoty of Linclon. His research is focused on visual information processing and its relation with environmental statistics and human adaptive behavior. He has published more than 50 papers in the leading academic journals and has been serving as an academic editor of PloS One.

Abstract:

A central research question in natural vision is how to allocate fixation to extract informative cues for scene perception. With high quality images, psychological and computational studies have made significant progress to understand and predict human gaze allocation in scene exploration and understaidng. However, it is unclear whether these findings can be generalised to degraded naturalistic visual inputs. Here we combined psychophysical, eye-tracking and computational approaches to systematically examine the impact of image resolution and image nosie (Gaussian low-pass filter, circular averaging filter, Additive Gaussian white noise) on observers’ gaze allocation and subsequent scene perception when inspecting both man-made and natural scenes. Compared with high quality images, degraded scenes would reduce the perceived image quality and affect the scene categorization, but this deterioration effect was scene content-dependent. Distorted images also attracted fewer numbers of fixations but longer fixation durations, shorter saccade distance and stronger central fixation bias. The impact of image noise manipulation on gaze distribution was mainly determined by noise intensity rather than noise type, and was more pronounced for natural scenes than for man-made scenes. We further compared four high performing visual attention models in predicting human gaze allocation in degraded scenes, and found that model performance lacked human-like sensitivity to noise type and intensity, and was considerably worse than human performance measured as inter-observer variance. Our results indicate a crucial role of external noise intensity in determining scene-viewing gaze behaviour and scene understanding, which should be considered in the development of realistic human-vision-inspired attention models.

Pascal Lorenz

University of Haute Alsace, France

Title: Architectures of Next Generation Wireless Networks
Speaker
Biography:

Pascal Lorenz received his M.Sc. (1990) and Ph.D. (1994) from the University of Nancy, France. Between 1990 and 1995 he was a research engineer at WorldFIP Europe and at Alcatel-Alsthom. He is a professor at the University of Haute-Alsace, France, since 1995. His research interests include QoS, wireless networks and high-speed networks. He is the author/co-author of 3 books, 3 patents and 200 international publications in refereed journals and conferences. He was Technical Editor of the IEEE Communications Magazine Editorial Board (2000-2006), Chair of Vertical Issues in Communication Systems Technical Committee Cluster (2008-2009), Chair of the Communications Systems Integration and Modeling Technical Committee (2003-2009) and Chair of the Communications Software Technical Committee (2008-2010)Pascal Lorenz received his M.Sc. (1990) and Ph.D. (1994) from the University of Nancy, France. Between 1990 and 1995 he was a research engineer at WorldFIP Europe and at Alcatel-Alsthom. He is a professor at the University of Haute-Alsace, France, since 1995. His research interests include QoS, wireless networks and high-speed networks. He is the author/co-author of 3 books, 3 patents and 200 international publications in refereed journals and conferences. He was Technical Editor of the IEEE Communications Magazine Editorial Board (2000-2006), Chair of Vertical Issues in Communication Systems Technical Committee Cluster (2008-2009), Chair of the Communications Systems Integration and Modeling Technical Committee (2003-2009) and Chair of the Communications Software Technical Committee (2008-2010)

Abstract:

Emerging Internet Quality of Service (QoS) mechanisms are expected to enable wide spread use of real time services such as VoIP and videoconferencing. The "best effort" Internet delivery cannot be used for the new multimedia applications. New technologies and new standards are necessary to offer Quality of Service (QoS) for these multimedia applications. Therefore new communication architectures integrate mechanisms allowing guaranteed QoS services as well as high rate communications. The service level agreement with a mobile Internet user is hard to satisfy, since there may not be enough resources available in some parts of the network the mobile user is moving into. The emerging Internet QoS architectures, differentiated services and integrated services, do not consider user mobility. QoS mechanisms enforce a differentiated sharing of bandwidth among services and users. Thus, there must be mechanisms available to identify traffic flows with different QoS parameters, and to make it possible to charge the users based on requested quality. The integration of fixed and mobile wireless access into IP networks presents a cost effective and efficient way to provide seamless end-to-end connectivity and ubiquitous access in a market where the demand for mobile Internet services has grown rapidly and predicted to generate billions of dollars in revenue.

Syed Afaq Ali Shah

The University of Western Australia, Australia

Title: Deep Learning for Image set based Face and Object Classification
Speaker
Biography:

I shall present a novel technique for image set based face/object recognition, where each gallery and query example contains a face/object image set captured from different viewpoints, background, facial expressions, resolution and illumination levels. While several image set classification approaches have been proposed in recent years, most of them represent each image set as a single linear subspace, mixture of linear subspaces or Lie group of Riemannian manifold. These techniques make prior assumptions in regards to the specific category of the geometric surface on which images of the set are believed to lie. This could result in a loss of discriminative information for classification. The proposed technique alleviates these limitations by proposing an Iterative Deep Learning Model (IDLM) that automatically and hierarchically learns discriminative representations from raw face and object images. In the proposed approach, low level translationally invariant features are learnt by the Pooled Convolutional Layer (PCL). The latter is followed by Artificial Neural Networks (ANNs) applied iteratively in a hierarchical fashion to learn a discriminative non-linear feature representation of the input image sets. The proposed technique was extensively evaluated for the task of image set based face and object recognition on YouTube Celebrities, Honda/UCSD, CMU Mobo and ETH-80 (object) dataset, respectively. Experimental results and comparisons with state-of-the-art methods show that our technique achieves the best performance on all these datasets.

Abstract:

Syed Afaq Ali Shah has done PhD in 3D computer vision (feature extraction, 3D object recognition, reconstruction) and machine learning in the School of Computer Science and Software Engineering (CSSE), University of Western Australia, Perth. He was holder of the most competitive Australian scholarships, which include Scholarship for International Research Fee (SIRF) and Research Training Scheme (RTS). He has published several research papers in high impact factor journals and reputable conferences. Afaq has developed machine learning systems and various feature extraction algorithms for 3D object recognition. He is the reviewer for IEEE Transactions on Cybernetics, Journal of Real Time Image Processing and IET Image Processing journal.

Biography:

Qin-Zhen Guo received the B. S. degree in Automation from Hunan University in 2011. He is currently pursuing the Ph.D. degree at the High-tech Innovation Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include image retrieval, machine learning, and pattern recognition.

Abstract:

Fast approximate nearest neighbor search techniques play an important role in large scale database search. Hashing based methods which convert the original data into binary codes have two advantages, high retrieval efficiency and low memory cost. But due to the thick boundary in Hamming space, the hashing based methods can not achieve ideal retrieval precision. Vector quantization, especially product quantization (PQ), based methods, which use a large codebook to quantize the data to reduce the cardinality of the original data space, are another class of approximate nearest neighbor search methods. There are also two advantages with PQ based methods, low memory cost and high retrieval precision. However, compared to hashing based methods, the retrieval efficiency of PQ based methods is lower. Considering the strengths and weaknesses of hashing and PQ methods, we have proposed a hierarchical method which combines hashing based methods and PQ based methods. Since the hashing methods have high retrieval efficiency, firstly, we use hashing methods to filter the obviously distant data. Then use PQ based methods to search the data retrieved by hashing methods since they have better retrieval precision. Experiments have shown that in large scale database, the hierarchical method can achieve better results than hashing based methods and higher retrieval efficiency than PQ based methods.

Yao-Jen Chang

Industrial Technology Research Institute (ITRI), Taiwan

Title: Uni-/Bi-/Multi-Color Intra Modes for HEVC Screen Content Coding
Speaker
Biography:

Yao-Jen Chang received the M.S. and Ph.D. degrees in communication engineering from National Central University, Taiwan, in 2006 and 2010, respectively. Since 2011, he has been a researcher in Industrial Technology Research Institute (ITRI), Taiwan. He has published over 30 refereed papers and filed 29 patents in multiple engineering fields from communications to video coding technologies. He has actively contributed over 50 proposals to joint meetings of ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) for developing the H.265/HEVC and its extensions. His current research interests include H.265/HEVC, Screen Content Coding, HEVC Encoder Optimization, Future Video Codec, Machine Learning, and Adaptive Filtering.

Abstract:

High Efficiency Video Coding (HEVC) Screen Content Coding (SCC) has been standardized for the screen-captured contents. Because many areas are composed of the texts and lines featuring non-smooth textures, the traditional intra prediction is not suitable for those areas. Several newly coding tools, such as the palette mode, intra block copy, string matching mode, uni-color intra mode, bi-color intra mode, and multi-color intra mode, were developed to address the issues during the joint meetings of MPEG and VCEG from 2014 to 2015. ITRI had contributed many techniques over the uni-/bi-/multi-color intra modes to improve compression performance, and coordinated the core experiment activities of the uni-/bi-/multi-color intra modes to study the performance under the HEVC SCC standard draft text 1. The concept of the color intra modes is to select a few samples out of the neighboring coding units to predict the pixels inside the current coding unit. In this talk, I will give you an elaboration over the uni-/bi-/multi-color intra modes.

Speaker
Biography:

Gang Wu is a Professor of Department of Mathematics, School of Science, China University of Mining and Technology. He received the B.S.degree from School of Mathematics, Shandong University, in 1998, M.S.degree from Department of Applied Mathematics, Dalian University of Technology, in 2001, and Ph.D. degree from Institute of Mathematics, Fudan University, in 2004. His current research mainly focuses on large sparse matrix computations, pattern recognition and data mining.

Abstract:

Recently, matrix-based methods have gained wide attentions in pattern recognition and machine learning communities. The generalized low rank approximations of matrices (GLRAM) and the bilinear Lanczos components algorithm (BLC) are two popular algorithms that treat data as the native two-dimensional matrix patterns. However, these two algorithms often require heavy computation time and memory space in practice, especially for large scale problems. In this talk, we propose inexact and incremental bilinear Lanczos components algorithms for high dimensionality reduction and image reconstruction. We first introduce the thick-restarting strategy to the BLC algorithm, and present a thick-restarted Lanczos components algotithm (TRBLC). In this algorithm, we use the Ritz vectors as approximations to dominant eigenvectors instead of the Lanczos vectors. In our implementation, the iterative matrices are not formed nor stored explicitly, thanks to the characteristic of the Lanczos procedure. Then, we explore the relationship between the reconstruction error and the accuracy of the Ritz vectors, so that the computational complexities of eigenpairs can be reduced significantly. As a result, we propose an inexact thick-restarted Lanczos components algorithm (Inex-TRBLC). Moreover, we investigate the problem of incremental generalized low rank approximations of matrices, and propose an incremental and inexact TRBLC algorithm (Incr-TRBLC). Numerical experiments illustrate the superiority of the new algorithms over the GLRAM algorithm and its variations, as well as the BLC algorithm for some real-world image reconstruction and face recognition problems.

Speaker
Biography:

Takashi Nakamura has completed his PhD at the age of 28 years from Kobe University. He is the professor of media studies in Faculty of Humanities in Niigata University. He has published more than 20 papers (including ones in Japanese) and two books (one as a singular author and the other as a sigular editor) in Japanese. He is an editorial board member of Annals of Behavioural Science.

Abstract:

This presentation focused on the action of looking at a mobile phone display as a type of nonverbal behavior/communication and compared it cross-culturally. The diversity of nonverbal behavior/communication was considered to be caused by the difference between Western and non-Western cultures. The questionnaire was conducted in three countries (the USA, Hong Kong and Japan), and a total of 309 subjects participated. The participants were required to record their opinions for the action according to the situation with ‘co-present’ familiar persons. The analysis declared that the difference between the USA and Japan was more pronounced as the relationship with the ‘co-present’ person was more intimate. The results of the Hong Kong sample were intermediate between those of the other two countries. The diversity was discussed in terms of independent/interdependent self in the perspective of cultural comparison and of mobile phone usage. The analysis revealed that the action as a form of nonverbal behavior/communication has functioned in human relationships and has been deeply embedded into culture in the mobile phone era.

Wang Xufeng

Air Force Engineering University School of Aeronautics and Astronautics Engineering, China

Title: Real-time Drogue Measurement for Autonomous Aerial Refueling Based on Computer Vision
Speaker
Biography:

Wang Xufeng received the B.S. and M.S. degrees from Air Force Engineering University in 2011 and 2013, respectively, where he is currently pursuing the Ph.D. degree. He has been a visiting scholar with the Department of Computer Science and Technology, Tsinghua University since 2014. His research interests include autonomous aerial refueling, computer vision and deep learning.

Abstract:

Autonomous aerial refueling (AAR) has been playing an increasingly important role in improving the capacity of the aircraft. During the docking phase of the probe-drogue AAR, one of the key problems is the drogue measurement which includes drogue detection and recognition, drogue spatial locating and drogue attitude estimation. To solve this problem, a novel and effective method based on computer vision is presented. For the drogue detection and recognition, considering the safety and robustness to the drogue diversity in the changing environmental conditions, the high reflection red-ring-shape feature instead of a set of infrared light emitting diodes (LEDs) is set on the parachute part of the drogue to achieve optimal performance, by using computer vision with prior domain knowledge incorporated. For the drogue spatial locating and drogue attitude estimation, in order to ensure the accuracy and real-time performance of the entire system, a monocular vision method is designed based on the camera calibration model with camera lens distortion considered, in view of its simple structure and high operation speed. Experiments demonstrate the effectiveness of the proposed method and a practical implementation that considers the effect of airflow has been provided. The results of the drogue measurement are analyzed, together with a comparison between the proposed method and the competing methods. The results show that the proposed method can realize drogue measurement efficiently and satisfy the requirement of AAR.

Balamuralidhar P

Tata Consultancy Services, India

Title: Low Altitude Aerial Vision
Biography:

Balamuralidhar P has completed his PhD from Aalborg University, Denmark. He is a Principal Scientist and Head of TCS Innovation labs Bangalore. He leads several research topics related cyber physical systems including aerial sensing, cognitive computer vision, sensor informatics, security & privacy. He has published more than 90 papers in reputed journals and has over 20 patents to his credit

Abstract:

Low altitude aerial Imaging and analytics is getting much business interest these days. This is due to the availability of affordable unmanned aerial vehicles and miniaturized sensors for cost effective spatial data collection. Applications include inspection of infrastructures and spatially distributed systems such as power lines, wind farms, pipelines, railways, buildings, farms and forests. Predominantly they use vision based sensing. However multi spectral, hyperspectral and laser based mapping are also used in certain cases. Advances in image processing and computer vision research coupled with high performance embedded computing platforms are generating interesting possibilities in this area. Traditional techniques along with deep learning and cognitive architectures are being explored to provide automatic analysis and assessment of the huge data acquired. In this talk some of our experiences and learnings on computer vision applications in related areas will be presented.