Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 4th Global Summit and Expo on Multimedia & Artificial Intelligence Rome, Italy.

Day 2 :

Keynote Forum

Alfonso Iniguezr

Swarm Technology, USA

Keynote: Distributed artificial intelligence in robotics

Time : 10:00-10:45

Conference Series Multimedia 2018 International Conference Keynote Speaker Alfonso Iniguezr photo
Biography:

Alfonso Iniguez is the Founder of Swarm Technology, which is a company that focuses on intent-based computing and swarm robotics. He has published research papers in the areas of distributed artificial intelligence, computer modeling and design verification. Using inspiration from ants and octopuses, he originated the five principles of swarm intelligence. His patented technology enables dynamic addition of processors for uninterrupted distributed processing within intent-based IoT edge processing and swarm robotics. He has worked in diverse engineering positions within: Motorola, Free scale, Integrated Device Technology, and Microchip Technology. He holds a MS degree in Electrical Engineering from the University of Arizona, and a BS degree in Computer Engineering from the Universidad Autónoma de Guadalajara, Mexico.

 

Abstract:

Background: Various companies and academic institutions are actively researching the field of swarm robotics. A survey on the topic reveals two distinct approaches: A. Each swarm member behaves autonomously without a central computer e.g. Harvard University’s 1024 Robot Swarm. B. Each swarm member is controlled through a central computer, e.g. Intel’s drones showcased by Disney’s light show and Super Bowl 2017.

 

Description of the Problem: In the case of A, the system falls into the realm of flocking behavior. This system suffers from: 1. Awareness: members are not aware of their available capabilities. 2. Autonomy: members must be told what to do. 3. Solidarity: members lack the ability to accomplish a mission using collective intelligence. In the case of B, members are slaves in a system controlled by a central computer. This system suffers from: 4. Expandability: members cannot be added dynamically. 5. Resiliency: the system lacks the ability to self-heal when members are removed.

 

Description of the Solution: Alfonso Iniguez is the first researcher to design an architecture that complies with the five principles of swarm intelligence: 1. Awareness: each member is aware of its available capabilities. 2. Autonomy: each member operates autonomously; this is essential to self-coordinate allocation of labor. 3. Solidarity: each member continuously volunteers its available capabilities until the mission is accomplished. 4. Expandability: members can be dynamically aggregated ad infinitum. 5. Resiliency: members can be removed while the system self-heals ad infinitum. The proposed solidarity cell architecture goes beyond flocking behavior and spectacular light shows. The technology will enable unmanned ground-air reconnaissance missions, precision farming, manufacturing robots, autonomous fleet management, and interplanetary exploration.

 

Keynote Forum

Daphne Economou

Westminster University, UK

Keynote: Trends and challenges in virtual reality technology

Time : 10:45-11:30

Conference Series Multimedia 2018 International Conference Keynote Speaker Daphne Economou photo
Biography:

Daphne Economou is a Senior Lecturer at the Department of Computer Science, Faculty of Science and Technology at the University of Westminster since January 2006. She has a PhD in Virtual Reality Systems Design from the Manchester Metropolitan University, a MA in Design for Interactive Media (Multimedia) from Middlesex University and she is a Senior Fellow of Higher Education Academy. She has published a long list of journal papers, peer-reviewed international conference papers and she served as program committee member in several international conferences. She has industrial experience as Human Factors Engineer at Sony Broadcast and Development Research Labs, Basingstoke UK and she is member of British Computer Society, IEEE and British Interactive Media Association (BIMA). She has been involved in the programme committee of several international conferences and she has organized and chaired workshops in IEEE international conferences related to serious games.

 

Abstract:

For the last four decades computer science researcher and industry have been working intensively to develop technology that would revolutionize the human experience interacting with computers, as well as with each other focusing their effort and hopes on virtual reality, augmented reality and mixed reality to realize this vision. Nowadays with the advances of head mounted displays, mobile and networking technology, wearables, smart environments, artificial intelligence and machine learning the required infrastructure falls in place to support the seamless human interaction in VR required to facilitate rich user experience. Domains of applications with great impact of VR span from education and training, culture, e-commerce, tourism, healthcare, entertainment and new forms of broadcasting. However, the new advances of this technology and the application requirements create new challenges in terms of interaction styles and design approaches that need to be adopted to ensure that users feel fully immersed in the computer simulated environment or the mixed reality environment they interact and fully engaged in the activities they participate. There is a need for a user centered design framework and design guidelines to support VR designers to create simulating environments and applications and to drive further VR technological development. The key note speech will present the state of the art of VR technology, it will discuss the virtual user experience challenges that derive from the current trends in VR and it will present some attempts of the serious games at Westminster Research Group (SG@W) to develop design guidelines for virtual human representation in VR and for the use of gamification as a design element to enhance user engagement in VR.

 

Break: Networking & Refreshments 11:30 -11:50 @ Foyer
Conference Series Multimedia 2018 International Conference Keynote Speaker Aniello R Patrone  photo
Biography:

Aniello R Patrone is a Computer Vision Engineer at Anyline GmbH, Vienna, Austria. A Computer Scientist by education, he completed Master Studies in Computer Vision in Naples, Italy, where he worked for the development of a marketed eye-tracker solution. His curiosity brought him to pursue a PhD in Computer Vision at the Computational Science Center of the University of Vienna, Austria. He has a proven record of publications in scientific journals and presentations at international conferences. Stepping out of the academic environment into the industry, he worked on video surveillance systems and recently joined the innovative company, Anyline GmbH.

 

Abstract:

The path of a research idea becoming a market product for customers´ use is filled with unexpected events and challenges. This presentation will allow you to have a look at the evolution of Machine Learning in the last ten years and at the technological shift from the use of external devices to mobile devices for document scanning purposes. The story the Document Scanner developed at Anyline GmbH in Vienna, Austria will start with an exemplary initial approach based on pure computer vision, analyzing limitations and real-life issues. It will continue with the next step, the deep learning approach in which some interesting CNN architectures will be presented and analyzed. Finally, a closer look will be taken to how to define image quality and how to implement it in a marketed product.

 

  • Virtual reality | Neural Networks | Artificial Intelligence | Image Processing | Computer Vision & Pattern Recognition | Multimedia Networking
Location: Olimpica 2
Speaker

Chair

Daphne Economou

Westminster University, UK

Session Introduction

Hector Perez Meana

National Polytechnic Institute, Mexico

Title: Face expression recognition in constrained and unconstrained environments
Speaker
Biography:

Hector Perez Meana received his PhD degree in Electrical Engineering from the Tokyo Institute of Technology, Tokyo, Japan, in 1989. He is the Dean of the Graduate Studies and Research Section of the Mechanical and Electrical Engineering School, Culhuacan Campus, of the National Polytechnic Institute of Mexico. In 1991 he received the IEICE excellent Paper Award, and in 2000 the IPN Research Award and the IPN Research Diploma. In 1998 he was Chair of the ISITA’98, and in 2009 the General Chair of The IEEE Midwest Symposium on Circuit and Systems (MWSCAS). He has published more than 150 papers in indexed journals and two books. He also has directed 20 PhD theses. He is a senior member of the IEEE, member of the IEICE, The Mexican Researcher System and The Mexican Academy of Science. His principal research interests are adaptive systems, image processing, pattern recognition, watermarking and related fields.

 

Abstract:

The facial expression recognition (FER) systems have been used to recognize the mood of the persons. Because to determine the mood of a given person may be important in several practical applications; several efficient algorithms have been proposed to this end. Most of them achieve high recognition rates under controlled conditions, of lighting and position of the person with respect to the camera. Most FER system uses the Viola-Jones algorithm for face detection in both, images and video frames. However, because for FER systems the eyes and mouth regions provide the most relevant information, some segmentation schemes must be used to estimate the ROI used for feature extraction. Besides ROI estimation, the face orientation related to the camera is another important issue, because if the person is not looking straightforward to the camera, partial occlusion of the face may occur; or the presence of shadows due to poor illumination conditions. To reduce the problems described above, we propose an algorithm that is able to detect the face orientation in the frame under analysis, such that only if the face is perpendicular to the camera, the ROI is estimated. After the ROI estimation each region is segmented into a set of N×M blocks to get the feature vector using the modal value. The resulting features matrix is then applied to a PCA and LDA for dimensionality reduction. The proposed algorithm was trained using the KDEF data base which consists of 490 images which are divided into 7 facial expressions (Afraid, Angry, Disgusted, Happy, Sad, Surprise and Neutral) of 70 people. Finally, the proposed system is tested using the HOHA database which consists of 150 videos of 32 movies. The evaluation results show that the proposed system provides recognition rates of about 90%.

 

 

Break: Lunch Break 13:00-14:00 @ Hotel Restaurant
Speaker
Biography:

Benjamin Seide has an expertise in Animation, Visual Effects and Virtual Reality. His visual effects work as animation practitioner contributed to international feature films such as Roman Polanski's Oliver Twist, Wim Wender's Don't Come Knocking and Martin Scorsese’s Hugo. His research focuses on interdisciplinary collaboration of art and technology, immersive media experiences such as 360Ëš films, 3D stereoscopy and Virtual Reality.

Abstract:

Cultural heritage commonly utilizes laser scanning, CGI, 360- degree imagery and photogrammetry aiming to create photorealistic and accurate representations of historical environments. The goal of being as accurate and realistic as possible has not been fully accomplished yet, but considering the rate of improvement, virtual environments and augmented extensions will become indistinguishable from reality. A countermovement of artists and researchers create artistic impressions of virtual environments, not aiming for photorealistic perfection but to add an interpretation to the debate how the deeper meaning beyond the visual representation can be best represented. Philosopher Merleau-Ponty quotes Rodin, “It is the artist who is truthful, while the photograph lies; for, in reality, time never stops.” This research project investigates the possibilities of impressionism in virtual reality by exploring and comparing the effect of stylized interpretations to photorealistic representations to attempted but failed photorealism in virtual reality environments. I propose that the meaning of heritage is not just the form of a heritage site but could be understood as different layers. Interactive and immersive applications, such as augmented and virtual reality applications, enable us to explore alternative layers beyond the basic image acquisition. These layers are commonly understood as additional information layers, from superimposed text providing more detailed information to animated CG characters performing a relevant historic scene inside the virtual environment. Layers of meaning could also be interpreted in a more artistic sense by creating impressions rather than photorealistic representations. These artistic impressions utilize animation, laser scanning and photogrammetry to create representations on an abstract interpretation level, aiming to create a sense of atmosphere and trigger a stronger emotional response.

Speaker
Biography:

Shi Jinn Horng received the BS degree in Electronic Engineering from National Taiwan Institute of Technology, Taiwan; the MS degree in Information Engineering from National Central University, Taiwan, and the PhD degree in Computer Science from National Tsing Hua University, Taiwan, in 1980, 1984, and 1989, respectively. Currently, he is a Chair Professor in the Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology. His research interests include Deep Learning, Biometric Recognitions and Image Processing.

Abstract:

Due to the difficulty of finding the specific features of faces, in computer vision, low-resolution face image recognition is one of the challenging problems and the accuracy of recognition is still quite low. We were trying to solve this problem using deep learning techniques. Two major parts are used for the proposed method; first the restricted Boltzmann machine is used to preprocess the face images, then the deep convolution neural network is used to do classification. The data set was combined from the Georgia Institute of Technology, Aleix Martinez, and Robert Benavente. Based on this combined data, we conducted the training and testing processes. The proposed method is the first method that combines restricted Boltzmann machine and deep convolution neural networks to do low-resolution face image recognition. From the experimental results, compared to existing methods, the proposed method greatly improves the accuracy of recognition. The proposed method is shown in Figure 1. The experimental results are shown in Table 1.

 

Image

 

Kai Lung Hua

National Taiwan University of Science and Technology, Taiwan

Title: Multimodal image popularity prediction on social media
Speaker
Biography:

Kai Lung Hua received the BS degree in Electrical Engineering from National Tsing Hua University in 2000, and the MS degree in Communication Engineering from National Chiao Tung University in 2002, both in Hsinchu, Taiwan. He received the PhD degree from the School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, in 2010. Since 2010, he has been with National Taiwan University of Science and Technology, where he is currently an Associate Professor in the Department of Computer Science and Information Engineering. He is a member of Eta Kappa Nu and Phi Tau Phi, as well as a recipient of MediaTek Doctoral Fellowship. His current research interests include digital image and video processing, computer vision, and multimedia networking. He has received several research awards, including Top 10% Paper Award of 2015 IEEE International Workshop on Multimedia Signal Processing, the Second Award of the 2014 ACM Multimedia Grand Challenge, the Best Paper Award of the 2013 IEEE International Symposium on Consumer Electronics, and the Best Poster Paper Award of the 2012 International Conference on 3D Systems and Applications.

 

Abstract:

Social media websites are one of the most important channels for content sharing and communication between users on social networks. The posted images on the websites, even the ones from the identical user, generally obtain very different numbers of views. This motivates researchers to predict the popularity of a candidate image on social media. To address this task, we investigate the effects of multimodal on user profile, post metadata, and photo aesthetics. The proposed method is evaluated via a large number of real image posts from Flickr. The experimental results verified the effectiveness of the proposed method.

Speaker
Biography:

Xiaodong Huang is an Associate Professor of Capital Normal University, China. He received his PhD degree in Computer Science from the Beijing University of Posts and Telecommunications in 2010, MS degree in Computer Science from the Beijing University of Posts and Telecommunications in 2006 and BS degree in Computer Science from Wuhan University of Technology in 1995. His research interests include pattern recognition and computer vision.

Abstract:

Compared with other video semantic clues, such as gestures, motions etc., video text generally provides highly useful and fairly precise semantic information, the analysis of which can to a great extent facilitate video and scene understanding. It can be observed that the video texts show stronger edges. The nonsubsampled contourlet transform (NSCT) is a fully shift-invariant, multi-scale, and multi-direction expansion, which can preserve the edge/silhouette of the text characters well. Therefore, in this paper, a new approach has been proposed to detect video text based on NSCT. First of all, the 8 directional coefficients of NSCT are combined to build the directional edge map (DEM), which can keep the horizontal, vertical and diagonal edge features and suppress other directional edge features. Then various directional pixels of DEM are integrated into a whole binary image (BE). Based on the BE, text frame classification is carried out to determine whether the video frames contain the text lines. Finally, text detection based on the BE is performed on consecutive frames to discriminate the video text from non-text regions. Experimental evaluations based on our collected TV videos data set demonstrate that our method significantly outperforms the other 3 video text detection algorithms in both detection speed and accuracy, especially when there are challenges such as video text with various sizes, languages, colors, fonts, short or long text lines.

 

Break: Networking & Refreshments 15:40-16:05 @ Foyer
Biography:

Daijin Kim received the BS degree in Electronic and Engineering from Yonsei University, Seoul, South Korea, in 1981, and the MS degree in Electrical Engineering from the Korea Advanced Institute of Science and Technology (KAIST), Taejon, 1984. In 1991, he received the PhD degree in Electrical and Computer Engineering from Syracuse University, Syracuse, NY. During 1992-1999, he was an Associate Professor in the Department of Computer Engineering at DongA University, Pusan, Korea. He is currently a Professor in the Department of Computer Science and Engineering at POSTECH, Pohang, Korea. His research interests include face and human analysis, machine intelligence and advanced driver assistance systems.

Abstract:

Recently, many face alignment methods using convolutional neural networks (CNN) have been introduced due to their high accuracies. However, they do not show real-time processing due to their high computational costs. In this paper, we propose a three-stage convolutional neural regression network (CNRN) to achieve a highly accurate face alignment in the real-time. The first stage consists of one CNRN that maps the facial image into the center positions of seven facial parts such as eyes, nose, mouth, etc. We obtain 68 local facial patches by aligning the center positions of seven facial parts onto the mean shape. The second stage consists of seven independent CNRNs, where each CNRN maps the local facial patches within its facial part into their displacements of x and y direction to reach the target positions. We obtain the fitted whole facial features and make a warped facial image from them. The third stage consists of one CNRN that maps the warped facial image into the appearance error. We repeat the second and third stage until the appearance error becomes small. The proposed method is fast because it trains first the facial parts and then facial features within the facial part like a coarse to fine fitting and each CNRN is relatively simple. The proposed method is highly accurate because it trains the facial features iteratively by performing the local regression on the facial features and the global regression on the warped appearance image. In the experiments, the proposed method will yield more accurate and stable face alignment or tracking under heavy occlusion and large pose variation than the existing the state of the art methods and run in the real-time.

 

Yaqi Mi

Beijing University of Posts and Telecommunications, China

Title: A new matrix operation in compressed sensing
Speaker
Biography:

Yaqi Mi received his Bachelor of Science degree in Computer Science and Technology from Zhejiang University of Finance & Economics, Hangzhou, China in 2016. Now she is working towards the Master of Science degree in Information Security at Beijing University of Posts and Telecommunications, Beijing, China. Her major interests are compressive sensing and signal processing.

Abstract:

In this speech, we propose a new matrix operation called P-tensor product (PTP) and apply it to compressed sensing (CS), the new model of CS is named PTP-CS. In order to break the restrictions of the traditional matrix multiplication, the PTP makes the dimension of two matrices matching by Kronecker product. Aiming at the large storage of the random matrix in CS, the PTP can construct a high-dimension matrix using a matrix, which can be chosen as random matrix or generalized permutation matrix. Similar with the traditional CS, we analyze some reconstruction conditions of the PTP-CS such as the spark, the coherence and the restricted isometry property (RIP). The experimental results demonstrate that our PTP-CS model can not only increase the choice of Kronecker matric and decrease the storage of traditional CS, but also maintain the considerable recovery performance.