Embedded Machine Learning for Data Quality Enhancement in Smart Systems

Antonio Liotta 

Faculty of Computer Science, Free University of Bozen-Bolzano, Italy



The Internet of Things, the idea that the physical world around us can be digitized, monitored and controlled, is fascinating as it complex. IoT is a mix of smart and dumb ‘things’, a digital ecosystem that keeps growing in size and complexity, generating a vast variety of incomplete, unstructured data. IoT is emerging as one of the biggest big-data problems at hand but is unlike any other data science projects. It is a complex spatio-temporal problem, whereby data sources are heterogeneous, unreliable, unreliably connected, and often hard to correlate. So how can we make sense of IoT data? How can we avoid turning it into an unpredictable mess? And which hurdles do we need to overcome when it comes to smart systems?
In this talk, I explore the missed potential of Cloud-based smart systems, whereby the sensed data is transferred pretty much un-processed to the Cloud. I argue that to make significant insights from IoT data, we need to initiate intelligent processes at the micro-edge (at the sensor nodes). By means of recent pilot studies, I illustrate the value of shallow learning and other lightweight learning methods, which may be employed to improve data quality and address communication and energy bottlenecks in typical smart systems. I advocate an extensive use of embedded machine learning to perform a range of data analysis tasks at the very edge of the IoT, employing intelligent processes for tasks such as data cleaning, missing-data management, compression, anomaly detection, and for self-tuning the data collection itself. All-in-all, this talk is about smart methods to enhance data quality in smart systems. 


Antonio Liotta ( is Full Professor at the Faculty of Computer Science, Free University of Bolzano (Italy), where he teaches Data Science and Machine Learning. Antonio’s passion for artificial intelligence, has driven his academic career through the meanders of artificial vision, e-health, intelligent networks and intelligent systems. Antonio’s team is renowned for his contributions to micro-edge intelligence and miniaturized machine learning, which have significant potential in harnessing data-intensive systems, for instance in the context of smart cities, cyber-physical systems, Internet of Things, smart energy, and machine learning with humans in the loop. He has led the international team that has recently made a breakthrough in artificial neural networks, initiating a new research strand on “sparse neural networks for embedded learning” ( Antonio was the founding director of the Data Science Research Centre at the University of Derby. He has set up several cross-border virtual teams, and has been credited with over 350 publications involving, overall, more than 150 co-authors. Antonio is Editor-in-Chief of the Springer Internet of Things book series (, and associate editor of several prestigious journals. He is co-author of the books “Networks for Pervasive Services: six ways to upgrade the Internet” ( and “Data Science and Internet of Things” (

LinkedIn  ||  Twitter  ||  Google Scholar

Data Integration, Cleaning, and Deduplication: Research versus Reality

Robert Wrembel

Poznan University of Technology, Poland

Slides                   YouTube


In business applications, data integration is typically implemented as a data warehouse architecture. In this architecture, heterogeneous and distributed data sources are accessed and integrated by means of an Extract-Transform-Load (ETL) layer. This layer runs ETL processes whose tasks include: data integration and homogenization, data cleaning, and deduplication. At the end of an ETL process, homogeneous and clean data are uploaded into a central repository - a data warehouse. Designing such processes is challenging due to the heterogeneity of data models and formats, data errors and missing values, multiple data pieces representing the same real-world objects. As a consequence, ETL processes are very complex, which results in high development and maintenance costs as well long runtimes.
To ease the development of ETL processes, various research solutions were development. They include among others: (1) ETL design methods, (2) data cleaning pipelines, (3) data deduplication pipelines, and (4) performance optimization techniques. Despite the fact that these solutions were included in commercial ETL design environments and ETL engines, there are still multiple open issues in this research and technological area. Moreover, the application of these solutions in commercial projects reveals that they frequently do not fit business user requirements.
In this talk, I will provoke a discussion on what problems in ETL design one can encounter while implementing ETL pipelines. The presented findings are based on my experience from research and commercial data integration projects for a financial sector, healthcare sector, and software development sector. In particular, I will cover the following: (1) challenges in designing ETL processes, (2) faulty data and cleaning, (3) deduplicating large row-like data, and (4) open problems in optimizing ETL processes.


Robert Wrembel (PhD, Dr. Habil.) is an associate professor in the Faculty of Computing and Telecommunications, at Poznan University of Technology (Poland). In 2008 he received a post-doctoral degree in computer science (habilitation), specializing in database systems and data warehouses. He has been a deputy dean of the Faculty of Computing and Management (2008-2012) and the Faculty of Computing (2012-2016). 
He was a consultant at software house Rodan Systems (2002-2003) and a lecturer at Oracle Poland (1998-2005). Within the last 10 years he has realized three research projects (two international) and three R&D projects (two for Samsung Electronics and one for a Polish company in the sector of energy production). Currently he is realizing the fourth R&D project for the biggest Polish bank. He leaded at his University the Erasmus Mundus Joint Doctorate Programme "Information Technologies for Business Intelligence - Doctoral College" (2013-2020). He cooperates with IBM Software Lab Kraków in Poland and is an IT consultant in a private hospital.
Robert visited numerous research and education centers, including: Universitat Politècnica de Catalunya - BarcelonaTech (Catalunya), Université Lyon 2 (France), Universidad de Costa Rica (Costa Rica), Klagenfurt University (Austria), Loyola University (USA), INRIA Paris-Rocquencourt (France), and Université Paris Dauphine (France). In 2012 he graduated from a 2-months innovation and entrepreneurial program at Stanford University. In 2013 he has done an internship in a BI company Targit (USA). 

Are Telepresence Robots Here to Stay?

  Janika Leoste

Tallinn University of Technology, Estonia

Slides                    YouTube


The praised and cursed hybrid education seems to become reality at least in higher education in the upcoming years. The biggest threat we have seen so far is the low engagement of the remote students when they are mediated via a teleconferencing system as they tend to be forgotten by the professor while the physical students are in class. This, of course, leads to incomplete learning gains. Another threat the remote learners tend to experience is the social isolation because it is difficult to immerse yourself to another location when you do not control your senses and body-movement at that other location. Based on my recent research, I will contemplate whether a telepresence robot could cultivate a stronger feeling of social presence and enable richer communication that is more similar to that of an in-person communication.


Janika Leoste is an associate professor of educational robotics at Tallinn University and a post-doctoral researcher of IT Didactics at Tallinn University of Technology, Estonia.
Her research interests lie on developing innovative didactical methods for technology enhanced learning, technological educational innovations and their sustainability. She is also leading the “Creativity Matters” IT didactics research group at Tallinn University of Technology, with the main research focus on didactical use of telepresence robots. In addition, Janika is leading the interdisciplinary educational innovation collaboration cluster STEAM4EDU at Tallinn University that aims at developing knowledge transfer with EdTech companies and influencing the digital education landscape in Estonia and cross-border.