Table of Contents
- Mining Hidden Structures from Massive Unstructured Text
- Deep Knowledge from Shallow Data: Machine Learning on Wearables Data for Medical Insights into Chronic Conditions
- AI and BigData for Molecular Diagnostics, a Seegene’s Approach
- AsterixDB Meets Machine Learning
- AI-Powered Network Security
- Applying Deep Learning to New Vision Sensors for Extreme Imaging Conditions
- Towards Trustworthy Data Science
Mining Hidden Structures from
Massive Unstructured Text
Prof. Jiawei Han
(University of Illinois at Urbana-Champaign, USA)
The real-world big data are largely dynamic, interconnected, and unstructured text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data. Such approaches, however, are not scalable. We envision that massive text data itself may disclose a large body of hidden structures and knowledge. Equipped with pretrained language models and text embedding methods, it is promising to transform unstructured data into structured knowledge. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including joint spherical text embedding, discriminative topic mining, taxonomy construction, text classification, and taxonomy-guided text analysis. We show that data-driven approach could be promising at transforming massive text data into structured knowledge.
Jiawei Han is Michael Aiken Chair Professor in the Department of Computer Science, University of Illinois at Urbana-Champaign. He received his B.S. degree from the University of Science and Technology of China, and Ph.D. from the University of Wisconsin at Madison.
He served as the Director of Information Network Academic Research Center (INARC) (2009-2016) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of the U.S. Army Research Lab and co-Director of KnowEnG, a Center of Excellence in Big Data Computing (2014-2019), funded by NIH Big Data to Knowledge (BD2K) Initiative.
His first authored book, “Data Mining: Concepts and Techniques”, has been adopted as a Data Mining textbook world-wide (received over 50,000 citations) and his H-index is 185, ranked among top-5 computer scientists world-wide. He received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Japan's Funai Achievement Award (2018). He is a Fellow of ACM and a Fellow of IEEE.
Deep Knowledge from Shallow Data:
Machine Learning on Wearables Data for Medical Insights into Chronic Conditions
Prof. Jaideep Srivastava
(University of Minnesota, USA)
There has been a recent epidemiological transition in the leading causes of death, from acute infectious diseases to chronic, non-communicable diseases, with an estimated cumulative output loss of over $47 trillion in the next two decades. There is increasing realization that healthcare needs to become more proactive and preventive. Conventional therapy model is episodic and reactive, with care provided when an event like a clinic or hospital visit happens. Next increment of patient data is collected upon the next clinic/hospital visit, and the patient history, augmented with the latest increment of data, is used for deciding the course of action.
With patient information being collected only upon hospital/clinic visits, a barrier to proactive healthcare is the lack of visibility of the patient’s status for the long stretch of time in-between visits. The rising popularity and functionality of wearable devices, e.g., watches from Apple, Fitbit, etc., makes them the perfect tool for ubiquitous sensing to fill in this gap. Hence, an important component of the NIH’s implementation of the Precision Medicine Initiative is to collect data from bio-metric and physiological sensors, such as wearable devices and mobile phones. For wide-spread and longitudinal data collection, health science is moving towards the use of wearable devices. The monitoring of people during their daily life can provide valuable insight into the behavioral patterns related to various chronic conditions.
Continuous sensing generates large amounts of big multi-modal data. A major drawback of the emerging scenario, however, is that medical professionals can get completely overwhelmed with the new data which can be very large scale and being new is not well understood. Without a set of tools which can help make sense of this data, it will remain largely unused. Scalable machine learning techniques have the potential to address this dilemma by providing insights from this massive data with varying levels of guidance required, i.e., unsupervised, semi-supervised, and supervised learning. For example, automated actigraphy can allow sleep disorder screening based on data from wearable devices; enabling proactive and early detection of sleep disorders like obstructive sleep apnea (OSA), a condition which affects over 22 million Americans, and if left untreated can lead to choking as well as severe neurological and cardiac conditions.
The novel analytical power provided by machine learning can translate simple monitoring into medical knowledge discovery. These devices provide a platform for an affordable widespread population screening, diagnosis, prognosis, monitoring of patients on therapy, and impending therapy non-adherence. In this talk we draw upon examples from sleep science and medicine, intensive care, and diabetes monitoring to illustrate how improvement can be achieved in treatment decisions and therapy management programs, to empower clinicians, therapy program managers, and patients, towards more proactive healthcare.
Jaideep Srivastava is Professor in the Department of Computer Science and Engineering, University of Minnesota. He directs a laboratory focusing on research in applied machine learning, focused in the areas of Social Media and Health Informatics.
He received his B.S. degree from IIT-Kanpur, and M.S. and Ph.D. from UC Berkeley, all in computer science.
He has authored over 460 papers in journals and conferences and awarded 6 patents. Seven of his papers have won best paper awards. He has delivered over 150 invited talks around the world, including over a dozen keynote addresses. He has supervised 43 PhD dissertations and 68 MS theses. He has mentored a number of BS students as well as over a dozen high schoolers for science competitions. He has advised a number of large companies as well as startups and founded two startups himself. He has held advisory positions with the State of Minnesota and is advisor to the UID project of the Government of India, whose goal is to provide biometrics-based identification to the 1.30+ billion citizens of India.
He is a Fellow of the Institute of Electrical and Electronics Engineers (IEEE), has been an IEEE Distinguished Visitor and has been awarded the Distinguished Research Contributions Award of the PAKDD, for his lifetime contributions to the field of machine learning and data mining.
AI and BigData for Molecular Diagnostics,
a Seegene’s Approach
Dr. Kyungoh Min
(Seegene, Inc., South Korea)
Molecular diagnostics is the rapidly developing area of medicine that investigates human, viral and microbial genomes. With the novel corona virus afflicting every corner of the world, the PCR (Polymerase Chain Reaction) technology which is used for its diagnosis has become a common term. In this technology, oligonucleotide reagents are used that can help detect the existence of target pathogens in the sample by identifying and amplifying with them.
Seegene’s proprietary multiplex techniques enable rapid operation of such detection process. In this talk, how Seegene is using computing technologies in designing reagents for such purpose is presented. Technologies such as machine learning as well as big data are currently investigated to improve the quality of the ‘in silico’ system the company is building.
Kyungoh Min is the head of the Institute of Diagnosis Platforms of Seegene, Inc., a world leader in molecular diagnostics. He holds a PhD from the University of Illinois, Urbana-Champaign.
Recently he joined Seegene Inc. The Seegene’s platform engineering organization is developing AI-based computing system for designing oligonucleotide reagents which are used for detecting infectious diseases caused by viruses such as SARS-Cov-2.
Over the years, he has worked on various software projects in many different application areas ranging from computerizing business processes to developing a subsystem of an operating system for high performance computing and IPTV set-top boxes. He also led embedded software engineering teams at Samsung Electronics and LG Electronics, both of which are renowned leaders in the consumer electronics industry. In particular, while at LG Electronics, he spearheaded an effort for developing a corporate-wide software platform for its product lines by adopting webOS operating system which was originally developed by Palm Inc. for mobile phones.
AsterixDB Meets Machine Learning
Prof. Michael J. Carey
(University of California at Irvine, USA)
Co-contributors: Ian Maxon and Phanwadee (Gift) Singthong (UC Irvine)
In the last few years, the field of data science has grown rapidly as businesses have adopted statistical and machine learning techniques to help drive their analyses and decision-making. Scaling data analyses to large volumes of data today typically involves the use and management of distributed frameworks; this can pose technical challenges for data analysts and reduces their productivity. This talk will briefly describe two ways in which the AsterixDB project has been working to alleviate this issue. The first is the development of AFrame (and now PolyFrame), a data analytics library that provides the data scientists’ familiar interface, Pandas Dataframe, but that transparently scales out the evaluation of analytical operations via a backend database system. This enables AFrame to leverage database management facilities (such as indexes and query optimization in AsterixDB) so that users can analyze large volumes of database-resident data as though it were "small data". The second is the provision of Python user-defined function (UDF) support in AsterixDB's SQL++ query language. This enables machine-learned models, such as models trained using ScikitLearn or PyTorch, to be deployed as functions that can be called like other functions in SQL++ queries and thus applied in parallel to large volumes of data by AsterixDB's parallel query engine.
Michael J. Carey is Bren Professor of Information and Computer Sciences and Distinguished Professor of Computer Science at UC Irvine, where he leads the AsterixDB project, as well as a Consulting Architect at Couchbase, Inc.
He received his B.S. and M.S. degrees from Carnegie-Mellon University and his Ph.D. in 1983 from the University of California, Berkeley.
Before joining UCI in 2008, he worked at BEA Systems for seven years and led the development of their AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble. He is an ACM Fellow, an IEEE Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests center around data-intensive computing and scalable data management (a.k.a. Big Data).
AI-Powered Network Security
Prof. Elisa Bertino
(Purdue University, USA)
Networks are today a critical infrastructure. Their resilience against attacks is thus critical. Protecting networks requires a comprehensive security life cycle and the deployment of different protection techniques. To make defenses more effective, recent solutions leverage AI techniques. In this talk, we first discuss relevant directions for AI-based protection techniques, according to a security life cycle. We then present an overview of Polisma, a framework to learn access control policies from data; such an approach is critical to enable zero-trust architecture (ZTA). Polisma is based on a pipeline of different techniques to learn attribute-based access control (ABAC) rules from logs of access control decisions and potential context information obtained from external sources (e.g., LDAP directories). Polisma, combines data mining, statistical, and machine learning techniques to learn access control rules that can then be easily understood by end-users, auditors, systems administrators. We have experimentally evaluated Polisma using two datasets (real and synthetic). Experimental results show that Polisma is able to generate ABAC policies that accurately control access requests.
Elisa Bertino is Samuel D. Conte Professor of Computer Science at Purdue University, and research director of CERITAS (the Center for Education and Research in Information Assurance and Security), an institute attached to Purdue University. She received her Ph.D. from the University of Pisa, Italy.
Prior to joining Purdue, she was a professor and department head at the Department of Computer Science and Communication of the University of Milan. She has been a visiting researcher at the IBM Research Laboratory (now Almaden) in San Jose, at the Microelectronics and Computer Technology Corporation, at Telcordia Technologies, and visiting professor at the Singapore Management University and the National University of Singapore.
Elisa Bertino is a Fellow member of IEEE, ACM, and AAAS. She received the 2002 IEEE Computer Society Technical Achievement Award for “For outstanding contributions to database systems and database security and advanced data management systems”, the 2005 IEEE Computer Society Tsutomu Kanai Award for “Pioneering and innovative research contributions to secure distributed systems”, the 2014 ACM SIGSAC Outstanding Contributions Award with citation “For her seminal research contributions and outstanding leadership to data security and privacy for the past 25 years”, and the 2019-2020 ACM Athena Lecturer Award. She recently received the IEEE 2021 Innovation in Societal Infrastructure Award for “For advancing the security and privacy of new-generation cellular networks.” Her recent research focuses on cybersecurity and privacy of cellular networks and IoT systems, and edge analytics and machine learning for cybersecurity.
Applying Deep Learning to New Vision Sensors for Extreme Imaging Conditions
Prof. Yong Ju Jung
(Gachon University, South Korea)
Recent deep learning-based approaches have shown outstanding performance in generating visually plausible image contents for various low-level vision tasks, such as image super-resolution, inpainting, colorization, high dynamic range imaging, and multi-image fusion. The deep learning approaches allow us to overcome the limitations of conventional image sensors while taking photos in extreme shooting conditions, such as low-light, high dynamic range, and fast motion.
In this talk, I will summarize recent deep learning-based approaches that can be applied to computational photography and imaging. I will introduce two new methods inspired by human vision, which enable us to design new types of cameras for better imaging quality. I will discuss a novel concept of peripheral vision sensor which uses deep image colorization techniques to provide better low-light shooting performance than conventional Bayer cameras. I will also discuss a depth sensing approach that uses deep stereo matching algorithms with stereo event streams captured by event-driven vision sensors. Experimental results have shown that the proposed deep learning models can provide better imaging performance and hence allow development of such new imaging devices.
Yong Ju Jung is an Associate Professor in the School of Computing, Gachon University. He received Ph.D. degree from Korea Advanced Institute of Science and Technology (KAIST) in 2005. From 2005 to 2010, he was a Principal Research Scientist with Samsung Advanced Institute of Technology. From 2010 to 2014, he was a Research Professor at KAIST. From 2014 to 2015, he was a Principal Engineer with System LSI division of Samsung Electronics, contributing to the development of image sensors and multi-camera solutions. His current research interests include image processing and computer vision such as computational photography and imaging.
Towards Trustworthy Data Science
Prof. Jian Pei
(Simon Fraser University, British Columbia at Burnaby, Canada)
We believe data science and AI will change the world. No matter how smart and powerful an AI model we can build, the ultimate testimony of the success of data science and AI is users' trust. How can we build trustworthy data science? At the level of user-model interaction, how can we convince users that a data analytic result is trustworthy? At the level of group-wise collaboration for data science and AI, how can we ensure that the parties and their contributions are recognized fairly, and establish trust between the outcome (e.g., a model built) from the group collaboration and the external users? At the level of data science eco-systems, how can we effectively and efficiently connect many participants of various roles and facilitate the connections among supplies and demands of data and models?
In this talk, I will brainstorm possible directions to the above questions in the context of an end-to-end data science pipeline. To strengthen trustworthy interactions between models and users, I will advocate exact and consistent interpretation of machine learning models. Our recent results show that exact and consistent interpretations are not just theoretically feasible, but also practical even for API-based AI services. To build trust in collaboration among multiple participants in coalition, I will review some progress in ensuring fairness in federated learning, including fair assessment of contributions and fairness enforcement in collaboration outcome. Last, to address the need of trustworthy data science eco-systems, I will review some latest efforts in building data and model marketplaces and preserving fairness and privacy. Through reflection I will discuss some challenges and opportunities in building trustworthy data science for possible future work.
Jian Pei is a Professor in the School of Computing Science at Simon Fraser University. He received his B.Eng. Computer Science and M. Eng. Computer Science degrees from Jiao Tong University, China, and Ph.D. from Simon Fraser University.
He is a leading researcher in the general areas of data science, big data, data mining, and database systems. His expertise is on developing effective and efficient data analysis techniques for novel data intensive applications and transferring his research results to products and business practice.
He has published one textbook, two monographs and over 300 research papers in refereed journals and conferences, which have been cited extensively by others. His research has generated remarkable impact substantially beyond academia. For example, his algorithms have been adopted by industry in production and popular open-source software suites.
He is recognized as a Fellow of the Royal Society of Canada (Canada's national academy), the Canadian Academy of Engineering, the Association of Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE). He was the editor-in-chief of the IEEE Transactions of Knowledge and Data Engineering (TKDE) in 2013-16, the chair of the Special Interest Group on Knowledge Discovery in Data (SIGKDD) of the Association for Computing Machinery (ACM) in 2017-2021, and a general co-chair or program committee co-chair of many premier conferences. He maintains a wide spectrum of industry relations with both global and local industry partners. He is an active consultant and coach for industry.
He received many prestigious awards, including the 2017 ACM SIGKDD Innovation Award, the 2015 ACM SIGKDD Service Award, the 2014 IEEE ICDM Research Contributions Award, the British Columbia Innovation Council 2005 Young Innovator Award, an NSERC 2008 Discovery Accelerator Supplements Award (100 awards cross the whole country), an IBM Faculty Award (2006), a KDD Best Application Paper Award (2008), an ICDE Influential Paper Award (2018), a PAKDD Best Paper Award (2014), and a PAKDD Most Influential Paper Award (2009).