ACM SIGMOD City, Country, Year
sigmod pods logo




SIGMOD 2021: Tutorials

Cohesive Subgraph Search over Big Heterogeneous Information Networks: Applications, Challenges, and Solutions

Abstract

With the advent of a wide spectrum of recent applications, querying heterogeneous information networks (HINs) has received a great deal of attention from both academic and industry societies. HINs involve objects (vertices) and links (edges) that are classified into multiple types; examples include bibliography networks, knowledge networks, and user-item networks in E-business. An important component of these HINs is the cohesive subgraph, or a subgraph containing vertices that are densely connected internally. Searching cohesive subgraphs over HINs has found many real applications, such as community search, event organization, and friend recommendation. Consequently, how to design effective cohesive subgraph models and how to efficiently search cohesive subgraphs on large HINs become important research topics in the era of big data. In this tutorial, we first highlight the importance of cohesive subgraph search over HINs in various applications and the unique challenges that need to be addressed. Subsequently, we conduct a thorough review of existing works of cohesive subgraph search over HINs. Then, we analyze and compare the models and solutions in these works. Finally, we point out new research directions. We believe that this tutorial not only helps researchers to have a better understanding of existing cohesive subgraph search models and solutions, but also provides them insights for future study.

Presenters

Yixiang Fang (The Chinese University of Hong Kong, Shenzhen) 
Kai Wang (University of New South Wales) 
Xuemin Lin (University of New South Wales) 
Wenjie Zhang (University of New South Wales) 



Permissioned Blockchains: Properties, Techniques and Applications

Abstract

The unique features of blockchains such as immutability, transparency, provenance, and authenticity have been used by many large-scale data management systems to deploy a wide range of distributed applications including supply chain management, healthcare, and crowdworking in permissioned settings. Unlike permissionless settings, e.g., Bitcoin, where the network is public, and anyone can participate without a specific identity, a permissioned blockchain system consists of a set of known, identified nodes that might not fully trust each other. While the characteristics of permissioned blockchains are appealing to a wide range of large-scale data management systems, these systems, have to satisfy four main requirements: confidentiality, verifiability, performance, and scalability. Various approaches have been developed in industry and academia to satisfy these requirements with varying assumptions and costs. The focus of this tutorial is on presenting many of these techniques while highlighting the trade-offs among them. We demonstrate the practicality of such techniques in real-life by presenting three different applications, i.e., supply chain management, large-scale databases, and multi-platform crowdworking environments, and show how those techniques can be utilized to meet the requirements of such applications.

Presenters

Mohammad Javad Amiri (University of Pennsylvania)
Divyakant Agrawal (University of California Santa Barbara)
Amr El Abbadi (University of California Santa Barbara)



A Deep Dive into Deep Learning Approaches for Text-to-SQL Systems

Abstract

Data is a prevalent part of every business and scientific domain, but its explosive volume and increasing complexity make data querying challenging even for experts. For this reason, numerous text-to-SQL systems have been developed that enable querying relational databases using natural language. The recent advances on deep neural networks along with the creation of two large datasets specifically made for training text-to-SQL systems, have paved the path for a novel and very promising research area. The purpose of this tutorial is a deep dive into this area, covering state-of-the-art techniques for natural language representation in neural networks, benchmarks that sparked research and competition, recent text-to-SQL systems using deep learning techniques, as well as open problems and research opportunities.

Presenters

George Katsogiannis-Meimarakis (Athena Research Center, Greece)
Georgia Koutrika (Athena Research Center, Greece)



Practical Security and Privacy for Database Systems

Abstract

Computing technology has enabled massive digital traces of our personal lives to be collected and stored. These datasets play an important role in numerous real-life applications and research analysis, such as contact tracing for COVID 19, but they contain sensitive information about individuals. When managing these datasets, privacy is usually addressed as an afterthought, engineered on top of a database system optimized for performance and usability. This has led to a plethora of unexpected privacy attacks in the news. Specialized privacy-preserving solutions usually require a group of privacy experts and they are not directly transferable to other domains. There is an urgent need for a general trustworthy database system that offers end-to-end security and privacy guarantees. In this tutorial, we will first describe the security and privacy requirements for database systems in different settings and cover the state-of-the-art tools that achieve these requirements. We will also show challenges in integrating these techniques together and demonstrate the design principles and optimization opportunities for these security and privacy-aware database systems.  This is designed to be a three-hour tutorial.

Presenters

Xi He (University of Waterloo)
Jennie Rogers (Northwestern University)
Johes Bater (Duke University)
Ashwin Machanavajjhala (Duke University)
Chenghong Wang (Duke University)
Xiao Wang ((Northwestern University)



Not your Grandpa's SSD: The Era of Co-Designed Storage Devices

Abstract

The Solid-State Drive (SSD) landscape is in constant evolution. For years, this evolution was hidden behind the unchanging abstractions of block devices and POSIX I/O. However, these abstractions have become problematic. They hinder performance and no longer reduce software complexity. Such a state of affairs impacts the database community in at least two ways.
First, using SSDs through legacy interfaces that hide internal mechanisms invariably results in erratic performance. The blame often goes to SSDs' notoriously expensive garbage collection. In truth, several other complex processes result in non-linear effects in terms of latency and bandwidth. In this tutorial, we describe these processes and how they are implemented in modern devices. This knowledge will help system designers better choose SSDs and shape database workloads to match their performance characteristics.
Second, the inadequacy of the traditional I/O abstractions opens up an entire research field focused on the co-design of SSD and database management systems (DBMS). Such research aims at devising mechanisms and policies coupling the storage manager of a DBMS and SSD internals: e.g., placing an SSD FTL (its "brains") under the control of an application, changing SSD subsytems in response to the workload, or executing logic within a SSD on a database's behalf. In this tutorial, we describe the research opportunities and challenges through this continuum of DBMS/SSD co-design techniques, and present platforms supporting their simulation and prototyping.
We believe that those two areas---a more seamless integration of Database and Storage, and the study of SSD variations adapted to Database computations---are central to the development of the next generation of Database Systems. This (opinionated) survey will equip both researchers and practitioners alike to enter the field.

Presenters

Alberto Lerner (University of Friborug, Switzerland)
Philippe Bonnet (IT Univ Copenhagen, Denmark)



Deep Learning: Systems and Responsibility

Abstract

Deep neural networks enable numerous and diverse applications of machine learning. We present a tutorial on deep learning, highlighting the data systems nature of neural networks and research opportunities for the data management community. In particular we focus on three critical aspects: 1) classic tradeoffs and design problems in neural networks which can be enriched if seen through a systems and data management perspective, e.g., thinking critically about storage, data movement, and computation; 2) classic data systems design problems which can be reconsidered if neural networks can be considered as a viable design option, e.g., to replace or help system components that make decisions such as a database optimizer; 3) important ethics considerations for the application of neural networks in critical human-facing problems in society and how these also link to data management and performance. While these are a diverse set of rich topics, their combination offers additional rich opportunities for future research. The tutorial is designed to be accessible to data management researchers with no background in neural networks. This tutorial can be offered in both a 1.5-hour or as a 3-hour version.

Presenters

Abdul Wasay (Harvard University, USA)
Subarna Chatterjee (Harvard University, USA)
Stratos Idreos (Harvard University)



AI Meets Database: AI4DB and DB4AI

Abstract

Database and Artificial Intelligence (AI) can benefit from each other. On one hand, AI can make database more intelligent (AI4DB). For example, traditional empirical database optimization techniques (e.g., cost estimation, join order selection, knob tuning, index and view advisor) cannot meet the high-performance requirement for large-scale database instances, various applications and diversified users, especially on the cloud. Fortunately, learning-based techniques can alleviate this problem. On the other hand, database techniques can optimize AI models (DB4AI). For example, AI is hard to deploy, because it requires developers to write complex codes and train complicated models. Database techniques can be used to reduce the complexity of using AI models, accelerate AI algorithms and provide AI capability inside databases. DB4AI and AI4DB have been extensively studied recently. In this tutorial, we review existing studies on AI4DB and DB4AI. For {AI4DB}, we review the techniques on learning-based database configuration, optimization, design, monitoring, and security. For {DB4AI}, we review AI-oriented declarative language, data governance, training acceleration, and inference acceleration. Finally, we provide research challenges and future directions in AI4DB and DB4AI.

Presenters

Guoliang Li (Tsinghua University, China)
Xuanhe Zhou (Tsinghua University, China)
Lei Cao (MIT, USA)



Querying in the age of Graph Databases and Knowledge Graphs

Abstract

Graphs have become the best way we know of representing knowledge. The computing community has investigated and developed the support for managing graphs by means of digital technology. Graph databases and Knowledge graphs surface as the most successful solutions to this program. This tutorial will provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs.

Presenters

Marcelo Arenas (PUC Chile)
Claudio Gutierrez (Universidad de Chile, Chile)
Juan Sequeda (data.world)

Credits
Follow our progress: FacebookTwitter