Ofey Chan, aka 'ofey404'

Pretending a subtitle is out there...



CMU 15-721 paper takeaway 1-5

Youtube Playlist: 15-721 Advanced Database Systems (Spring 2020)

01 - History of Databases

What goes around comes around

M. Stonebraker, et al., What Goes Around Comes Around, in Readings in Database Systems, 4th Edition, 2006 (Optional)

Main idea:

  1. Understand the history of database to try not to repeat it.
  2. Ideas of databases: data model or interface. Few new ideas occurred.
  3. Advantages in field of database: L/P isolation(for agility and optimization), easy to standardize.

Takeaway:

Systems:

Data model System Interface
Hierarchical tree IBM IMS DL/1, a record at a time, limited P/L independence
Hyperspace network CODASYL Navigating in the hyperspace, no P/L independence
Relational System on VAX, IBM DB/2 SQL, QUEL…
Entity - Relational Schema normalization tools DBA tools
Object oriented Garden and Exodus Certain programming language
Object - Rational Sybase SQL + User defined components
Semi structured and XML    

What’s Really New with NewSQL?

A. Pavlo, et al., What’s New with NewSQL?, in SIGMOD Record (vol. 45, iss. 2), 2016 (Optional)

02 In-Memory Databases (No In-Class Lecture)

Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores

X. Yu, et al., Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores, in VLDB, 2014

Main idea:

7 concurrent control algrithms, in 2 schemes(2PL and Timestamp Ordering), on a 1024 core simulator.

Bottlenecks to scalability: lock-thrashing, preemptive abort, deadlock, timestamp allocation, memory copying.

2PL T/O
low contention higher contention
short transaction longer transaction
kv workload OLTP workload

Takeaway:

Bottleneck Direction
timestamp allocation hardware counter, clock, and atomic addition
memory allocation, copying CPU background copyer, thread-local memory pool
No superior scheme switch between schemes or hybrid approach

System:

Workload:

03 Multi-Version Concurrency Control (Design Decisions)

An Empirical Evaluation of In-Memory Multi-Version Concurrency Control

Main idea:

Scaling MVCC on modern multi-core, in-memory hardware setting.

Key design decisions:

Takeaway:

System:

Configuration CC protocol Storage Scheme GC Index
Oracle/MySQL MV2PL Delta Vacuum Logical
Postgres MV2PL/MV-TO Append-Only Vacuum Physical

Workload:

YCSB and TPC-C04 - Multi-Version Concurrency Control [Protocols]

Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems

Main idea:

Takeaway:

It’s good to know internals of current databases’ implementation. They might be simple and out-dated with state-of-art hardware.

Eg: Current serializability validation implementation in 2.3, check entire read set and re-checked in the end. It may be a suitable way in in-disk era.

System used:

Research on HyPer.

This MVCC model suits HTAP databases best, like SAP HANA. Can be implemented in high-performance transactional systems, H-Store/VoltDB. Little need to prefer snapshot isolation in the future.

Workload evaluated:

05 - Multi-Version Concurrency Control [Garbage Collection] (CMU Databases / Spring 2020)

Scalable Garbage Collection for In-Memory MVCC Systems

J. Böttcher, et al., Scalable Garbage Collection for In-Memory MVCC Systems, in VLDB, 2019

Main idea:

Takeaway:

In place GC would make system more robust to skew.

System used:

Hyper.

Workload evaluated:

CH benchmark, a stress test for GC.

TPC-C, scalability and overhead.

August 06, 2021

[Back to top]

* Style sheet refers to Dr. Brian Robert Callahan