Ofey Chan, aka 'ofey404'
Pretending a subtitle is out there...
Youtube Playlist: 15-721 Advanced Database Systems (Spring 2020)
M. Stonebraker, et al., What Goes Around Comes Around, in Readings in Database Systems, 4th Edition, 2006 (Optional)
Main idea:
Takeaway:
Systems:
Data model | System | Interface |
---|---|---|
Hierarchical tree | IBM IMS | DL/1, a record at a time, limited P/L independence |
Hyperspace network | CODASYL | Navigating in the hyperspace, no P/L independence |
Relational | System on VAX, IBM DB/2 | SQL, QUEL… |
Entity - Relational | Schema normalization tools | DBA tools |
Object oriented | Garden and Exodus | Certain programming language |
Object - Rational | Sybase | SQL + User defined components |
Semi structured and XML |
A. Pavlo, et al., What’s New with NewSQL?, in SIGMOD Record (vol. 45, iss. 2), 2016 (Optional)
X. Yu, et al., Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores, in VLDB, 2014
Main idea:
7 concurrent control algrithms, in 2 schemes(2PL and Timestamp Ordering), on a 1024 core simulator.
Bottlenecks to scalability: lock-thrashing, preemptive abort, deadlock, timestamp allocation, memory copying.
2PL | T/O |
---|---|
low contention | higher contention |
short transaction | longer transaction |
kv workload | OLTP workload |
Takeaway:
Bottleneck | Direction |
---|---|
timestamp allocation | hardware counter, clock, and atomic addition |
memory allocation, copying | CPU background copyer, thread-local memory pool |
No superior scheme | switch between schemes or hybrid approach |
System:
Workload:
Main idea:
Scaling MVCC on modern multi-core, in-memory hardware setting.
Key design decisions:
Takeaway:
MVTO works well for most workloads.
Transaction level gc has small memory footprint, which is good.
System:
Configuration | CC protocol | Storage Scheme | GC | Index |
---|---|---|---|---|
Oracle/MySQL | MV2PL | Delta | Vacuum | Logical |
Postgres | MV2PL/MV-TO | Append-Only | Vacuum | Physical |
Workload:
Main idea:
update
by delete
then insert
.Takeaway:
It’s good to know internals of current databases’ implementation. They might be simple and out-dated with state-of-art hardware.
Eg: Current serializability validation implementation in 2.3, check entire read set and re-checked in the end. It may be a suitable way in in-disk era.
System used:
Research on HyPer.
This MVCC model suits HTAP databases best, like SAP HANA. Can be implemented in high-performance transactional systems, H-Store/VoltDB. Little need to prefer snapshot isolation in the future.
Workload evaluated:
J. Böttcher, et al., Scalable Garbage Collection for In-Memory MVCC Systems, in VLDB, 2019
Main idea:
Takeaway:
In place GC would make system more robust to skew.
System used:
Hyper.
Workload evaluated:
CH benchmark, a stress test for GC.
TPC-C, scalability and overhead.
* Style sheet refers to Dr. Brian Robert Callahan