Distributed Database Architectures + DataStax JVector Talk (CMU Intro to Database Systems)
Resources:
#22 - Distributed Database Architectures + DataStax JVector Talk (CMU Intro to Database Systems)
Key Takeaways
From my point of view, this is an additional resource to polish the knowledge you already might have about the distributed databases and their architectures. The most interesting part is about JVector part and is outlined below:
- Basic overview of the distributed architectures: Shared-Nothing, Shared-Disk, Shared-Memory.
Raw and limited examples of the real database engines that are based on these architectures. - Basic knowledge about partitioning vertical and horizontal, consistent hashing, along with the description of their purpose.
- Watch Jonathan Ellis’ part where he describes the JVector and white papers behind it:
- Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
- Understanding Hierarchical Navigable Small Worlds (HNSW)
- LM-DiskANN: Low Memory Footprint in Disk-Native Dynamic Graph-Based ANN Indexing (2019)
- Quicker ADC (Asymmetric Distance Computation) for the ANN search.
- Product Quantization (PQ) for the ANN search.
- It was notices that JVector implementations with GC-based languages like Java are more efficient
than the C++ implementations due to the garbage collection and memory management.
Would be interesting to investigate the reasons behind this statement.
See also: