Key Moments
Key Moments
MySQL Cluster 5.1: in-memory core with disk data, online ops, and robust failover.
Key Insights
Architecture is a shared-nothing, synchronous cluster: storage nodes hold data, MySQL servers handle SQL, and a management node distributes config and coordinates failover.
5.1 adds data on disk and variable-sized rows, enabling larger datasets with better memory efficiency and new disk-based indexing plans.
Online operations are emphasized: online add/drop of indices, online table-partitioning control, and online table space and log file management.
Performance enhancements include engine condition pushdown, batch read interfaces, and tight query-cache integration for cluster tables.
Cross-cluster replication and NDB API integrations enable geo-redundancy and offload/reporting workloads while maintaining a unified binary log for replication.
Failover and recovery are epoch-based and highly automatic in design, but some operations (like automatic failover) are still evolving, with manual steps documented.
INTRODUCTION AND CONTEXT
The speaker, Stuart Smith, introduces the topic of MySQL Cluster (NDB) and places it within the lineage of MySQL storage engines. He outlines the progression from 4.1’s early in-memory clustering to 5.0’s performance improvements and 5.1’s new features, including disk-backed data and richer functionality. The talk is framed as part of a trilogy, with emphasis on practical architecture, deployment scenarios, and how a cluster storage engine integrates with standard MySQL servers. The goal is to give attendees a working mental model of what a cluster does, how it is structured, and where it adds value for high-availability, high-throughput workloads.
ARCHITECTURE AND COMPONENTS OF MYSQL CLUSTER
The core architecture is laid out: storage nodes form the data backbone, MySQL server nodes handle SQL traffic, and a separate management server handles configuration distribution and monitoring interfaces. Data is distributed across node groups and replicated to achieve high availability. The model is shared-nothing, with commodity hardware and Ethernet interconnects. A transport layer (TCP by default, with optional SCI). The data path is clear: client MySQL servers issue queries, which are translated into operations on data nodes; all transactions are synchronously replicated to replicas to keep consistency.
NEW FEATURES IN 5.1: DISK-BASED DATA AND VARIABLE-SIZED ROWS
5.1 introduces disk-backed data (dis data) alongside the traditional in-memory model, enabling larger datasets without needing all data in RAM. Variable-sized rows reduce wasted memory, improving RAM efficiency and capacity for big workloads. The talk also explains disk-based index considerations and the groundwork for more flexible index storage. The architecture maintains fast recovery via checkpoints, and the memory footprint becomes more predictable with the ability to size data and index memory more precisely.
ONLINE OPERATIONS AND PARTITIONING
A major theme is online manageability: you can add and drop indexes online without copying entire tables, create or alter table spaces and log file groups, and perform user-defined partitioning. Partitioning by key, range, or list allows explicit control over data placement across node groups, enabling load balancing and targeted performance tuning. Online index creation and drop avoid the traditional lock-and-copy approach, significantly reducing maintenance windows and improving availability during schema changes.
PERFORMANCE OPTIMIZATIONS AND QUERY PROCESSING
The talk covers several performance enhancements: engine condition pushdown sends unindexed-field predicates down to data nodes, enabling parallel evaluation and reducing wire traffic. Batch read interfaces reduce the number of network hops by fetching multiple keys in one request. Query cache integration and improved metadata handling speed up common patterns for cluster tables. The combination of in-node processing and batched lookups yields tangible speedups on typical workloads.
REPLICATION MODEL, NDB API, AND FAILOVER SCENARIOS
Beyond intra-cluster replication, there is replication between clusters for geo-redundancy. The NDB API allows programming direct data operations in C++ and feeding changes into the cluster’s binary log via a new injector thread, resulting in a canonical, row-based binary log for slaves. Failover is epoch-based, with a defined sequence to synchronize binary log positions and epochs across master and backup instances. While some failover automation is still evolving, the architecture supports robust redundancy through multi-channel replication and manual failover workflows.
Mentioned in This Episode
●Products
●Software & Apps
●Books
●Concepts
●People Referenced
MySQL Cluster 5.1 Quick Do/Don't
Practical takeaways from this episode
Do This
Avoid This
Common Questions
NDB Cluster is the storage engine that provides a high-availability, in-memory distributed database. It partitions data across storage nodes and replicates it, enabling fast primary key/index lookups with configurable redundancy. It also supports a MySQL front-end and a C++ API for direct data access. (Timestamp: 81)
Topics
Mentioned in this video
High-availability clustered storage engine for MySQL; foundational to the MySQL Cluster architecture.
C++ API for direct cluster interaction; can drive operations without going through MySQL server.
High-performance interconnect hardware used to reduce latency between nodes.
Utility script to estimate memory usage for a given database dataset across cluster versions.
More from GoogleTalksArchive
View all 13 summaries
58 minEverything is Miscellaneous
54 minStatistical Aspects of Data Mining (Stats 202) Day 7
45 minKey Phrase Indexing With Controlled Vocabularies
63 minMysteries of the Human Genome
Found this useful? Build your knowledge library
Get AI-powered summaries of any YouTube video, podcast, or article in seconds. Save them to your personal pods and access them anytime.
Try Summify free