At the highest level of abstraction, it is a database that shards data across many sets of Paxos [21] state machines in datacenters spread all over the world. Replication is used for global availability and geographic locality; clients automatically failover between replicas. Spanner automatically reshards data across machines as the amount of data or the number of servers changes, and it automatically migrates data across machines (even across datacenters) to balance load and in response to failures. Spanner is designed to scale up to millions of machines across hundreds of datacenters and trillions of database rows.

"We believe it is better to have application programmers deal with per- formance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. Running two-phase commit over Paxos mitigates the availability problems."

My understanding:

  • Global database.

  • Sharded, each shard is replicated, possibly across data centers and even continents.

  • Writes are standard two-phase commits. Can be rather slow because of the whole cross-continent thing.

  • Reads are fast because we don't use locks and can read from the nearest replica.

  • The tricks are:

    • Versioned storage. Associate each update with a timestamp. When a read starts, capture the timestamp, and get latest before it.

      • It increases storage, but it's ok-ish. We can garbage collect.

      • Call this snapshot isolation.

    • We can know if the replica is guaranteed to have all updates up to the time. If it doesn't - wait.

  • For times, we need good clock. TrueTime to the rescue. It offers a confidence interval.

    • Time sync is important for RO. RW - they use 2PC, they don't care.

Last updated