1. MapReduce

    1. Simple interface to hide the complexity of mass parallelism, fault tolerance, scale, etc.

    2. Data and computation are on the same cluster. You try to run jobs on the same machines where you have the data.

  2. GFS

    1. Filesystem-like interface but not exactly.

    2. Purpose-built for their use cases. They can afford a lot of weirdness (e.g. non-idential replicas) because of this.

    3. Replicate data.

    4. Consistency -> Poor performance.

  3. Primary-Backup Replication

    1. Represent a VM as a state machine.

    2. Make it fully deterministic.

    3. Apply actions to the replica and the backup.

  4. Raft

    1. Replicate the log across peers.

    2. Elect leader.

    3. Consistency guarantees through a clever scheme.

    4. Requires a majority for anything. Two different majorities always overlap, so can't overwrite something.

    5. Reply to the client only when committed.

    6. Any application, like KV server, is built on top.

  5. Object Storage on CRAQ

    1. Can do chain replication.

    2. Leader handles writes.

    3. The last node in the chain handles reads.

  6. Aurora

    1. Decouple storage from the rest.

    2. Only replicate the log. This way, drastically reduce the network traffic.

  7. Cache Consistency: Frangipani

    1. Can cache local operations.

    2. Achieve cache coherence with clever locking.

    3. Only write when releasing a lock.

Last updated