Distributed Transactions
Concurrency control.
Atomic commit.
ACID: Atomic, Consistent, Isolated (serializable), Durable.
Serializable: exists serial order of execution of the transactions that yields same result.
Concurrency control:
Pessimistic.
Optimistic, OCC.
If conflicts are frequent - pessimistic is good. If they are rare - optimistic is good.
Two-phase locking:
Acquire lock before using record.
Hold lock until after it commits or aborts. It's bad for concurrency, but required for performance.
Distributed transactions:
Transaction participants.
Transaction coordinator (TC).
Each message is tagged with transaction id.
First send prepare message to participants. Make sure participants can do it.
If they all reply yes, send Commit message. Participants reply.
Participants unlock when they see either Commit or Abort.
What if a participant crashed after replying to PREPARE but crashed? When recovering, it must still be prepared to commit. So the participant must make state and locks durable on disk before replying to Prepare.
What if a participant crashed after making change but before returning to COMMIT?
What if TC crashes? Before sending COMMIT messages, it must write it to its durable log. If it doesn't receive replies from any Prepare's, it must abort.
If a participant is waiting for Commit but hasn't received it for some time, it's not entitled to abort the transaction unilaterally. It must keep waiting.
Blocking and locking is a fundamental property of 2-phase commits. But it's not a good property. It makes it slow.
The decision is made by a single entity - TC.
Sort of looks similar to Raft, but it's entirely different. It solves very different problem.
Raft: High availability by replicating data. Can operate even though some servers are unreachable. In 2PC, you need to wait for all participants. Raft - everyone is doing the same, 2PC - everyone doing different. Raft is all about availability, 2PC - not highly available at all. 2PC is correct with failures, but not available with failures.
You can use Raft to replicate each part, coordinate cross-shard communication with 2PC. This can be highly available and correct.
Last updated