Questions: https://pdos.csail.mit.edu/6.824/quizzes/q21-1.pdf

Answers: https://pdos.csail.mit.edu/6.824/quizzes/q21-1-sol.pdf


A. Briefly describe the advantage of allowing reduce workers to start reading intermediate files early

The entire job will finish faster. The job will finish when the last reduce task is finished, so the sooner the reduce jobs will start the sooner the entire thing will finish. The started reduce job might need to wait for the map, but it is at least not slower than waiting for the map to finish before starting the reduce.

B. Describe how you would modify your lab implementation to allow reading of intermediate files early, as Google's MapReduce library does, while maintaining correctness. (You don't have to write code, but sketch out a design in words. Keep your answer brief.)

Modifications to the master:

  • Start assigning Reduce tasks before all Map tasks are done. It should still assign Map tasks before Reduce tasks.

Modifications to reduce workers:

  • We now cannot sort the input by key. Instead, we should maintain a map key->state.

  • Find the first unprocessed Map output. Iterate the values, update the state for the corresponding key. Repeat until all map outputs are processed.

  • reducef must support streaming, e.g. support reducef(key, state, values): updatedState.

Hmm, in the answers they just suggest to copy files over, but not actually start computations on Reduce... It makes sense I guess.

MapReduce (Lab)

When testing their implementation, Zara notices that a job with map tasks numbered 0, 1, and 2 often fails.

A. In some scenarios, some reduce workers cannot open intermediate file mr-1-x (where x is the Reduce task number). Explain a scenario that could cause this to occur.

I see two problems:

  • The loop over map tasks starts from numCompletedMapTasks. This seems to assume that tasks are completed in the same order as they are assigned, which is not guaranteed to be true.

  • numCompletedMapTasks can be counted twice if one worker times out but later still reports the job as done, and then another worker also completes the task.

Scenario where this output can happen:

  • Task 0 is assigned to worker 0, task 1 is assigned to worker 1.

  • Worker 0 dies.

  • Worker 1 completes successfully, numCompletedMapTasks is set to 1.

  • Then task 0 will never get finished. Any reduce worker trying to read its output will fail.

B. Briefly describe what Zara should do to fix this problem.

  • Start the first loop from 0.

  • Maintain "completed" flag for each task. Increment numCompletedMapTasks only if this task is completed for the first time.


A. The GFS file system doesn't guarantee linearizability. Give an example of how the lack of linearizability complicates writing programs with GFS. That is, describe a feature that one must implement when using GFS that one wouldn't have to implement if GFS would have provided linearizability. Briefly explain your answers.

Users need to implement checksums to verify data consistency, and/or use appends with deduplication where they would otherwise use writes.

B. Briefly explain why append operations are not allowed to span chunk boundaries.

Can't dedup if something is written twice. (???)

Wrong: to make appends atomic.

VMware FT

In VM-FT (as described by in the paper "Fault-Tolerant Virtual Machines" by Scales et al.), the backup lags behind the primary to ensure, for example, that network interrupts are delivered at exactly the same instruction at the backup as on the primary. Briefly explain what could go wrong if the backup delivered interrupts at a different instruction than the primary?

Interrupt handles on two replicas will trigger in different replica states and will produce different results. The replicas will deviate forever.


Consider figure 7 of the Raft paper "In search of Understandable Consensus Algorithms (Extended Version)". The last follower (at the bottom of the figure) has term 3 in its last index 11. Could there have been a scenario under which that follower's log had entries in indexes 12, 13, and 14 in figure 7? What terms could those entries have? (Briefly explain your answer.)

It could have more entries with term 3, it it received more entries from the client in that term before committing them. It could only be 3; it could not be a leader of any other term, and it it received more entries as a follower, terms 2 and 3 would've been overwritten.

Raft (Lab)

A. No failures of any kind

5 (eventually). Once for each server.

B. Network partitions possible (but no crashes)

0 or 3-5. 0 if it never commits. 3 or higher if it commits and is replicated to the majority.

C. Crashes possible (but no network partitions)

0 is possible. It is still not guaranteed to replicate everywhere - the leader can crash, another leader will be elected, and this message will get lost forever.

If it does replicate, it will replicate everywhere eventually. So, 5.

Can it be greater than 5? Yes. It can be any number greater than 5 if it restarts.

Raft (Persistence)

While implementing Raft, Connie Consensus decided that instead of persisting the entire log, she would persist only committed entries. Connie claims that uncommitted entries could have been lost because of crashes or network issues so it's safe to ignore them until they've been committed. Does this change affect Raft's correctness? If so, provide a sequence of events that would lead to unsafe behavior. You can assume that the rest of Connie's implementation is correct.

When a leader sends an entry to the follower and it replies with success, the leader assumes it will be durable. Otherwise the correction breaks.

  • 3 servers.

  • S1 is elected the leader.

  • S3 is partitioned away.

  • S1 sends and entry to S2, S2 replies success.

  • S1 commits the entry and replies to the client with success.

  • S2 crashes and "looses" the entry.

  • S3 wakes up, S2 wakes up, S1 is partitioned.

  • S3 becomes the leader.

  • S2 and S3 keep committing new entries. The previously committed entry is lost.


A. Zookeeper's read throughput scales with the number of servers (e.g., see the red line in Figure 5). The Raft paper describes an optimization for read-only operations. Would that optimization allow Raft to scale read throughput with the number of peers as in Zookeeper? (Briefly explain your answer.)

Yes, but only at expense of weaker consistency guarantees. It would allow stale reads.

B. For this example, briefly explain what could go wrong if ZooKeeper didn't guarantee FIFO client order. In particular, what values could the read of f by client 2 return?

0, or 1.

Last updated