📓
ainzzorl's wiki
  • Intro
  • Pet Projects
    • Good Code
    • LangFlipFlop
    • Soccer Adjectives
    • Word Highlighter
    • News Digest
    • Algorithms in Kotlin
    • rustorcli
  • Tech
    • Tech Articles
    • Algorithms
      • Distributed Hash Table (DHT)
      • RSA
      • Seam Carving
      • Fibonacci Heaps
      • Suffix trees, suffix arrays, etc.
    • Technologies
      • Threads vs Events
      • TLS
      • GPU
      • Web Sockets
      • OSI Model
    • Open Source
      • Homebrew
      • Standard Notes
      • LiChess
    • Bash
    • Raft
    • Quantum Computing
    • GFS
    • MapReduce
    • ZooKeeper
  • Courses
    • MIT 6.824: Distributed Systems
      • Primary-Backup Replication
      • Object Storage on CRAQ
      • Aurora
      • Cache Consistency: Frangipani
      • Takeways
      • Distributed Transactions
      • Midterm
      • Spanner
      • FaRM
      • Spark
      • Cache Consistency: Memcached at Facebook
    • Page 2
    • MIT 6.854/18.415: Advanced Algorithms + Stanford CS168: The Modern Algorithmic Toolbox
  • Math
    • Fourier Transform
    • Probabilities & Statistics
  • Places
    • Moscow
  • Books
    • Page 1
    • Page 1
    • Page 1
    • Page 1
    • Page 1
    • Tobol Mnogo Zvannyh - Ivanov
    • The Twelve Chairs/12 стульев - Ilf, Petrov
    • Beauty is a Wound - Eka Kurniawan
    • The Queen of Spades/Пиковая Дама - Pushkin
    • The Sirens of Titan - Kurt Vonnegut
    • Обитель - Захар Прилепин
    • The Faithful Executioner - Joel Harrington
    • City of Lies: Love, Sex, Death, and the Search for Truth in Tehran - Ramita Navai
    • June/Июль - Dmitry Bykov/Дмитрий Быков
    • East of Eden - John Steinbeck
    • Como Agua Para Chocolate/Like Water for Chocolate - Laura Esquivel Valdés
    • The Kukotski Enigma/Казус Кукоцкого - Lyudmila Ulitskaya/Людмила Улицкая
    • Ancillary Justice - Ann Leckie
    • Career of Evil - JK Rowling
    • The Signal and the Noise - Nate Silver
    • Don't Sleep, There are Snakes - Daniel Everett
    • Оправдание Острова - Eugene Vodolazkin
    • A Place Called Winter - Patrick Gale
    • 1491: New Revelations of the Americas Before Columbus - Charles C. Mann
    • Трудно Отпускает Антарктида - Vladimir Sanin
    • Klara and the Sun - Kazuo Ishiguro
    • The History of My Contemporary/История Моего Современника - Vladimir Korolenko
    • Life at the Speed of Light - Craig Venter
    • Misery - Stephen King
    • And Then There Were None - Agatha Christie
    • A Kim Jong-Il Production - Paul Fischer
  • Cooking
    • Marinated Mushrooms
    • Pea Soup
    • Fried Potato
    • Chimichurri
    • Komendantsky Sauce
    • Spicy mushroom marinara
    • Bruschetta
    • Solyanka with Mushrooms
    • Tofu Scramble
    • Bean Spaghetti
    • Salsa
    • Baked Mushrooms
    • Lentil Soup
    • Веганство в Москве
  • Misc Research
    • Recycling
    • Sailing Upwind
    • Viruses
  • Misc Reading
    • Harry Potter - Rowling Writing
  • Fitness
  • Languages
    • Spanish
    • Language Classification
  • Juggling
  • Life Advice
Powered by GitBook
On this page

Was this helpful?

  1. Tech
  2. Algorithms

Distributed Hash Table (DHT)

PreviousAlgorithmsNextRSA

Last updated 3 years ago

Was this helpful?

DHT in BitTorrent

Problem: how do I find peers for a torrent I'm trying to download? Trackers are not in the spirit of P2P and are SPOFs.

Solution: distribute the knowledge about peers for torrents across all participating nodes. Nodes and torrent infohashes are mapped to the same space. Nodes closer to the torrent are expected to store peers for it. Each node maintains a routing table of known nodes, and a map of peers for some torrents. It traverses the nodes until it finds one that knows the peers. Then it announces itself.

RPCs:

  • ping

  • announce_peer

    • Announce that the peer, controlling the querying node, is downloading a torrent on a port. announce_peer has four arguments: "id" containing the node ID of the querying node, "info_hash" containing the infohash of the torrent, "port" containing the port as an integer, and the "token" received in response to a previous get_peers query. The queried node must verify that the token was previously sent to the same IP address as the querying node. Then the queried node should store the IP address of the querying node and the supplied port number under the infohash in its store of peer contact information.

  • get_peers

    • Get peers associated with a torrent infohash. "q" = "get_peers" A get_peers query has two arguments, "id" containing the node ID of the querying node, and "info_hash" containing the infohash of the torrent. If the queried node has peers for the infohash, they are returned in a key "values" as a list of strings. Each string containing "compact" format peer information for a single peer. If the queried node has no peers for the infohash, a key "nodes" is returned containing the K nodes in the queried nodes routing table closest to the infohash supplied in the query. In either case a "token" key is also included in the return value. The token value is a required argument for a future announce_peer query. The token value should be a short binary string.

  • find_nodes

    • Find node is used to find the contact information for a node given its ID. "q" == "find_node" A find_node query has two arguments, "id" containing the node ID of the querying node, and "target" containing the ID of the node sought by the queryer. When a node receives a find_node query, it should respond with a key "nodes" and value of a string containing the compact node info for the target node or the K (8) closest good nodes in its own routing table.

Protocol:

  • Bootstrap.

    • find_nodes(self)

  • Refresh buckets.

    • find_nodes(target bucket)

  • Announce.

    • get_peers(infohash) + announce_peer(infohash)

A "peer" is a client/server listening on a TCP port that implements the BitTorrent protocol. A "node" is a client/server listening on a UDP port implementing the distributed hash table protocol.

A trackerless torrent dictionary does not have an "announce" key. Instead, a trackerless torrent has a "nodes" key. This key should be set to the K closest nodes in the torrent generating client's routing table. Alternatively, the key could be set to a known good node such as one operated by the person generating the torrent. Please do not automatically add "router.bittorrent.com" to torrent files or automatically add this node to clients routing tables.

Spoof protection:

Before we add someone to the peer-list, we verify that that ip is really requesting it.

  • get_peers responds with a write-token.

  • Write token is typically a MAC of:

    • source(ip, port)

    • target info-hash

    • local_secrets (experes in tens of minutes)

  • announce_peer requires a valid write token to insert the node into the peer list.

Topology. Describes how nodes can repond to recursive lookup queries like this, with nodes closer and closer to the target.

The DHT is made up by all bittorrent peers, across all swarms. Each npde has a self-assigned address, or node id. Id space - [0, 2^160). All nodes appear uniformly distributed. Same space as infohash space. Nodes whose ID is close to an info-hash are responsible for storing information about it.

Routing. It's impractical for every node to know about every other node. There are millions of nodes, they come and go constantly.

Every node specializes in knowing about all nodes close to itself. The routing table orders nodes based on their distance from oneself.

XOR distance metric: d(a, b) = a xor b.

The distance space is divided into buckets, each no more than 8 nodes. Each bucket is half as the previous. Fartherst 1/2 nodes: 1 bucket. You know more about nodes closer to you.

For every hop in a recursive lookup, the nodes distance is cut in half. Lookup complexity: O(log n).

Routing table. The XOR distance metric applied to the routing table just counts the length of the common bit-prefix. Max of 160 buckets. Most buckets will be empty (too few nodes!), so an array of 160 buckets is not efficient.

A typical routing table starts with only bucket 0. When 9th bucket is added, the bucket is split into bucket 0 and bucket 1, with the nodes moved into the respective buckets. Only the highest number bucket is ever split.

Traversal algorithm. You join the swarm - you need ips for the infohash + announce yourself. Sort nodes in your routing table by distance to the target. Pick, say, 3 nodes. Send get_peers requests to these nodes at the same time. Keep 3 outstanding requests. They come back with nodes, we insert them into the routing table. Hopefully closer to the info-hash. Mark stale nodes (don't remove, to save time when it's reinserted). When we have 8 nodes on top, all queried successfully and no closer nodes, we stop. We announce ourselves to these 8 nodes. Re-announce every 15 minutes.

https://www.cs.princeton.edu/courses/archive/fall18/cos418/docs/L6-dhts.pdf
https://www.ietf.org/proceedings/65/slides/plenaryt-2.pdf
https://www.usenix.org/legacy/publications/library/proceedings/osdi2000/full_papers/gribble/gribble_html/node4.html
https://stackoverflow.com/questions/1332107/how-does-dht-in-torrents-work
https://en.wikipedia.org/wiki/Mainline_DHT
https://engineering.bittorrent.com/2013/01/22/bittorrent-tech-talks-dht/
http://www.bittorrent.org/beps/bep_0005.html
https://vimeo.com/56044595