IPFS
Last updated
Last updated
Globally distributed file system: IPFS is about “distribution” decentralization.
Content-based identification with the secure hash of contents.
Resolving locations using Distributed Hash Table (DHT).
Block exchanges using the popular BitTorrent peer-to-peer file distribution protocol.
Incentivized block exchange using Bitswap protocol.
Merkle DAG (Directed Acyclic Graph) version-based organization of files, similar to Git version control system.
Self-certification servers for the storage nodes for security.
Files in distributed storage.
A distributed hash table uses the hash of the file as a key to return the file's location.
Once the location is determined, the transfer takes place peer-to-peer as a decentralized transfer.
Peer nodes holding the data blocks are incentivized by a protocol called Bitswap.
Peer nodes have a want_list and have_list.
Any imbalance is noted in the form of a BitSwap credit and debt.
Bitswap protocol manages the block exchanges involving the nodes accordingly.
The nodes in the network then have to provide value in the form of blocks.
If you send a block, you get an IPFS token that can be used when you need a block.
The Bitswap protocol has provisions for handling exceptions such as freeloading nodes, nodes wanting nothing, and nodes having nothing.
When the local node receives a block, it broadcasts a cancel message for that block CID to all connected peers.
However, the cancel may not be processed by the recipient peer before it has sent out the block → duplicates.
The local node keeps track of the ratio of duplicates / received blocks and adjusts the split factor.
If the ratio goes above 4 (a large number of duplicates), the split factor is increased - the same CID will be sent to fewer peers.
If the ratio goes below 2 (few duplicates) the split factor is decreased - the same CID will be sent to more peers.
The cluster facilitates the replication of content across multiple nodes.
All cluster peers need to share the same cluster secret in order to be a part of the same cluster.
Each cluster peer has its own unique ID.
When new data is added and pinned to one of the peers of the cluster, all the other peers of that cluster receive the data.
The peer responsible for initiating the cluster is the one who creates the cluster secret and is selected as the cluster leader.
Every peer of the cluster is able to modify, add or remove data from the cluster.
When a peer is added or removed from the cluster, the cluster continues working normally.
In case the node leader goes down, a new leader is elected based on a consensus algorithm.
Facilitates the management of groups that shall receive the same content, for example, software updates, multicast content, and location-based content.
A leader election is started by a candidate server.
A server becomes a candidate if it receives no communication from the leader over a period called the election timeout, so it assumes there is no acting leader anymore.
It starts the election by increasing the term counter, voting for itself as the new leader, and sending a message to all servers requesting their vote.
A server will vote only once per term, on a first-come-first-served basis.
If a candidate receives a message from another server with a term number larger than the candidate's current term, the candidate's election is defeated and the candidate changes into a follower and recognizes the leader as legitimate.
If a candidate receives a majority of voters, it becomes the new leader.
If neither happens, e.g., because of a split vote, then a new term starts, and a new election begins.
Cryptographic hashes of public key identify nodes.
They hold the objects that form the files to be exchanged.
Objects are identified by a secure hash, and an object may contain sub-objects each with its own hash that is used in the creation of the root hash of the object.
IPFS identifies the resources by a hash.
Instead of identifying the resource by its location as in HTTP, IPFS identifies it by its content or by the secure hash of its content.
Send around a request for anyone with a resource with the hash identifier.
The routing part of the IPFS protocol maintains a DHT to locate the nodes as well as the file objects.
A simple DHT holds the hash as the key and location as the value.
The key can directly hash into the location.
DHT resolves to the closest location to the key value.
Object pinning: Nodes that wish to ensure the survival of particular objects can do so by pinning the objects.
Objects are kept in the node’s local storage.
Object publishing: DHT, with content-hash addressing, allows publishing objects in a distributed way.
Anyone can publish an object by simply adding its key to the DHT, adding themselves as a peer, and giving other users the object’s path.
New versions hash differently and thus are new objects. Tracking versions is the job of additional versioning objects.