«Open Source" and "decentralization" are two main labels that provide a good first impression of the project. Often popular projects are disingenuous on these topics, or completely mislead their users, being proprietary and partially centralized. For example, semi-open Telegram and very noticeably centralized TOR.
What do we know about I2P? I propose to understand in detail what ensures the connectivity and performance of a completely decentralized network: Floodfiles are a kind of message boards.
Introduction to the topic
Let's start with the fact that I2P intranet names do not correspond in any way to any IP addresses, as happens on the traditional Internet. Consequently, there is no routing across networks and subnets, which is provided by the IP concept. Moreover, the status of the network, which is directly related to anonymity, must be based on something. Agree, the question here is clearly not about optimally short paths from point to point, but rather about something the opposite: unpredictable routes and a ton of tricky cryptography.
Identifiers of intranet resources, for example, website addresses, do not contain any information about the physical location of the server. A traditional address ending in “.b32.i2p” is just a SHA256 hash of the full address, which includes a set of cryptographic keys. Full address in conjunction with incoming information tunnels called a LeaseSet, which is necessary to establish communication with the hidden service.
It is known that the hash function is irreversible, i.e. From the hash value it is impossible to reconstruct the original data from which the unique string was taken. Understanding this leads to the logical conclusion that the received b32 address must be resolved in much the same way as what happens on a regular network, when the IP address of the target resource is determined by a domain name: the information is returned by the DNS server in response to a user request for a specific name. In I2P, instead of pre-known DNS servers, floodfills are used - message boards where hidden resources publish information about themselves so that they can be accessed.
There are several fundamental differences from the traditional domain name system:
In I2P there are no people responsible for registering (not to be confused with the binding of short addresses “.i2p”, since they, in simple terms, are bound for free to an already existing long address “.b32.i2p”»);
It is fundamentally wrong to constantly store the LeaseSet in one place, because... any of the floodfils could be malicious. Placing information about a LeaseSet should be unpredictable for floodphiles, but logical for those who are looking for it;
Lisset contains not only the full resource address, consisting of cryptographic keys, but also information about incoming tunnels. All tunnels only last 10 minutes, so the LeaseSet must be updated regularly to ensure the resource remains accessible from the outside.
How a local I2P router finds a hidden service
The logic that allows a hidden resource (destination) to be published on an unpredictable floodfill (floodfill), and then another user to find this published LeaseSet is quite elegant in practice:
It takes the target address (for example, the one the user entered into the browser URI) and today's date. From this block of information, the SHA256 hash is derived. Then a flood file is searched in the router’s local database: all available ones are searched. On average, their number ranges from hundreds to a couple of thousand, depending on the operating conditions of a particular I2P router. To access a LeaseSet, use the one that, when performing an EXCLUSIVE OR operation with the block “target b32 address + today’s date,” will give the smallest value.
If the polled floodfill does not have the desired LeaseSet, it returns three floodfills from its list that it believes are the most suitable for the address we want to address. After polling additional floodfills, if unsuccessful, the address is considered unavailable.
Publishing a LeaseSet works according to a similar logic. The two most suitable floodfills are selected for publication. When receiving a new LeaseSet, each of the two floodfills communicates it to the three most suitable floodfills from their database. This is where the distribution ends..
The destination publishing its LeaseSet will check the quality of the publication. After sending the LeaseSet to floodfill, a response is expected. If there is no answer, the LeaseSet is sent to the next floodfil. If the publication is successful, the floodfill returns a list of its neighbors with whom it shared the LeaseSet. A control call is made to the named neighbors. The received LeaseSets are checked and, if they are all copies of what the initiator originally sent, the publication is considered complete. The LeaseSet is published every ten minutes as the tunnels become obsolete.
Where do floodfills come from?
Floodfil is a role that any router can take on. Naturally, such a router should be easily accessible to external requests: have an open working port and a white IP address. All necessary settings - one parameter in the configuration file: floodfill = true
.
Router address
We've sorted out the endpoint identifiers: these are entities without physical addresses that are located on unknown I2P routers. What about the addresses and identifiers of the routers themselves, especially since floodfills are also routers in their essence??
A home Internet router has an IP address, server applications on the regular Internet have an IP address and port.
I2P routers also have an address: this is a set of cryptographic public keys. Router addresses are similar to the addresses of hidden endpoints discussed above. A router address is a set of cryptographic keys - almost the same as the base64 address of an endpoint. For a short record, the SHA256 hash is used, which gives an output of 32 bytes. In fact, this is an analogue of a b32 address. To avoid confusion between router addresses and endpoint addresses, router idents are written in base64 encoding (first line in the screenshot).
Traffic between routers is encrypted at the level of network transport protocols: NTCP2 (crypto analogue of TCP) and SSU (crypto analogue of UDP). This hides absolutely all I2P traffic from an outside observer, such as a home Internet provider. At the same time, transport cryptography is not tied to the router keys. Router keys are used optionally depending on the type of traffic passing through. Delving deeper into the topic of different layers of encryption can cause cognitive impairment for an unprepared reader, so let's return closer to the topic.
The router address does not contain data about connecting to it, only public keys. This also happens with addresses of hidden services, where to access a resource you need a LeaseSet with incoming tunnels. Instead of a LeaseSet, the router has RI (router info) - a file that includes the full address and information about physical accessibility. This file is static, so there is no need to re-request it every ten minutes.
The resulting Router Info is stored in the local “netDb” directory. RI contains special flags (Router Caps) that indicate the router’s throughput and floodfiling status (or the absence of this status). In the future, the new router can be used as a transit node when building tunnels.
The LeaseSet always contains only the short identifier of the first router of the incoming tunnel (gateway). To contact this router, the last node of our outgoing tunnel resolves the received node address through the floodfill known to it and, if successful, begins transmitting information to the incoming tunnel of our recipient.
Moreover, when building a tunnel, the identifiers of the neighbor router are resolved in the same way by each tunnel participant if the required Router Info is not in their local storage. But this is again a slight digression from the topic of floodfills to tunnels.
Network Research
The term network research refers to the regular contact of each router with floodfills known to it in search of new routers. This process is often called network pattern expansion or probing..
The bottom line is this: a random value is sent, in response to which the floodfill sends the three closest in a mathematical sense to Router Info. Proximity is calculated using the logical operation "EXCLUSIVE OR". Essentially, this is the function of a standard call to floodfil in search of an address, only the address is generated randomly, so floodfil returns three neighbors that we could contact while continuing to search for the “address”».
Research calls are carried out through special tunnels that are undemanding in terms of information transfer speed. As a rule, if transit tunnels appear on a low-bandwidth router, they are exploratory.
Network exploration through tunnels (and not directly by router floodfiling) is a measure to combat Eclipse and Sybil attacks, which are aimed at isolating the user within the perimeter of malicious floodfillrs who can deliberately ignore requests from a particular router. Thanks to the chain of transit nodes in the research tunnel, the floodfillr does not know which router is actually requesting three new routers to expand its network pattern. I2P attack models are discussed in detail in article.
This is, in general terms, what a completely independent routing system for an anonymous I2P network, built on floodfills, looks like. At the same time, floodphiles are not controlled by ten trusted organizations, Comrade Major or Angela Merkel, but by thousands and thousands of volunteers.
A few words for floodfill holders
Floodfile mode increases the amount of resources consumed by the router. This must be taken into account on very weak devices such as single board computers. Or should it have been taken into account earlier... Thanks to the development of I2P in the form of the introduction of modern cryptography, the question is becoming so irrelevant that in the near future it may be completely omitted.
The fact is that all calls to the floodfill are encrypted with its asymmetric encryption key. In the old implementation this is the key ElGamal, which requires a lot of CPU time. In the current operating mode, elliptic curve cryptography is used (encryption type ECIES_X25519_AEAD, code designation 4
). As the network upgrades, old resource-intensive encryption is a thing of the past as the new type is used by default.
Already today there are successful examples of i2pd working in floodfill mode on OpenWRT routers. We’ll talk about the prospects of turning a coffee maker into an anonymous person in one of the following articles.