BitTorrent with dignity (aka privacy)

Posted on 2023-02-26

This is a Work In Progress.

I'm working on a rough software spec for a new privacy application, and I need help from other more experieinced privacy and software designers. This should be considered a living, dynamic document until a formal spec can be agreed upon. All of these ideas are rough and most likely will change.

The big idea

A torified BitTorrent application written in Rust that will only work with itself while enhacing the Tor network. The app should be modular to support other transport types, such as mixnets, but initially will be designed to leverage well known network tech such as Tor.

The goal is not to be interoperable with legacy BitTorrent over the clear-web. Those platforms are negligent in protecting users and must be abandoned. The design choices of this app aim to solve easy-to-guess problems, so please read all of it.

Every node on the network must be:

a BitTorrrent client/server
a tracker
a tor middle relay

1. a BitTorrrent client/server

Onion services - "arti" (yes, i know it's not prod ready yet, that's fine) UDP to HTTP/2 conversion - rust crate "hyper" Message parsing - rust crate "serde" Piece management & peer selection - rust crate "rust-torrent"

To use Arti with Hyper and Rust-Torrent, a custom transport for Hyper that uses Arti to handle the underlying network connections would be needed. This might be done by implementing the "hyper::client::connect" trait for Arti, which could allow using Arti as the underlying network transport for Hyper's client. Serde could be responsible for parsing and serializing the data exchanged between Rust-torrent and Hyper. It uses a defined data format (such as JSON or Binary) to ensure that data is structured correctly and can be understood by other nodes in the network.

In the context of using serde in the Rust-Torrent and Hyper libraries, data is being serialized and deserialized between the two libraries to facilitate communication between the different layers of the application stack. Specifically, Rust-Torrent deals with the Bittorrent protocol, which defines a set of messages that are sent between peers participating in a swarm. These messages contain information about which pieces of a shared file are available, which pieces are still needed, and so on. When these messages are passed between Rust-Torrent and Hyper, they need to be serialized from Rust-Torrent's internal representation into a format that can be sent over the network, and then deserialized back into Rust-Torrent's internal representation on the receiving end. Serde provides a convenient and efficient way to do this serialization and deserialization, and thus acts as a bridge between Rust-Torrent and Hyper.

For Rust-Torrent, we would also need to modify the networking code to use Arti for making connections to other peers in the BitTorrent network. This could involve creating a custom networking layer that uses Arti to establish connections with other peers and handle incoming data. Overall, using Arti with Hyper and Rust-Torrent would require some significant modifications to both libraries, but it is certainly possible to make it work.

The peer selection algorithm in Rust-Torrent is responsible for selecting which peers to connect to based on a variety of factors, such as availability, download speed, and number of active connections. It is an important part of the overall performance and efficiency of the client, as the selection of good peers can significantly improve the speed and reliability of the downloads.

Multiplexing onions

Due to the network performance limitations of data passing through 6 different, globally distributed ISPs (a standard tor onion onion circuit), multiplexing download/upload streams seems prudent. This has additional benefits of distributing data via incresingly greater data paths around the world, making it significantly harder to perform network analysis to deanonymize users. Onion services can be created dynamically and automatically depending on a number of factors, including network performance and file size. By default, the number of streams should be two. To generate multiple Tor onion services and multiplex network streams, custom code would need to be created. This code would need to interface with several aspects of the application, including:

Arti: The code would need to interface with Arti, the Rust implementation of Tor, to generate multiple Tor onion services (per file? per GB?). This would involve using the Arti API to create and manage Tor circuits and onion services.
Hyper: The code would also need to interface with Hyper, the HTTP library for Rust, to multiplex network streams. This would involve using the Hyper API to manage HTTP/2 streams, which would allow multiple concurrent requests and responses to be sent over a single connection, but should be able to scale up across multiple onions.
Rust-torrent: The code would also need to interface with Rust-torrent, the Rust implementation of the BitTorrent protocol. This would involve using the Rust-torrent API to manage peer connections and piece selection, as well as to handle the actual data transfer over the network.
Serde: Finally, the code would need to use Serde, the Rust library for serializing and deserializing data, to encode and decode messages between peers. This would be necessary to communicate information about available pieces and to negotiate the transfer of data between peers.

Overall, the custom code would need to coordinate these various components to ensure that multiple concurrent transfers were taking place over multiple Tor onion services and that data is being efficiently multiplexed to maximize download and upload performance.

2. a tracker

To make the tracker functionality of BitTorrent distributed, an existing tracker application (written in rust) would need to be forked and modified to run as a distributed system. In an absolutely distributed model, the tracker must exist on all nodes in the network automatically. This approach has several advantages over traditional centralized tracker systems, including increased resilience, scalability, privacy, and plausible deniability. Making every node a tracker also makes it easy to self-host files without needing someone else's tracker. The reverse-proxy aspect of tor onion services makes this trivial from any network, even behind NAT.

A modified version of the application Torrust Tracker could be used. To make is distributed, the following modifications might need to be made:

Peer-to-peer communication: Torrust Tracker would need to be modified to support peer-to-peer communication between all nodes in the network. This would involve implementing a distributed messaging protocol that allows nodes to communicate with each other directly, without the need for a central server.
Distributed data storage: The tracker would need to be modified to store its data in a distributed manner, such as in a distributed hash table (DHT). A DHT is a decentralized system for storing and retrieving key-value pairs that can be used to store information about the torrents being shared on the network.
Load balancing: In a distributed system, load balancing becomes important to ensure that no single node becomes overloaded with requests. To achieve this, Torrust Tracker would need to be modified to distribute incoming requests across all the nodes in the network, using a load balancing algorithm.
Fault tolerance: To ensure that the tracker remains available in the event of node failures, it would need to be modified to handle node failures gracefully, by redistributing the workload across the remaining nodes in the network.
Security: To ensure the security and privacy of the users on the network, the tracker would need to be modified to run over the Tor network using Arti, providing end-to-end encryption and anonymization.

Every node on the network is a tracker. In addition, every tracker can choose to become a mirror for any other tracker (which doesn't mean it copies all the data, just the metadata). Becoming a tracker mirror should be as simple as copying and pasting the tor onion address of the tracker, which is the only identifier of a node. Therefore, every node operator can run multiple instances and trivially copy over tracker data. This way, when an operator needs to restart hardware or software, they can leave one instance online so that related tracker data is still accessible to the rest of the network. If the original tracker does not ever come back online, that is not a problem. Copying tracker data is a one-time event (full backup), and an operator can choose to automatically keep the tracker data up to date, or to do it manually. But each copy of a tracker becomes its own net-new onion service. Even though becoming a tracker mirror is a one-time event, that does not apply to keeping track of the peers that have copies of the file data related to the tracker data. Address data must be shared synchronously in near real-time between all peers that share tracker data and file data.

3. a tor middle relay

Classically, with BitTorrrent, the share ratio is what determines how much someone can download. In this torified version of a BitTorrrent application, the share ratio needs to be pre-determined by how much tor middle relay traffic they provided to the network. The Tor network has limited bandwidth and resources, and using it for high-volume file sharing could negatively impact the network's performance. By using middle relays as a measure of contribution, users would be incentivized to provide resources to the network without overburdening it. Determining a fair and effective share ratio based on Tor middle relay traffic could be challenging and would require careful consideration and testing. Remember that tor onion services only utilize middle relays, not exit relays. So substantially increasing the side of the network with thousands of new middle relays of this type would not affect, and would not contribute to, exit relaying.

How does it all get started?

UI

The app depends on a web browser as an app interface.

First use

First, once launched, the app (node):

Allows the user to see configuration, search, file management, and share management (via http://127.0.0.1:port) or onion URI (http://v3onion.onion:port/manaement/token).
Becomes a Tor middle relay
Displays basic statistics and log output from various Tor services.

A node operator must know at least one other existing, online node (from friends, from a trusted clear-web website, from Reddit, etc). Every node that a user manually adds is considered a trusted node. Once a trusted node is added to the app, the user's local app will use Tor onion services to do two things:

Check against the trusted node to see if its software version is newer. If newer, the local app will automatically track (tracker, aka redistribute), download, and seed the newer software. It will not automatically install, the user must initiate the update, and give the user the ability to manually validate checksums against the app maintainers website.
Tracker data from the trusted node becomes searchable upon connecting to a node but it does not become a mirror of tracker data.

Web of trust

Connecting to a trusted node (one-degree of separation) can provide further access to the trusted nodes of the trusted node that the user connected to (two-degrees of separation). However, trust only works for one-degree of separation by default in order to minimize local performance issues. After adding a first node to trust (1deg), the user adding the node to trust can opt-in to adding up to N-degrees of separation for node trust. In other words: if a trusted node (1deg) trusts two nodes (2deg), the app user will in effect trust three total nodes. If the user allows up to three-degrees of separation for trust and the two (2deg) trusted nodes all trust two nodes (3deg), the user will in effect trust seven nodes (1 + 2 + 4). If any of those degrees-of-separation trusts 10,000 nodes, you can see how that might quickly overwhelm the user's local app, and is why they need to be careful about adding nodes to trust based on their hardware, software, and network limitations. Limiting searchable and shareable access to nodes via delegated trust also helps keep the network somewhat flat (prevents extreme bloating), while still allowing users to easily accses and share.

Leaching

A user wishing to download file data first requires mirroring the tracker data of the file a user wishes to download, further enhancing the distribution of tracker data. Once the tracker data is 100% mirrored, the source node then adds the onion service of user to their tracker table, and all nodes that trust and mirror that node then become aware of this new node and what files it is offering, but it is not trusted by any node.

Seeding

Something

UX

From the UI:

Users (node operators) can point the app to any local file or folder that they wish to share.
The applicaapption will automatically generate a tracker file for the user that gets self-hosted.
The user will be required to input information about the file(s) they are about to share. Here is where there should be additional user education about not deanonymizing one's self, if applicable.
After confirming the files to be shared, the app will automatically make the tracker data and file data ready to be shared (as a private tracker by default).

Private trackers

Being able to keep shared data limited (not publicly shared) is an important feature. By default, data that is ready to be shared will only be privately available. Meaning:

the tracker for this data will effectively be a private tracker and a random onion URI (http://v3onion.onion:port/private/token) will be generated exclusively for this tracker and file.
in order to share access to this file in its default state, a user must share the onion URI out-of-band from the application.

Public trackers

Being able to trivially share data with the whole world is also an important feature. Once data has been made available for private sharing, a user can opt-in to making it publicly available.

With a single click, a user can convert something from a dedicated private tracker into a public share via a new onion URI (http://v3onion.onion:port/public/token).
Sharing this onion URI with anyone, or with the general public, will allow any app user to access this user's public trackers and any publicly shared data.

Table of Contents