Nine Years of UDT

This August marks the 9th year of the development of UDT. The project originated from SABUL (Simple Available Bandwidth Utilization Library)  that I worked together with Xinwei Hong (now at Microsoft) between 2001 and 2003. Around year 2000, researchers had noticed that stock TCP (newReno) was not efficient for the wide spreading OC-12 and  1GE networks connecting research labs around the world. SABUL was one of the first research projects to resolve the problem.

SABUL used UDP to transfer large data blocks and TCP to transfer control packets (e.g., for reliability). It ran very well for our own applications on private networks, but there were three areas that would require (significant)  future work. The congestion control algorithm was not suitable for shared networks. The use of TCP for its control channel limited the protocol’s design choices. And the API was not friendly to generic application development.

In the second half of 2002, I started to design a new protocol to remove these limitations. This protocol was later named as UDT by Bob because it is completed based on top of UDP and it uses single UDP socket for both data and control packets. The first version of UDT was out in early 2003. Compared to SABUL, UDT provides streaming style API so that it can simulate the TCP socket semantics, which is an important step to gain a large user community.

I spent about one year to investigate the congestion control algorithm. Having considered many approaches, I chose to modify the traditional loss-based AIMD algorithm, which had worked stably with TCP. Delay-based approaches have several attractions, especially that they are less affected by non-congestion related packet loss, but they face a fundamental problem, which is to learn the “base” delay value. An inaccurate “base” value can make a delay-based algorithm either too aggressive  if the base is overestimated or too “friendly” (co-existing flows may simply kill it) otherwise.

UDT uses packet-pair to estimate the available bandwidth so that it can rapidly explore the large bandwidth, while it can still share it fairly and friendly with other flows.

In addition to its native congestion algorithm, I have also implemented most of the major congestion control algorithms available by 2005. To this end, I have made UDT a composable framework so that a new control algorithm can be easily implemented by overriding several call back functions.  Due to the nature of the feedback delay and unknown coexisting flows, there is no “perfect” control algorithm. Each algorithm may work well in some situation but may behave poorly in others. UDT can be quickly customized to suit a specific environment.

By 2005, UDT3 already became production ready and had a large user community. As the user community grows, people had started to use UDT in commodity networks with relative small bandwidth (e.g., Cable, DSL, etc.). There was an important feature of UDP that accelerated this change: it is much easier to use UDP to punch a firewall than using TCP and UDT is completed based on UDP.

This is a completely different use scenario compared to the original design goal. While UDT can easily scale to inter-continental multi-10GE networks, it did not scale well to high concurrency. This motivated the  birth of UDT4 in 2007.

UDT4 introduces UDP multiplexing and buffer sharing to allow an application to start a very large number of UDT connections. UDP multiplexing makes it much easier to support firewall management and NAT punching because a single UDP socket can carry multiple UDT connections. Buffer sharing significantly reduced the memory usage. Today UDT can efficiently support 100,000 concurrent connections on a commodity computer. The scalability will be further improved when the epoll API and session multiplexing over UDT connections are completed.

Over these years, I have received numerous useful feedbacks and learned a lot from the users. In particular, many features were motivated by users’ requirements. Several users have even developed their own UDT implementation, while some others created and shared wrappers for non-C++ programming languages.

UDT also benefits a lot from the open source approach, which helps UDT to reach to a greater community. Users can do code review, debug and submit bug fixes whenever necessary, which greatly increases the code quality.

At a user space protocol, UDT is able to include related new technologies from computer networking and adapt to new network environments and use scenarios. I am confident that UDT will continue to evolve and serve the data transfer jobs in more and more applications.

Advertisements

A High Performance Data Distribution and Sharing Solution with Sector

Over the years, many users have used UDT to power their high speed data transfer applications or tools.  Today, with Sector, we can provide an advanced data distribution and sharing application, in addition to the UDT library. This solution currently works on Linux only, but we will port it to Windows in the near future, first the client side, then the server side.

Here are the several simple steps you can follow to set up a free, open source, advanced, high performance, and simple-to-use data distribution and sharing platform:

1. Download Sector from here, compile and configure the software following the manual.

2. Set up a security server, which allows you control the data access permission, including user accounts, passwords, IP access control list, etc. You can also set up an anonymous account for your public data.

3. Set up one or several Sector master servers, which can be on the same computer that hosts the security server and the data (slave server).

4. Set up Sector Slave servers on the computers that host your data. Unlike FTP or most commercial applications that supports only a single server, Sector allows you to install the servers on multiple computers, even 1000s of them, and yet provides a uniform namespace for the complete system.

5. Install client software on your users’ computers and mount the Sector file system as a local file directory using the Sector-FUSE module. Your users can browse and access the data in Sector just as browsing and accessing data on a local directory by using file system commands “ls”, “cp”, etc. They will not feel the existence of Sector, although this “local directory” can actually run on 1000 servers across multiple continents!

All the data transfer occurring between the clients and the slave servers are on top of UDT. Therefore, data transfer throughput can be guaranteed even over wide area networks.

If you have any questions, please post them on the Sector project forum.