August 18, 2010 4 Comments
This August marks the 9th year of the development of UDT. The project originated from SABUL (Simple Available Bandwidth Utilization Library) that I worked together with Xinwei Hong (now at Microsoft) between 2001 and 2003. Around year 2000, researchers had noticed that stock TCP (newReno) was not efficient for the wide spreading OC-12 and 1GE networks connecting research labs around the world. SABUL was one of the first research projects to resolve the problem.
SABUL used UDP to transfer large data blocks and TCP to transfer control packets (e.g., for reliability). It ran very well for our own applications on private networks, but there were three areas that would require (significant) future work. The congestion control algorithm was not suitable for shared networks. The use of TCP for its control channel limited the protocol’s design choices. And the API was not friendly to generic application development.
In the second half of 2002, I started to design a new protocol to remove these limitations. This protocol was later named as UDT by Bob because it is completed based on top of UDP and it uses single UDP socket for both data and control packets. The first version of UDT was out in early 2003. Compared to SABUL, UDT provides streaming style API so that it can simulate the TCP socket semantics, which is an important step to gain a large user community.
I spent about one year to investigate the congestion control algorithm. Having considered many approaches, I chose to modify the traditional loss-based AIMD algorithm, which had worked stably with TCP. Delay-based approaches have several attractions, especially that they are less affected by non-congestion related packet loss, but they face a fundamental problem, which is to learn the “base” delay value. An inaccurate “base” value can make a delay-based algorithm either too aggressive if the base is overestimated or too “friendly” (co-existing flows may simply kill it) otherwise.
UDT uses packet-pair to estimate the available bandwidth so that it can rapidly explore the large bandwidth, while it can still share it fairly and friendly with other flows.
In addition to its native congestion algorithm, I have also implemented most of the major congestion control algorithms available by 2005. To this end, I have made UDT a composable framework so that a new control algorithm can be easily implemented by overriding several call back functions. Due to the nature of the feedback delay and unknown coexisting flows, there is no “perfect” control algorithm. Each algorithm may work well in some situation but may behave poorly in others. UDT can be quickly customized to suit a specific environment.
By 2005, UDT3 already became production ready and had a large user community. As the user community grows, people had started to use UDT in commodity networks with relative small bandwidth (e.g., Cable, DSL, etc.). There was an important feature of UDP that accelerated this change: it is much easier to use UDP to punch a firewall than using TCP and UDT is completed based on UDP.
This is a completely different use scenario compared to the original design goal. While UDT can easily scale to inter-continental multi-10GE networks, it did not scale well to high concurrency. This motivated the birth of UDT4 in 2007.
UDT4 introduces UDP multiplexing and buffer sharing to allow an application to start a very large number of UDT connections. UDP multiplexing makes it much easier to support firewall management and NAT punching because a single UDP socket can carry multiple UDT connections. Buffer sharing significantly reduced the memory usage. Today UDT can efficiently support 100,000 concurrent connections on a commodity computer. The scalability will be further improved when the epoll API and session multiplexing over UDT connections are completed.
Over these years, I have received numerous useful feedbacks and learned a lot from the users. In particular, many features were motivated by users’ requirements. Several users have even developed their own UDT implementation, while some others created and shared wrappers for non-C++ programming languages.
UDT also benefits a lot from the open source approach, which helps UDT to reach to a greater community. Users can do code review, debug and submit bug fixes whenever necessary, which greatly increases the code quality.
At a user space protocol, UDT is able to include related new technologies from computer networking and adapt to new network environments and use scenarios. I am confident that UDT will continue to evolve and serve the data transfer jobs in more and more applications.