14 terabytes per second: Google lifts the veil of secrecy over Effingo

Friend

Professional
Messages
2,653
Reaction score
852
Points
113
Effingo delivers unprecedented data transfer speeds, changing the rules of the game in information management.

Google has revealed the technical details of its internal data transfer tool called Effingo, which moves an average of 1.2 exabytes of information daily.

At the SIGCOMM 2024 conference in Sydney , a report was presented explaining that bandwidth limitations and the constant speed of light force Google to duplicate data so that it is closer to the place where it is processed or provided. The Effingo tool reduces network latency from hundreds of milliseconds to tens of milliseconds on a continent.

Conventional data transfer tools either optimize transfer times or process point-to – point data streams, but they can't handle the amount of data that Effingo moves daily-14 terabytes per second. Effingo takes into account the importance of tasks, providing the necessary resources to perform priority tasks, such as disaster recovery compared to planned data migration.

Effingo is optimized to use the Colossus file system developed by Google and deployed in clusters consisting of thousands of machines. Each cluster has the Effingo software installed, which consists of the control and transport planes. The control plane manages the copy lifecycle, and the transport plane transmits data and monitors the status. The transport plane consumes 99% of the CPU, but consists of less than 7% lines of code.

Each cluster is connected to other clusters via low-latency, high-bandwidth networks, or via WAN connections that use Google and third-party infrastructure. The Bandwidth Enforcer (BWe) tool, also developed by Google, distributes bandwidth based on service priorities and the value of added bandwidth.

When the user initiates a data transfer, Effingo requests traffic allocation from BWe and starts transferring data as quickly as possible. This allocation can occur on the basis of pre-defined quotas, using metrics of throughput and available Effingo work resources, which perform data movement tasks in the form of "Borg" tasks (Google's containerization platform, from which Kubernetes was allocated).

Effingo can use best-attempt resources for less critical tasks and request quotas for tasks that require a certain amount of network performance. Quotas are allocated months in advance, and Effingo is one of many resources in the central planning system. Unused quotas are reallocated, but they can be quickly returned if necessary.

Despite all the efforts to allocate resources, the average global Effingo queue size is 12 million files, which is equivalent to about eight petabytes. At peak times, queues increase by 12 petabytes and nine million files when the top 10 users initiate new transfers.

Google plans to improve the integration of Effingo with resource management systems and optimize CPU usage for inter-cluster transfers. Improvements are also planned to scale data transfers faster.

Source
 
Top