Carding 4 Carders
Professional
The researchers presented new methods for accelerating the processing of sparse tensors for large-scale AI models.
Scientists from MIT and NVIDIA have developed two techniques that speed up the processing of sparse tensors, data structures used for high-performance computing. These techniques can significantly improve the performance and energy efficiency of systems such as large-scale machine learning models used in generative artificial intelligence.
Tensors are data structures that use machine learning models. Both new methods are aimed at efficient use of sparsity in tensors, allowing you to skip zero values and save computing resources and memory. However, using sparsity is not without its problems. For example, determining nonzero values in large tensors is a difficult task.
Researchers from MIT and NVIDIA have proposed two solutions. The former allows the hardware to efficiently find non-zero values for various sparsity patterns. The second solution increases storage buffer usage and reduces external memory traffic.
One of the developed accelerators, HighLight, can handle various sparsity patterns and work efficiently even with models without zero values. The researchers used "hierarchical structured sparsity" to represent various patterns of sparsity.
Another approach, called "Tailors" and "Swiftiles", allows you to efficiently "rebook" data to speed up workloads. This method quickly estimates the ideal data block size, saving computing resources. The combination of these methods doubles the speed and reduces power consumption by half compared to existing accelerators.
"Swiftiles allows us to estimate what the size of these blocks should be without having to repeatedly refine the estimate. This is made possible by supporting rebooking," says Xue, one of the authors of the development.
In the future, the researchers plan to apply the idea of rebooking to other aspects of computer architecture and improve the process of evaluating the optimal level of rebooking.
Scientists from MIT and NVIDIA have developed two techniques that speed up the processing of sparse tensors, data structures used for high-performance computing. These techniques can significantly improve the performance and energy efficiency of systems such as large-scale machine learning models used in generative artificial intelligence.
Tensors are data structures that use machine learning models. Both new methods are aimed at efficient use of sparsity in tensors, allowing you to skip zero values and save computing resources and memory. However, using sparsity is not without its problems. For example, determining nonzero values in large tensors is a difficult task.
Researchers from MIT and NVIDIA have proposed two solutions. The former allows the hardware to efficiently find non-zero values for various sparsity patterns. The second solution increases storage buffer usage and reduces external memory traffic.
One of the developed accelerators, HighLight, can handle various sparsity patterns and work efficiently even with models without zero values. The researchers used "hierarchical structured sparsity" to represent various patterns of sparsity.
Another approach, called "Tailors" and "Swiftiles", allows you to efficiently "rebook" data to speed up workloads. This method quickly estimates the ideal data block size, saving computing resources. The combination of these methods doubles the speed and reduces power consumption by half compared to existing accelerators.
"Swiftiles allows us to estimate what the size of these blocks should be without having to repeatedly refine the estimate. This is made possible by supporting rebooking," says Xue, one of the authors of the development.
In the future, the researchers plan to apply the idea of rebooking to other aspects of computer architecture and improve the process of evaluating the optimal level of rebooking.