Compressing Gradients for Memory-Efficient Training

Published On: July 31, 2025 4:00 am

---Advertisement---

[Submitted on 13 Jan 2025 (v1), last revised 30 Jul 2025 (this version, v3)]

View a PDF of the paper titled Wavelet Meets Adam: Compressing Gradients for Memory-Efficient Training, by Ziqing Wen and 7 other authors

View PDF
HTML (experimental)

Abstract:Large language models (LLMs) have shown impressive performance across a range of natural language processing tasks. However, their vast number of parameters introduces significant memory challenges during training, particularly when using memory-intensive optimizers like Adam. Existing memory-efficient algorithms often rely on techniques such as singular value decomposition projection or weight freezing. While these approaches help alleviate memory constraints, they generally produce suboptimal results compared to full-rank updates. In this paper, we investigate the memory-efficient method beyond low-rank training, proposing a novel solution called Gradient Wavelet Transform (GWT), which applies wavelet transforms to gradients in order to significantly reduce the memory requirements for maintaining optimizer states. We demonstrate that GWT can be seamlessly integrated with memory-intensive optimizers, enabling efficient training without sacrificing performance. Through extensive experiments on both pre-training and fine-tuning tasks, we show that GWT achieves state-of-the-art performance compared with advanced memory-efficient optimizers and full-rank approaches in terms of both memory usage and training performance.

Submission history

From: Ziqing Wen [view email]
[v1]
Mon, 13 Jan 2025 11:35:09 UTC (322 KB)
[v2]
Tue, 29 Jul 2025 09:24:44 UTC (216 KB)
[v3]
Wed, 30 Jul 2025 01:07:39 UTC (216 KB)

Compressing Gradients for Memory-Efficient Training

Submission history

Join WhatsApp

Latest Stories

Leave a Comment Cancel reply

Latest News

Tesla partly liable in Florida Autopilot trial, jury awards $329M in damages

ChatGPT users shocked to learn their chats were in Google search results

New World Record Alert: Weather Satellite Records Longest Lightning Flash of 515 Miles

New Rogue Planet Discovered in Hubble Data Using Einstein’s Gravity Theory

WIRED Roundup: ChatGPT Goes Full Demon Mode

Itch.io Is Restoring NSFW Games—as Long as They’re Free

Categories

Quakes Links

Follow Us On