A team of researchers at Stanford University has published a paper describing a novel training methodology that reduces the energy consumption of large language model training by up to 60%. The technique, called Sparse Gradient Cascading, selectively activates portions of the neural network during training rather than updating all parameters simultaneously.
The approach was validated by training a 70-billion-parameter model using only 40% of the compute resources typically required. The resulting model matched the performance of conventionally trained equivalents on standard benchmarks, with no measurable degradation in quality.