Marginally Interesting - The Newsletter - Issue #5
At this year's Google IO, they announced TPU v4, doubling the computing power of TPU v3, and my immediate reaction was "who is going to need this?" Google has been churning out a string of papers the past years with ever increasing amounts of parameters in their models which are already in the millions.
MIT has started a meta-study to extrapolate how much further improvements via ML are going to cost (link below) and they are predicting that we are getting into the regime of diminishing returns. The amount of computing power required to get a couple more percentage points on tasks like object recognition will soon become prohibitive.
The computational limits of deep learning | MIT CSAIL — www.csail.mit.edu A new project led by MIT researchers argues that deep learning is reaching its computational limits, which they say will result in one of two outcomes: deep learning being forced towards less computationally-intensive methods of improvement, or else machine learning being pushed towards techniques that are more computationally-efficient than deep learning.
The story that Google and other big companies are telling is pretty compelling: "Look, we're getting closer and closer to creating real intelligence! And look, we've upgraded our cloud offerings so that even you can run those models in the cloud. For a small fee, of course!"
I'm wondering, is all that computing power really necessary? And if not, would we know? Big companies like Google have started to dominate deep learning research the past years (see this study of big AI conferences in 2020). Increasingly it seems only they have the money, data, and computing power to push the envelope.
Can we really count on Google to also explore other, more data and energy efficient approaches? From a point of view of scientific progress, yes. But from a business point of view, I have my doubts. The car industry also only started looking into more efficient engines after the oil prices got too high and the public got more environmentally conscious.
Then there is the question how many people really need these kinds of super advanced deep learning models. For many business applications classical methods like linear models and decision trees and variants are still all you need.
Large scale learning used to mean something else. It was more about finding very efficient optimization methods like coordinate descent for methods like logistic regression, or approximative second order methods. Big Data changed that, and then later GPUs. Admittedly, the kind of optimization methods probably wouldn't work for the complex deep learning methods we have today. And it is much easier to mechanically scale out and accelerate a computation with hardware instead of trying to look for algorithmic and computational tricks. Still, scaling out is not always the most effective solution, but that art seems to be lost to the mainstream.
Scientific progress is one thing, but increasingly with AI we're in a world where research and business goals are closely intertwined.
There is ongoing research, also within Google, to make neural network training more effective. For example this paper by Mingxing Tan and Quoc V. Le on EfficientNetV2 discusses in depth different strategies like adjusting image size, and more efficient convolutional layer architectures to improve training times fourfold.
More recently Lee-Thorp et al. have proposed a variant of the Transformer architecture that replaces a part with fast Fourier transform, running seven times faster in training on GPUs.
I personally don't have an answer what kinds of new approaches we might find that are less insane in terms of resource usage, but till we have those, I guess we'll continue to hand over vast amounts of money to cloud providers for that sweet GPU computing time.
That's all I wanted to say :)
Hope you liked this slightly more opinionated edition, have a nice weeken!