Not so long ago, I wanted to make an algorithm run on a particular GPU. So I spent about a month setting up a special C++ compiler, tuning it up with its special flags, debugging, and optimizing the GPU code. I was very pleased in the end since not only I managed to make it work, but the resulting implementation, due to all the optimizations, boiled down to ~400 lines of GPU’s native machine code.
Except, when you look at it from another angle, if I were writing in assembly from the start, 400 lines of code in one month doesn’t seem like a noticeable achievement.
We used to write GPU shaders in assembly not so long ago. In 2006, I was translating some from assembly to GLSL for maintainability reasons. Maintainability, and rapid development in general, was the king back then. Now, however, with cloud-based SaaS getting its popularity, the focus shifts from rapid development to optimization.
Desktop software brings money from sales, and software sales because of its features. You do more features in a unit of time — you make more money.
Software as a service is slightly different. You charge for the service, but you also pay for the run time. The faster your software runs, the more you save. The more you save — the cheaper you can sell your service while maintaining a profit. The cheaper your service is — the more clients you attract. The fastest code wins the market.
Compilers were made to automate tedious assembly programming. To make development faster. Not the code. Sure, compilers can do a lot of optimizations by themselves, but it’s more or less auxiliary to their main function which is saving cost on coding.
As the market model changes, so change the priorities. As far as it goes, it wouldn’t take long before we start translating GLSL shaders back to the assembly.
Unless something new replaces compilers as a technology.
I think, not only C++ in particular, but optimizing compilers in general overstayed their welcome. They come from the 1950s and although every particular compiler advanced greatly over the years, the core of the technology remained mostly unchallenged.
But what exactly would we want to challenge? What is a compiler?
I’d like to propose a mental model. It might feel alien at first but hear me out.
An optimizing compiler is a constrained minimization problem solver.
The target function, the thing we want to minimize is the run time of the resulting machine code.
The constraints are our source code. We used to see it as text, as some language construct. But essentially it’s just a set of constraints. We don’t care about how exactly it is executed, how exactly the machine code will look, we only want that, if there is 2+2 in the code, there will be 4 in the output.
And the mathematical space, where we’re looking for the minimum, is the space of all the possible valid machine code.
Of course, you can’t just compile a program by running some gradient descent algorithm in this space. The constraints are too many, the space is too large, none of the classical minimization algorithms are realistic in this space. So compilers use a special strategy to keep within constraints. They translate code from one representation to another and only “travel” the optimization space sparingly within the current representation.
This “travel” means that in every given representation, for instance, one of GCC intermediate languages, or LLVM intermediate representation, or even the machine code at the back end, the code is altered to run faster while maintaining some level of semantic equivalency.
E. g. Intel Atom Z has a very small fetch frame. It’s only 8 bytes. This means that given the average size of Intel instruction, it can’t normally take more than one instruction a fetch. GCC knows that. It rearranges instructions so the longer ones are followed with the shortest ones and now more pairs are fetched together. The program is semantically the same but it runs faster.
Sometimes you have to sacrifice some of the equivalency for performance. Arithmetic optimizations in floating-point numbers are such an example. Floating-point numbers are not strictly associative meaning that
(a+b)+c ≠ a+(b+c). They are “almost associative” but due to the way they manage the computational error, they aren’t. So if you want to rearrange your instructions to make code faster, you have to give up some of the equivalency strictness.
Also, with compilers, all the optimizations they do are programmed by their programmers. It’s essentially the refined wisdom of the crowd. People think of smart ways to make code faster, and these ways end up in compilers’ code.
Sometimes, they surpass any expectations. For instance, clang can rearrange array traversal into a binary search by itself!
But in the end, all that any specific optimizing compiler does is programmed by people. There is no intelligence in compilers.
- Compilers don’t really optimize the run time of your code, they only apply heuristics that typically lead to optimization. Some compilers allow profile-driven optimization which is a great step forward, but it’s not an inherent trait of compilation.
- Compilers presume the level of deviation acceptable for semantic equivalency. Again, this is a known problem and there are compilers’ flags like
--unsafe-math-optimizations. But there is no way to set possible deviations explicitly. “I want this formula to be precise up to
e-5, and that — to
e-12” — you can’t do that.
- Compilers don’t learn. They only get better when people learn how to make them better. So not really fast. This is a huge problem since now no vendor can’t even improve their hardware drastically. Any significant change in the instruction set will require at least the back-end to be re-engineered completely, and any change beyond that, like introducing a new type of co-processor, may even require a completely new compiler.
The thing that will eventually replace compilers should:
- Require a target profile: a set of performance tests that unambiguously establish the target function.
- Have explicit precision constraints. How accurate you want your computation to be or, in other words, how many meaningful bits are you willing to sacrifice for performance.
- Learn by itself.
Also, given the current technological landscape, the last requirement almost certainly means that this should also be a cloud service. Should it come along with the development tools is irrelevant. The language is irrelevant. What is important, this self-learning code generator really benefits from hoarding all the knowledge it gets in one instance.
So the thing that will replace C++ will:
- be a cloud-based service;
- optimizing your code to run specific scenarios faster;
- guided by the explicit precision constraints;
- using lessons from all the other builds it did before.
There is only one minor problem. It doesn’t have a name yet. How would you name this thing?