A proof of concept superoptimizer that considers all features of a modern processor.
This Wiki is for a research project that is developing a proof of concept superoptimizer, as the basis of a future more complete implementation that can be integrated with existing compilers and used in production environments.
The initial research project will run from 02nd June to 1st October 2014 and is being undertaken by Embecosm.
The goals of the project can be summarised as:
- Identify code that can be superoptimized (for speed, size or energy) focusing on finding awkward constructs
- Design a peephole optimization to integrate a superoptimizer into GCC and/or LLVM
- Hand-calculate benefit from superoptimization, comparing the different techniques
- Document the generalized process to demonstrate repeatability
Superoptimization is an idea which produces perfectly optimal code, in place of the code we currently have generated by compilers. This is typically done via a bruteforce search of every possible instruction sequence, checking whether it performs the desired actions and accepting the sequence if it is the optimal one.
The problem is clearly very difficult, exploding in size as the length of the instruction sequence increases. However, superoptimization gaining traction as a method of optimizing programs. In this feasibility study we hope to find which techniques are extensible enough to bring out of the academic community to be using in a commercial setting. This means that the superoptimizer needs to be able to handle all the corner cases that may arise in production code. If it turns out that superoptimization is not currently feasibly, by the end of the project we will have a research roadmap describing what will need to be done next.
Superoptimization was first conceived by Henry Massalin, where a bruteforce search was conducted through the arithmetic instructions of the Motorola 68000. The example given in the original paper is given below.
The function on the left is compiled, giving the piece of code on the right, with multiple branches and comparisons. This could perhaps be reduced by an expert assembly programmer, but was generally accepted as close to optimal. When the superoptimizer was run on this program, a much shorter version was found - 4 instructions long and with no branching.
On the left is the instruction sequence found by the superoptimizer (Motorola 68000), with the definitions of the instructions on the far right. In the middle are three cases, computed step by step to illustrate how the code sequence works. How to instructions compute their result is not obvious exploiting the carry flag ('x') in combination with arithmetic instructions.
This research is supported by the UK Technology Strategy Board.
Any enquiries should be directed in the first instance to firstname.lastname@example.org.