http://superoptimization.org/w/api.php?action=feedcontributions&user=James&feedformat=atomSuperoptimization - User contributions [en]2024-03-29T00:38:01ZUser contributionsMediaWiki 1.23.0http://superoptimization.org/wiki/Work_Package_3Work Package 32014-10-02T16:04:17Z<p>James: /* Deliverables */</p>
<hr />
<div>== Work package summary ==<br />
<br />
{|<br />
|-<br />
| ''Start date'': || 21 July 2014<br />
|-<br />
| ''End date:'' || 15 August 2014<br />
|}<br />
<br />
Revised, since [[Work Package 2|work package 2]] has implemented prototypes for the various superoptimization techniques.<br />
<br />
=== Description ===<br />
<br />
Measure the benefit to be achieved from superoptimization, using the [[Work Package 2|work package 2]] prototypes.<br />
<br />
=== Deliverables ===<br />
<br />
A report on the findings of these measurements.<br />
<br />
=== Dependencies ===<br />
<br />
[[Work Package 1|work package 1]] and [[Work Package 2|work package 2]], although likely to be done in parallel.<br />
<br />
=== Status ===<br />
<br />
Complete.<br />
<br />
== Approach ==<br />
<br />
Hand-calculate benefit from superoptimization, comparing the different techniques. Stretch goal measure implementation (if WP2 completed an implementation).<br />
<br />
== Notes ==<br />
<br />
The benefit is likely to be largest for inner loops, and frequently executed portions of code.<br />
<br />
A prototype superoptimizer has been constructed, to allow short AVR instruction sequences to be evaluated.<br />
<br />
== Deliverables ==<br />
<br />
Rolled into WP4</div>Jameshttp://superoptimization.org/wiki/Work_Package_3Work Package 32014-10-02T16:00:03Z<p>James: /* Status */</p>
<hr />
<div>== Work package summary ==<br />
<br />
{|<br />
|-<br />
| ''Start date'': || 21 July 2014<br />
|-<br />
| ''End date:'' || 15 August 2014<br />
|}<br />
<br />
Revised, since [[Work Package 2|work package 2]] has implemented prototypes for the various superoptimization techniques.<br />
<br />
=== Description ===<br />
<br />
Measure the benefit to be achieved from superoptimization, using the [[Work Package 2|work package 2]] prototypes.<br />
<br />
=== Deliverables ===<br />
<br />
A report on the findings of these measurements.<br />
<br />
=== Dependencies ===<br />
<br />
[[Work Package 1|work package 1]] and [[Work Package 2|work package 2]], although likely to be done in parallel.<br />
<br />
=== Status ===<br />
<br />
Complete.<br />
<br />
== Approach ==<br />
<br />
Hand-calculate benefit from superoptimization, comparing the different techniques. Stretch goal measure implementation (if WP2 completed an implementation).<br />
<br />
== Notes ==<br />
<br />
The benefit is likely to be largest for inner loops, and frequently executed portions of code.<br />
<br />
A prototype superoptimizer has been constructed, to allow short AVR instruction sequences to be evaluated.<br />
<br />
== Deliverables ==<br />
<br />
''Available soon''</div>Jameshttp://superoptimization.org/wiki/Superoptimizing_CompilersSuperoptimizing Compilers2014-10-02T15:57:18Z<p>James: /* Support */</p>
<hr />
<div>A proof of concept superoptimizer that considers ''all'' features of a modern processor.<br />
<br />
== Welcome ==<br />
<br />
This Wiki is for a research project that is developing a proof of concept superoptimizer, as the basis of a future more complete implementation that can be integrated with existing compilers and used in production environments.<br />
<br />
The initial research project will run from 02nd June to 1st October 2014 and is being undertaken by [http://www.embecosm.com Embecosm].<br />
<br />
== Goals ==<br />
<br />
The goals of the project can be summarised as:<br />
<br />
# Identify code that can be superoptimized (for speed, size or energy) focusing on finding awkward constructs<br />
# Design a peephole optimization pass (or similar) to integrate a superoptimizer into GCC and/or LLVM<br />
# Hand-calculate benefit from superoptimization, comparing the different techniques<br />
# Document the generalized process to demonstrate repeatability<br />
<br />
== Introduction ==<br />
<br />
Superoptimization is an idea which produces perfectly optimal code, in place of the code we currently have generated by compilers. This is typically done via a bruteforce search of every possible instruction sequence, checking whether it performs the desired actions and accepting the sequence if it is the optimal one.<br />
<br />
The problem is clearly very difficult, exploding in size as the length of the instruction sequence increases. However, superoptimization gaining traction as a method of optimizing programs. In this feasibility study we hope to find which techniques are extensible enough to bring out of the academic community to be using in a commercial setting. This means that the superoptimizer needs to be able to handle all the corner cases that may arise in production code. If it turns out that superoptimization is not currently feasibly, by the end of the project we will have a research roadmap describing what will need to be done next.<br />
<br />
Superoptimization was first conceived by Henry Massalin[http://web.stanford.edu/class/cs343/resources/superoptimizer.pdf], where a bruteforce search was conducted through the arithmetic instructions of the Motorola 68000. The example given in the original paper is given below.<br />
<br />
<br />
[[File:Signum_compiler.png|center]]<br />
<br />
<br />
The function on the left is compiled, giving the piece of code on the right, with multiple branches and comparisons. This could perhaps be reduced by an expert assembly programmer, but was generally accepted as close to optimal. When the superoptimizer was run on this program, a much shorter version was found - 4 instructions long and with no branching.<br />
<br />
<br />
[[File:Signum_sopt.png|center|700px]]<br />
<br />
<br />
On the left is the instruction sequence found by the superoptimizer (Motorola 68000), with the definitions of the instructions on the far right. In the middle are three cases, computed step by step to illustrate how the code sequence works. How to instructions compute their result is not obvious exploiting the carry flag ('x') in combination with arithmetic instructions.<br />
<br />
== Project execution ==<br />
<br />
The project runs from 1 June to 30 September 2014. The lead investigator is [[User:James|James Pallister]]. The [[project plan]] describes five work packages and includes a [[risk analysis]]. These are living documents, which evolve with the project.<br />
# [[Work Package 1]]<br />
# [[Work Package 2]]<br />
# [[Work Package 3]]<br />
# [[Work Package 4]]<br />
# [[Work Package 5]]<br />
<br />
Half way through the project (31 July 2014), we have produced an [[interim report]]. There will be a [[final report]] at the end of the project.<br />
<br />
== Support ==<br />
<br />
This research is supported by the [https://www.innovateuk.org/ Innovate UK (Technology Strategy Board)].<br />
<br />
== Contact ==<br />
<br />
Any enquiries should be directed in the first instance to [mailto:info@embecosm.com info@embecosm.com].</div>Jameshttp://superoptimization.org/wiki/Work_Package_4Work Package 42014-10-02T15:53:35Z<p>James: /* Deliverables */</p>
<hr />
<div>== Work package summary ==<br />
<br />
{|<br />
|-<br />
| ''Start date:'' || 2 June 2014<br />
|-<br />
| ''End date:'' || 19 September 2014<br />
|}<br />
<br />
Revised, since the deliverable of [[Work Package 2|work package 2]] forms an appropriate interim report.<br />
<br />
=== Description ===<br />
<br />
Document the generalized process to demonstrate repeatability. This is the main engineering documentation of the project, and should provide sufficient information to guide a future product implementation. This will include appropriate reference to the work in [[Work Package 1|work package 1]], [[Work Package 2|work package 2]] and [[Work Package 3|work package 3]].<br />
<br />
=== Deliverables ===<br />
<br />
A report as described, with the deliverable of [[Work Package 2|work package 2]] forming the interim report, to be delivered by 25 July 2014.<br />
<br />
=== Dependencies ===<br />
<br />
Runs concurrently with [[Work Package 1|work package 1]], [[Work Package 2|work package 2]] and [[Work Package 3|work package 3]].<br />
<br />
=== Status ===<br />
<br />
In progress. Interim report (deliverable of [[Work Package 2|work package 2]]) complete.<br />
<br />
== Approach ==<br />
Document the generalized process to demonstrate repeatability. This is the main engineering documentation of the project, and should provide sufficient information to guide a future product implementation. This will include appropriate reference to the work in WP1, WP2 and WP3.<br />
== Notes ==<br />
This will provide an extended version of the WP2 deliverable.<br />
<br />
== Deliverables ==<br />
Available: [[Final report]]</div>Jameshttp://superoptimization.org/wiki/Final_reportFinal report2014-10-02T15:51:20Z<p>James: </p>
<hr />
<div>The final report for this project was delivered on 30 September 2014 and can be viewed [[media:Wp4.pdf|here]].</div>Jameshttp://superoptimization.org/wiki/File:Wp4.pdfFile:Wp4.pdf2014-09-30T08:58:51Z<p>James: </p>
<hr />
<div></div>Jameshttp://superoptimization.org/wiki/Work_Package_3Work Package 32014-08-01T19:18:32Z<p>James: /* Notes */</p>
<hr />
<div>== Description ==<br />
<br />
Hand-calculate benefit from superoptimization, comparing the different techniques. Stretch goal measure implementation (if WP2 completed an implementation).<br />
<br />
== Notes ==<br />
<br />
The benefit is likely to be largest for inner loops, and frequently executed portions of code.<br />
<br />
A prototype superoptimizer has been constructed, to allow short AVR instruction sequences to be evaluated.<br />
<br />
== Deliverables ==<br />
<br />
''Available soon''</div>Jameshttp://superoptimization.org/wiki/Work_Package_2Work Package 22014-08-01T19:17:13Z<p>James: /* Deliverables */</p>
<hr />
<div>== Description ==<br />
<br />
:Design a peephole optimization to integrate a superoptimizer into GCC and/or LLVM.<br />
<br />
== Notes ==<br />
<br />
A peephole optimizer is no longer relevant - not enough potential for gains.<br />
<br />
Optimizing simultaneous sequences.<br />
<br />
:Peephole superoptimizer study could bruteforce multiple sequences simultenously, by hashing the test vectors<br />
::Unclear to see how this can be done for SMT based methods<br />
::Naively attempting to find multiple solutions at the same time will limit the 'branch pruning' of the solver – it will be unable to rule out large areas of the search space due to multiple divergant solutions<br />
::Could be the thing that limits the effectiveness of the SMT method<br />
::Possible solutions<br />
:::Attempt to find multiple solutions to 'similar' sequences. Based on the assumption that the solutions will be similar. How to define a similarity metric?<br />
<br />
== Deliverables ==<br />
<br />
The above notes, and notes on the [[Meetings]] page are the output of WP2, to be formulated into the project report (WP4)</div>Jameshttp://superoptimization.org/wiki/Work_Package_1Work Package 12014-08-01T19:16:58Z<p>James: /* Deliverables */</p>
<hr />
<div><br />
== Description ==<br />
:Identify code that can be superoptimized (for speed, size or energy) focussing on finding awkward constructs. <br />
<br />
:Use rapid prototyping to evaluate key algorithms. Consider novel constructive techniques as well as exploring existing approaches.<br />
<br />
== Notes ==<br />
Difficult constructs<br />
<br />
:Memory<br />
::Possibly not too difficult if only considering a single basic block<br />
::Memory liveness needs to be taken into account (To discuss next week)<br />
:Skip instructions<br />
::Possibly difficult to encode. Consider as seperate basic blocks for now?<br />
:Flags<br />
::Naively modelling flags results in a large number of constraints<br />
::However, the constraints should fall naturally out of the bitvector arithmetic in most cases.<br />
::Increase in constraint-generator complexity for reduction in final constraint complexity (which should reduce superoptimization time)<br />
::Few instructions that use the flags (except branches), so maybe low effectiveness in superoptimization<br />
<br />
<br />
How can we handle larger basic blocks?<br />
<br />
:Hopefully be able to generate instruction sequences up to length 10 (maybe 15).<br />
:The set of instructions used for superoptimization should be restricted (conservatively).<br />
::E.g. if the sequence does not load or store memory, then perhaps we do not need the memory instructions<br />
:Some basic blocks are larger, however we may be able to split the basic block<br />
:Consider the dataflow DAG, cyclomatic complexity. Can the DAG be partitioned (possibly make a cut which cuts the shortest amount of edges – where, how, how many?)<br />
::Potential for superoptimization often inhibited by load/stores dispersed through the code.<br />
::The load/store are a single chain which cannot be reordered/eliminated, since the volatility of the address is not known (see Illustration 1).<br />
::Code to be superoptimized could chosen by selecting a subgraph with few connections to the load/store chain (highlighted area on the right).<br />
::The complexity of the DAG and size of the basic blocks are higher than expected. The number of instructions per basic block could greatly slow down superoptimization, although partitioning the DAG into smaller areas may help.<br />
:The edges represent the live registers/flags. The search can be restricted based on liveness information<br />
<br />
== Deliverables ==<br />
<br />
The above notes, and notes on the [[Meetings]] page are the output of WP1, to be formulated into the project report (WP4)</div>Jameshttp://superoptimization.org/wiki/Work_Package_4Work Package 42014-08-01T17:13:51Z<p>James: /* Notes */</p>
<hr />
<div>== Description ==<br />
Document the generalized process to demonstrate repeatability. This is the main engineering documentation of the project, and should provide sufficient information to guide a future product implementation. This will include appropriate reference to the work in WP1, WP2 and WP3.<br />
== Notes ==<br />
This will provide an extended version of the WP2 deliverable.<br />
<br />
== Deliverables ==<br />
''Completed by the end of the project''</div>Jameshttp://superoptimization.org/wiki/Work_Package_5Work Package 52014-08-01T17:13:20Z<p>James: Created page with "==Description== ==Notes== ==Deliverables=="</p>
<hr />
<div>==Description==<br />
<br />
==Notes==<br />
<br />
==Deliverables==</div>Jameshttp://superoptimization.org/wiki/Work_Package_4Work Package 42014-08-01T17:13:02Z<p>James: Created page with "== Description == Document the generalized process to demonstrate repeatability. This is the main engineering documentation of the project, and should provide sufficient infor..."</p>
<hr />
<div>== Description ==<br />
Document the generalized process to demonstrate repeatability. This is the main engineering documentation of the project, and should provide sufficient information to guide a future product implementation. This will include appropriate reference to the work in WP1, WP2 and WP3.<br />
== Notes ==<br />
<br />
== Deliverables ==<br />
''Completed by the end of the project''</div>Jameshttp://superoptimization.org/wiki/Work_Package_3Work Package 32014-08-01T17:12:02Z<p>James: Created page with "== Description == Hand-calculate benefit from superoptimization, comparing the different techniques. Stretch goal measure implementation (if WP2 completed an implementation)...."</p>
<hr />
<div>== Description ==<br />
<br />
Hand-calculate benefit from superoptimization, comparing the different techniques. Stretch goal measure implementation (if WP2 completed an implementation).<br />
<br />
== Notes ==<br />
<br />
== Deliverables ==<br />
<br />
''Available soon''</div>Jameshttp://superoptimization.org/wiki/Work_Package_1Work Package 12014-08-01T17:11:28Z<p>James: </p>
<hr />
<div><br />
== Description ==<br />
:Identify code that can be superoptimized (for speed, size or energy) focussing on finding awkward constructs. <br />
<br />
:Use rapid prototyping to evaluate key algorithms. Consider novel constructive techniques as well as exploring existing approaches.<br />
<br />
== Notes ==<br />
Difficult constructs<br />
<br />
:Memory<br />
::Possibly not too difficult if only considering a single basic block<br />
::Memory liveness needs to be taken into account (To discuss next week)<br />
:Skip instructions<br />
::Possibly difficult to encode. Consider as seperate basic blocks for now?<br />
:Flags<br />
::Naively modelling flags results in a large number of constraints<br />
::However, the constraints should fall naturally out of the bitvector arithmetic in most cases.<br />
::Increase in constraint-generator complexity for reduction in final constraint complexity (which should reduce superoptimization time)<br />
::Few instructions that use the flags (except branches), so maybe low effectiveness in superoptimization<br />
<br />
<br />
How can we handle larger basic blocks?<br />
<br />
:Hopefully be able to generate instruction sequences up to length 10 (maybe 15).<br />
:The set of instructions used for superoptimization should be restricted (conservatively).<br />
::E.g. if the sequence does not load or store memory, then perhaps we do not need the memory instructions<br />
:Some basic blocks are larger, however we may be able to split the basic block<br />
:Consider the dataflow DAG, cyclomatic complexity. Can the DAG be partitioned (possibly make a cut which cuts the shortest amount of edges – where, how, how many?)<br />
::Potential for superoptimization often inhibited by load/stores dispersed through the code.<br />
::The load/store are a single chain which cannot be reordered/eliminated, since the volatility of the address is not known (see Illustration 1).<br />
::Code to be superoptimized could chosen by selecting a subgraph with few connections to the load/store chain (highlighted area on the right).<br />
::The complexity of the DAG and size of the basic blocks are higher than expected. The number of instructions per basic block could greatly slow down superoptimization, although partitioning the DAG into smaller areas may help.<br />
:The edges represent the live registers/flags. The search can be restricted based on liveness information<br />
<br />
== Deliverables ==<br />
<br />
''Available soon''</div>Jameshttp://superoptimization.org/wiki/Work_Package_2Work Package 22014-08-01T17:10:31Z<p>James: </p>
<hr />
<div>== Description ==<br />
<br />
:Design a peephole optimization to integrate a superoptimizer into GCC and/or LLVM.<br />
<br />
== Notes ==<br />
<br />
A peephole optimizer is no longer relevant - not enough potential for gains.<br />
<br />
Optimizing simultaneous sequences.<br />
<br />
:Peephole superoptimizer study could bruteforce multiple sequences simultenously, by hashing the test vectors<br />
::Unclear to see how this can be done for SMT based methods<br />
::Naively attempting to find multiple solutions at the same time will limit the 'branch pruning' of the solver – it will be unable to rule out large areas of the search space due to multiple divergant solutions<br />
::Could be the thing that limits the effectiveness of the SMT method<br />
::Possible solutions<br />
:::Attempt to find multiple solutions to 'similar' sequences. Based on the assumption that the solutions will be similar. How to define a similarity metric?<br />
<br />
== Deliverables ==<br />
<br />
''Available soon''</div>Jameshttp://superoptimization.org/wiki/Work_Package_2Work Package 22014-08-01T17:09:14Z<p>James: Created page with "== Description == :Design a peephole optimization to integrate a superoptimizer into GCC and/or LLVM. == Notes == A peephole optimizer is no longer relevant - not enough po..."</p>
<hr />
<div>== Description ==<br />
<br />
:Design a peephole optimization to integrate a superoptimizer into GCC and/or LLVM.<br />
<br />
== Notes ==<br />
<br />
A peephole optimizer is no longer relevant - not enough potential for gains.<br />
<br />
Optimizing simultaneous sequences.<br />
<br />
:Peephole superoptimizer study could bruteforce multiple sequences simultenously, by hashing the test vectors<br />
::Unclear to see how this can be done for SMT based methods<br />
::Naively attempting to find multiple solutions at the same time will limit the 'branch pruning' of the solver – it will be unable to rule out large areas of the search space due to multiple divergant solutions<br />
::Could be the thing that limits the effectiveness of the SMT method<br />
::Possible solutions<br />
:::Attempt to find multiple solutions to 'similar' sequences. Based on the assumption that the solutions will be similar. How to define a similarity metric?</div>Jameshttp://superoptimization.org/wiki/Work_Package_1Work Package 12014-08-01T17:01:59Z<p>James: /* Notes */</p>
<hr />
<div><br />
== Description ==<br />
:Identify code that can be superoptimized (for speed, size or energy) focussing on finding awkward constructs. <br />
<br />
:Use rapid prototyping to evaluate key algorithms. Consider novel constructive techniques as well as exploring existing approaches.<br />
<br />
== Notes ==<br />
Difficult constructs<br />
<br />
:Memory<br />
::Possibly not too difficult if only considering a single basic block<br />
::Memory liveness needs to be taken into account (To discuss next week)<br />
:Skip instructions<br />
::Possibly difficult to encode. Consider as seperate basic blocks for now?<br />
:Flags<br />
::Naively modelling flags results in a large number of constraints<br />
::However, the constraints should fall naturally out of the bitvector arithmetic in most cases.<br />
::Increase in constraint-generator complexity for reduction in final constraint complexity (which should reduce superoptimization time)<br />
::Few instructions that use the flags (except branches), so maybe low effectiveness in superoptimization<br />
<br />
<br />
How can we handle larger basic blocks?<br />
<br />
:Hopefully be able to generate instruction sequences up to length 10 (maybe 15).<br />
:The set of instructions used for superoptimization should be restricted (conservatively).<br />
::E.g. if the sequence does not load or store memory, then perhaps we do not need the memory instructions<br />
:Some basic blocks are larger, however we may be able to split the basic block<br />
:Consider the dataflow DAG, cyclomatic complexity. Can the DAG be partitioned (possibly make a cut which cuts the shortest amount of edges – where, how, how many?)<br />
::Potential for superoptimization often inhibited by load/stores dispersed through the code.<br />
::The load/store are a single chain which cannot be reordered/eliminated, since the volatility of the address is not known (see Illustration 1).<br />
::Code to be superoptimized could chosen by selecting a subgraph with few connections to the load/store chain (highlighted area on the right).<br />
::The complexity of the DAG and size of the basic blocks are higher than expected. The number of instructions per basic block could greatly slow down superoptimization, although partitioning the DAG into smaller areas may help.<br />
:The edges represent the live registers/flags. The search can be restricted based on liveness information</div>Jameshttp://superoptimization.org/wiki/Work_Package_1Work Package 12014-08-01T17:00:59Z<p>James: /* Notes */</p>
<hr />
<div><br />
== Description ==<br />
:Identify code that can be superoptimized (for speed, size or energy) focussing on finding awkward constructs. <br />
<br />
:Use rapid prototyping to evaluate key algorithms. Consider novel constructive techniques as well as exploring existing approaches.<br />
<br />
== Notes ==<br />
Difficult constructs<br />
<br />
:Memory<br />
::Possibly not too difficult if only considering a single basic block<br />
::Memory liveness needs to be taken into account (To discuss next week)<br />
:Skip instructions<br />
::Possibly difficult to encode. Consider as seperate basic blocks for now?<br />
:Flags<br />
::Naively modelling flags results in a large number of constraints<br />
::However, the constraints should fall naturally out of the bitvector arithmetic in most cases.<br />
::Increase in constraint-generator complexity for reduction in final constraint complexity (which should reduce superoptimization time)<br />
::Few instructions that use the flags (except branches), so maybe low effectiveness in superoptimization<br />
<br />
<br />
How can we handle larger basic blocks?<br />
<br />
:Hopefully be able to generate instruction sequences up to length 10 (maybe 15).<br />
:The set of instructions used for superoptimization should be restricted (conservatively).<br />
::E.g. if the sequence does not load or store memory, then perhaps we do not need the memory instructions<br />
:Some basic blocks are larger, however we may be able to split the basic block<br />
:Consider the dataflow DAG, cyclomatic complexity. Can the DAG be partitioned (possibly make a cut which cuts the shortest amount of edges – where, how, how many?)<br />
:The edges represent the live registers/flags. The search can be restricted based on liveness information</div>Jameshttp://superoptimization.org/wiki/Work_Package_1Work Package 12014-08-01T17:00:07Z<p>James: Created page with " == Description == :Identify code that can be superoptimized (for speed, size or energy) focussing on finding awkward constructs. :Use rapid prototyping to evaluate key algo..."</p>
<hr />
<div><br />
== Description ==<br />
:Identify code that can be superoptimized (for speed, size or energy) focussing on finding awkward constructs. <br />
<br />
:Use rapid prototyping to evaluate key algorithms. Consider novel constructive techniques as well as exploring existing approaches.<br />
<br />
== Notes ==<br />
Difficult constructs<br />
<br />
:Memory<br />
::Possibly not too difficult if only considering a single basic block<br />
::Memory liveness needs to be taken into account (To discuss next week)<br />
:Skip instructions<br />
::Possibly difficult to encode. Consider as seperate basic blocks for now?<br />
:Flags<br />
::Naively modelling flags results in a large number of constraints<br />
::However, the constraints should fall naturally out of the bitvector arithmetic in most cases.<br />
::Increase in constraint-generator complexity for reduction in final constraint complexity (which should reduce superoptimization time)<br />
::Few instructions that use the flags (except branches), so maybe low effectiveness in superoptimization</div>Jameshttp://superoptimization.org/wiki/Superoptimizing_CompilersSuperoptimizing Compilers2014-08-01T15:28:14Z<p>James: /* Work Packages */</p>
<hr />
<div>A proof of concept superoptimizer that considers ''all'' features of a modern processor.<br />
<br />
== Welcome ==<br />
<br />
This Wiki is for a research project that is developing a proof of concept superoptimizer, as the basis of a future more complete implementation that can be integrated with existing compilers and used in production environments.<br />
<br />
The initial research project will run from 02nd June to 1st October 2014 and is being undertaken by [http://www.embecosm.com Embecosm].<br />
<br />
== Goals ==<br />
<br />
The goals of the project can be summarised as:<br />
<br />
# Identify code that can be superoptimized (for speed, size or energy) focusing on finding awkward constructs<br />
# Design a peephole optimization pass (or similar) to integrate a superoptimizer into GCC and/or LLVM<br />
# Hand-calculate benefit from superoptimization, comparing the different techniques<br />
# Document the generalized process to demonstrate repeatability<br />
<br />
== Introduction ==<br />
<br />
Superoptimization is an idea which produces perfectly optimal code, in place of the code we currently have generated by compilers. This is typically done via a bruteforce search of every possible instruction sequence, checking whether it performs the desired actions and accepting the sequence if it is the optimal one.<br />
<br />
The problem is clearly very difficult, exploding in size as the length of the instruction sequence increases. However, superoptimization gaining traction as a method of optimizing programs. In this feasibility study we hope to find which techniques are extensible enough to bring out of the academic community to be using in a commercial setting. This means that the superoptimizer needs to be able to handle all the corner cases that may arise in production code. If it turns out that superoptimization is not currently feasibly, by the end of the project we will have a research roadmap describing what will need to be done next.<br />
<br />
Superoptimization was first conceived by Henry Massalin[http://web.stanford.edu/class/cs343/resources/superoptimizer.pdf], where a bruteforce search was conducted through the arithmetic instructions of the Motorola 68000. The example given in the original paper is given below.<br />
<br />
<br />
[[File:Signum_compiler.png|center]]<br />
<br />
<br />
The function on the left is compiled, giving the piece of code on the right, with multiple branches and comparisons. This could perhaps be reduced by an expert assembly programmer, but was generally accepted as close to optimal. When the superoptimizer was run on this program, a much shorter version was found - 4 instructions long and with no branching.<br />
<br />
<br />
[[File:Signum_sopt.png|center|700px]]<br />
<br />
<br />
On the left is the instruction sequence found by the superoptimizer (Motorola 68000), with the definitions of the instructions on the far right. In the middle are three cases, computed step by step to illustrate how the code sequence works. How to instructions compute their result is not obvious exploiting the carry flag ('x') in combination with arithmetic instructions.<br />
<br />
== Work Packages ==<br />
<br />
There are five work packages, of which is detailed on a separate page:<br />
<br />
# [[Work Package 1]]<br />
# [[Work Package 2]]<br />
# [[Work Package 3]]<br />
# [[Work Package 4]]<br />
# [[Work Package 5]]<br />
<br />
== Support ==<br />
<br />
This research is supported by the [https://www.innovateuk.org/ UK Technology Strategy Board].<br />
<br />
== Contact ==<br />
<br />
Any enquiries should be directed in the first instance to [mailto:info@embecosm.com info@embecosm.com].</div>Jameshttp://superoptimization.org/wiki/Superoptimizing_CompilersSuperoptimizing Compilers2014-08-01T15:27:42Z<p>James: /* Work Packages */</p>
<hr />
<div>A proof of concept superoptimizer that considers ''all'' features of a modern processor.<br />
<br />
== Welcome ==<br />
<br />
This Wiki is for a research project that is developing a proof of concept superoptimizer, as the basis of a future more complete implementation that can be integrated with existing compilers and used in production environments.<br />
<br />
The initial research project will run from 02nd June to 1st October 2014 and is being undertaken by [http://www.embecosm.com Embecosm].<br />
<br />
== Goals ==<br />
<br />
The goals of the project can be summarised as:<br />
<br />
# Identify code that can be superoptimized (for speed, size or energy) focusing on finding awkward constructs<br />
# Design a peephole optimization pass (or similar) to integrate a superoptimizer into GCC and/or LLVM<br />
# Hand-calculate benefit from superoptimization, comparing the different techniques<br />
# Document the generalized process to demonstrate repeatability<br />
<br />
== Introduction ==<br />
<br />
Superoptimization is an idea which produces perfectly optimal code, in place of the code we currently have generated by compilers. This is typically done via a bruteforce search of every possible instruction sequence, checking whether it performs the desired actions and accepting the sequence if it is the optimal one.<br />
<br />
The problem is clearly very difficult, exploding in size as the length of the instruction sequence increases. However, superoptimization gaining traction as a method of optimizing programs. In this feasibility study we hope to find which techniques are extensible enough to bring out of the academic community to be using in a commercial setting. This means that the superoptimizer needs to be able to handle all the corner cases that may arise in production code. If it turns out that superoptimization is not currently feasibly, by the end of the project we will have a research roadmap describing what will need to be done next.<br />
<br />
Superoptimization was first conceived by Henry Massalin[http://web.stanford.edu/class/cs343/resources/superoptimizer.pdf], where a bruteforce search was conducted through the arithmetic instructions of the Motorola 68000. The example given in the original paper is given below.<br />
<br />
<br />
[[File:Signum_compiler.png|center]]<br />
<br />
<br />
The function on the left is compiled, giving the piece of code on the right, with multiple branches and comparisons. This could perhaps be reduced by an expert assembly programmer, but was generally accepted as close to optimal. When the superoptimizer was run on this program, a much shorter version was found - 4 instructions long and with no branching.<br />
<br />
<br />
[[File:Signum_sopt.png|center|700px]]<br />
<br />
<br />
On the left is the instruction sequence found by the superoptimizer (Motorola 68000), with the definitions of the instructions on the far right. In the middle are three cases, computed step by step to illustrate how the code sequence works. How to instructions compute their result is not obvious exploiting the carry flag ('x') in combination with arithmetic instructions.<br />
<br />
== Work Packages ==<br />
<br />
# [[Work Package 1]]<br />
# [[Work Package 2]]<br />
# [[Work Package 3]]<br />
# [[Work Package 4]]<br />
# [[Work Package 5]]<br />
<br />
== Support ==<br />
<br />
This research is supported by the [https://www.innovateuk.org/ UK Technology Strategy Board].<br />
<br />
== Contact ==<br />
<br />
Any enquiries should be directed in the first instance to [mailto:info@embecosm.com info@embecosm.com].</div>Jameshttp://superoptimization.org/wiki/Superoptimizing_CompilersSuperoptimizing Compilers2014-08-01T15:26:41Z<p>James: </p>
<hr />
<div>A proof of concept superoptimizer that considers ''all'' features of a modern processor.<br />
<br />
== Welcome ==<br />
<br />
This Wiki is for a research project that is developing a proof of concept superoptimizer, as the basis of a future more complete implementation that can be integrated with existing compilers and used in production environments.<br />
<br />
The initial research project will run from 02nd June to 1st October 2014 and is being undertaken by [http://www.embecosm.com Embecosm].<br />
<br />
== Goals ==<br />
<br />
The goals of the project can be summarised as:<br />
<br />
# Identify code that can be superoptimized (for speed, size or energy) focusing on finding awkward constructs<br />
# Design a peephole optimization pass (or similar) to integrate a superoptimizer into GCC and/or LLVM<br />
# Hand-calculate benefit from superoptimization, comparing the different techniques<br />
# Document the generalized process to demonstrate repeatability<br />
<br />
== Introduction ==<br />
<br />
Superoptimization is an idea which produces perfectly optimal code, in place of the code we currently have generated by compilers. This is typically done via a bruteforce search of every possible instruction sequence, checking whether it performs the desired actions and accepting the sequence if it is the optimal one.<br />
<br />
The problem is clearly very difficult, exploding in size as the length of the instruction sequence increases. However, superoptimization gaining traction as a method of optimizing programs. In this feasibility study we hope to find which techniques are extensible enough to bring out of the academic community to be using in a commercial setting. This means that the superoptimizer needs to be able to handle all the corner cases that may arise in production code. If it turns out that superoptimization is not currently feasibly, by the end of the project we will have a research roadmap describing what will need to be done next.<br />
<br />
Superoptimization was first conceived by Henry Massalin[http://web.stanford.edu/class/cs343/resources/superoptimizer.pdf], where a bruteforce search was conducted through the arithmetic instructions of the Motorola 68000. The example given in the original paper is given below.<br />
<br />
<br />
[[File:Signum_compiler.png|center]]<br />
<br />
<br />
The function on the left is compiled, giving the piece of code on the right, with multiple branches and comparisons. This could perhaps be reduced by an expert assembly programmer, but was generally accepted as close to optimal. When the superoptimizer was run on this program, a much shorter version was found - 4 instructions long and with no branching.<br />
<br />
<br />
[[File:Signum_sopt.png|center|700px]]<br />
<br />
<br />
On the left is the instruction sequence found by the superoptimizer (Motorola 68000), with the definitions of the instructions on the far right. In the middle are three cases, computed step by step to illustrate how the code sequence works. How to instructions compute their result is not obvious exploiting the carry flag ('x') in combination with arithmetic instructions.<br />
<br />
== Work Packages ==<br />
<br />
# Work Package 1<br />
# Work Package 2<br />
# Work Package 3<br />
# Work Package 4<br />
# Work Package 5<br />
<br />
<br />
== Support ==<br />
<br />
This research is supported by the [https://www.innovateuk.org/ UK Technology Strategy Board].<br />
<br />
== Contact ==<br />
<br />
Any enquiries should be directed in the first instance to [mailto:info@embecosm.com info@embecosm.com].</div>Jameshttp://superoptimization.org/wiki/MeetingsMeetings2014-07-08T09:20:07Z<p>James: Created page with "== Kickoff Meeting JP, SJH, KIE, JB == Actions for two weeks time * Where should the optimization pass go? * What representation should the superoptimizer operate on? * Does..."</p>
<hr />
<div>== Kickoff Meeting JP, SJH, KIE, JB ==<br />
Actions for two weeks time<br />
<br />
* Where should the optimization pass go?<br />
* What representation should the superoptimizer operate on?<br />
* Does the SMT approach work for AVR?<br />
* Analyse some basic blocks<br />
* Average instruction count<br />
* Metrics for complexity<br />
* Number of memory ops<br />
* Interdependency between memory ops and rest of instructions<br />
* Find a set of basic blocks for further discussion<br />
<br />
== Bristol Meeting #1 JP, SJH, KIE ==<br />
Discussed cost models to bias the search for instructions<br />
* Instruction sequences length. Generate instruction sequences which are the same size or shorter is likely to be a good heuristic. Need to consider instruction size however, since some instructions could be 2 words<br/> <br />
How can we handle larger basic blocks?<br />
* Hopefully be able to generate instruction sequences up to length 10 (maybe 15).<br />
* The set of instructions used for superoptimization should be restricted (conservatively).<br />
* E.g. if the sequence does not load or store memory, then perhaps we do not need the memory instructions<br />
* Some basic blocks are larger, however we may be able to split the basic block<br />
* Consider the dataflow DAG, cyclomatic complexity. Can the DAG be partitioned (possibly make a cut which cuts the shortest amount of edges – where, how, how many?)<br />
* The edges represent the live registers/flags. The search can be restricted based on liveness information<br />
<br />
Difficult constructs<br />
<br />
* Memory<br />
* Possibly not too difficult if only considering a single basic block<br />
* Memory liveness needs to be taken into account '''(To discuss next week)'''<br />
* Skip instructions<br />
* Possibly difficult to encode. Consider as seperate basic blocks for now?<br />
* Flags<br />
* Naively modelling flags results in a large number of constraints<br />
* However, the constraints should fall naturally out of the bitvector arithmetic in most cases.<br />
* Increase in constraint-generator complexity for reduction in final constraint complexity (which should reduce superoptimization time)<br />
* Few instructions that use the flags (except branches), so maybe low effectiveness in superoptimization<br />
<br />
Optimizing simultaneous sequences.<br />
<br />
* Peephole superoptimizer study could bruteforce multiple sequences simultenously, by hashing the test vectors<br />
* Unclear to see how this can be done for SMT based methods<br />
* Naively attempting to find multiple solutions at the same time will limit the 'branch pruning' of the solver – it will be unable to rule out large areas of the search space due to multiple divergant solutions<br />
* '''Could be the thing that limits the effectiveness of the SMT method'''<br />
* Possible solutions<br />
* Attempt to find multiple solutions to 'similar' sequences. Based on the assumption that the solutions will be similar. How to define a similarity metric?<br />
<br />
== Bristol Meeting #2 JP, KIE ==<br />
Dataflow DAGs<br />
<br />
* Potential for superoptimization often inhibited by load/stores dispersed through the code.<br />
* The load/store are a single chain which cannot be reordered/eliminated, since the volatility of the address is not known (see Illustration 1).<br />
* Code to be superoptimized could chosen by selecting a subgraph with few connections to the load/store chain (highlighted area on the right).<br />
* The complexity of the DAG and size of the basic blocks are higher than expected. The number of instructions per basic block could greatly slow down superoptimization, although partitioning the DAG into smaller areas may help.<br />
* [[Image:|thumb|''Illustration 1: This basic block has a sequence of memory operations which will be difficult to superoptimize'']]Many basic blocks are either too small, dominated by memory operations, or are too closely couple with the memory operations (see Illustration 2).<br />
<br />
[[Image:|thumb|''Illustration 2: Dataflow DAG. Arrows indicate a data dependency. Blue arrows are register dependencies, red are flags, and green is memory'']]<br />
<br />
<br />
AVR instruction set<br />
<br />
* The simplicity of the AVR instruction set means there are few instructions with 'interesting' side effects.<br />
* It is possible that since the instructions are simply, there is not much scope for superoptimization.<br />
* The simplicity of the instructions further hinders superoptimization since a large number of operations is require to perform one 'high level' operation – this slows down the search for a replacement sequence.<br />
* Since the operations are simple, is the compiler already exploiting them close to optimality?<br />
* '''Survey other instruction sets – is there a more appropriate instruction set?'''<br />
** '''Needs scope for superoptimization'''<br />
** '''Trade offs between number of registers, number of instructions and simplicity of instructions'''<br />
<br />
Higher level superoptimization<br />
<br />
* The SMT solver will find a sequence of instructions for any given relationship between input and output machine states. Can we use information from earlier in the compiler (perhaps before register allocation and spillcode generation?)<br />
* Explore intermediate representation in GCC – however these may suffer from the lack of instruction complexity, similar to AVR assembly, and a disconnect from the assembly instructions.<br />
* Does unoptimized code have more 'scope' for superoptimization? If the optimization process loses some information about the sequence, or transforms it in a way that makes superoptimization difficult, then operating on unoptimized code may be better.<br />
<br />
Superoptimizing 'components'<br />
<br />
* In much of BEEBS the operations are on 16 or 32 bit integers, resulting in many 8bit integer ops to emulate the bit width.<br />
* It may be that the superoptimizer can cope with restricted instances like this, which can then be reused, rather than generic instruction sequences.<br />
<br />
= Lymington Meeting #1 JP, JB =<br />
'''Where should the optimization pass go?'''<br />
<br />
* Intermediate representations suffer from the same problems as<br />
<br />
'''What representation should the superoptimizer operate on?'''<br />
<br />
* The selection of the region to superoptimize should be done via the dataflow DAG. This allows only the necessary instructions to be superoptimized, and composes nicely if the whole basic block is not to be superoptimized.<br />
* To enable multiple basic blocks to be considered, the representation needs to be simplified. With the DAG representation the need to track register assignments is unnecessary, due to the data dependencies being encoded in the transitions of the DAG.<br />
<br />
'''Does the SMT approach work for AVR?'''<br />
<br />
* Early experimental work suggests there isn't any particularly difficulty in encoding the AVR instruction set, however the efficiency of the approach suffers from AVR's lack of complexity.<br />
<br />
'''Analyse some basic blocks'''<br />
<br />
* Overall instruction count per basic block is high for AVR<br />
* Instruction complexity is low for AVR – more varied types of instructions for similar operations<br />
<br />
=== Summary + Next ===<br />
Seems like optimizing basic blocks is not productive enough. The compiler does a good job with what is possible over the basic block, minimizing what is possible for the super optimizer to find. Rather than changing the instruction set, is it possible to consider more than just a single basic block.<br />
<br />
<br />
More than basic blocks<br />
<br />
* To superoptimize AVR code we need to consider multiple basic blocks as input – this is where many other superoptimizers found potential. For example, Massalin's superoptimizer considered a target function with branches, even if the superoptimizer itself could not generate branches.<br />
* Loops would be the best target, since these are where the majority of the time is spent. However there is significant complexity in verifying loop equivalence, and likely even more complexity generating loops. Is there a hybrid approach that can be used, between SMT and bruteforce?<br />
* Just optimizing the sequential code inside a loop (without changing the overall loop structure) may be sufficient for performance gains.<br />
* At the very least straight-line code should be considered as input (even if the final output cannot contain branching).<br />
<br />
Generality<br />
<br />
* Optimizing larger amounts of code will make it more difficult to find common sequences between different programs, resulting in the superoptimizer having to be rerun more often.<br />
<br />
== Lymington Meeting #1.5 JP, JB ==<br />
Tasks<br />
<br />
* Talk to Andrew – superoptimization wiki, put meeting notes on the wiki<br />
* Talk to Kerstin about relational verification, verification of programs with loops<br />
* Email David Greaves<br />
* Analyse loops, regions bigger than basic blocks, common structures before and after optimization.<br />
* Look at libgcc + compiler-rt (longer, one-shot superoptimization)<br />
* Comparison with hand-written, compiler generated and superoptimized<br />
<br />
== Bristol Meeting #3 JP, SJH ==<br />
What are we aiming to speed up?<br />
<br />
* Are we targeting the common case? This should be handled well by the compiler already since there are operations commonly performed.<br />
* Can we target more specific things, but which happen less frequently?<br />
<br />
Speculative execution<br />
<br />
* The speed up in Henry Massalin's signum exampe came not just from exploiting the hardware flags, but from eliminating the control flow.<br />
* The calculations performed by the branches became implicit, handled by values stored in the flags. Similarly, superoptimizers have been used to eliminate branches by executing multiple possible basic blocks then conditionally moving the results into the appropriate registers (if conversion).<br />
* A more generic form of this could be speculative execution. Potentially eliminates difficult to handle control flow for the superoptimizer. Merge in a probable basic block, allowing more extensive superoptimization, then undo the result if the branch should not have been taken?<br />
* Memory operations remain difficult to handle.<br />
<br />
Metrics for superoptimization potential<br />
<br />
* How does the instruction set relate to the potential of superoptimization?<br />
* The size of the search space<br />
* A larger search space means it is more difficult to searching<br />
* More registers → larger search space<br />
* Larger instruction set → larger search space<br />
* The complexity of the operations that operate on the search space<br />
* More complex instructions → smaller instruction set (?)<br />
* More complex instructions → shorter instruction sequence<br />
* Each instruction transforms the search space<br/> <br />
<br />
<br />
== Bristol Meeting #4 JP, KIE ==<br />
Superoptimizing loops<br />
<br />
* What do we expect we can do by superoptimizing loops? To be discussed on Tuesday.<br />
* Different iteration patterns<br />
* Different numbers of loops<br />
<br />
Verification of two programs with loops<br />
<br />
* Usually only verify if you already know the loop invariant. The loop invariant is very difficult to calculate from existing code automatically. See Alan Bundy work on rippling.<br />
* “Relational verification using product programs” may provide insight into if programs with loops can be proved to be equivalent.<br />
* If there is little to be gained by changing the loop structure, only consider the loop body<br />
<br />
Generating programs with loops<br />
<br />
* Harder than verifying whether two loops are the same<br />
* Ideas<br />
* Iteratively unroll loops, constraining each unroll to the same instructions, until reach a fixed point? Constraining the loop to the same instructions reduces the search space, but the number of constraints increases.</div>Jameshttp://superoptimization.org/wiki/Superoptimizing_CompilersSuperoptimizing Compilers2014-07-02T11:13:23Z<p>James: /* Goals */</p>
<hr />
<div>A proof of concept superoptimizer that considers ''all'' features of a modern processor.<br />
<br />
== Welcome ==<br />
<br />
This Wiki is for a research project that is developing a proof of concept superoptimizer, as the basis of a future more complete implementation that can be integrated with existing compilers and used in production environments.<br />
<br />
The initial research project will run from 02nd June to 1st October 2014 and is being undertaken by [http://www.embecosm.com Embecosm].<br />
<br />
== Goals ==<br />
<br />
The goals of the project can be summarised as:<br />
<br />
# Identify code that can be superoptimized (for speed, size or energy) focusing on finding awkward constructs<br />
# Design a peephole optimization pass (or similar) to integrate a superoptimizer into GCC and/or LLVM<br />
# Hand-calculate benefit from superoptimization, comparing the different techniques<br />
# Document the generalized process to demonstrate repeatability<br />
<br />
== Introduction ==<br />
<br />
Superoptimization is an idea which produces perfectly optimal code, in place of the code we currently have generated by compilers. This is typically done via a bruteforce search of every possible instruction sequence, checking whether it performs the desired actions and accepting the sequence if it is the optimal one.<br />
<br />
The problem is clearly very difficult, exploding in size as the length of the instruction sequence increases. However, superoptimization gaining traction as a method of optimizing programs. In this feasibility study we hope to find which techniques are extensible enough to bring out of the academic community to be using in a commercial setting. This means that the superoptimizer needs to be able to handle all the corner cases that may arise in production code. If it turns out that superoptimization is not currently feasibly, by the end of the project we will have a research roadmap describing what will need to be done next.<br />
<br />
Superoptimization was first conceived by Henry Massalin[http://web.stanford.edu/class/cs343/resources/superoptimizer.pdf], where a bruteforce search was conducted through the arithmetic instructions of the Motorola 68000. The example given in the original paper is given below.<br />
<br />
<br />
[[File:Signum_compiler.png|center]]<br />
<br />
<br />
The function on the left is compiled, giving the piece of code on the right, with multiple branches and comparisons. This could perhaps be reduced by an expert assembly programmer, but was generally accepted as close to optimal. When the superoptimizer was run on this program, a much shorter version was found - 4 instructions long and with no branching.<br />
<br />
<br />
[[File:Signum_sopt.png|center|700px]]<br />
<br />
<br />
On the left is the instruction sequence found by the superoptimizer (Motorola 68000), with the definitions of the instructions on the far right. In the middle are three cases, computed step by step to illustrate how the code sequence works. How to instructions compute their result is not obvious exploiting the carry flag ('x') in combination with arithmetic instructions.<br />
<br />
== Support ==<br />
<br />
This research is supported by the [https://www.innovateuk.org/ UK Technology Strategy Board].<br />
<br />
== Contact ==<br />
<br />
Any enquiries should be directed in the first instance to [mailto:info@embecosm.com info@embecosm.com].</div>Jameshttp://superoptimization.org/wiki/Superoptimizing_CompilersSuperoptimizing Compilers2014-07-02T11:12:44Z<p>James: Added description of Henry Massalin's work</p>
<hr />
<div>A proof of concept superoptimizer that considers ''all'' features of a modern processor.<br />
<br />
== Welcome ==<br />
<br />
This Wiki is for a research project that is developing a proof of concept superoptimizer, as the basis of a future more complete implementation that can be integrated with existing compilers and used in production environments.<br />
<br />
The initial research project will run from 02nd June to 1st October 2014 and is being undertaken by [http://www.embecosm.com Embecosm].<br />
<br />
== Goals ==<br />
<br />
The goals of the project can be summarised as:<br />
<br />
# Identify code that can be superoptimized (for speed, size or energy) focusing on finding awkward constructs<br />
# Design a peephole optimization to integrate a superoptimizer into GCC and/or LLVM<br />
# Hand-calculate benefit from superoptimization, comparing the different techniques<br />
# Document the generalized process to demonstrate repeatability<br />
<br />
== Introduction ==<br />
<br />
Superoptimization is an idea which produces perfectly optimal code, in place of the code we currently have generated by compilers. This is typically done via a bruteforce search of every possible instruction sequence, checking whether it performs the desired actions and accepting the sequence if it is the optimal one.<br />
<br />
The problem is clearly very difficult, exploding in size as the length of the instruction sequence increases. However, superoptimization gaining traction as a method of optimizing programs. In this feasibility study we hope to find which techniques are extensible enough to bring out of the academic community to be using in a commercial setting. This means that the superoptimizer needs to be able to handle all the corner cases that may arise in production code. If it turns out that superoptimization is not currently feasibly, by the end of the project we will have a research roadmap describing what will need to be done next.<br />
<br />
Superoptimization was first conceived by Henry Massalin[http://web.stanford.edu/class/cs343/resources/superoptimizer.pdf], where a bruteforce search was conducted through the arithmetic instructions of the Motorola 68000. The example given in the original paper is given below.<br />
<br />
<br />
[[File:Signum_compiler.png|center]]<br />
<br />
<br />
The function on the left is compiled, giving the piece of code on the right, with multiple branches and comparisons. This could perhaps be reduced by an expert assembly programmer, but was generally accepted as close to optimal. When the superoptimizer was run on this program, a much shorter version was found - 4 instructions long and with no branching.<br />
<br />
<br />
[[File:Signum_sopt.png|center|700px]]<br />
<br />
<br />
On the left is the instruction sequence found by the superoptimizer (Motorola 68000), with the definitions of the instructions on the far right. In the middle are three cases, computed step by step to illustrate how the code sequence works. How to instructions compute their result is not obvious exploiting the carry flag ('x') in combination with arithmetic instructions.<br />
<br />
== Support ==<br />
<br />
This research is supported by the [https://www.innovateuk.org/ UK Technology Strategy Board].<br />
<br />
== Contact ==<br />
<br />
Any enquiries should be directed in the first instance to [mailto:info@embecosm.com info@embecosm.com].</div>Jameshttp://superoptimization.org/wiki/File:Signum_compiler.pngFile:Signum compiler.png2014-07-02T11:08:04Z<p>James: James uploaded a new version of &quot;File:Signum compiler.png&quot;</p>
<hr />
<div>The signum function as compiled naively</div>Jameshttp://superoptimization.org/wiki/File:Signum_sopt.pngFile:Signum sopt.png2014-07-02T11:05:29Z<p>James: The signum function superoptimized</p>
<hr />
<div>The signum function superoptimized</div>Jameshttp://superoptimization.org/wiki/File:Signum_compiler.pngFile:Signum compiler.png2014-07-02T11:04:10Z<p>James: The signum function as compiled naively</p>
<hr />
<div>The signum function as compiled naively</div>Jameshttp://superoptimization.org/wiki/Superoptimizing_CompilersSuperoptimizing Compilers2014-07-02T10:13:25Z<p>James: </p>
<hr />
<div>A proof of concept superoptimizer that considers ''all'' features of a modern processor.<br />
<br />
== Welcome ==<br />
<br />
This Wiki is for a research project that is developing a proof of concept superoptimizer, as the basis of a future more complete implementation that can be integrated with existing compilers and used in production environments.<br />
<br />
The initial research project will run from 02nd June to 1st October 2014 and is being undertaken by [http://www.embecosm.com Embecosm].<br />
<br />
== Goals ==<br />
<br />
The goals of the project can be summarised as:<br />
<br />
# Identify code that can be superoptimized (for speed, size or energy) focusing on finding awkward constructs<br />
# Design a peephole optimization to integrate a superoptimizer into GCC and/or LLVM<br />
# Hand-calculate benefit from superoptimization, comparing the different techniques<br />
# Document the generalized process to demonstrate repeatability<br />
<br />
== Introduction ==<br />
<br />
Superoptimization is an idea which produces perfectly optimal code, in place of the code we currently have generated by compilers. This is typically done via a bruteforce search of every possible instruction sequence, checking whether it performs the desired actions and accepting the sequence if it is the optimal one.<br />
<br />
The problem is clearly very difficult, exploding in size as the length of the instruction sequence increases. However, superoptimization gaining traction as a method of optimizing programs. In this feasibility study we hope to find which techniques are extensible enough to bring out of the academic community to be using in a commercial setting. This means that the superoptimizer needs to be able to handle all the corner cases that may arise in production code. If it turns out that superoptimization is not currently feasibly, by the end of the project we will have a research roadmap describing what will need to be done next.<br />
<br />
<br />
<br />
== Support ==<br />
<br />
This research is supported by the [https://www.innovateuk.org/ UK Technology Strategy Board].<br />
<br />
== Contact ==<br />
<br />
Any enquiries should be directed in the first instance to [mailto:info@embecosm.com info@embecosm.com].</div>James