特に-O2で有効にならない最適化オプションを
-finline-functions (enabled at -O3)
Integrate all simple functions into their callers. The compiler heuristically decides which functions are simple enough to be worth integrating in this way.
If all calls to a given function are integrated, and the function is declared static, then the function is normally not output as assembler code in its own right.
-funswitch-loops (enabled at -O3)
Move branches with loop invariant conditions out of the loop, with duplicates of the loop on both branches (modified according to result of the condition).
-fpredictive-commoning (enabled at -O3)
Perform predictive commoning optimization, i.e., reusing computations (especially memory loads and stores) performed in previous iterations of loops.
-fgcse-after-reload (enabled at -O3)
When -fgcse-after-reload is enabled, a redundant load elimination pass is performed after reload. The purpose of this pass is to cleanup redundant spilling.
-ftree-vectorize (enabled at -O3)
Perform loop vectorization on trees.
-mtune=cpu-type
Tune to cpu-type everything applicable about the generated code, except for the ABI and the set of available instructions. The choices for cpu-type are:
generic
Produce code optimized for the most common IA32/AMD64/EM64T processors. If you know the CPU on which your code will run, then you should use the corresponding -mtune option instead of -mtune=generic. But, if you do not know exactly what CPU users of your application will have, then you should use this option.
As new processors are deployed in the marketplace, the behavior of this option will change. Therefore, if you upgrade to a newer version of GCC, the code generated option will change to reflect the processors that were most common when that version of GCC was released.
native
This selects the CPU to tune for at compilation time by determining the processor type of the compiling machine. Using -mtune=native will produce code optimized for the local machine under the constraints of the selected instruction set. Using -march=native will enable all instruction subsets supported by the local machine (hence the result might not run on different machines).
-fmodulo-sched
Perform swing modulo scheduling immediately before the first scheduling pass. This pass looks at innermost loops and reorders their instructions by overlapping different iterations.
-fmodulo-sched-allow-regmoves
Perform more aggressive SMS based modulo scheduling with register moves allowed. By setting this flag certain anti-dependences edges will be deleted which will trigger the generation of reg-moves based on the life-range analysis. This option is effective only with -fmodulo-sched enabled.
-funroll-loops
Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop. This option makes code larger, and may or may not make it run faster.
-fprefetch-loop-arrays
If supported by the target machine, generate instructions to prefetch memory to improve the performance of loops that access large arrays.
This option may generate better or worse code; results are highly dependent on the structure of loops within the source code.
Disabled at level -Os.
-ffast-math
Sets -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range.
This option causes the preprocessor macro __FAST_MATH__ to be defined.
This option is not turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
最終更新:2010年02月17日 18:07