JIT optimizer for C/C++

I was reading about the advantages of JIT over precompiled and one of those mentioned was that a JIT could adjust branch predictions based on actual runtime data. Now it’s been a long time since I wrote a compiler in college, but it seems to me that something similar can be achieved for precompiled code also in most cases (where there are no explicit gotos).

Consider the following code:

   test x
   jne L2:
L1: ...
   jmp L3:
L2: ...
L3:

If we have some runtime instrumentation that sees how many times the ‘jne L2’ is true, it could physically swap all the instructions in the L1: block and the L2: block. Of course, it would have to know that no thread is within either block during the swap, but those are details…

   test x
   jeq L1:
L2: ...
   jmp L3:
L1: ...
L3:

I understand there are also issues when the program code is loaded in readonly memory, etc. but it’s an idea.

So my question is, is such a JIT optimization feasible for C/C++ or am I missing some fundamental reason why this cannot be done? Are there any JIT optimizers for C/C++ out there?

  • Git DEFLATE/optimized zlib
  • Deployment Strategy for Require JS Optimized/Concatenated Website Files
  • Is running git update-index --refresh from PS1 prompt safe?
  • git usage with remote 'origin' repository
  • How to optimize git update-index?
  • Revalidate opcache only after git push
  • HLSL branch avoidance
  • 4 Solutions collect form web for “JIT optimizer for C/C++”

    Most modern CPU support branch prediction. They have a small cache which allow the CPU to notionally give you the benefits of re-ordering at runtime. This cache is fairly limited in size, but may mean you don’t get as much benefit as you might imagine. Some CPUs can even start executing both branches and discard the work done on the branch not taken.


    EDIT: The biggest advantage in using a JIT compiler comes from code like this.

    if (debug) {
       // do something
    }
    

    JITs are very good at detecting and optimising code which doesn’t do anything. (If you have a micro-benchmark which suggests Java is much faster than C it is most likely the JIT has detected your test isn’t doing anything where the C compiler didn’t)

    You might ask, why doesn’t C have something like this? Because it has something “better”

    #if DEBUG
        // do something
    #endif
    

    This is optimal provided DEBUG rarely changes and you have very few of these flags so you can compile every useful combination.

    The problem this approach is scalability. Every flag you add can double the number of pre-compiled binaries to produce.

    If you have many such flags and it is impractical to compile every combination, you need to rely on branch prediction to optimise your code dynamically.

    There is no JIT compiler for C++ that I am aware of; however, GCC does support feedback directed optimization (FDO), which can use runtime profiling to optimize branch prediction and the like.

    See the GCC options starting with “-fprofile” (HINT: “-fprofile-use” uses the generated runtime profile to perform the optimization, while “-fprofile-generate” is used to generate the runtime profile).

    You are refering to tracing or reoptimizing JITs, not just any old JIT, something like this hasn’t been made for C or C++ (at least not publically). However, you might want to check if LLVM isn’t headed that way with a branch (considering its both a compiler and JIT) using Clang or GCC front ends, as I’ve seem some topics suggesting it might be implemented.

    The HP Dynamo binary recompiler demonstrated that it is possible to achieve speed-ups of up to 20 % on optimized code produced by a C++ compiler. Dynamo isn’t exactly a JIT compiler since it starts with arbitrary machine code instead of some higher level representation such as JVM bytecode or .NET CIL, but in principle a JIT for C++ could only be more efficient than Dynamo. See: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.12.7138&rank=1

    Dynamo was created for the HP PA-RISC architecture, and never offered as a commercial product, so it isn’t of much use in the current world dominated by x86 variants. I wonder if VMware, Connectix or Parallels have ever played around with adding optimization passes to their recompilers, or have they already got rid of binary translation in favour of the virtualization features in the latest x86 CPUs.

    Git Baby is a git and github fan, let's start git clone.