LLVM, code portability tool box

Logo LLVM émergeant d'un ordinateur

LLVM brings this along with the IR (Intermediate Representation) code it produces. This object language is quite similar to the build language, and optimizing its code, which would be difficult to produce and maintain in a high-level language, provides the necessary speed. IR is then executed by the Just In Time compiler, which translates it into machine code almost immediately.

LLVM (Low Level Virtual Machine) is more than a virtual machine, it is also a compiler infrastructure, a set of tools written in C++ and running on Unix and Linux systems, as well as on Windows. These tools work with the bitcode (not the bytecode) of LLVM, which is the packaging of the IR code into a distributed module.
The tools consist of a Clang compiler that supports four languages ​ ​ and produces bitcode (including IR) or binary executable, code optimizer, LDB debugger, link editor, JIT virtual machines, interpreter.
It was created by the University of Illinois and received input from Apple, who wrote Clang in particular.

LLVM gives new youth to older programs written in C, C++, or other languages (all statically typed languages can be compiled into IR). It allows you to run them on new systems. They can be converted to JavaScript (with Emscribtem) and work in browsers. Or thanks to the Portable Own Client, after transferring to IR, work also on any system...
Thus, LLVM is an alternative to Java with the addition of Web as a possible target: its IR code can be compiled into Asm.js or WebAssembly.

However, there are also disadvantages. Runtime from LLVM does not include a garbag collector, this must be provided with the runtime of the compiled language. In addition, the intermediate code is not portable, you need to create code specific to the processor architecture. That is why WebAssembly was invented. LLVM is also called a moving target because the code it produces varies over time.

In 2018, employees unexpectedly leave the project due to the fact that it has become too political and puts the social agenda above competence, which leaves doubts about the future quality of the code.

LLVM operation diagram

Diagramme du fonctionnement de LLVM

LLVM can generate bitcode from many statically typed languages: C and Objective C with CLang, Java, ADA, Fortran with GCC and other languages ​ ​ with other compilers, since they support output bitcode.

This bit code is optimized and can be used directly by the LLVM virtual machine. With link editing and static compilation, it can become an executable binary. You can also resort to Emscribtem to convert it to JavaScript or Asm.js, which allows you to promote the program in your browser.

LLVM includes the JIT virtual machine used by Mono, Julia and many other projects.

Difference between Java code and LLVM

It is best to see the difference between the codes produced, for example:

Either a simple function in C or Java for compilation:

int arith(int x, int y, int z) {    
    return(x * y + z);  
}

LLVM produces this IR code:

define i32 @arith(i32 %x, i32 %y, i32 %z) {  
   entry:    
   %tmp = mul i32 %x, %y    
   %tmp2 = add i32 %tmp, %z    
   ret i32 %tmp2  
}

While Java produces this bytecode (it can be seen using the JDK java tool):

public class demo.Demo {
  public static int arith(int, int, int);
    Code:
       0: iload_0
       1: iload_1
       2: imul
       3: iload_2
       4: iadd
       5: ireturn
}

It can be seen that baytecode, in addition to being closer to machine language, uses the stack to store data and perform operations on it, while IR uses registers and memory areas.

And

difference between bitcode and bytecode

What is the difference between bitcode and IR? Why are we talking about bitcode and not bytecode, as happens with Java?

In both cases, the code is executed by the virtual machine, JIT or not. The name bytecode comes from the fact that the instruction set was originally encoded in byte (byte in English). This is no longer necessary, but the stream remains a byte stream (bytestream), while we use bitcode to note that the stream is expressed in bits (bittrame), which means not in bytes, but in units of variable sizes.

IR (Intermediate Representation) is a language designed for a virtual machine or compiler, and it is encapsulated in a file, which in the case of LLVM is called bitcode. It is encoded as a bitstream (bitstream) consisting of blocks (blocks) and records (records).

Toolbox

On Windows, you can use Visual Studio or Eclipse CDT with the LLVM plugin. QtCreator can also use Clang.