October 1, 2012

Getting started with OCaml bindings for LLVM

I am fascinated by static analysis. Over the weekend, I learnt the basics of the OCaml language, and its bindings to LLVM for that purpose. This post tells you how you can install the relevant tools on Ubuntu 12.04 (Precise Pangolin), and gets you started analysing some simple "hello world" code.

The first thing you need to do is make sure you have OCaml and LLVM installed.

sudo apt-get install ocaml llvm-3.0 llvm-3.0-dev clang

Notice that you've installed version 3.0 of LLVM. This is because, on the current version of Ubuntu, clang is also verion 3.0, and so it is possible compile with clang (to LLVM bitcode or assembly) and read it with the corresponding OCaml bindings.

Next, you'll need a file to analyse. Obviously for any interesting project, you're going to be using existing code, but to demonstrate it working, I created a simple "Hello World" file in hello.c,

#include <stdio.h>

int
main(int argc, char *argv[])
{
    printf("Hello world!\n");
    return 0;
}

And compiled using clang to a bitcode file:

clang -emit-llvm -c hello.c -o hello.bc

We could have compiled to the assembly like this:

clang -emit-llvm -S -c hello.c -o hello.ll

Now we have an LLVM bitcode file, which we would like to begin to analyse with the OCaml bindings. In a new OCaml file, bc.ml, we first need to access the LLVM libraries:

(* LLVM libraries *)
open Llvm;;
open Llvm_bitreader;;

Then it is simply a matter of reading in the bitcode file we just created. For that we first need a memory buffer (of the file itself), and a context.

(* Load the bitcode file *)
let mem = MemoryBuffer.of_file "./hello.bc";;
let context = create_context ();;
let modl = parse_bitcode context mem;;

This gives us a module, modl, which contains the information we want. Using it we can find out all sorts of information out about the bitcode file, from global variables used to the CFG. For example, we could print out all the globals variables in the file like this:

(* Show all global variables *)
let show_global var =
    print_string "\n";
    print_string "Found global variable:\t";
    print_string (value_name var);
    print_string "\n";;

iter_globals show_global modl;;

To compile our OCaml file, we need to be slightly careful. This is because we need to tell the compiler where the LLVM library is (via the -I flag). To compile,

ocamlc -I +llvm-3.0 -c bc.ml

And to link, we need to include the libraries themselves,

ocamlc -cc g++ -I +llvm-3.0 llvm.cma llvm_bitreader.cma bc.cmo -o bc

Which produces an executable file which prints out all the globals in "hello.bc".

I've only had a chance to play for a single weekend, so I apologise for any newbie things which could be done better. Feel free to comment to correct me! OCaml seems like a powerful, yet understandable, functional language and in combination with LLVM, there's more interesting projects to do here than I have free time.