The Just In Time (JIT) compiler

The JIT compiler translates statement blocks, which have been opened using the compile keyword, to an internal bytecode (p-code) representation. This p-code is optimized by some post processing routines and then finally translated to native CPU code thus speeding up execution time by up to 50 times.

By using a CPUTable to emit the actual machine code, most of the platform-dependent code is removed from the JIT compiler at source level. The CPUTable contains the micro-programs for all p-code instructions. In order to add support for other CPU architectures, this table can be exchanged for a more suitable set of micro programs.

P-Code virtual machines have been around for some decades (read a paper saying 1950ies). Nevertheless I first became aware of VMs when I heard of Sun Microsystems Java language. Actually, I have never been much into Java programming (had to learn it for school once) and I did not look at the Java p-code instruction set or Java VM when I started the TKScript JIT (to be true: When I started optimizing the JIT later on, I used a Java decompiler and actually *had* a look at the Java VM (a bit). Guess the JIT design is quite obvious anyway since the parser tree is stack based, too..). The main sources of inspiration for the JIT were

classic arcade/home computer emulators, most noticably the Un*x Amiga Emulator (UAE) (features a kewl 68K JIT).
some years ago a friend of mine told me about his hardware Forth stack cpu university project, which seemed very suitable for ..stack based languages :) Hi MiG!

Some funny thing I just noticed (as of 10-Apr-2004) is that the Forth language had "Integrated access to assembly language.". Originally, the TkScript JIT compiler also included an X86/X87 inline assembler which was removed in 2003 to ease portability.

Probably the most important reference for bytecode virtual machines is UCSD Pascal, read more about it here.

Please notice that TkScript is basically not a bytecode-interpreted language. The bytecode used here is only an intermediate step for the native code generator. It is generated only for compile{/*...*/} statement sequences.

The native-code generator uses a peephole optimizer, which moves an optimization "window" over the generated code and collapses n instructions to a single one. Take a look at the difference between these two examples: sieve_dasm_opt and sieve_dasm_noopt.

Usage

The programmer must tag the statement blocks which are to be compiled to native code by opening them using the keyword compile.

Example

Example:

        int flags[8191]; flags.numElements=8191;

        compile
        {
                int size= 8190;
                int i, prime, k, count, iter;
                trace "Eratosthenes Sieve prime number calculation";
                trace "10 iterations<br>";

                for (iter = 1; iter <= 10; iter++) 
                {
                        count = 0;

                        for (i = 0; i <= size; i++) flags[i] = true;

                        for (i = 0; i <= size; i++) 
                        {
                                if (flags[i]) 
                                {
                                        prime = i + i + 3;
                                        k = i + prime;
                                        while (k <= size) 
                                        {
                                                flags[k] = false;
                                                k += prime;
                                        }
                                        count ++;
                                }
                        }
                }
                trace count + " primes";
        }

Example disassembly

"sieve.tks" p-code disassembly:

(#0000)00000000: movvc size 8190 (0.000000f);
(#0001)00000003: pushc 9659160 (0.000000f);
(#0002)00000005: loadc 4370488 (0.000000f);
(#0003)00000007: apicall;
(#0004)00000008: incstp;
(#0005)00000009: pushc 9659320 (0.000000f);
(#0006)0000000b: loadc 4370488 (0.000000f);
(#0007)0000000d: apicall;
(#0008)0000000e: incstp;
(#0009)0000000f: movvc iter 1 (0.000000f);
(#0010)00000012: jivic iter > 10 005c;
(#0011)00000016: movvc count 0 (0.000000f);
(#0012)00000019: movvc i 0 (0.000000f);
(#0013)0000001c: jiviv i > size 002b;
(#0014)00000020: pushc 1 (0.000000f);
(#0015)00000022: pushv i;
(#0016)00000024: iasel 0;
(#0017)00000026: siapopl;
(#0018)00000027: inciv i;
(#0019)00000029: bra 001c;
(#0020)0000002b: movvc i 0 (0.000000f);
(#0021)0000002e: jiviv i > size 0058;
(#0022)00000032: pushv i;
(#0023)00000034: siapushl;
(#0024)00000035: sitestz 0054;
(#0025)00000037: pushivaddiv i i;
(#0026)0000003a: pushc 3 (0.000000f);
(#0027)0000003c: siadd;
(#0028)0000003d: popv prime;
(#0029)0000003f: pushivaddiv i prime;
(#0030)00000042: popv k;
(#0031)00000044: jiviv k > size 0052;
(#0032)00000048: pushc 0 (0.000000f);
(#0033)0000004a: pushv k;
(#0034)0000004c: siapopl;
(#0035)0000004d: ivaddiv k prime;
(#0036)00000050: bra 0044;
(#0037)00000052: inciv count;
(#0038)00000054: inciv i;
(#0039)00000056: bra 002e;
(#0040)00000058: inciv iter;
(#0041)0000005a: bra 0012;
(#0042)0000005c: pushc 9594240 (0.000000f);
(#0043)0000005e: loadc 4370488 (0.000000f);
(#0044)00000060: apicall;
(#0045)00000061: incstp;
(#0046)00000062: halt;

Example disassembly (no optimization)

compiler output without optimization ("sieve.tks" disassembly page):

(#0000)00000000: pushc 8190 (0.000000f);
(#0001)00000002: popv size;
(#0002)00000004: pushc 9663256 (0.000000f);
(#0003)00000006: loadc 4370488 (0.000000f);
(#0004)00000008: apicall;
(#0005)00000009: incstp;
(#0006)0000000a: pushc 9663416 (0.000000f);
(#0007)0000000c: loadc 4370488 (0.000000f);
(#0008)0000000e: apicall;
(#0009)0000000f: incstp;
(#0010)00000010: pushc 1 (0.000000f);
(#0011)00000012: popv iter;
(#0012)00000014: pushv iter;
(#0013)00000016: pushc 10 (0.000000f);
(#0014)00000018: sicmpb <=;
(#0015)00000019: sitestz 0075;
(#0016)0000001b: pushc 0 (0.000000f);
(#0017)0000001d: popv count;
(#0018)0000001f: pushc 0 (0.000000f);
(#0019)00000021: popv i;
(#0020)00000023: pushv i;
(#0021)00000025: pushv size;
(#0022)00000027: sicmpb <=;
(#0023)00000028: sitestz 0035;
(#0024)0000002a: pushc 1 (0.000000f);
(#0025)0000002c: pushv i;
(#0026)0000002e: iasel 0;
(#0027)00000030: siapopl;
(#0028)00000031: inciv i;
(#0029)00000033: bra 0023;
(#0030)00000035: pushc 0 (0.000000f);
(#0031)00000037: popv i;
(#0032)00000039: pushv i;
(#0033)0000003b: pushv size;
(#0034)0000003d: sicmpb <=;
(#0035)0000003e: sitestz 0071;
(#0036)00000040: pushv i;
(#0037)00000042: siapushl;
(#0038)00000043: sitestz 006d;
(#0039)00000045: pushv i;
(#0040)00000047: pushv i;
(#0041)00000049: siadd;
(#0042)0000004a: pushc 3 (0.000000f);
(#0043)0000004c: siadd;
(#0044)0000004d: popv prime;
(#0045)0000004f: pushv i;
(#0046)00000051: pushv prime;
(#0047)00000053: siadd;
(#0048)00000054: popv k;
(#0049)00000056: pushv k;
(#0050)00000058: pushv size;
(#0051)0000005a: sicmpb <=;
(#0052)0000005b: sitestz 006b;
(#0053)0000005d: pushc 0 (0.000000f);
(#0054)0000005f: pushv k;
(#0055)00000061: siapopl;
(#0056)00000062: pushv k;
(#0057)00000064: pushv prime;
(#0058)00000066: siadd;
(#0059)00000067: popv k;
(#0060)00000069: bra 0056;
(#0061)0000006b: inciv count;
(#0062)0000006d: inciv i;
(#0063)0000006f: bra 0039;
(#0064)00000071: inciv iter;
(#0065)00000073: bra 0014;
(#0066)00000075: pushc 9598296 (0.000000f);
(#0067)00000077: loadc 4370488 (0.000000f);
(#0068)00000079: apicall;
(#0069)0000007a: incstp;
(#0070)0000007b: halt;

Limitations and hints

the return statement may not be used
not all control structures can be compiled to native code. A hybrid execution mixes C function calls with native code (apicall p-code).
There following statements/expressions can be compiled to native machine code:+
- ARGBExpr
- BuiltinFun (sin, cos, tan, sqrt, rad, deg, abs, frac, tcint, tcfloat, round, rnd)
- Call (C++ method call statement)
- ClassMemberExpr
- ClassMemberOpAssign
- ConstVal
- DoubleArgExpr
- DoWhile
- ECall (C++ method call expression)
- ECallY (new style C++ method call expression)
- ExtFunctionExpr (C function call)
- FunctionECallY (new style C call expression)
- For
- IfElse
- IndexedVarExpr (int and float only)
- IndexedVarAssign (int and float only)
- LoopStatement
- OpObjectMemberAssign
- OpObjectMemberAssignY
- OpVarAssign
- PointerAssign
- PostDecVarExpr
- PostDecVarStat
- PostIncVarExpr
- PostIncVarStat
- PreDecVarExpr
- PreDecVarStat
- PreIncVarExpr
- PreIncVarStat
- RGBExpr
- SingleArgExpr
- UnresolvedModuleMemberExpr
- VarAssign
- VarExpr
- While
the command line option "--forceint" is used to turn off the JIT compiler. Debug tools like e.g. pointer validation are only available when using the interpreted mode.
The JIT compiler can access script-class members but not methods (resp. a hybrid node is inserted instead).
Array boundary checks can be turned off by using the command line switch "--noboundarycheck". This can speed up critical loops by up to 30+%.
Array variables may not be (re-)allocated within a compiled block (JIT program) if they are used in the same block
The current array pointer is stored in an internal structure each time a JIT program is called, so that changes to the array pointers are not recognized until the next program call
the JIT compiler is currently only available for Intel/AMD 80486+ and Motorola 68040+ compatible CPUs
the JIT cpu-table is just a simple binary assembly table, porting the JIT is easy (the 68040 port took ~4 hours and I had to re-learn Motorola assembler after having not programmed it for ~7 years;)
The byte/word array instructions were removed because the philosophy is to leave all low-level number crunching to C++ extensions (which can also be very portable if their source is publically available)
The -da command line switch turns on the disassembler

P-Code instruction set

The following intermediate byte code instruction set is used by the JIT compiler to translate source code to native CPU code. Arguments are usually passed on the hardware stack. Script classes require a static array to keep track of nested class calls. The list may be a little outdated but basically nothing has gravely changed.

    apicall <function_adr> - call API function (first argument last)
    bra <label>  - branch to user defined label  
    fvaddc <var> <const>
    fvaddfv <var1> <var2>
    fvdivc <var> <const>
    fvdivfv <var1> <var2>
    fvmulc <var> <const>
    fvmulfv <var1> <var2>
    fvsubc <var> <const>
    fvsubfv <var1> <var2>
    halt - terminate the VM. 
    iasel <var> load int array variable var into array base register  
    ivaddc <var> <const>
    ivaddiv <var1> <var2>
    ivandc <var> <const>
    ivandiv <var1> <var2>
    ivaslc <var> <const>
    ivasliv <var1> <var2>
    ivasrc <var> <const>
    ivasriv <var1> <var2>
    ivdivc <var> <const>
    ivdiviv <var1> <var2>
    iveorc <var> <const>
    iveoriv <var1> <var2>
    ivmodc <var> <const>
    ivmodiv <var1> <var2>
    ivmulc <var> <const>
    ivmuliv <var1> <var2>
    ivorc <var> <const>
    ivoriv <var1> <var2>
    ivsubc <var> <const>
    ivsubiv <var1> <var2>
    jeqivic <var1> <iconst> <label>
    jeqiviv <var1> <var2> <label>
    jgeivic <var1> <iconst> <label>
    jgeiviv <var1> <var2> <label>
    jgtivic <var1> <iconst> <label>
    jgtiviv <var1> <var2> <label>
    jltivic <var1> <iconst> <label>
    jltiviv <var1> <var2> <label>
    jleivic <var1> <iconst> <label>
    jleiviv <var1> <var2> <label>
    jneivic <var1> <iconst> <label>
    jneiviv <var1> <var2> <label>
    fasel <var> load float array variable var into array base register  
    poplv <lvar> - pop local variable lvar from stack 
    popv <var> -pop global/function variable var from stack 
    pushc <const> - push constant value const onto stack 
    pushfvaddc <var> <const>
    pushfvaddfv <var1> <var2>
    pushfvdivc <var> <const>
    pushfvdivfv <var1> <var2>
    pushfvmulc <var> <const>
    pushfvmulfv <var1> <var2>
    pushfvsubc <var> <const>
    pushfvsubfv <var1> <var2>
    pushivandc <var1> <const>
    pushivandiv <var1> <var_2>
    pushivorc <var1> <const>
    pushivoriv <var1> <var_2>
    pushiveorc <var1> <const>
    pushiveoriv <var1> <var_2>
    pushivmodc <var1> <const>
    pushivmodiv <var1> <var_2>
    pushivaddc <var1> <const>
    pushivaddiv <var1> <var_2>
    pushivsubc <var1> <const>
    pushivsubiv <var1> <var_2>
    pushivmulc <var1> <const>
    pushivmuliv <var1> <var_2>
    pushivdivc <var1> <const>
    pushivdiviv <var1> <var_2>
    pushivaslc <var1> <const>
    pushivasliv <var1> <var_2>
    pushivasrc <var1> <const>
    pushivasriv <var1> <var_2>
    pushs - push stackpointer onto stack 
    pushv <var> - push global/function variable var onto stack
    pushdeciv <var> - decrement and push global/function int variable var onto stack 
    pushinciv <var> - increment and push global/function int variable var onto stack 
    pushivdec <var> - push global/function int variable var onto stack and decrement it 
    pushivinc <var> - push global/function int variable var onto stack and  increment it 
    movlv <lvar> <lvar2> - move content of lvar2 to lvar 
    movlvc <lvar> <const> - move constant value const to lvar 
    movv <var> <var2> - move content of var2 to var  
    movvc <var> <const> - move constant value const to var 
    sfatan2 - calc arc tan of st[1]/st[0] (both float)  
    sfabs - calc absolute value of stack float val  
    sfadd - add stack float values  
    sfcmp <relop> <label> - compare float values from stack and branch to label if comparison evaluates to 1 
    sfcos - calc cos of stack float val (rad)  
    sfdiv sp[0]=sp[1]/sp[0]  - 
    sffrac - calc remainder of stack float val  
    sfneg - change sign of stack float val 
    sfmul - multiplicate stack float values  
    sfpow - st[0]= st[0] raised to the power of st[1]
    sfround - round stack float val  
    sfsin - calc sin of stack float val (rad) 
    sfsub sp[0]=sp[1]-sp[0]  - 
    sfsqrt - calc square root of stack float val  
    sftan - calc tan of stack float val (rad)  
    sfquad - multiply stack float val with itself 
    sfcmpb <relop> - compare stack float values and store result (1,0)  
    siabs - calc absolute value of stack int val  
    siadd - add stack int values  
    siand - bitwise and stack int values  
    siasl sp[0]=sp[1]<<sp[0]  - 
    siasr sp[0]=sp[1]>>sp[0]  - 
    sicmp <relop> <label> - compare int values from stack and branch to label if comparison evaluates to 1  
    sicmpb  compare stack int values and store result (1,0)  
    sidiv sp[0]=sp[1]/sp[0]. - 
    sieor - bitwise eor/xor stack int values  
    siinv - bitwise not int val from stack 
    siloop <label> - decrement int value from stack and branch to label if value>0
    simod sp[0]=sp[1]%sp[0] - 
    simul - multiplicate stack int values  
    sineg - change sign of stack int val  
    sinot - C-like logical not operator. pops int val from stack and pushes either 0 or 1 
    sior - bitwise or stack int values  
    siapopl - read array element
    siapushl - store array element
    sipackargb - pop 4 (byte) ints from stack and push packed 32bit int 
    sipop - pop int/float val from stack and discard it. 
    sipush[0..3] - push constant int val [0..3] onto stack 
    sirnd - push new random int val onto stack  
    sisub sp[0]=sp[1]-sp[0] -  
    siswap - swaps swap upper stack values  
    siswapw - swap upper and lower word of int stack value 
    sitestbz - check if stack int val >0 and set it to either 0 or 1 
    sitestbz2 - check if stack+1 int val >0 and set it to either 0 or 1 
    sitestz <label> - pop int from stack, compare with 0 and branch to label if 1 
    sitestzp <label> - compare int value from stack and branch to label if comparison evaluates to 1 
    siquad - multiply stack int val with itself 
    siunpackargb - pop packed 32bit int from stack and push 4 (byte) ints 
    sreti - grab int/object result value from last C/C++ call (usually in eax)
    sretf - grab float result value from last C/C++ call (usually in st0)
    stcif - typecast stack int val to float  
    stcif2 - typecast stack+1 int val to float  
    stcfi - typecast stack float val to int  
    stcfi2 - typecast stack+1 float val to int

Magic symbols in the CPUtable

Variables, constants and jump/function call addresses in the native code are later replaced by their actual values with the help of some magic (32Bit) place holder symbols that are stored in the CPUTable:

"magic" symbol instruction

0xC0DE1234 halt

0xC0DEC0DE bra, sfcmp, sicmp, sitestz, sitestzp, siloop

0xC0FFC0FE movvc

0xC0FFC4C4 pushv, pushivinc, pushinciv, pushivdec, pushdeciv, popv, movv, movvc, inciv,

0xC0FFC4C4 deciv, pushv

0xC0FFC0FF pushc, loadc

0xC0FFC4C5 movv

0xC0FFAAA1 iasel, iaselbc

0xC0FFABA1 iaselbc

0xC0FFAAA2 fasel

0xC0FFABA2 faselbc

0xC0FFC7C7 incliv, decliv, siapushl, siapushlbc, siapopl, siapoplbc

"magic" symbol	instruction
0xC0DE1234	halt
0xC0DEC0DE	bra, sfcmp, sicmp, sitestz, sitestzp, siloop
0xC0FFC0FE	movvc
0xC0FFC4C4	pushv, pushivinc, pushinciv, pushivdec, pushdeciv, popv, movv, movvc, inciv,
0xC0FFC4C4	deciv, pushv
0xC0FFC0FF	pushc, loadc
0xC0FFC4C5	movv
0xC0FFAAA1	iasel, iaselbc
0xC0FFABA1	iaselbc
0xC0FFAAA2	fasel
0xC0FFABA2	faselbc
0xC0FFC7C7	incliv, decliv, siapushl, siapushlbc, siapopl, siapoplbc

back to index