Course web page for Fall 2021.
Last time we finished with a discussion of taking a function’s address and then doing an indirect call.
The address of a label can be obtained using the leaq
instruction
and PC-relative addressing:
leaq add1(%rip), %rbx
Calling a function whose address is in a register, i.e., indirect function call.
callq *%rbx
arg ::= ... | (FunRef label)
instr ::= ... | (IndirectCallq arg) | (TailJmp arg)
| (Instr 'leaq (list arg arg))
def ::= (Def label '() '() info ((label . block) ...))
pseudo-x86 ::= (ProgramDefs info (def...))
The callq
instruction
But there is more to do to make a function call:
The C calling convention uses the following six registers (in that order) for argument passing:
rdi, rsi, rdx, rcx, r8, r9
The calling convention says that the stack may be used for argument
passing if there are more than six arguments, but we shall take an
alternate approach that makes it easier to implement efficient tail
calls. If there are more than six arguments, then r9
will store a
tuple containing the sixth argument and the rest of the arguments.
The instructions for each function will have a prelude and conclusion
similar to the one we’ve been generating for main
.
The most important aspect of the prelude is moving the stack pointer down by the size needed the function’s frame. Similarly, the conclusion needs to move the stack pointer back up.
Recall that we are storing variables of vector type on the root stack.
So the prelude needs to move the root stack pointer r15
up and the
conclusion needs to move the root stack pointer back down. Also, in
the prelude, this frame’s slots in the root stack must be initialized
to 0
to signal to the garbage collector that those slots do not yet
contain a pointer to a vector.
As we did for main
, the prelude must also save the contents of the
old base pointer rbp
and set it to the top of the frame, so that we
can use it for accessing local variables that have been spilled to the
stack.
Caller View | Callee View | Contents | Frame |
---|---|---|---|
8(%rbp) | return address | ||
0(%rbp) | old rbp | ||
-8(%rbp) | callee-saved | Caller (e.g. map) | |
… | … | ||
-8(j+1)(%rbp) | spill | ||
… | … | ||
8(%rbp) | return address | ||
0(%rbp) | old rbp | ||
-8(%rbp) | callee-saved | Callee (e.g. add1 as f) | |
… | … | ||
-8(j+1)(%rbp) | spill | ||
… | … |
Recall that the registers are categorized as either caller-saved or callee-saved.
If the function uses any of the callee-saved registers, then the previous contents of those registers needs to be saved and restored in the prelude and conclusion of the function.
Regarding caller-saved registers, nothing new needs to be done. Recall that we make sure not to assign call-live variables to caller-saved registers.
Normally the amount of stack space used by a program is O(d) where d is the depth of nested function calls.
This means that recursive functions almost always use at least O(n) space.
However, we can sometimes use much less space.
A tail call is a function call that is the last thing to happen inside another function.
Example: the recursive call to tail_sum
is a tail call.
(define (tail_sum [n : Integer] [r : Integer]) : Integer
(if (eq? n 0)
r
(tail_sum (- n 1) (+ n r))))
(+ (tail_sum 3 0) 36)
In Python:
def tail_sum(n : int, r : int) -> int:
if n == 0:
return r
else:
return tail_sum(n - 1, n + r)
print( tail_sum(3, 0) + 36)
Because a tail call is the last thing to happen, we no longer need the caller’s frame and can reuse that stack space for the callee’s frame. So we can clean up the current frame and then jump to the callee. However, some care must be taken regarding argument passing.
The standard convention for passing more than 6 arguments is to use slots in the caller’s frame. But we’re deleting the caller’s frame. We could use the callee’s frame, but its difficult to move all the variables without stomping on eachother because the caller and callee frames overlap in memory. This could be solved by using auxilliary memory somewhere else, but that increases the amount of memory traffic.
We instead recommend using the heap to pass the arguments that don’t fit in the 6 registers.
Instead of callq
, use jmp
for the tail call because the return
address that is already on the stack is the correct one.
Use rax
to hold the target address for an indirect jump.
(ProgramDefsExp info defs exp)
==>
(ProgramDefs info (append defs (list mainDef)))
where mainDef
is
(Def 'main '() 'Integer '() exp')
We’ll need to generate leaq
instructions for references to
functions, so it makes sense to differentiate them from let-bound
variables.
(Var x)
==>
(Var x)
(Var f)
==>
(FunRef f)
(Let x (Bool #t)
(Apply (If (Var x) (Var 'add1) (Var 'sub1))
(Int 41)))
=>
(Let x (Bool #t)
(Apply (If (Var x) (FunRef 'add1) (FunRef 'sub1))
(Int 41)))
Transform functions so that have at most 6 parameters.
(Def f ([x1 : t1] ... [xn : tn]) rt info body)
==>
(Def f ([x1 : t1] ... [x5 : t5] [vec : (Vector t6 ... tn)]) rt info
new-body)
and transform the body
, replace occurences of parameters x6
and
higher as follows
x6
==>
(vector-ref vec 0)
x7
==>
(vector-ref vec 1)
...
If there are more than 6 arguments, pass arguments 6 and higher in a vector:
(Apply e0 (e1 ... en))
==>
(Apply e0 (e1 ... e5 (vector e6 ... en)))
Treat FunRef
and Apply
as complex operands.
(Prim '+ (list (Int 5) (FunRef add1)))
=>
(Let ([tmp (FunRef add1)])
(Prim '+ (list (Int 5) (Var tmp))))
Arguments of Apply
need to be atomic expressions.
Add cases for FunRef
and Apply
to the three helper functions
for assignment, tail, and predicate contexts.
In assignment and predicate contexts, Apply
becomes Call
.
In tail contexts, Apply
becomes TailCall
.
You’ll need a new helper function for function definitions.
The code will be similar to the previous code for Program
Previous assignment:
(define/override (explicate-control p)
(match p
[(Program info body)
(set! control-flow-graph '())
(define-values (body-block vars) (explicate-tail body))
(define new-info (dict-set info 'locals vars))
(Program new-info
(CFG (dict-set control-flow-graph 'start body-block)))]
))
adapt the above to process every function definition.
FunRef
becomes leaq
We’ll keep FunRef
as an instruction argument for now,
placing it in a leaq
instruction.
(Assign lhs (FunRef f))
==>
(Instr 'leaq (list (FunRef f) lhs'))
Call
becomes IndirectCallq
(Assign lhs (Call fun (arg1 ... argn)))
==>
movq arg'1 rdi
movq arg'2 rsi
...
(IndirectCallq fun')
(Instr 'movq (Reg 'rax) lhs')
TailCall
becomes TailJmp
We postpone the work of popping the frame until later by inventing an
instruction we’ll call TailJmp
.
(TailCall fun (arg1 ... argn))
==>
movq arg'1 rdi
movq arg'2 rsi
...
(TailJmp fun')
(Def f ([x1 : T1] ... [xn : Tn]) rt info blocks)
1. blocks => blocks'
2. prepend to start block from blocks'
movq rdi x1
...
4. parameters get added to the list of local variables
=>
(Def f '() '() new-info new-CFG)
alternative: replace parameters (in the CFG) with argument registers
New helper function for function definitions.
leaq
reads from the first argument and writes to the second.
IndirectCallq
and TailJmp
read from their argument and you must
assume they write to all the caller-saved registers.
New helper function for function definitions.
Compute one interference graph per function.
Spill vector-typed variables that are live during a function call.
(Because our functions make trigger collect
.) So add interference
edges between those variables and the callee-saved registers.
The destination of leaq
must be a register.
The destination of TailJmp
should be rax
.
(TailJmp %rbx)
==>
movq %rbx, %rax
(TailJmp rax)
(FunRef label) => label(%rip)
(IndirectCallq arg) => callq *arg
(TailJmp rax)
=>
addq frame-size, %rsp move stack pointer up
popq %rbx callee-saved registers
...
subq root-frame-size, %r15 move root-stack pointer
popq %rbp restore rbp
jmp *%rax jump to the target function