-
Notifications
You must be signed in to change notification settings - Fork 0
some inteview questions
Q1: Assume your assembly language includes ONLY the following instructions:
'inc REG': increments a given register by one. 'dec REG': decrement a given register by one. 'jnz LABEL': jumps to a given LABEL if the previous instruction's result was not zero. 'HALT': stops running.
Task: A and B registers hold non-negative values. The program should calculate the value of |A-B| and locate the result in C. In addition, the language holds registers C,D,...,Z, which you can assume are initialized at program start to zero.
answer:
// save the A B first
SaveA:
int D
int E
dec A
jnz SaveA
Save_B:
inc F
int G
dec B
jnz Save_B
D: // set A
inc C
dec D
jnz D
DECB:
dec C
jnz T2
B:
inc C
dec G
jnz G
DECA:
dec D
jnz DECA
halt
T2:
dec G
jnz DECB
Q2 :The performance of a program is slower than the same program before the backend of the compiler has changed. What direction would you look into this issue and solve it?
Maybe the IR is a good start point for llvm
Q3: Optimize this code (like what compiler would reconstruct an equivalent source code as):
int fn (int a, int b) {
int sum = 0;
for(int i=4*a;i>0;i--) {
sum+=b*i*i;
}
return sum;
}
I'm not sure the correct answer
what i can do
1 i=4*a; ==> i= a>>2;
3 loop expand
4 sse optimize
Q4 Given a sequence of instructions, form a control flow graph (CFG) for it. Then write the instructions in SSA form. Then perform a specified code transformation on the CFG. Often this will be PRE - partial redundancy elimination.
Q5 Given two sequences of instructions that are functionally the same but have reordered operations, describe why you’d prefer one over the other and under which conditions. I usually ask this in the context of GPUs with Apple Metal or CUDA.
Q6 Given a (bad) sequence of code, describe what it does. Tell me what’s wrong with it, and how a static analysis could (or could not) detect the error at compile time. Build the static analysis.
Q7 For compiler engineers that know Python: Here is some Python code that appends to a string. It takes a really long time to run. Why? For backend compiler work: what are the limitations of X architecture (GPU, VLIW, out-of-order superscalar, FPGA, something else)? What is it good at running? What is it not good at running? Which code transformations, manual or automatic tend to improve performance?
Q8 Some code doesn’t run fast enough, so the developer added explicit prefetching. Now the code runs even slower. Why? For positions that may require adapting or creating a programming language: tell me what feature Y does for language Z, what it enables, and how does it complicate your job? Can you think of alternatives?