Issues in the Design of Code Generator

Pranav BawgikarPranav Bawgikar
3 min read

[39]

Input to Code Generator

The input to code generator is the intermediate code generated by the front end, along with information in the symbol table that determines the run-time addresses of the data-objects denoted by the names in the intermediate representation. Intermediate codes may be represented mostly in quadruples, triples, indirect triples, postfix notation, syntax trees, DAG’s etc. Assume that they are free from all of syntactic and state semantic errors, the necessary type checking has taken place and the type-conversion operators have been inserted wherever necessary.

Target program

Target program is the output of the Code Generator. The output may be absolute machine language, relocatable machine language or assembly language.

  • Absolute machine language as an output has advantages that it can be placed in a fixed memory location and can be immediately executed.

  • Relocatable machine language as an output allows subprograms and subroutines to be compiled separately. Relocatable object modules can be linked together and loaded by linking loader.

  • Assembly language as an output makes the code generation easier. We can generate symbolic instructions and use macro-facilities of assembler in generating code.

Instruction selection

Selecting best instructions will improve the efficiency of the program. It should include the instructions that should be complete and uniform. Instruction speeds and machine idioms also plays a major role when efficiency is considered. But, if we do not care about the efficiency of the target program then instruction selection is straight-forward.

For example, x = x + 1

MOV x, R0
ADD #1, RO
MOV R0, x

Can be replaced by

INC x

Another example, the respective three-address statements would be translated into latter code sequence as shown below:

P := Q + R

S := P + T

MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S

Here the fourth statement is redundant as the value of the P is loaded again in that statement that just has been stored in the previous statement. It leads to an inefficient code sequence. A given intermediate representation can be translated into many code sequences, with significant cost differences between the different implementations. A prior knowledge of instruction cost is needed in order to design good sequences, but accurate cost information is difficult to predict.

Register allocation issues

Use of registers make the computations faster in comparison to that of memory, so efficient utilization of registers is important. The use of registers are subdivided into two subproblems:

  • During Register allocation, we select only those set of variables that will reside in the registers at each point in the program.

  • During a subsequent Register assignment phase, the specific register is picked to access the variable.

For example,

t = x + y

t = t * z

t = t / w

MOV  x, R0
ADD y, R0
MUL z, R0
DIV w, R0
MOV  R0,t

Evaluation order

The code generator decides the order in which the instruction will be executed. The order of computations affects the efficiency of the target code. Among many computational orders, some will require only fewer registers to hold the intermediate results. However, picking the best order in general case is a difficult NP-complete problem.

0
Subscribe to my newsletter

Read articles from Pranav Bawgikar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pranav Bawgikar
Pranav Bawgikar

Hiya 👋 I'm Pranav. I'm a recent computer science grad who loves punching keys, napping while coding and lifting weights. This space is a collection of my journey of active learning from blogs, books and papers.