Generating assembler by dsl

In the end everything boils down to risc instruction generation. From there on down, it is pretty much mechanics. And everything between ruby and the risc layer is to make the step towards risc easier.

Risc is after all very much like assembler, and programming any system in assembler is challenging. The simple object model helps, and so does the simple risc instruction set, but for a long time i was struggling to create readable code that generates the risc.

The main places we find this code is the SlotMachine layer, eg in the calling convention and in the Builtin module, generating code that can not be expressed in ruby.

Compiler

When the code goes from ruby to Sol, or Sol to SlotMachine, methods on the previous (tree) structure return the next structure. But going to risc the code evolved to use a different approach. The unit of compilation is a block or method, and the respective Compiler is passed into the generating ruby method.

The risc code is then added directly to the compiler. If anything is returned it is used by the caller for other things. This change happened as the compiler needed to be passed in as it carries the scope, ie is needed for variable resolution.

It is also easier to understand (albeit maybe after getting used to it), as the generation of the structure after the return was not intuitive. The Compiler's central method to capture the code, is add_code, and the argument must be a risc instruction.

While this worked (and works) the resulting code is not very concise, cluttered with all kind of namespace issues and details.

Builder

From these difficulties was born the DSL approach that is implemented in the Builder. The Builder uses method_missing to resolve names to registers. Those registers, or more precisely RegisterValues, are then used to generate risc instruction by overloading operators.

Most of the work at the risc level , probably 60% or more, is shuffling data around. This is covered by the four instruction SlotToReg, RegToSlot, Transfer and LoadConstant using the << operator with different arguments. See below:

  def build_message_data( builder )
    builder.build do
      space? << Parfait.object_space
      next_message? << space[:next_message]

      next_message_reg! << next_message[:next_message]
      space[:next_message] << next_message_reg
      ....
    end

Variable definitions with ? or ! . Variables must be defined before use.
space becomes a variable, or a named register
First line generates a constant_load, because the right side is a constant
second (and third) line generates a SlotToReg , because the left is a register and the right is a slot
fourth line generates a RegToSlot, as the left side is a slot, and the right a register
all generated instructions are automatically added to the compiler

As you can see the code is quite readable. Mapping names to the registers makes them feel like normal variables. (They are not off course, they are instances of RegisterValue stores in a hash in Builder). The overloading of [] , which just creates an intermediate RValue, makes the it look like the result of the array access is transferred to the register. And that is exactly what is happening. The next_message on the right is a register, and the array indexing access the "register" is an object. The "index", also called next_message is an instance variable. The builder magic looks up the index in the type (Message) and does an indexed memory access, which is exactly what a SlotToReg is.

There are also other ways to use the Builder: One is by just using the instance methods, ie any code can be added with add_code, also inside the block

  if_zero ok_label
  ...
  branch  while_start_label
  add_code exit_label

Here we use the if_zero method, that generates a IfZero instruction
The second line is a creates a Branch, much the same way
The last line adds a label, using the add_code

And in these last examples we see how operators can be called, or indeed any generating function can be called

  builder.build do
    integer_const! << 1
    integer_tmp.op :>> , integer_const
    ...
    Risc::Builtin::Object.emit_syscall( builder , :exit )
    ...

The first line load a fixnum, which is LoadData in risc terms
The second line calls an operator >>, which gets translated to an OperatorInstruction, that leaves the result in the second argument
The last line invokes some shared code that does an system exit. This is in essence much like inlining the exit code.