Find me on Codeberg

Memory layout and management

Memory management must be one of the main horrors of computing. That’s why garbage collected languages like ruby are so great. Even simple malloc implementations tend to be quite complicated. Unnecessary so, if one used object oriented principles of data hiding.

Objects

As has been mentioned, in a true OO system, object tagging is not really an option. Tagging being the technique of adding the lowest bit as marker to pointers and thus having to shift ints, loosing a bit, and having some check before any pointer access.
Mri does this for Integers but not other value types. We accept this and work with it and just say “off course” , but it’s not modelled well.

In a real OO system, everything really is an object. Strings are objects, floats, symbols, arrays, and yes, Integers are normal Objects

The difference with Integers is that they are immutable. As are Symbols in Ruby 2.x and Strings in Ruby 3.x and Javascript. Sensibly so, and in general the property of immutable should be modelled explicitly.

Object Memory Layout

When we say everything in an Object, what does that mean in practise. Well, in short it means every Object has a Type, and the type is the first word in the memory layout.

A Type stores the instance variable names, the methods, and refers to a class, which in turn defines behaviour in a ruby.

As a further stipulation, making our life easier, we define objects to be of fixed size (according to type) and a multiple of a cache line long. Objects are managed in Pages of same sized objects and the ObjectSpace, see below.

Continuations

But (i hear), ruby is dynamic, we must be able to add variables and methods to an object at any time. So the type, or length, can’t be fixed. Ok, we can change the Type every time, but when any empty slots have been used up, what then.

Then we use Continuations, so instead of adding a new variable to the end of the object, we use a new object and store it in the original object. Thus extending the object. A linked list basically.

Continuations are pretty normal objects and it is just up to the object to manage the redirection. Off course this may splatter objects a little, but in running application this does not really happen much. Most instance variables are added quite soon after startup, just as functions are usually parsed in the beginning.

We can avoid the added redirection of Continuations by clever code analysis and over dimensioning. While this, and the whole concept of fixed size objects, may seem wasteful at first sight, it is much more efficient than using a hash (as in mri, that not only stores all those names over and over, but also has buckets, list functionality and just about uses a cache line for a single variable)

Data

So if we’re not tagging and we only have Objects where is the data. Where is that int, that char, the byte-buffer.

Just to make that totally clear: The OO level has no access, no idea of data.

Data does off course exist, but it is hidden, beyond the instance variables, inaccessible to normal ruby code.

The way this works, is that all access to data, or one should really say all functionality that is needed to perform on data, is implemented in the lower levels, mostly the Risc layer.

In the Builtin module, we can define methods in purely Risc terms. The Risc layer does have access to the memory, and can thus do things with it. Let's look at the simple example of Integer addition. The method is defined in Builtin, on the Integer type. The method requires one argument and checks that that too is an Integer. Then it loads the data from both objects, performs the operation, "allocates" a new Integer object and saves the machine word into it.

We can also define SlotMachine Instruction to manipulate data, but as work is done in methods, the Builtin approach has been sufficient up to now.

Again one may think this is wasteful, the simplest of Integer operation thus taking 10-20 cpu instructions instead of one. But not only are we speed-wise up against interpretation (ie not one), but number crunching is not really what ruby is made for. And if it ever is, there is always the possibility to optimise those Builtin methods.

Pages, Space and object allocation

A Page manages a fixed size number of objects of the same size. They do not need to be of same Type, just same memory length.

The Space, manages Pages, and is ultimately responsible for "allocating" new memory.

Objects are not allocated in the same way as mri, but rather recycled. Mri used C and specifically malloc to do memory allocation and freeing.

RubyX only every allocates Pages, or many pages (depending on object size), and does so by getting it directly from the operating system (system call).

Objects are only every recycled. Pages keep free lists of the objects (of the size they manage) that are not used, and hand them out upon request. When the garbage collector deems an object to be "freed" it is put back on the free-list of the appropriate Page. This is done by changing the type of the object.

Status

Not all of this has been implemented yet, only the static side of this. Pages and Space are still barely existent in terms of functionality and objects are only statically allocated at the moment.

But fixed size objects (and off course the type system) are done. When creating a binary, only fixes sized objects are written. The next step will be to sort them according to size and arrange them in Pages.

Integers and their basic operations are done, and strings have basic read/write access, but no allocation yet.