Pharo VM Transpiler: My First Six Weeks
In this article I'll be sharing the progress made during my first six weeks of the coding period on the GSoC program in which I've been contributing to the Pharo-to-C VM Transpiler project.
My goals for this period
My proposal mentioned several inlining issues which were meant to help me "catch up" with the project however, during the community bonding period my mentors and I decided to steer into issues centered around preventing translation errors as they were a much better fit for my proposal's theme.
Some context about Slang's ASTs
As part of its transpilation pipeline Slang uses three different ASTs and understanding their responsibilities is key
RbAST (refactoring browser AST) is the exact representation of a Pharo project and, as its name indicates, it is used by the IDE for tasks such as refactoring, suggestions, etc.
TAST is an intermediate representation between Pharo and C, its nodes hold information that serves the C transpilation such as type system information, on top of that this AST also goes through several transformations such as renamings, inlinings, etc.
CAST is the C code representation, its sole responsibility is to be able to write C code.
Development
Locals type declarations
Pull Requests: #603
Slang has several pragmas (or annotations) for providing information needed for translations. When declaring a variable or an argument (a.k.a. a local) in Slang you must use the var:type:
pragma so the generated C code can type the corresponding variable.
In this example, we are using var:type:
to declare the type of the foo
temporary variable
AClass >> aMethod
| foo |
<var: 'foo' type: 'char *'>
which would be translated to
void someMethod() {
char *foo;
}
A validation was implemented so that declaring a variable/argument without adding its corresponding type declarations will throw an error thus avoiding a problem that would arise when trying to compile the generated C code.
Instance variables type declarations
Pull Requests: #607
Instance variables are the object's attributes, however, in C this isn't so obvious because classes can be translated in two very different manners
Most classes are translated as a set of functions and they declare a set of global variables
In some special cases some classes are marked as structs (this happens when they inherit the
SlangStructType
class), here instance variables are translated as members of the struct.
This development was focused on structs, the way types are assigned to instance variables is by implementing a method (on the class side) called instVarTypeDeclarationsDo
that matches instance variable names with their type.
A validation in the SlangStructType
was implemented so that when emitting the struct code if any instance variable didn't have a corresponding type it fails, in a similar fashion to the last issue this prevents problems down the line.
C Reserved Words
Pull Requests: #613
Due to Smalltalk and C having different keywords, some C keywords may be used as identifiers which can cause the compilation to fail (this issue had already been encountered, read issue #429)
For example, the following method would break when trying to compile C because register
is a reserved word in C
AClass >> aMethod
| register |
This pull request implemented several validations when creating the TAST nodes that will throw an error when some identifier (selector, argument, temporary variable, etc) would conflict with a C keyword.
C Conflicts Renamings
As a continuation of the previous work, a possible improvement was to, instead of throwing an error, automatically rename the identifier during the translation thus giving the developer complete freedom when naming.
To implement this behavior I modified the method CCodeGenerator>>emitCCodeOn:doInlining:doAssertions:
that orchestrates a huge part of the translation, it collects all the necessary TMethods (the TAST node that represents a method), does the inlinings and then emits the corresponding CAST and C code for the whole program. This worked great because I could perform the renames right before any code was emitted.
The renamings by themselves were pretty straightforward as the idea of renaming a variable or a selector already existed, the only logic I had to add was renamings for instance variables.
After finishing with renaming C keywords conflicts I went and did the same for conflicting selectors, which would conflict if they had the same name but with different arguments. Take this example where we define aMethod
and aMethod:
AClass >> aMethod
^true
AClass >> aMethod: anArgument
^true
When translating them to functions both function names would be aMethod
which would cause a conflict and fail when compiling and the same could be said for locals and other selectors.
The final result for the renamings ended up like this
Identifier conflicting with keyword: Appends an
"_1"
to it.Selector conflicting with other selectors: Append the number of arguments to the conflicting selectors, in the previous example the functions would end up like
aMethod0
andaMethod1
.Local variable conflicting with selectors: Precede it with an
"l_"
.
Expectations for the remaining weeks
Although I got pretty sidetracked with renamings I'm really happy with the result and it's helped me understand Slang much better. The plan is to continue with type-related issues, such as type validations or type-guided translations.
Subscribe to my newsletter
Read articles from Ivan Jawerbaum directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by