A Gentle Introduction to Core Erlang: Part 2

In the last post, we learnt the basics of Core Erlang, the intermediate language used by the Erlang compiler. In this part, we will explain how to write Erlang programs that transform Core Erlang programs.

Core Erlang Program Transformation

In general, it is more convenient to perform source-to-source transformation in Core Erlang than carrying out this task in Erlang (mainly, if our goal is to generate BEAM bytecode). This is due, in part, to the simplicity of the Core Erlang language.

But, it can become an arduous task at certain times, and the scarce and outdated documentation available does not make it better. Moreover, it is quite difficult to find examples of how to write code for this. The first time working on Core Erlang program transformation, I spent most of the time browsing code on GitHub and developing code by trial and error.

Abstract Syntax Trees

As it is common in programming languages, a Core Erlang program can be represented with an Abstract Syntax Tree (AST for short). The nodes in this tree represent the constructs that occur in the source code. In our context, the terms AST and forms are interchangeable.

For example, the AST that represents the sum_tuple module is a module declaration (root node) with two function definitions: main/2 and sum_tup/1. A function definition consists of a pair {Name, Body} where Name is the name/arity of a function (e.g., sum_tup/1), and Body is the body of the function (e.g., fun (_@c0) -> case _@c0 of ... end).

module 'sum_tuple' ['main'/2,
        'sum_tup'/1]
    attributes []
'main'/2 =
    fun (_@c1,_@c0) ->
  let <Tup> = {_@c1,_@c0}
  in  apply 'sum_tup'/1
    (Tup)
'sum_tup'/1 =
    fun (_@c0) ->
  case _@c0 of
    <{X,Y}> when 'true' ->
        call 'erlang':'+'
      (X, Y)
    <_@c1> when 'true' ->
        primop 'match_fail'
      ({'function_clause',_@c1})
  end

Now, let us focus on the main/2 function, that you can find represented as an AST in the above diagram.

AST for sum_tuple in Core Erlang
The Abstract Syntax Tree for the sum_tuple program in Core Erlang.

Here, the left child of the root node are the arguments of the function (a list of variables), and the right subtree is the body of the function which, in turn, is a let expression. The subtrees of a let expression are (from left to right) variables, argument and body. In this let expression, Tup (the variable) will be bound to a tuple {_@c1,_@c0} (the argument), so that the sum_tup(Tup) application (the body) is evaluated with the inclusion of this binding.

Data Types

Since some constructs in Erlang can be considered syntactic sugar, these can be compiled into simpler ones when translated to Core Erlang. Thus, there are only a few Core Erlang AST data types. The main ones (and what they are used for) are:

  • 'apply' for function applications.
  • call for function calls.
  • 'case' for case statements.
  • clause for clauses in case and receive statements.
  • cons for list constructors.
  • 'fun' for function definitions.
  • let for let expressions.
  • literal for atoms, numbers, characters…
  • module for module definitions.
  • 'receive' for receive statements.
  • tuple for tuples.
  • var for variables.

Note that this is not an exhaustive list of the Core Erlang AST data types. Nevertheless, I consider these to be the most important, although the missing ones (binary, catch, etc.) could also appear in your programs (i.e., it depends on your problem).

The type of a node can be obtained with the cerl:type/1 function. Another option is to match with the record type of a node directly (#c_apply{...}, #c_call{...}, etc.), but I prefer the former option because I think it is easier to read, in general.

Core Erlang Metaprogramming in Erlang

In our case, the metaprogramming process can be divided into three steps:

  • Translate Erlang source code to Core Erlang forms.
  • Manipulate the Core Erlang forms.
  • Generate Core Erlang code from forms.

These steps can be thought of as read, modify and write if you regard code and forms two representations of the same thing: A program. In addition, I will also describe a last step: Compiling the Core Erlang code (i.e., the code that we have written) to generate a BEAM file.

From Erlang code to Core Erlang forms

In the last part, we saw how to get the forms associated to the Core Erlang code using the compile:file/2 function.

case compile:file(File, [to_core, binary, no_copt]) of
    {ok, _, CoreForms} ->
      CoreForms;
    _ ->
      io:fwrite("Error: Could not compile file.~n").
end.

We already know that the to_core option in the compile:file/2 call is used to obtain the Core Erlang forms, but there are some other options that can be useful here:

  • binary: By default, compile:file generates a file (the result of the compilation). With this option, we avoid its generation.
  • no_copt: Disables compiler optimizations.

If we aim at generating a modified version of our program and we are not interested in the unchanged version, then we should include the binary option.

Occasionally, the translated forms do not have a clear correspondence with the original Erlang code (because of compiler optimizations). In these cases, you can use no_copt to obtain forms that are closer to original code.

Manipulation of Core Erlang forms

Once you have read the forms, it is possible to perform the transformation node by node, starting by the module node, and propagating it to the subnodes. This way, you get full control over the transformations you perform on each node, and whether you propagate or not these transformations to their subtrees.

However, if your transformation is going to focus on a particular type of node, I would recommend using the cerl_trees:map/2 function instead. cerl_trees:map/2 receives a tree and a function, and it traverses the tree applying this function to each node.

Suppose we have been asked to replace any appearance of the concatenation operator (here, a call to the ++/2 function) by a list constructor (i.e., convert [X] ++ Xs to [X|Xs]) in some module. To solve this problem, we could write:

replace_concat(Node) ->
  case cerl:type(Node) of
    call ->
      ConcCallName = cerl:concrete(cerl:call_name(Node)),
      case ConcCallName of
        '++' ->
          [FstArg, SndArg] = cerl:call_args(Node),
          FlatFstArg = cerl:cons_hd(FstArg),
          cerl:c_cons(FlatFstArg,
                      SndArg);
        _ -> Node
      end;
    _ -> Node
  end.

The replace_concat/1 function receives Node as the input and checks its type. Then, if Node turns out to be a function call to ++, we return a new constructor list node (created with cerl:c_cons(...)) where:

  • FlatFstArg is the flattened version of the first call argument.
  • SndArg is the second call argument.

Otherwise, we return Node as it is. Thus, a call like [1] ++ [2,3] will be replaced by [1|[2,3]] at the Core Erlang level. Note that we must flatten the first argument, or otherwise we would obtain [[1]|[2,3]] instead, a different result.

Therefore, we can use the cerl_trees:map/2 function to apply replace_concat/1 to each node in the Core Erlang forms we have obtained from the Erlang code (here, CoreForms):

TransForms = cerl_trees:map(fun replace_concat/1, CoreForms).

Clearly, this is just a simple example to illustrate how to manipulate Core Erlang forms, but we can think of more realistic examples: Program instrumentation (for tracing or profiling), pattern matching transformations (for program analysis or optimization), etc.

From Core Erlang forms to code

In this step, we simply must write the Core Erlang code associated to the transformed forms into a file. Here, we can rely on the cerl_prettypr:format/1 function for generating this code (a text string) from the transformed forms:

file:write_file("transformed.core",
                cerl_prettypr:format(TransForms)).

I have created a gist for this example (up to this step) so that you can replicate it on your computers.

From Core Erlang code to BEAM bytecode

Compiling a Core Erlang file works the same way as compiling an Erlang file, with the difference that you must specify +from_core as an option when you run the compiler:

erlc +from_core module.core

The same thing applies to compiling from the Erlang shell:

c('module.core', [from_core]).

Alternatively, you can use again the compile:file/2 function in your program with the from_core option (and without the binary one) to compile the file that you have just written, so that you automate every step in the metaprogramming procedure.

Node annotations

Apart from the data type and the subtrees in a node, there is another field dedicated to annotations. By default, annotations contain the name of the file and the corresponding line in the Erlang source code.

Node annotations can be quite useful is some cases. For instance, if we have to perform an analysis in some Core Erlang code prior to a second pass algorithm, we can use annotations to store any data that could be useful during the second pass.

Other resources

I find that the most useful resources when working on Core Erlang metaprogramming are the Erlang docs for the compiler. Basically, these docs are the APIs for modules that we have already used (cerl, cerl_trees, etc.) and some others that can be useful in some cases (cerl_clauses, core_pp, etc.). However, most of the functions are not documented, and some testing is required in order to know what these do.

My colleague Salvador Tamarit, who has been working on Erlang for a long time now, prepared some slides (with examples) about metaprogramming in both Erlang and Core Erlang some years ago:

Finally, if you are interested in manipulating Erlang code instead, smerl is a library that eases this process. However, I am not aware of similar libraries for Core Erlang metaprogramming. Perhaps I will write it at some point :wink:

A Gentle Introduction to Core Erlang: Part 1

Back when I started my PhD (2.5 years ago), I was assigned to instrument Core Erlang code. Since then, we have used Core Erlang in several other projects, but people still ask me why we choose Core Erlang over Erlang.

The Core Erlang language

In the beggining, working with Core Erlang proved to be quite a hard task due to the lack of documentation, and I do not think it has improved much since then. These entries are aimed towards providing a basic understanding of the Core Erlang language and its capabilities.

In particular, this post introduces the Core Erlang language.

Erlang made simpler

Core Erlang is a functional and concurrent programming language. In a nutshell, it is a complete but simpler language than Erlang, since many Erlang constructs are just syntactic sugar from a compiler perspective. In fact, Core Erlang is so simple that we refer to it as a subset of Erlang in many of our works.

Generally, Core Erlang is used as an intermediate language by the Erlang compiler. In summary, Erlang code is translated to Core Erlang before the generation of BEAM bytecode. The image shown below illustrates this process. Here, blue elements are human-readable, and red ones are not.

Erlang code is translated to Core Erlang before the generation of BEAM bytecode.

Actually, this diagram is a simplified version of the one in Fonseca’s thread in the Elixir forum which, in turn, could also be magnified in order to include other compilation stages.

Therefore, developers are not expected to write Core Erlang code directly. This code is usually generated instead, and any change you want to include can be achieved by managing the resulting structure. Some of its features make it convenient for specific tasks, but impractical for the rest of applications.

From Erlang to Core Erlang

There are several ways to obtain the Core Erlang translation from an Erlang source file.

From the command line

If you only want to examine the Core Erlang code generated, this method is the easiest one. Just type the following command in the command line:

erlc +to_core module.erl

The +to_core option will make the Erlang compiler to generate the module.core file instead of module.beam. If you open module.core, you should be able to read the Core Erlang code translated from your module.

Alternatively, you can perform this translation from the Erlang shell. This method is similar to the previous one, but I find that this way is more convenient when you have to declare a large set of options:

c('module.erl', [to_core]).

The result will be the same as in the previous case.

From an Erlang program

You can also get the associated Core Erlang code from within an Erlang program using the compile:file/2 function (you can check here the available options and possible outputs).

case compile:file(File, [to_core]) of
    {ok, _, CoreForms} ->
      CoreForms;
    _Other ->
      io:fwrite("Error: Could not compile file.~n", []).
end.

In this case, the variable CoreForms stores the forms that represent your code, and you can manage this data structure in order to perform analyses or modifications in your program (in the next part we will go into more detail about this).

An example

Let’s consider the well-known factorial example, which can be written in Erlang as:

fact(0) -> 1;
fact(N) -> N * fact(N-1).

We can obtain the corresponding Core Erlang code by using any of the methods previously mentioned.

'fact'/1 =
    fun (_@c0) ->
	case _@c0 of
	  <0> when 'true' ->
	      1
	  <N> when 'true' ->
	      let <_@c1> =
		  call 'erlang':'-'
		      (N, 1)
	      in  let <_@c2> =
		      apply 'fact'/1
			  (_@c1)
		  in  call 'erlang':'*'
			  (N, _@c2)
	end

Even though this is a basic example, you can already see why Core Erlang is much simpler. Here are some things that happen when you translate Erlang into Core Erlang:

  • Pattern matching is moved from anywhere to case statements.
  • Guards are added to every clause in a case statement.
  • Function calls (to external modules) and function applications (calls to the same module) have a different syntax.
  • Function calls (including built-in functions) are fully qualified.

Besides, a catch-all clause (i.e., a clause with a pattern <X> when 'true') is added to case statements that do not include one.

These changes make Core Erlang more convenient for certain tasks. For instance, program transformation is easier to perform on Core Erlang programs, since there are fewer constructs and pattern matching always occurs in case statements.

Other resources

If you want to read more about the Core Erlang language, you can check these resources:

Besides, I can not finish without mentioning Kofi Gumbs - The Core of Erlang. I found his post when I started writing my own, and I used it as a reference during the making of this post.

There, he goes into more detail on some points, but I wanted to keep this introduction as simple as possible. In the next part, I intend to go a bit further and explain how to manipulate Core Erlang programs in Erlang.

Hello, World!

Welcome to my blog! I am Adrian, a PhD student from Valencia, Spain.

It has been a long time since I wanted to start this blog, and I have finally found the time to set it up and write a few posts. I decided to begin working on this blog after reading several times about the benefits of blogging for PhD students. In summary, as a PhD student, blogging does allow you to:

  • Practice and improve your writing skills.
  • Disseminate your work.
  • Talk about what you like from your research.
  • Make your CV look nicer.

Given these, one could say that blogging should be mandatory if you care about your research.

In this blog I intend to write mostly about the practical aspects of my research, which are usually summarized or omitted in our papers. Here, I have the chance to discuss with a larger audience and collect feedback from both the professional and academic communities.

My research focuses on the Erlang programming language. The name of the blog, Trial & Erlang, is a reference to the “Trial & Error” method for solving problems. This is the method I tend to use whenever we start a new Erlang project where I have to employ some library with little or no documentation at all (something too common in the life of an Erlang developer). Thus, I also intend to write some instructive posts, so that other users can profit from my experience.

I hope that you enjoy my writing and the posts here are interesting to you. Do not miss the chance to comment, even if it’s just to say “hi”!