Skip to content

So many notations

One of the things I like about Perl is it’s enthusiasm for embedding multiple little languages into the main language.  Actually it’s more like they are all woven together.   Erlang shares a bit of that, though it’s more adhoc.  The result is that when you start reading Erlang code you discover all these different things are going on at the same time.

There is the usual code v.s. comments.   There is no multiline comment syntax.  Comments begin with a percent sign and run to the end of the line.  It’s a convention enforced by all the editting modes that one, two, and three percents are implicitly at the level of line, statement, and file respectively.  And file comments tend to be implicitly associated with the following code.

The comments are often written in the stylized documentation language.   And the documentation language has a sublanguage that is wiki like, and some tags that document data types and function signatures.  As far as I know things you say in the documentation about types and function signatures have no effect on the behavior of the program, i.e. like usual your comments can be lies.

In a manner that is reminiscent of C preprocessor’s use of hash, as in #define or #ifdef, the Erlang compiler uses a prefix dash on tags to mark up the code.  Unlike the comments this markup does effect behavior.  In addition to the kinds of things C does with it’s preprocessor (simple macros, conditional compilation, include files), the Erlang compiler uses these to define records, do some limited type declarations, imports and exports, and assert metadata about the file being compiled (i.e. version, author, module, etc).  And finally this is were the behavior mechanism is.

Together this makes simple Erlang programs much more complex to read.  The simplest program will usually have one or two statements in each of 3 notations: doc, Erlang, compiler annotations.  And both the doc and compiler annotations have their own embedded micro languages for special functions.

erlangNotes

I don’t know what’s going on in that example with the “%% userdevguid-begin rt_simple:types”?

Type Structure

Let’s be honest, the Erlang’s type system is a bit of a hack.  It’s basic type system reminds me of Lisp circa 1970.   It’s dynamically typed of course and  there is a suite of built in types (integers, floats, process ids, etc. etc.) which including some composite types (lists, tuples, etc.).  Everything else is built out of these basic types and to tell the truth there isn’t any support for extending the set of  types.   There are two schemes to gloss over that (records and behaviour).  What’s striking is the lack of support for classes or object oriented programming.   If Erlang had such support then some of the built in classes (process identifiers, references) might have been …

You can get a sense of the built in types from this code in Erlang’s own term printer.  It’s not actually printing, rather it is converting a Erlang term into a data structure in preparation for printing.  But it also helps to reveal some of rough edges.

Erlang does not have a string or character type.  It papers over this with some syntactic sugar on input and some heuristics in the term printer.  The handling of unicode is what it is.

There is a lot of mechanism around modules.  They provide namespaces, and compilation units, and stuff for loading and versioning code.  But modules are not a built in type, instead they are always denoted by their name, an atom.

Behaviours, one of the two schemes that helps to compensate for the primitive type system, work by mixing modules together.  Behaviours remind one, a bit, of abstract classes; i.e. where an abstract class is never instantiated but instead subclasses complete the abstract class and are then instantiated.  Erlang’s behaviors work much the same way.   A behaviour module defines an abstract behaviour (say a server framework) and then later modules flesh that behavior out be defining N functions that are specify the details.  Possibly the abstract class analogy is too nice, these are more like the call backs seen in GUI frameworks.

Records are implemented on tuples at compile time.  So they are mostly just syntactic sugar – the compiler and documentation tools supports this sugar.  So the interpreter knows nothing about them unless you explicitly load up some knowledge to help it (search for “record” here).   The shell’s init files are your friend.

There are type predicates for individual built in types, i.e. (is_atom, is_binary, is_bitstring, is_float, is_function, is_integer,_is_list, is_pid, is_record, is_reference, is_tuple), but no function which tells you the type of the term you have in hand.

Note that the messages passed between Erlang processes are not their own type, they are just arbitrary terms.

Note that there are no built in types that provide for introspection, i.e. arbitrary memory pointers, or pointers for inspecting stack frames.

Erlang does not have ratios, complex numbers, at least not out of the box.  All integers are bignums.

Process State

This post is an attempt to tease out all the state of a Erlang Process.

The overarching design pattern  used to organize Erlang Processes is known as the supervision tree.  The interior nodes in that tree are supervisors while the leaves are workers.  Supervisors concern themselves with reliability and scaling, while the workers consume most of the cycles.  A nice feature of this design is that workers can often be written so they just commit suicide when things go wrong.  But, supervision trees deserves their own discussion, what’s described here includes the raw materials from which that scheme is built out.

Each process includes:

  • a process id
  • the node the process, along with others, resides within
  • a status which is one of runable, running, waiting, suspended
  • a bundle of process information which can be read for introspective purposes
  • it’s call stack
  • an incoming message Q
  • an optional name, i.e. an atom; such processes are said to be registered
  • an optional globally registered name.
  • a rarely used associative store known as a dictionary
  • symmetric links to zero or more other processes which configure how to tear things down when a processes exits
  • zero or more monitors, when a monitored process exits a message is send to any process monitoring it.
  • zero or more timers
  • a rarely used flag, trap_exit, that converts exit signals into a message to the process.
  • a rarely used flag, error_handler, which provides a hook to catch and resolve errors prior to the stack unwinding.  By default this is used to load code.
  • a min_heap_size (another flag)
  • the integer save_calls flag provides for capturing a debug trace
  • a priority, one of low, normal, high (another flag)

A simple erlang node has 24 processes, 15 of which are named; but no globally registered processes.

The inter-process links are bidirectional and my cross between Erlang nodes.   If a process exits these links work to assure that all the reachable processes also exit, but that can be frustrated by setting the trap_exit flag.  A simple spawn creates a process with no links, but such processes are rare; in practice nearly all processes will have one or more links.

Monitors are directional and only trigger a message when the monitored process exits.  They also can cross between Erlang nodes.

At first blush the dictionary looks to be Erlang’s version of thread local storage.  But since most Erlang programs strive to written in a functional style the dictionary is rarely used.  Instead, the typical long lived Erlang process will have it’s persistent state kept in a data structure and at each heart beat of the process that data structure will be replaced with a incrementally modified fresh version which is then passed on to the next cyle.  For example:

server(State) ->
    receive
	{From, Info} ->
            NextState = process_info(From, Info, State),
            server(NextState);
	{From, Ref, Query} ->
             {Reply, NextState} = process_query(From, Query,State),
             From ! {Ref, Reply},
             server(NextState)
    end.

In that example the server’s state to the variable State. The state for the following heartbeat is bound to NextState. The server function is crafted to assure that it loops to the next heartbeat by virtue of a tail recursive call.

Patterns like this are common enough that Erlang’s support libraries have abstractions to for them, see for example the generic server behavior.  These abstractions along with the supervisor tree design pattern mean that Erlang programmers rarely use links, monitors, spawn, etc. directly, except when debugging.

The Call Stack

Each of Erlang’s light weight processes has, unsurprisingly, a call stack. This stack grows by function calls and it unwinds in one of three ways.

  • Single values are returned from functions
  • The process exits: exit(self(), {ok, Result}).
  • Throwing a value: i.e. throw({error, key_missing, Key}).
  • Signaling an error.

As is conventional the later means tend to be used more for errors.

There are a few syntaxtic forms that allow you to assure that as the stack is unwound you can gain control. For example:

try do_activity()
catch
throw:Term -> Term;
error:Reason -> {’EXIT’,{Reason,erlang:get_stacktrace()}}
exit:Reason -> {’EXIT’,Reason}
end

The result of that expression is the value of do_activity, or if one of the three later cases occurs then they are coerced into values as if the Expr returned the shown value.  The use of colon, as in throw:, is a bit odd; that’s not a module name.  Are the error:Reason cases are only signaled by the runtime system?

Note how a wildcard catch does not catch exit:Reason or throw:Ball.

1> spawn(fun() -> try exit(bye) catch _ -> io:format("hi~n", []) end end).
<0.115.0>
2> spawn(fun() -> try exit(bye) catch exit:_ -> io:format("hi~n", []) end end).
hi
<0.117.0>

3> spawn(fun() -> try throw(bye) catch exit:_ -> io:format("hi~n", []) end end).
<0.119.0>
4>
=ERROR REPORT==== 24-Aug-2009::10:26:20 ===
Error in process <0.119.0> with exit value: {{nocatch,bye},[{erl_eval,expr,3}]}

5> spawn(fun() -> try throw(bye) catch throw:_ -> io:format("hi~n", []) end end).
hi
<0.121.0>
6>

That example is equivalent to the older (original?) syntax: catch do_activity().

Erlang does not support the multiple value returns, as seen in Common Lisp say.  So only a single value is passed back via return/throw/exit.   But it does use the usual work around of passing back small short lived data structures.  Typically these are tuples.

Return, rather that throw, is often used for error results.  Various conventions are widely used to help distinguish these error.   See these examples where the symbols ok is used along with or without a tuple.   {error, …} is common as a way to signal that the value being sent back to the caller(s) denotes a problem.

I have not found a good discussion of the coding conventions for error v.s. usual value returning; and in the code I’ve read it seems a bit adhoc.  There is some discussion of errors more generally here and here.  It is notable that those highlight that in addition to the return/throw/exit choices you also have the option of packing the error handling into your inter-process architecture – i.e. let worker processes error and exit while having supervisor processes deal with the consequences.  But this posting is about what happens on the call stack of a single erlang process.  But, it does highlight that messages and signals passed between processes have very similar return/error/signal issues for which conventions are needed.

In addtion to catch erlang also has syntax to encourage cleaning up when the stack is unwound.

borrow_tool().
try use_tool()
cleanup return_tool().

Many Moving Parts

A real world Erlang system will have far more moving parts than most programmers are used to.

  • Multiple machines
  • Multiple Operation system processes on those Machines
    • One epmd process on each machine
    • One or more erlang nodes
      • Most nodes are running the erlang bytecode interpreter known as beam.
        • An optional monitoring process (a heartbeat) for each beam node.
      • Nodes in other languages which implement the inter-node protocol.
        • A common example is inet_gethost used for DNS lookups
  • Inside of each beam node are many erlang processes, often thousands.

For example, on one of my machines I’m running two Erlang systems: ejabberd, and rabbitmq.  This machine happens to be running FreeBSD for it’s operating system.   Two FreeBSD users (known as ejabberd and rabbitmq) run these two systems.  So if we look at their processes we see this:


[bhyde@elm ~]$ ps -axugwwU rabbitmq,ejabberd | sort
USER       PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
ejabberd   919  0.0  0.1  1448   880  ??  S    Sun05PM   0:01.67 /usr/local/lib/erlang/erts-5.6.5/bin/epmd -daemon
ejabberd   920  0.0  2.8 37984 28796  ??  S    Sun05PM   0:45.78 [beam]
ejabberd  1165  0.0  0.1  1428   680  ??  Is   Sun05PM   0:00.04 inet_gethost 4
ejabberd  1166  0.0  0.1  1532   864  ??  I    Sun05PM   0:00.03 inet_gethost 4
ejabberd  1167  0.0  0.1  1532   868  ??  I    Sun05PM   0:00.03 inet_gethost 4
ejabberd  3082  0.0  0.1  1532   888  ??  I    Sun06PM   0:00.05 inet_gethost 4
ejabberd 49972  0.0  0.1  1532   912  ??  I     7:22AM   0:00.00 inet_gethost 4
rabbitmq   969  0.0  1.2 17600 12112  ??  Ss   Sun05PM   0:30.35 [beam]
rabbitmq   982  0.0  0.1  1432   680  ??  Is   Sun05PM   0:00.01 inet_gethost 4
rabbitmq   983  0.0  0.1  1468   848  ??  I    Sun05PM   0:00.00 inet_gethost 4
[bhyde@elm ~]$

There is only a single epmd which everybody uses to rondevous around; presumably it’s user just happened to be started first.

My guess for why there are so many inet_gethost processes? Well I’m assuming they never die; and when ejabberd and rabbitmq started up they had a few DNS lookups they needed to do.

Each of those beam Erlang nodes have about a dozen Erlang processes inside of them.


[bhyde@elm ~]$ echo 'i().' | erl -sname foo1 -remsh rabbitmq | grep -c '^<'
28
[bhyde@elm ~]$ echo 'i().' | erl -sname foo1 -remsh ejabberd | grep -c '^<'
28
[bhyde@elm ~]$ echo 'i().' | erl -sname foo1 -remsh ejabberd | tail
<0.30.0>              kernel_config:init/1                   233       49    0
                      gen_server:loop/6                        9
<0.31.0>              supervisor:kernel/1                    233       65    0
kernel_safe_sup       gen_server:loop/6                        9
<0.35.0>              erlang:apply/2                        1597    10318    0
                      rpc:local_call/3                        49
Total                                                      31487   279423    0
                                                             255
ok
(foo1@elm)2> *** Terminating erlang (foo1@elm)
[bhyde@elm ~]$

By the way, in that last transcript I invoked erl erl -sname foo1 -remsh … a few times. That fired up yet another erlang beam process each time. In this case those ran under my Freebsd user account (bhyde). They chatted with empd to using the node name given after the -remsh argument, ejabberd for example. At that point they created a remote shell into that process. The ‘i().’ command asked for a list of the erlang processes in the node.

Punctuation

It’s a source of perverse amusement to pick apart how differing languages use the punctuation characters.  Of course APL get’s the prize for the most audacious use of punctuation.

For example Erlang uses the exclamation mark (aka bang) as a binary operator for sending a message to a process.  Language designers, particularly in mature languages, tend to use up all the punctuation characters in short order.  In Erlang the tilde character (i.e. ~) is used inside of format strings; but surprisingly it doesn’t appear to be used otherwise.

Of course single punctuation characters can be combined to get yet more punctuation.  Recently it’s become a bit of a fad to use multiple quotes to provide different flavors of strings.   Erlang being an older language hasn’t done that yet.  But it does have ->, <-, and <= for pattern match.

I’ve got a little table where I’ve made a stab at collecting all these, it’s over in the associated wiki.

Offering Load

This blog, Offering Load, is some notes on Erlang, the programming language.

I’m hoping that a somewhat more organized approach to learning Erlang will help strip away some of my confusion.