LaTex is a fantastic tool for typesetting, but there seem to be a serious gap in documentation for using it to lay out programming semantics. Moreover, getting started is often opaque, obtuse, or generally just difficult. This guide is intended to alleviate some of that annoyance, and to describe how LaTeX can be used to easily and quickly lay out language semantics.
There is a repository on GitHub that contains the LaTex I’m going to be presenting here if you’d like to follow along.
Getting Started
First, you’re going to want a few packages, including proof, amsmath, listings, amsthm, and amssymb. These should have come with your LaTeX install.
Now, we start our document in the usual way:
Notice the input here: we’ll use a definition file to keep us sane while defining new forms and structures for our language. This definition file will contain all of the language form definitions, semantic step definitions, etc., organized as a sort of reference guide with easy extensibility. (It will also perform all of the package requirements we may need.)
The two displayskip
definitions are for using math display environments (which
we will leverage a lot) without a ton of extra spaces floating around.
Defining a Language
Next, we begin by defining a few helpers for language definition in defs.tex
.
These helpers are meant primarily for layout and ease of writing later:
the alternatives definition will allow us to properly lay out our semantics,
while the compiler operator is just a nice extra. The \inlinexp
and
\inlinexpa
definitions are for placing expressions in the middle of prose,
which we’ll get to later.
With these forms defined, we turn to our language forms, first in defs.tex
and then as a reference table in the main document. For this example, I will be
constructing a lambda calculus with a few extra terms.
A few things may stick out immediately:
- There are multiple definition forms for if, let, and lambda. This is intentional: sometimes I want a lambda laid out in a single line, but during writing a paper it may be convenient to add a line break. This can get truly messy if done inside of an existing array environment, and the guesswork for spacing can be a nightmare. As such, you need only do it once and then enclose it in a macro (array environments nest well).
- The keyword operators are all extracted into their own macros. Since we
are sometimes using multiple possible language definition forms for typesetting,
it is important to abstract out the keywords. Then if we want to change the
syntax of
if
from Haskell-styleif E then E else E
to Scheme’sif E E E
, we need only modify the\thenline
and\elseline
commands to have our changes propagate. - The names all end with funny letters. Since LaTeX lacks a type system,
I find it helpful to keep track of what operators are meant to belong to which
part of my language. To this end, expressions end with
e
(orea
for array-based layouts), values end withv
, and operators all end withop
. Pick a naming suffix or prefix and stick with it. (I prefer the suffix for later readability.)
With these forms in mind, we now turn to constructing our first piece of LaTex: the language definition. We’ll do this with an array environment.
I’m assuming you’re at least a little familiar with
math environments and arrays
[2,
3]. If you’re not,
the quick explanation is that \[
and \]
enclose a math environment and the
array environment has three columns which are right-align, center-aligned, and
left-aligned respectively.
The important draw here, though, is the language forms: they simply use our definitions. We have, in essence, used the LaTeX macro system to build a sort of “domain-specific language” that allows us to (more or less) write the language we are describing. Furthermore, consistently using these definitions means that changing the syntax is as easy as changing the definition and rebuilding the file.
As an aside, the application rule should seem a bit dissatisfying if you have
been following along: until now, I have insisted that every piece I build be
modular and abstracted, but I reduce application to ~
. This is, for me, a
matter of convenience: I most often typeset languages that use juxtaposition
for application, and so do not often define an application operation. If I were
working with an OO language or the like, I would define
\newcommand{\invoke} [3] {#1.#2(#3)}
and the like.
Building Semantic Definitions and Reductions, Part 1
With language definitions in place, we turn our focus to semantic reductions. For this purpose I am going to construct a call-by-value reduction model, first with denotational semantics and then with contexts and a reduction relation.
For the denotational approach, we will use inference-style judgements. To do this, we will first construct a reduction relation and a small layout helper in the definition file:
Now we’re ready to write the full set of denotational rules, using the
substitution command for substitution and \dstep
for denotational reductions.
For the reduction rules, we have two layout options:
- We can use a
gather*
environment to allow LaTeX to flow our definitions. - We can use an
array
environment to precisely control how the rules get laid out.
I’m partial to the latter for papers where space is a problem because it allows
us to strictly control the space our rules take up. That said, I am going to
demonstrate gather*
when writing the denotational semantics to give you a feel
for each. (The gather
environment lacking the asterisk numbers the lines,
which I find distasteful.) Here is a definition of our language semantics:
There are several pieces I should mention here:
- The
\iand
command serves as a spacer for premises in judgments. - The
\cdots
in the premises of the application and let are meant to represent the substitution constraints that I have left out. - Rules may be named, as in the first rule, by passing an additional, optional argument.
- In a
gather*
environment, rules must have spaces inserted between them. I use~
between rules on the same line and\\
for line breaks. This ends up a bit cramped in my opinion, and if I were to stick with this approach I’d likely define\newcommand{\rulespacer}{\qquad}
and use that in place of the~
.
Building Semantic Definitions and Reductions, Part 2
For the small-step operational semantics, we will, as promised, use an array environment. While I have, more recently, taken to also laying out SSOS using judgment rules, for the sake of demonstration I will use a more traditional layout approach.
Before we begin, we must, of course, define a new stepping relation for each of the redex and the context step. While I’m at it, I will also define a form for easy layout of each line. While we’re at it, we will also define language evaluation contexts:
Next, we introduce evaluation contexts to our language definitions:
Now we’ll introduce the small-step semantics with our array environment, defining both the redexes and the context reduction:
Constructing Typing Judgements
The last thing I will do is lay out the typing rules for our little calculus. Since booleans are our only base type, this should be fairly easy. As before, we first introduce a series of definitions to our definition file. While I’m at it, I will also define the type environment and judgment relation for types.
The macros we will use are the last three defined:
envent
, short for ‘environment entails’extenvent
, short for ‘extended environment entails’envlookup
, for auto-generating environment lookups
Now we extend the definitions in our main file and scribble out the typing judgments. I will again use an array environment, this time demonstrating how one can be used to lay out inference rules.
Conclusion
I hope you have found this little tutorial insightful and helpful. Again, the code for all of this is available on GitHub.
If I’ve made any mistake in the presentation here versus the LaTeX code, or in the semantic definitions I’ve put forth, please let me know.
In closing, here are some parting comments:
- If you’d like to write SSOS using the judgment approach, simply write
something like
\dstep
and use it. - If you need an additional environment (e.g., for channels) in your reduction relation, I find it convenient to build two versions: one with the default symbol and one that takes the symbol as input, as with the rules for type environment extension above.
- The xspace package is invaluable for using commands and keywords within prose.
- All of the commands above, properly, should use
\ensuremath
. I’m too lazy to type that everywhere, so I don’t, but if you are going to share such a definition set with other people you may consider it. - You may consider the stmaryd package for additional symbols (as needed).
- A lot of people seem to like Ben C Pierce’s Rules for typing judgments. They don’t require a math environment.