Deep dive into Meta-Theory à la Carte (part 1)
Meta-Theory à la Carte
I have been quite interested in “Meta-Theory à la Carte”, published at POPL 2013 by Benjamin Delaware, Bruno C. d. S. Oliveira, and Tom Schrijvers. Unfortunately, the paper is quite complicated, contains a couple typos in the code fragments, and the associated code base is very sparsely commented, and does not compile with modern versions of the Coq proof assistant.
Fortunately, the paper is written well enough that it was not too complicated to rebuild a subset of their results, with the help of their code base. In the process, I have learned a lot about the design philosophy of the paper, and will try to convey a deeper understanding to anyone reading this series of posts.
Let’s get started with a motivating example.
Open recursion for extensibility
Say you want to study two languages and their respective expressivity. For a contrived example, say the simply-typed lambda calculus with boolean constants and operations, versus the simply-typed lambda calculus with natural number constants and operations. You could define these two languages as:
But you now have two full-fledged, unconvertible copies of the simply-typed lambda calculus: you will most likely need to implement its static and dynamic semantics twice, resulting in frustrating code duplication. The dream would be to be able to define the lambda calculus package once, and reuse it in two larger languages. What would this look like? Let’s focus on the easier case of the boolean sub-language (the lambda calculus will add extra noise due to variable binding). If we had to define a standalone data type for just the boolean constructors, it would look like:
But how to fill the ellipsis? Clearly, putting Boolean
is unsatisfactory,
because this would create a closed language of just the boolean constructor and
the if-then-else operation, no other constants or operations allowed in!
In order to allow any expression of a given larger language E
, we must take as
an argument the type of this yet-to-be-determined larger expression language:
Note that the MkBoolean
constructor does not make use of the type E
in its
arguments: it asks for a bool
value b
, and is happy being a value of type
Boolean E
with no other condition. On the other hand, the IfThenElse
constructor wants its three arguments to be values of type E
, the expression
language we will eventually construct.
What type to give extensible terms?
At this point, we should expect to be able to create values that only require
features of the Boolean
language, for instance, we ought to be able to encode
the expression:
if true then false else true
in our language. Naively, it should be:
but what type should this expression have? It actually admits an infinity of types, of the form:
where _
can be any type. This may seem extremely surprising! But if you
think about the type constraints, because e
is constructed with IfThenElse
,
it must have type Boolean ?E1
for some E1
. Because the condition
argument
is built using MkBoolean
, said E1
must be equal to Boolean ?E2
for some
E2
. The other two arguments to IfThenElse
add the exact same constraint,
and so we are left with the conclusion that e
must have type Boolean (Boolean
?E2)
for any type E2
.
This is flexible, but also problematic. While we could give the expression the following type:
it would quickly become irritating to annotate every concrete term with an
explicit description of its layers. We can do better, by creating a fixed point
for types, that is, given Boolean : Set -> Set
, we would like to build a type
BooleanFix : Set
, such that BooleanFix
behaves just as if it were the
infinite type Boolean (Boolean (Boolean (Boolean ...)))
. Coq’s type system
will not let us “simply” (that is, equi-recursively) define such a BooleanFix
as Boolean BooleanFix
. Instead, we have to use what is called an
iso-recursive encoding:
It’s almost the same, except the user is left to call Wrap
and unwrap
appropriately to indicate their intent to the type system. Given such a
type operator, we can finally define expressions of different depths with a
unifying type, BooleanFix
:
The price to pay is the presence of Wrap
at every stage. This can be
alleviated with smart constructors:
While this Wrap
business may sound annoying, we will soon replace it with a
more abstract concept that will prove necessary to making the language
extensible. So don’t fret too much about it!
Abstracting to extensible languages
Recall BooleanFix
’s definition:
It let us inject as many layers of Boolean
inside an outer layer of Boolean
as we need. However, if we intend for our language to be extensible, we should
not inject a copy of the language feature Boolean
itself, but rather a copy of
the entire expression language! We would like to generalize BooleanFix
so
that it accepts as an argument the expression language E
, and uses it for its
sub-terms:
Unfortunately, Coq will not accept this generalization, as Fix E
is being
passed as an argument to an abstract E : Set -> Set
, which is considered an
unsafe occurrence in negative position. If you do not understand what this
means, you don’t need to for the purpose of everything we are doing here, no
worries! Luckily, there are tricks we can use to get something morally
equivalent.
Instead of requiring the type to be explicitly recursive, we can use its Church encoding, which is:
This may look very familiar to people having studied abstract algebra. Indeed,
if F
is a Functor
, the encoding of the constructor, of type F A -> A
looks exactly like an F-algebra, and this whole type looks like it is performing
a fold with this algebra.
As you see, a value of type Fix Boolean
is now a function that receives two
arguments: first, the carrier type of the algebra (which we discard here with
_
), then, the algebra alg : Boolean A -> A
, that carries the knowledge of
what to do to produce a A
for all possible Boolean
values, as long as the
recursive occurrences have themselves been replaced with results of type A
.
We will later see that this algebra can be used to describe a large class of
recursive computations over extensible data types.
Let us now actually achieve our goal of extensibility: instead of returning a
value of type Fix Boolean
, we would like to return a value of type Fix E
for
some language E
. Of course, this will require that the language E
has some
means of storing Boolean
expressions. It is therefore reasonable to ask for
such means as an argument to our smart constructor: tell me how to store boolean
expressions, and I will build boolean expressions.
This takes care of MkBoolean
, but IfThenElse
is more complicated. When the
constructor has recursive occurrences, the smart constructor must do additional
work. We would like to pass in sub-expressions of the full language, that is,
values of type Fix E
. But because our encoding of Fix
abstracts over the
concrete type of the language, there is a loss of information. It is no longer
known that the type of the arguments to IfThenElse
is Fix E
: it instead
requires arguments of type S
, where S
is the universally-quantified type in
the definition of Fix
(confusingly, it is the type named E
in the definition
of Fix
).
Let this not dissuade us: we have access to the function wrap : E S -> S
, and
our argument c : Fix E
, is, by definition, capable of targetting any language
when given the appropriate wrapper. In particular, c S
has type (E S -> S)
-> S
, and so providing it with wrap
will allow c
to convert itself into a
value of this abstract type S
.
We now have enough infrastructure to define complex extensible terms, for instance:
As you can see, it is pretty verbose to manually pass inject
to every smart
constructor. This calls for turning it into a type class method, and have our
smart constructors be ad-hoc polymorphic.
Combining two languages into one
So far, we have seen that we can define extensible language features as parameterized inductive types that receive the full language as input. What we have not covered is how to combine two such language features into a language containing both. To do so, we will use a type-level sum:
Try and ignore the idiosyncracies of defining nice notations in Coq, and focus
on the definition of the variant Sum1
. It takes two extensible languages F
and G
, and creates a new extensible language F + G
. To be an expression of
the language F + G
, you can either be a F
expression, stored in the inl1
constructor, or a G
expression, stored in the inr1
constructor. Remember
that A
will eventually be instantiated to be the full language, so you can
think of it as being itself F + G
(or an even larger sum of features if we
were to nest additional language features).
We can now define a compound language, comprised of both Boolean
and
Natural
:
Let’s put this to the test by instantiating our extensible term from the
previous section! Remember, we need to provide two injections, one from
Natural
into BooleanAndNatural
, and one from Boolean
into
BooleanAndNatural
. We could provide them as:
but notice that this does not actually depend on the concrete
BooleanAndNatural
or Boolean
types, just on the presence of the inl1
constructor. Therefore, it is more clever to define two generic instances to
cover all possible combinations:
To convince ourselves that this indeed covers our needs, let’s put it to the test:
Indeed, these two generic instances can provide injections from any language
feature into a larger language of which it is one summand. And so, with all
we’ve seen so far, we can now define a concrete term, of type Fix
BooleanAndNatural
, by instantiating the extensibleTerm
from the previous
section:
If you would like to try this for yourself, here is the accompanying code, tested in Coq v8.9.
To be continued…
This post is getting quite long already, yet we just scratched the surface. So far, we have seen:
-
that language features can be defined as extensible inductive types, whose type is
Set -> Set
, taking for input the extended language that they are to be part of, -
how to turn such an extensible inductive type into a concrete type via a
Fix
operator, -
and how to define smart constructors, that inject any language feature construct into any language that supports such feature.
Now here is a teaser of what remains to be seen:
-
how to use the algebra in our
Fix
encoding to compute a whole class of recursive functions over extensible data types, -
how this
Fix
encoding, however, is not up to the task of defining appopriate semantics for interesting language features (like our if-then-else operation), -
how to define properties of extensible data types, and how to prove those properties.
© 2024 Coq en Stock ― Powered by Jekyll and Textlog theme