Adding First-Order Functions
These are lecture notes for the “Programming Languages and Types” at the University of Marburg loosely based on Sec. 4 of the book “Programming Languages: Application and Interpretation” by Shriram Krishnamurthi.
Please comment/correct/improve these notes via github. Proposals or questions can be submitted as an “issue”; proposals for corrections / extensions / improvements can be submitted as a “pull request”. You can of course also send an email to Klaus Ostermann
In the last lecture we have seen how we can give commonly occuring
(sub)expressions a name via the With
construct. However, often we can
identify patterns of expressions that occur in many places, such as 5*5/2
,
7*7/2
and 3*3/2
, the common pattern being x*x/2
. In this case, the
abstraction capabilities of With
are not sufficient.
One way to enable more powerful abstractions are functions. Depending on the context of use and the interaction with other language features (such as imperative features or objects), functions are also sometimes called procedures or methods.
Here we consider so-called first-order functions, that – unlike higher-order functions – are not expressions and can hence not be passed to or be returned from other functions. First-order functions are simply called by name.
To introduce first-order functions, we need two new things:
- The possibility to define functions and
- the possibility to call functions.
A call to a function is an expression, whereas functions are defined separately. Functions can have an arbitrary number of arguments.
The following definitions are the language we have analyzed so far:
sealed abstract class Exp
case class Num(n: Int) extends Exp
case class Add(lhs: Exp, rhs: Exp) extends Exp
case class Mul(lhs: Exp, rhs: Exp) extends Exp
case class Id(x: Symbol) extends Exp
case class With(x: Symbol, xdef: Exp, body: Exp) extends Exp
// We use implicits again to make example programs less verbose.
implicit def num2exp(n: Int) = Num(n)
implicit def sym2exp(x: Symbol) = Id(x)
The new language constructs for first-order functions are:
Function definitions.
A function has a number of formal arguments and a body. A first-order
function also has a name. To make the invariant that there can only be one
function for each name explicit, we store functions in the form of a map from
function names to FunDef
s.
case class FunDef(args: List[Symbol], body: Exp)
type Funs = Map[Symbol,FunDef]
Function calls. Functions are called by name, here represented as a Symbol. Any valid expressions may be passed as arguments.
case class Call(f: Symbol, args: List[Exp]) extends Exp
A substitution based implementation
The substitution for the new language is a straightforward extension of the former one (3-wae.scala#subst5)
def subst(e: Exp,i: Symbol,v: Num) : Exp = e match {
case Num(n) => e
case Id(x) => if (x == i) v else e
case Add(l,r) => Add( subst(l,i,v), subst(r,i,v))
case Mul(l,r) => Mul( subst(l,i,v), subst(r,i,v))
case With(x,xdef,body) => With(x,
subst(xdef,i,v),
if (x == i) body else subst(body,i,v))
case Call(f,args) => Call(f, args.map(subst(_,i,v)))
}
But is the extension really so straightforward? It can be seen in the last line that our substitution deliberately ignores the function name f'. We say in this case that function names and variable names live in different “name spaces”. An alternative would be to have a common namespace.
We will first study a “reference interpreter” based on substitution. We pass
the map of functions as an additional parameter funs
.
def eval(funs: Funs, e: Exp): Int = e match {
case Num(n) => n
case Id(x) => sys.error("unbound identifier: "+x)
case Add(l,r) => eval(funs,l) + eval(funs,r)
case Mul(l,r) => eval(funs,l) * eval(funs,r)
case With(x, xdef, body) => eval(funs,subst(body,x,Num(eval(funs,xdef))))
case Call(f,args) => {
// lookup function definition
val fd = funs(f)
// evaluate function arguments
val vargs = args.map( eval(funs,_))
if (fd.args.size != vargs.size)
sys.error("number of paramters in call to "+f+" does not match")
// We construct the function's body by subsequently substituting all formal
// parameters with their respective argument values.
// If there is only one single argument `fd.arg` and one single argument
// value `varg`, the next line of code is equivalent to:
// `val substbody = subst(fd.body, fd.arg, Num(varg))`
val substbody = fd.args.zip(vargs).foldLeft(fd.body)(
(b,av) => subst(b,av._1,Num(av._2))
)
eval(funs,substbody)
}
}
Two functions adder
and doubleadder
which may be used for testing:
val someFuns = Map(
'adder -> FunDef(List('a,'b), Add('a,'b)),
'doubleadder -> FunDef(List('a,'x), Add(Call('adder, List('a,5)),Call('adder, List('x,7))))
)
assert( eval(someFuns,Call('doubleadder,List(2,3))) == 17)
As can be seen in the example above, each function can “see” the other functions. We say that in this language functions have a global scope.
Exercise: Can a function also invoke itself? Is this useful?
An Environment based Implementation
We will now study an environment-based version of the interpreter. To motivate environments, consider the following sample program:
val testProg = With('x, 1, With('y, 2, With('z, 3, Add('x,Add('y,'z)))))
When considering the With
case of the interpreter, the interpreter will
subsequently produce and evaluate the following intermediate expressions:
val testProgAfterOneStep = With('y, 2, With('z, 3, Add(1,Add('y,'z))))
val testProgAfterTwoSteps = With('z, 3, Add(1,Add(2,'z)))
val testProgAfterThreeSteps = Add(1,Add(2,3))
At this point only pure arithmetic is left. But we see that the interpreter had to apply subsitution three times. In general, if the program size is n, then the interpreter may perform up to O(n) substitutions, each of which takes O(n) time. This quadratic complexity seems rather wasteful.
Can we do better?
We can avoid the redundancy by deferring the substitutions until they are really needed. Concretely, we define a repository of deferred substitutions, called environment.
It tells us which identifiers are supposed to be eventually substituted by which value.
This idea is captured in the following type definition:
type Env = Map[Symbol,Int]
Initially, we have no substitutions to perform, so the repository is empty.
Every time we encounter a construct (a With
or Call
) that requires
substitution, we augment the repository by adding one more entry. The entry
records the identifier’s name and either the value (if eager) or expression
(if lazy) it should eventually be substituted with. We continue to evaluate
without actually performing substitution.
This strategy breaks a key invariant we had established earlier: Any identifier the interpreter encounters must be free, since every bound identifier would have already been substituted. Now that we’re no longer using the substitution-based model, we may encounter bound identifiers during interpretation. How do we handle them? We must substitute them by consulting the repository.
def evalWithEnv(funs: Funs, env: Env, e: Exp) : Int = e match {
case Num(n) => n
// look up in repository of deferred substitutions
case Id(x) => env(x)
case Add(l,r) => evalWithEnv(funs,env,l) + evalWithEnv(funs,env,r)
case Mul(l,r) => evalWithEnv(funs,env,l) * evalWithEnv(funs,env,r)
case With(x, xdef, body) => evalWithEnv( funs,
env + ((x,evalWithEnv(funs,env,xdef))),
body )
case Call(f,args) => {
// lookup function definition
val fd = funs(f)
// evaluate function arguments
val vargs = args.map(evalWithEnv(funs,env,_))
if (fd.args.size != vargs.size)
sys.error("number of paramters in call to "+f+" does not match")
// We construct the environment by associating each formal argument to its
// actual (evaluated) value
val newenv = Map() ++ fd.args.zip(vargs)
evalWithEnv(funs,newenv,fd.body)
}
}
val test1 = Call('doubleadder,List(2,3))
assert( evalWithEnv(someFuns,Map.empty,test1) == 17) //=> It works!
Dynamic Scoping
In the interpreter above, we have extended the empty environment when
constructing newenv
. A conceivable alternative is to extend env
instead:
def evalDynScope(funs: Funs, env: Env, e: Exp) : Int = e match {
case Num(n) => n
case Id(x) => env(x)
case Add(l,r) => evalDynScope(funs,env,l) + evalDynScope(funs,env,r)
case Mul(l,r) => evalDynScope(funs,env,l)// evalDynScope(funs,env,r)
case With(x, xdef, body) => evalDynScope( funs,
env + ((x,evalDynScope(funs,env,xdef))),
body )
case Call(f,args) => {
val fd = funs(f)
val vargs = args.map(evalDynScope(funs,env,_))
if (fd.args.size != vargs.size)
sys.error("number of paramters in call to "+f+" does not match")
// Extending env instead of using Map() !!
val newenv = env ++ fd.args.zip(vargs)
evalDynScope(funs,newenv,fd.body)
}
}
assert( evalDynScope(someFuns,Map.empty,test1) == 17)
Does this make a difference? Let’s take a look at the following example:
val funnyFun = Map(
'funny -> FunDef(List('a), Add('a,'b))
)
val test2 = With('b, 3, Call('funny, List(4)))
assert(evalDynScope(funnyFun, Map.empty, test2) == 7)
In this example, the free variable b
inside of funny
is bound to 3.
Obviously this interpreter is “buggy” in the sense that it does not agree
with the substitution-based interpreter. But is this semantics reasonable?
Let’s introduce some additional terminology to make the discussion simpler:
Definition (Static Scope). In a language with static scope, the scope of an identifier’s binding is a syntactically delimited region.
A typical region would be the body of a function or other binding construct.
Definition (Dynamic Scope): In a language with dynamic scope, the scope of an identifier’s binding is the entire remainder of the execution during which that binding is in effect.
We see that eval
and evalWithEnv
give our language static scoping,
whereas evalDynScope
gives our language dynamic scoping.
Armed with this terminology, we claim that dynamic scope is entirely unreasonable. The problem is that we simply cannot determine what the value of a program will be without knowing everything about its execution history.
If the function funny
were invoked by some other sequence of functions that
did not bind a value for b
, then that particular application of funny
would result in an error, even though a previous application of f
in the
very same program’s execution completed successfully!
In other words, simply by looking at the source text of funny
, it would be
impossible to determine one of the most rudimentary properties of a program:
“Whether or not a given identifier was bound.”
You can only imagine the mayhem this would cause in a large software system, especially with multiple developers and complex flows of control. In this lecture we will therefore regard dynamic scope as an error.