Dependency Analysis for Standard ML MATTHIAS BLUME Princeton University Automatic dependency analysis is a useful addition to a system like CM, our compilation manager for Standard ML of New Jersey. It relieves the programmer from the tedious and error-prone task of having to specify compilation dependencies by hand and thereby makes its usage more user friendly. But dependency analysis is not easy, as the general problem for Standard ML is NPcomplete. Therefore, CM has to impose certain restrictions on the programming language to recover tractability. We prove the NP-completeness result, discuss the restrictions on ML that are used by CM, and provide the resulting analysis algorithms. Categories and Subject Descriptors: D.3.3 [Programming Languages]: Language Constructs and Features—Modules, packages; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems General Terms: Algorithms, Languages, Theory Additional Key Words and Phrases: Compilation management, dependency analysis, NP-completeness
1. INTRODUCTION For programs written in Standard ML [Milner et al. 1997], the order of compilation matters. But the task of maintaining order within collections of sources can be tedious. Therefore, CM [Blume 1995], the compilation manager for Standard ML of New Jersey [Appel and MacQueen 1991], offers automatic dependency analysis. CM provides a language for specifying the semantic structure of large programs that consist of many separately compiled modules by arranging these modules into a hierarchy of groups. The hierarchy more or less directly reveals dependencies between groups while dependencies between individual source files within each group are not given explicitly. Here CM’s dependency analysis maintains the illusion of unordered source collections. This balancing act—explicit dependencies between groups but implicit dependencies within groups—is important. As we will see, one must impose certain restrictions on the source language to be able to make dependency analysis tractable. However, some of these restrictions should not be used indiscriminately for the entire program but only within groups. Otherwise, they will have a negative impact on modularity [Blume and Appel 1999]. This work was supported in part by NSF Grant CCR-9625413. Author’s address: Research Institute for Mathematical Sciences, Kyoto University, Kyoto, 606-01 Japan;
[email protected] Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 1999 ACM 0164-0925/99/0700-0790 $5.00 ! ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999, Pages 790–812.
Dependency Analysis for Standard ML
·
791
Dependency analysis within groups significantly simplifies the task of writing group descriptions. It makes CM easier to use and therefore more attractive as a tool. Many other compilation and configuration management tools either assume that dependency analysis for the underlying programming language is tractable and straightforward, or they require the programmer to specify dependencies explicitly. Since dependency information is usually coded in some specification language, one can imagine adding dependency analysis using an auxiliary program that calculates and generates specifications. Examples are the imake and makedepend tools that generate input for make from C source code [DuBois 1996]. For C programs this is a perfectly adequate solution because the outcome of compiling a C source file does not depend on the order in which other C source files are compiled. This means that in this particular case derived objects do not depend on the makefile itself. The example of Standard ML demonstrates that for other languages this is not necessarily so. But if the makefile is calculated as a function of all sources, and all derived objects depend on the makefile, then any modification at all would require the entire system to be rebuilt from scratch. To avoid this problem, CM remembers for each source which other sources it depended on when it was compiled the previous time. That makes it possible to analyze a modification’s effect on dependencies for each source file separately and to locally decide whether recompilation has become necessary. CM also employs another optimization to reduce the number of recompilation steps: a technique called cutoff recompilation [Adams et al. 1994]. Binfiles are the binary results of compiling ML source files and consist of two parts: executable machine code on the one hand and a static environment on the other. But only the environment is examined when dependent source files are compiled. If only the executable code but not the export environment changes, then it is not necessary to recompile other compilation units. Abadi, Lampson, and L´evy have investigated how this type of optimization can be generalized [Abadi et al. 1996]. They observed that not every change to the input of an operation also changes its output and developed a mechanism for finding those parts of an expression that contribute to its value. Their work was done in the framework of the Vesta system [Levin and McJones 1993] and its configuration language [Hanna and Levin 1993]. A prerequisite for cutoff recompilation to work is the presence of accurate and detailed dependency information. This information could be supplied by the programmer, but when creating CM we were interested in deriving that information directly from a given unordered set of source files. In what follows we discuss how this can be done for programs written in Standard ML. Section 2 gives an intuition for why some of ML’s language features make dependency analysis difficult. In Section 3 we introduce the notion of a feasible ordering. The next two sections prove two NP-completeness results and propose restrictions on the programming language that make dependency analysis tractable. Section 6 shows an efficient analysis algorithm. We then have a look at some other programming languages to see if similar problems arise there. Finally, we conclude with a brief discussion of some of the necessary practical optimizations and with lessons for language designers. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
792
·
Matthias Blume
2. THE CASE OF STANDARD ML Given a set S of ML source files, we want to determine an order in which they can be compiled. Source files are permitted to import definitions that are not exported by any one of them. Such definitions must be provided by the “context environment” ρ. ρ is given along with S. Dependency analysis for Standard ML is not as straightforward as it may seem: the analyzer must find an ordering of source files using its knowledge about name visibility, but in general name visibility is affected by that ordering. There are two main reasons for this circularity: multiple top-level definitions and open. Multiple Top-Level Definitions. In ML, each top-level definition implicitly starts a new nested scope. Thus, an existing definition does not preclude subsequent redefinitions of the same name. Later definitions do not affect earlier uses. Consider the following code that declares three variables a, x, and f using ML’s val declaration: val a = 1 val x = a val f = fn () => a Later, in a different section of the program one can “recycle” the name a by giving it a new definition for a new purpose: val a = "hello, world" x and f are not affected by the new definition of a; x still evaluates to 1 as before. Function f was also defined in terms of a, and refers to the old definition. Had we done the same in another language such as Scheme [Clinger and Rees 1991], we would find different behavior. In particular, the reference to a from within the body of function f would be “redirected” to point to the new definition. In essence, a redefinition acts like an assignment to the existing variable. In contrast, redefining a variable in Standard ML creates a fresh binding and starts a new scope. In a strongly typed setting this is perhaps more sensible, since otherwise changing the type of a would also entail changing the type of f, but previous uses of f might not be compatible with such a change. While ML’s behavior avoids complications related to the type system, it also means that rearranging the order of source files amounts to rearranging name visibility and scopes. The following code ultimately binds x to 1: 1 2 3
val a = 1 val x = a val a = 2
If one rearranges the definitions by exchanging lines 2 and 3 (or the source files containing lines 2 and 3), then x, like a, will be bound to 2. Dependency analysis must find an ordering that is feasible and unique. By definition, a feasible ordering allows the program to be compiled successfully. The uniqueness requirement means that all feasible orderings have equivalent use-def ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
·
Dependency Analysis for Standard ML definition of B
1 2 3 4 5 6 7
val x = 1 structure A = struct val x = 2 end open B open A val y = x
structure B = struct structure A = struct end end structure B = struct end structure B = struct val x = 3 structure A = struct end end structure B = struct val x = 3 structure A = struct val x = 4 end end
793
value of x on line 7 1 2 3
4
Fig. 1. The problem with open. Depending on the contents of structure B, there are several different possible meanings for the variable x that appears on line 7.
graphs. Without it there would be a danger of several different meanings for the same program. Moreover, dependency analysis should also identify sources that do not depend on each other. This can help minimize the work that is necessary when source code is modified; only files affected by a change will have to be recompiled. Open. In ML we can take any number of declarations and bundle them as a named structure. A structure definition is introduced by the keyword structure; the body of the structure is enclosed within struct and end. For example, we can write structure S = struct val a = 5 val b = 7 end and later refer to elements of the structure using “qualified” names like S.a and S.b: val c = S.a + S.b Alternatively, one can also “open” the structure and then refer to its elements directly without the prefix: open S val c = a + b Opening structures in ML causes problems for the dependency analyzer because one source file can freely refer to identifiers declared in other compilation units without having to explicitly name that compilation unit. It can then become difficult to determine which definition corresponds to a given use of an identifier. In Figure 1, consider the code on the left-hand side. Depending on the contents of ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
794
·
Matthias Blume
structure B, variable x that is mentioned on line 7 could refer to line 1, to line 3, a member of structure B, or even a member of some structure B.A. How Dependency Analysis Fits into the CM Group Model. In the absence of open, dependency analysis is tractable if multiple top-level definitions for the same symbol are ruled out. But if one were to enforce this rule globally, then a definition in one part of the program would prevent definitions for the same name in other, unrelated parts. It is important to CM’s group model [Blume and Appel 1999] that such restrictions be only applied locally because otherwise they would inhibit modularity. Every CM group consists of a set of ML source files and a set of subgroups. The set of source files plays the role of the set S from above, while the exports of the subgroups determine the context environment ρ. CM avoids a global restriction on multiple top-level definitions by allowing sources in S to override definitions in ρ. This could cause ambiguities for names that are defined by both ρ and some s ∈ S if they are also imported by another source s! ∈ S. In this case we therefore require that the definition exported from s ∈ S take precedence. Unfortunately, if open is allowed back into the language, then dependency analysis again becomes intractable. In Section 5, we prove that certain uses of open can make dependency analysis NP-hard and show how to solve this dilemma by restricting the use of open. 3. FEASIBLE ORDERINGS The dependency analyzer A takes a set of sources S = {s1 , . . . , sn } together with " = "sp1 , . . . , spn # some context environment ρ0 and produces a linear arrangement S of the sources in S, such that all variables are defined at the time when they are used: A({s1 , . . . , sn }, ρ0 ) = "sp1 , . . . , spn # " implies a total order ≺ on S. We call this order feasible. The sequence S The definition for a variable x that is referred to in spi can be provided by either the context ρ0 or by another source spj where j < i. However, this is not sufficient. Let x be a long identifier of the form Y.z, referring to a definition in structure Y . Yet another source spk with j < k < i could define structure Y without defining Y.z. This would make previous definitions for Y.z unavailable. Without structure definitions it is relatively straightforward to implement dependency analysis. Unfortunately, this is of no help if one wants to deal with languages that have module systems similar to Standard ML. As we have informally discussed in the previous section, in a language with MLlike structure definitions there are two aspects that complicate dependency analysis: symbols can be defined at the top level in more than one source, and structures can be “opened.” We will see that either feature independently makes the problem intractable. Therefore, we will now address them separately.
ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML
·
795
4. MULTIPLE DEFINITIONS Multiple definitions of the same symbol at the top level in separate sources can introduce ambiguities into the association between uses of a variable and the corresponding definition. For example, consider three sources s1 , s2 , and s3 : s1 val x = 1
s2 val x = 2
s3 val y = x
In this simple example it is not hard to spot the problem: which definition for x is the code in s3 referring to? If we prove that ambiguities can always be detected easily when they exist, then the detection algorithm can be built into the dependency analyzer, and ambiguous specifications could be rejected gracefully. However, the situation is more complicated: Claim 1. With multiple top-level definitions for the same identifier the problem of finding a feasible ordering is NP-complete. Proof. The problem is in NP because one can simply pick some ordering and check its feasibility. This can be done in polynomial time by processing "sp1 , . . . , spn # = A(S, ρ0 )
from left to right, checking each source for undefined names. To prove the problem to be NP-hard, we use a reduction from the satisfiability problem (SAT), more specifically 3-SAT: for any formula in conjunctive normal form with n variables v1 , . . . , vn and m clauses c1 , . . . , cm , where each clause contains exactly three literals, there is a set of 2n + 2 sources s˜, s1 , s!1 , . . . , sn , s!n , sˆ for which a feasible ordering corresponds to a satisfying truth assignment. Let us first illustrate the main idea of the construction by looking at clauses with only two variables. Suppose v1 ∨ v2 is such a clause. Each variable will be represented by two structures in two corresponding sources. For v1 there are sources s1 and s!1 with s1 structure A = struct val x = 1 end
s!1 structure A = struct (* empty *) end
s2 and s!2 are constructed according to similar principles, but instead of declaring an empty structure B, s!2 now contains a reference to structure A. s2 structure B = struct val x = 1 end
s!2 structure B = A
Now consider the variable B.x. It can be defined either because s2 was compiled after s!2 or because s!2 was compiled after s2 and, at the same time, s1 was compiled after s!1 . Thus, the relative orders of si and s!i play the role of boolean switches. They model the behavior of the boolean variables vi . The availability of B.x corresponds to the truth value of the entire clause. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
796
·
Matthias Blume
This construction can be extended easily to handle negative literals. Suppose the clause was v1 ∨ v¯2 . In this case the contents of s2 and s!2 must be exchanged. Three or more literals can be accounted for by adding more sources and structures. For a three-literal clause v1 ∨ v2 ∨ v3 one would add s3 and s!3 : s3 structure C = struct val x = 1 end
s!3 structure C = B
The complete construction uses multiple versions of structures A, B, and C to handle more than one clause. Additional definitions, a “header” source s˜, and a sentinel sˆ constrain the system so that only the relative order of si and s!i (for each i) remains unrestricted. Construction. s˜ contains val z0 = 0 val z!0 = 0 structure X1 = struct end . . . structure Xn = struct end Note that subscripted names like z0 are used as a metanotation to specify the ML symbols that need to be generated and inserted. No clause may contain both v and v¯ for any variable v, but that is not a restriction because such clauses are always satisfied and may therefore be ignored. Let {ck1 , . . . , ckj } with k1 < · · · < kj be the set of clauses where the variable vi occurs (either directly or in negated form v¯i ). The corresponding sources si and s!i will then have the following general form: si z!i−1
val zi = zi−1 + structure Yi = Xi structure Xi = struct val x = 1 end Xk1 . . . Xkj
z!i
s!i
val = zi−1 + z!i−1 structure Yi = Xi structure Xi = struct val x = 1 end Xk! 1 . . . Xk! j
The Xk are chunks associated with the clauses in which vi appears. In particular, a clause ck = xl1 ∨ xl2 ∨ xl3 with xl ∈ {vl , v¯l } and l1 < l2 < l3 is represented by a combination of chunks of ML constructs in sources sl1 , s!l1 , sl2 , s!l2 , sl3 , and s!l3 . Suppose all three literals in ck are positive. The following table gives the chunks of ML code for each of the respective files. However, if literal xl is negative, then the corresponding chunks for sl and s!l must be exchanged: ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML Source sl1 s!l1 sl2 s!l2 sl3 s!l3
Chunk structure structure structure structure structure structure
Ak Ak Bk Bk Ck Ck
= = = = = =
struct struct struct Ak struct Bk
·
797
val x = 1 end end val x = 1 end val x = 1 end
Finally, sˆ is given as val z = zn + z!n val c1 = C1 .x . . . val cm = Cm .x val y = Y1 .x + · · · + Yn .x. The constraints imposed by auxiliary definitions for z0 ,. . .,zn ,z!0 ,. . .,z!n and z restrict any feasible ordering to one where si and s!i precede sj and s!j whenever i < j, so sk and s!k are adjacent for all k. s˜ is the least, and sˆ is the largest element: ! " ! " sn s1 ≺ · · · ≺ ≺ sˆ s˜ ≺ s!1 s!n
The definitions for the structures X1 , . . ., Xn , Y1 , . . ., Yn and the variable y guarantee that for each i ∈ {1, . . . , n} either si ≺ s!i or s!i ≺ si . In other words, every feasible ordering will have to be total. A total ordering under which s!i precedes si corresponds to an assignment where vi is true. On the other hand, if si ≺ s!i , then vi is false under the corresponding truth assignment. A definition for Ck .x exists if and only if clause ck is satisfied under this interpretation. Therefore, a feasible ordering defines a satisfying assignment and vice versa. This concludes the reduction of 3-SAT to the ordering problem. The correspondence between feasible orderings and satisfying assignments gives rise to the following corollary: Corollary 2. Proving a feasible ordering unique is co-NP-complete. Unfortunately, in general it does not help to rely on ML types to solve situations that otherwise look ambiguous. The programs constructed for the NP-completeness proof would not benefit from such additional information. Besides, it would be better not to rely on types because ML type inference itself is a hard problem; it has been shown to be DEXPTIME-complete [Mairson 1990; Kfoury et al. 1994]. We felt that it is reasonable to require having at most one definition for each toplevel symbol per group. Although there are circumstances when one would want to override a given definition with a new one, CM addresses this issue adequately by introducing the notions of subgroups. In this discussion of dependency analysis, imported groups of a group are represented abstractly as part of the context environment. Top-level definitions for symbols that were already defined by the context are permitted in this model. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
798
·
Matthias Blume
Restriction 1. In each group there can be at most one source that provides a top-level definition for any given symbol. A source can define a name that was already defined by the context, but uses of that name in any of the sources will then refer to the new definition. We implemented this restriction in CM. To our knowledge there has never been an instance where it caused difficulties to users. It provides a well defined association of defined symbols with the sites of their definitions. Therefore, there exists a unique use-definition graph which—although not given explicitly—can be traced out by a depth-first search. The edges of the graph correspond to the free occurences of symbols in ML sources, and the depth-first search takes time proportional to the number of such edges. Having a single top-level definition corresponds well with the C model. However, it is important that such a restriction not be enforced globally but only within each group [Blume and Appel 1999], and the C model does not have a notion of groups. 4.1 Partial Orders Intuitively, s1 ≺ s2 means that s2 “depends” on s1 . If we want to avoid unnecessary recompilation, then we must capture the idea that two sources do not depend on each other. Total orders contain “too many” relations, so we will consider partial orders instead. For the sake of semantic predictability we desire a feasible partial order ≺min that contains the fewest relations and is unique. One can represent ≺min as a DAG of sources given by the “predecessor” function P : Source → 2Source that is calculated by (a modified form of) dependency analysis: P = A(S, ρ0 ) 4.2 Uniqueness and Use-Def Mappings Partial orders can be extended to become total, but in general there will be more than one way of doing so. However, in some sense one would like to think of all total orders that are compatible with a given partial order as being equivalent. Therefore, it is necessary to clarify the uniqueness requirement. During compilation every use of an identifier will be resolved by the compiler by looking it up in the current compile-time environment. Thus, it will associate each use with a corresponding definition. To identify each definition and each use of a name, we assume that all definitions and all uses are marked with some label l ∈ Lab (e.g., its source location). The ordering, partial or total, of sources induces a particular use-def mapping M . M maps the label of a variable’s use to the label of its corresponding definition: M : Lab → Lab
Note that Restriction 1 guarantees that the use-def mapping induced by a feasible partial order is well defined. Moreover, a feasible use-def mapping reveals the underlying partial order on sources P if one collapses the uses of all free variables of each source. Dependency analysis must reveal a unique use-def mapping. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML
·
799
5. OPENING STRUCTURES Standard ML programs that do not make use of the open syntax have the convenient property that both the set of free variables of a source and the set of exported top-level definitions can be determined by scanning only the source itself. It is not necessary to know the definitions for any of the free variables. The ability to open a structure, thereby making its constituent definitions directly available without need to use long identifiers, comes at the cost of losing this property. The problem is that open introduces a number of definitions, but the names so defined are not lexically apparent. In the scope of such a set of “indirect” definitions it may be that what looks like a free variable is actually bound, and what looks bound under superficial inspection may in certain cases actually be a free occurrence. A slight variation of the example given in Figure 1 gives an example of the latter: structure S = struct val x = 1 end open X open S val y = x + 1 Opening structure S seems to bind variable x, but if structure X, about which nothing is known, contains a substructure S without a variable x, then x is actually free in this code. The following example shows this: structure X = struct structure S = struct end end In the context of dependency analysis it is especially troublesome that open at the top level takes away the analyzer’s ability to determine the set of exported names by simply scanning the source code. Instead, it will have to process open as it goes, which is complicated by the fact that in general yet-to-be-determined knowledge about the dependencies would be required for this. Even after banning multiple definitions for the same name, dependency analysis is NP-complete if the use of open is not restricted. Claim 3. Dependency analysis is NP-complete for programs with open where multiple definitions for top-level names (including those introduced by top-level open) are prohibited, but where a top-level definition can override a binding in the context. Proof. By the same argument used in the proof for Claim 1 the problem is in NP. To prove it NP-hard, we will reduce SAT to the dependency analysis problem. Consider a formula in conjunctive normal form with n variables vi and m clauses ck . Here is the heart of the construction. Suppose there are structures A and A’ defined by the context as follows: ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
800
·
Matthias Blume
structure A = struct structure A’ = struct end val c1 = 1 end structure A’ = struct structure A = struct end val c2 = 1 val c3 = 1 end Depending on whether A or A’ is opened first, there will be a definition for either c1 or both c2 and c3 . We can think of these variables as “satisfied clauses”; the order of opening A and A’ corresponds to the truth assignment for a variable. However, this construction is slightly flawed because a clause can be satisfied by more than one variable, but due to Restriction 1 one cannot define the corresponding identifier more than once in different sources. To fix this technical problem, one can delay the definition(s) of variables ck until the last source sˆ by wrapping them into structures Cki . There is one such structure per vi and clause ck . Cki contains a definition for ck if clause ck is satisfied by the value of vi . The sentinel source sˆ eventually opens all structures Cki in a local scope, thus adhering to Restriction 1. The context provides a dummy definition to make sure that all the structures opened in sˆ exist. Suppose v7 appears in clause c1 and v¯7 in c2 as well as c3 . The revised version of a “gadget” for v7 would then be structure A7 = struct structure A!7 = struct end structure C17 = struct val c1 = 1 end end structure A!7 = structure A7 structure C27 structure C37 end
struct = struct end = struct val c2 = 1 end = struct val c3 = 1 end
Construction. A variable vi is encoded as a pair of sources si and s!i : si val zi = zi−1 + z!i−1 open Ai
s!i val = zi−1 + z!i−1 open A!i z!i
Let {ck+ , . . . , cka+ } be the set of clauses that contain the literal vi . Then A!i is defined 1 by the context as follows: ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML
·
801
structure A!i = struct structure Ai = struct val yi = 1 end k+
structure Ci 1 = struct val ck+ = 1 end 1 . . . k+
structure Ci a = struct val cka+ = 1 end end Likewise, let {ck− , . . . , ck− } be the set of clauses containing v¯i . Ai becomes 1
b
structure Ai = struct structure A!i = struct val yi = 1 end k−
structure Ci 1 = struct val ck− = 1 end 1 . . . k−
structure Ci b = struct val ck− = 1 end b end Furthermore, the context also defines val z0 = 0, val z!0 = 0, and empty structures Cki for i = 1, . . . , n; k = 1, . . . , m. An additional source sˆ has the form val z = zn + z!n val y = y1 + · · · + yn structure L = struct open C11 · · · open Cm n val c = c0 + · · · + cm end The definitions for z0 , . . ., zn , z!0 , . . ., z!n and z restrict any feasible ordering to one where only the relative order of sk and s!k for any k is not yet determined. Similarly, y0 , . . ., yn and y guarantee that any feasible ordering will be total, because either si ≺ s!i or s!i ≺ si must be true. Structures Cki are opened within the body of structure L. This avoids a violation of Restriction 1. si ≺ s!i corresponds to vi being false, because structure Ai will be opened first, providing definitions for the “clauses” that contain v¯i , thereby satisfying the corresponding requirements imposed by the code in sˆ. It also overrides the existing definition for A!i , so the subsequent opening of that structure will not be able to introduce definitions for any ck . In a completely symmetrical fashion, one can argue that s!i ≺ si corresponds to an assignment under which vi is true. A definition for ck is available if at least one of the structures containing such a definition is opened. By construction, that will be the case precisely when the corresponding literal becomes true. Therefore, we have created a set of sources for which a feasible ordering exists if and only if there is a satisfying assignment for the given satisfiability problem. This reduces SAT to the dependency analysis problem and, thus, renders the latter NP-complete. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
802
·
Matthias Blume
To make dependency analysis tractable, one must impose a restriction that, at least, prevents the construction of the program that was used by the proof. The heart of the problem is the ability to open certain structures at the top level. If open is banned from the top level, then the definitions exported by a source can be determined by looking at just that source. Whenever the dependency analyzer has to process an internal open it will already know where to find the definition of the structure that is being opened. The problem is tractable again. Restriction 2a. The open syntax cannot be used at the top level. Claim 4. Under Restrictions 1 and 2a any feasible use-def mapping is unique if it exists. Proof. We refer to the proof for the stronger Claim 5. The current implementation of CM enforces Restriction 2a. We believe in a programming style that uses ML’s module language extensively, so there is no need for open at the top level. However, in some instances such a complete ban was prohibitive. These cases have been rare, but occasionally it is important to support them. For example, someone who is using Concurrent ML [Reppy 1991] extensively, as a programming language in its own right, might want to open the CML structure to have more convenient access to its components. There are several ways of restricting the use of top-level open in a more relaxed way. The drawback is that it becomes increasingly difficult to specify the rules precisely and to explain them to the user. The latter has an impact on, for example, the quality of error messages and therefore on overall acceptance of the dependency analyzer as a tool. Here is an alternative to Restriction 2a, which also leads to a tractable dependency analysis problem: Restriction 2b. Instances of the open syntax at the top level are not permitted to introduce definitions for names that are already defined by the context. This restriction can be weakened some more by limiting its scope to structure definitions only. In fact, it can be relaxed even further by only considering definitions for those structures that are also used (as opposed to just being reexported): Restriction 2c. Instances of the open syntax at the top level are not permitted to introduce definitions for structure names that are used somewhere within the group if the context already provides a definition for them. Restriction 2a is strictly stronger than Restriction 2b, and Restriction 2c is a further relaxation of the latter. Claim 5. Under Restrictions 1 and 2c any feasible use-def mapping is unique if it exists. Proof. Suppose there are two feasible use-def mappings M and M ! . To be different, there must be at least one use of an identifier where they disagree. We shall show that this is not possible. Consider the partial order ≺M on sources that is induced by M . We use the notation x ∈ s for uses x of identifiers that occur in s and M (x) ∈ s! for the ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML
·
803
corresponding definitions M (x) that occur in s! . Some definitions are given by the context environment ρ0 . In this case we write M (x) ∈ ρ0 . Let there be at least one use where M and M ! disagree. Then there must be a source sˆ and a use x ˆ ∈ sˆ such that (1) M (ˆ x) '= M ! (ˆ x) (2) no ancestor of sˆ reveals discrepancies between M and M ! :
∀s, x : s ≺M sˆ ∧ x ∈ s ⇒ M (x) = M ! (x)
(3) x ˆ is the (textually) earliest use of a name in sˆ for which M and M ! disagree. Let us find the location of M (ˆ x). There are three possible cases: (1) M (ˆ x) ∈ sˆ (2) ∃s : M (ˆ x) ∈ s ∧ s ≺M sˆ (3) M (ˆ x) ∈ ρ0
But none of these cases can actually occur:
(1) If M (ˆ x) ∈ sˆ, then M ! must induce a different scope for x ˆ in sˆ. The only language construct that is capable of inducing different scoping for different use-def mappings is open, and such an open must textually precede the use x. But we picked x to be the textually earliest use of a name where M and M ! disagree. (2) If ∃s : M (ˆ x) ∈ s ∧ s ≺M sˆ, then s exports different definitions under M than it does under M ! . This can only happen if s opens a structure Y , and M (Y ) '= M ! (Y ). But sˆ was picked to be minimal; no ancestor of sˆ can contain a use of such a Y for which M and M ! disagree. x) '∈ ρ0 . Therefore, there must be a source s! exporting (3) If M (ˆ x) ∈ ρ0 , then M ! (ˆ x) under M ! : ∃s! : M ! (ˆ x) ∈ s! . Because of Restriction 2c, no top-level M ! (ˆ open can provide the definition M ! (ˆ x). Therefore, there must be an explicit definition for (the head-component of) xˆ in s! . There is no language construct capable of wiping out such an explicit definition, even under different use-def x) under any mapping, including M . But this mappings. Thus, s! exports M ! (ˆ x), is impossible because Restriction 1 would then demand xˆ to refer to M ! (ˆ which we assumed it does not. Thus, if both M and M ! are feasible use-def mappings, then they must coincide. 6. THE ANALYSIS ALGORITHM The previous discussion has established that if a use-def mapping exists, then it must be unique under Restrictions 1 and 2c. Suppose the mapping is already known. One could then verify it by processing individual sources in topological order. From this idea one can derive a quadratic-time algorithm for discovering the correct partial order. To present the algorithm formally, let us consider a simplified language that only contains structure declarations, sequences of declarations, and opening of structures. Each source of the group to be analyzed is represented by a declaration (decl). The definition of decl is shown in Figure 2. Omitted from this language are ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
804
·
Matthias Blume lid = id × id ∗ decl → structure id strexp |
seq decl decl
|
empty
|
struct decl
|
open strexp
strexp → name lid
Fig. 2. Simple module language. This figure shows the abstract syntax of a module language that has been simplified for expository purposes. However, the language still has nested structures and the ability to open them. Therefore, it exhibits the same intrinsic problems with respect to dependency analysis that are present in Standard ML. Direct-Decl(structure (v, d)) = {v} Direct-Decl(seq (d1 , d2 )) = Direct-Decl(d1 ) ∪ Direct-Decl(d2 ) Direct-Decl(open (s)) = {} Direct-Decl(empty) = {} Lookup-Rest(ρ, %&) = ρ Lookup-Rest(ρ, %v1 , v2 , . . .&) = if v1 ∈ dom (ρ) then Lookup-Rest(ρ(v1 ), %v2 , . . .&) else abort "member not found in structure"
Fig. 3. Auxiliary functions for dependency analysis. Direct-Decl calculates the set of names bound by “direct” definitions. A direct definition is one that is not introduced via open. Given the environment for the head component of a long identifier, we use Lookup-Rest to complete the lookup operation for the entire name. Note that in correct programs this operation must always succeed.
signatures, signature constraints on structures, functors, and functor applications. They do not complicate matters further and would only add bulk to the exposition. Structures can be defined to contain any number of other declarations (possibly zero), or they can be equal to previously defined structures. Structure declarations assign a structure expression (strexp) to a simple identifier (id). A structure expression is either a long identifier (lid) that refers to some previously defined structure or it is a declaration (decl) that provides definitions for the members of the structure. Environments ρ : U map simple identifiers to other environments: ρ(v) represents the definitions for members of the structure that is named v in environment ρ. A name that is mapped to an empty environment is different from a name that is not mapped at all. The notation ρ[v ,→ ρ! ] refers to the environment ρ augmented with a new binding that maps v to ρ! —possibly overriding an existing binding for the same variable v. The operator + denotes environment layering, and dom(ρ) is the set of names bound in ρ. Figure 3 shows two auxiliary functions. Direct-Decl calculates the set of simple identifiers for which there is a definition that was provided directly and not by ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML
·
805
Analyze-Source(D, set of names that have an explicit definition Analyzed, results from successful analyses ρ0 , context environment d) = current source let P ← {} initialize dependencies for this source and Analyze-Decl(structure (v, s), ρ) = explicit definition return ρ[v )→ Analyze-Strexp(s, ρ)] sequential definitions Analyze-Decl(seq (s1 , s2 ), ρ) = return Analyze-Decl(s2 , Analyze-Decl(s1 , ρ)) opening a structure Analyze-Decl(open s, ρ) = return Analyze-Strexp(s, ρ) + ρ empty definition Analyze-Decl(empty, ρ) = return ρ new structure body and Analyze-Strexp(struct d, ρ) = return Analyze-Decl(d, ρ) analyze body of the structure Analyze-Strexp(name (v, v∗ ), ρ) = name of existing structure if v ∈ dom(ρ) then is defined in same source return Lookup-Rest(ρ(v), v∗ )
has no direct definition but is defined in context
else if v *∈ D ∧ v ∈ dom (ρ0 ) then return Lookup-Rest(ρ0
(v), v∗ )
else if ∃(j, ρj ) ∈ Analyzed : v ∈ dom (ρj ) then P ← P ∪ {j} return Lookup-Rest(ρj (v), v∗ ) else in
return "abandon" ρ ← Analyze-Decl(d, ∅U ) return (ρ, P ) Fig. 4.
defined in other source (already analyzed) register dependency
so far, no definition is known defer current analysis run analysis, gather dependencies return export environment and dependencies
Syntax-directed traversal as performed by the dependency analyzer.
opening some structure, while Lookup-Rest resolves the remaining components of a long identifier once the environment representing its head component is known. The input to the algorithm is a set {d1 , . . . , dn } of sources (represented by decl s) and the context environment ρ0 . The objective is to calculate the partial order Depend, where Depend[i] gives the indices of those sources that di depends on. The set denoted by D is used to remember all simple names for which there is a direct definition in one of the sources. The variable Analyzed keeps track of sources that have already been analyzed successfully. Each element (i, ρi ) ∈ Analyzed contains the environment ρi representing definitions exported from di . Function Analyze-Source is implemented in terms of two mutually recursive functions Analyze-Decl and Analyze-Strexp, which are used to process decl s and strexps, respectively. These functions are shown in Figure 4. The result obtained from a call to Analyze-Decl represents the definitions exported from a decl, while the value returned from Analyze-Strexp corresponds to the members of a given structure. The environment argument implements scope rules by keeping track of local definitions. The important aspect of the algorithm is the way it handles names that are not found in the local environment. First it checks D and ρ0 . Restriction 2c guarantees that a binding in ρ0 is the correct one to be used if the variable is not in D. Otherwise the definition must be exported from one of the other sources. Analyzed is checked to see if a previously analyzed source has already revealed it. If this is ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
806
·
Matthias Blume
Analyze(%d1 , . . . , dn &, representation of n sources ρ0 )#= context environment n let D ← i=1 Direct-Decl(di ) calculate set of explicitly defined names and FindNext(Analyzed, {}) = all sources have been analyzed return Depend return final dependency graph FindNext(Analyzed, R) = more sources to be analyzed return Try(Analyzed, R, R) find source where analysis succeeds and Try(Analyzed, {}, R) = search was unsuccessful abort "undefined variable or cyclic reference" pick arbitrary element Try(Analyzed, {(i, d)} ∪ R! , R) = case Analyze-Source(D, Analyzed, ρ0 , d) try analyzing this source of (ρ, Dep) ⇒ analysis was successful Depend[i] ← Dep remember dependencies return FindNext(Analyzed ∪ {(i, ρ)}, R \ {(i, d)}) analyze rest | "abandon" ⇒ analysis was not successful return Try(Analyzed, R! , R) keep searching in return FindNext({}, {(1, d1 ), . . . , (n, dn )}) start analysis for all sources Fig. 5. Dependency analysis. Dependency analysis consists of two nested loops. Function FindNext loops over the set of sources that are yet to be analyzed. The inner loop, represented by function Try, repeatedly invokes Analyze-Source until it finds a source where it succeeds. The algorithm calls Try O(n2 ) times in the worst case.
not the case, then the current source was processed prematurely; analysis must be repeated later. The remainder of the algorithm, shown in Figure 5, consists of two nested loops represented by FindNext and Try. R holds pairs (i, di ) of indices and sources that still need to be processed. The inner loop repeatedly calls Analyze-Source until it finds a source for which this analysis succeeds. The restrictions guarantee that names will be resolved correctly if they are resolved at all. Therefore, the algorithm will indeed discover the desired partial order if it exists. In the worst case it will take O(n2 ) calls to Analyze-Source to do so. It is possible to reduce running time to O(n) by avoiding repeated invocations of the analyzer for the same source. The trick is to run the analysis algorithm on all sources simultaneously. Instead of abandoning a computation and later duplicating work that already had been accomplished, the algorithm will simply wait until definitions for previously unknown names become available. A version of such an algorithm had been implemented in SC, which was CM’s precursor [Harper et al. 1994]. But the authors of SC were not aware of the general problem’s complexity class, so they only enforced Restriction 1 and made no attempt to restrict the use of the top-level open. As a result, the analysis performed by SC was incomplete in the sense that for certain programs it would fail to find an existing feasible ordering. Furthermore, for some programs with ambiguous dependencies, it would silently pick one of the choices without warning the user about the existence of others that differ semantically. To present the algorithm, we rely on a small number of primitives for nonpreemtive concurrency. Threads are created using fork and collected using join. A thread can wait on an event using wait. The resume operation unblocks all threads that wait on one of the events in the specified set. It also prevents from blocking any future wait operations on these events. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML
·
807
results of successful analyses dependency graph to be constructed
global Analyzed global Depend
Analyze-Source! (d, current source i, index of current source D, set of names that have an explicit definition ρ0 ) = context environment let Analyze-Decl(structure (v, s), ρ) = explicit definition return ρ[v )→ Analyze-Strexp(s, ρ)] sequential definitions Analyze-Decl(seq (s1 , s2 ), ρ) = return Analyze-Decl(s2 , Analyze-Decl(s1 , ρ)) opening a structure Analyze-Decl(open s, ρ) = return Analyze-Strexp(s, ρ) + ρ empty definition Analyze-Decl(empty, ρ) = return ρ new structure body and Analyze-Strexp(struct d, ρ) = return Analyze-Decl(d, ρ) analyze body of the structure Analyze-Strexp(name (v, v∗ ), ρ) = name of existing structure if v ∈ dom(ρ) then is defined in same source return Lookup-Rest(ρ(v), v∗ ) else if v *∈ D ∧ v ∈ dom (ρ0 ) then return Lookup-Rest(ρ0
(v), v∗ )
has no direct definition but is defined in context
else
in
block if no other thread has revealed a definition yet wait v ∃!(j, ρj ) ∈ Analyzed : v ∈ dom (ρj ) definition provided by source dj Depend[i] ← Depend[i] ∪ {j} register dependency return Lookup-Rest(ρj (v), v∗ ) run analysis, gather dependencies ρ ← Analyze-Decl(d, ∅U ) Analyzed ← Analyzed ∪ {(i, ρ)} register successful analysis resume dom(ρ) return
Fig. 6.
restart waiting threads; no future wait on these events will block terminate thread
Syntax-directed traversal modified for concurrent analysis.
representation of n sources Analyze(%d1 , . . . , dn &, ρ0# )= context environment n D ← i=1 Direct-Decl(di ) calculate set of explicitly defined names for i ← 1 to n do Depend[i] ← {} initialize dependency graph Analyzed ← {} initialize analysis results Threads ← {} start all analysis threads for i ← 1 to n do T ← fork Analyze-Source! (di , i, D, ρ0 ) Threads ← Threads ∪ {T } collect threads after termination join Threads if deadlock then deadlock indicates error in sources abort "undefined variable or cyclic reference" return dependency graph return Depend Fig. 7. Concurrent dependency analysis. The concurrent version of dependency analysis creates one thread per source. Therefore, it only incurs O(n) calls to Analyze-Source. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
808
·
Matthias Blume
Figure 6 shows a concurrent version of Analyze-Source. As before, Analyze-Strexp goes through a case analysis for resolving structure names. However, in the absence of a definition for a name, it does not abandon the computation but waits for such a definition to arrive. Identifiers play the role of events in this algorithm. The new main loop simply spawns one thread for each source, waits for their completion, and returns the result. This is shown in Figure 7. 7. OTHER LANGUAGES Because of subtle differences in design, some popular languages, for example C [Kernighan and Ritchie 1988], Ada [Ada 1980], or Java [Arnold and Gosling 1996], do not exhibit the same problems with dependency analysis. In particular, these languages often require globally unique names, provide only a restricted version of ML’s open, or do not have such a feature at all. A C object file (.o) depends on its source file (.c) and on other files that are included via preprocessor directives (#include). Therefore, one must calculate the transitive closure of the “#include” relation to find dependencies for each object. With current practice, a dependency analyzer for C never needs to look at the names of program variables or functions. Only in a hypothetical system, where #include directives are added automatically by some tool, the requirement for uniqueness of top-level definitions would regain its relevance for dependency analysis. An Ada compilation unit depends on other compilation units only if they are named explicitly. Furthermore, names for compilation units are globally unique. Consequently, there is no potential for ambiguities. Language constructs that, like Pascal’s with [Jensen and Wirth 1978], bind identifiers implicitly have been criticized before [Tennent 1981, Section 6.2.3.] because they can make it more difficult for the human reader to understand the code. As we have seen, at least in the case of ML’s open they can also make automatic dependency analysis hard. C lacks a language construct like open, but namespaces in C++ [Stroustrup 1997] are very similar to ML’s structures. If we tried to adapt CM’s group model to C++, then we would have to struggle with the using construct the same way we struggled with ML’s open. However, the case of Java’s package system and import is different. Although writing something like import java.util.*; will probably prompt the human reader to hunt for documentation of package java.util, it does not come with the same drawbacks as far as dependency analysis is concerned. This is explained by the fact that Java’s namespace for packages is flat – even though the dot notation seems to suggest otherwise. Java packages are not nested, and thus one cannot write import java.*; because there is no package called java. But our second NP-completeness proof does rely on such “partial” opening of nested modules. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML
·
809
Ada’s use is unable to override earlier definitions. When we go back to the example in Figure 1, we discover that under such a rule, regardless of B’s contents, line 7 would always refer to the definition of x on line 1 because neither open B nor open A would be able to override it. However, one subtlety still remains there as well. Consider use A; use B; . . . use of x . . . If x is supposed to be taken from package B, then a bug is introduced into this program if a modification to package A causes A to define x as well, because in this case it is impossible for the second use-clause to override the existing definition. Modula-2 [Wirth 1982] and Modula-3, on the other hand, do not have any of these problems. In Modula-2 one must write FROM M IMPORT a, b, c; in order for a, b, and c—and only those—to become directly accessible without having to prefix them by M. Therefore, the identifiers defined are lexically apparent even without knowing the definition of module M. Of course, other languages have an equivalent for the from-import construct. In Java we would write import M.a; import M.b; import M.c; and in ML the same can be achieved by explicit rebinding: val a = M.a val b = M.b val c = M.c But the point is that if the language does not enforce such usage then a dependency analyzer cannot rely on it. The design of Oberon takes restrictiveness much further by discarding both with and from-import, leaving the language without any facility for circumventing the qualification of identifiers [Wirth 1988a; 1988b]. ML’s approach is a lot more flexible (perhaps even unnecessarily so), hence harder to analyze. However, by imposing only two simple restrictions (see Sections 4 and 5) we were still able to make dependency analysis tractable. Recent implementations of Scheme include a hygienic macro system. Hygienic macros are macros that do not suffer from the otherwise common problem with inadvertent name capture. In particular, no hygienic macro can bind a name that is not lexically apparent. If we look at hygienic macros with dependency analysis in mind, then this property seems to automatically rule out the type of problem that we have seen in the case of ML’s open. But even though all names that are bound by a macro’s invocation must appear lexically, not every name that appears lexically will also be bound. This insight can be used to reduce the satisfiability problem to ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
810
·
Matthias Blume
a dependency analysis problem of Scheme with hygienic macros in a manner very similar to the proofs that have been presented earlier in this article [Blume 1997, Chapter 6]. However, NP-hardness may not even be the worst of Scheme’s problems: the author is not aware of a reliable dependency analysis algorithm for Scheme that does not involve expanding macros at analysis time, and, in general, termination of macro expansion is not decidable. 8. OPTIMIZATIONS It is relatively expensive to parse an entire source file every time the dependency analyzer needs it. Therefore, CM calculates a condensed version of the source, which sheds all parts of the abstract syntax tree that are not necessary for dependency analysis. The much smaller result is kept in a cache. Without open it would be very easy to “compress” the per-file dependency information. All the dependency analyzer needs to know is the set of names that occur free in a source and the set of names defined and exported by the source. In the absence of open and other constructs with similar behavior it is straightforward to calculate these sets for each given source. With open it is not possible to precompute free and bound variables for each source. As we have seen before, without prior knowledge of the structure being opened the analyzer will potentially lose track of what is currently bound or free. Furthermore, the condensed version of the source must still maintain knowledge about the constituent parts of a structure in order to be able to handle cases where dependency analysis eventually reveals that it is being opened somewhere. Even worse, the fact that open may be used locally (e.g., inside a let expression) means that information about nested scopes in ordinary program code must also be retained. For example, in the expression let open X val b = a + 1 in b + c end it is not clear whether a and c are actually free until X becomes available. Summary information about definitions and uses can still be obtained for the code between separate occurrences of open. For instance, in the previous example we know that b is not free because there is no open separating its definition from its use. Fortunately, the savings will normally be substantial because programmers tend not to use open locally very often. The current implementation of CM uses a different strategy of avoiding this problem. It simply ignores all names that are not structures, signatures, functors, or functor signatures. Again, for this to be useful, one must rely on a programming style where everything is defined in modules, but it avoids the need to keep track of anything but module definitions and uses. The size of typical dependency files is only 1–4% of the corresponding ML source code size. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
Dependency Analysis for Standard ML
·
811
9. CONCLUSIONS AND LESSONS FOR LANGUAGE DESIGN Dependency analysis for ML is not easy because the definition-use relationship that is responsible for intermodule dependencies and therefore decides whether a particular ordering is feasible and unique itself depends on that ordering. In addition to being able to create and use bindings for identifiers, in a language like ML it is also possible for existing bindings to be “cancelled.” In some sense, this observation lies at the heart of our NP-completeness proofs and must be seen as the prime reason why dependency analysis can become hard. However, we were able to overcome these difficulties by imposing two simple restrictions on the source language which restored tractability. As we have seen, the language feature most troublesome for the dependency analyzer is open. SML’97 has introduced a construct called “datatype replication” which, regrettably, has similar properties with respect to dependency analysis. In particular, writing datatype t = datatype S.t establishes t to be an alias for S.t but at the same time also rebinds all of S.t’s constructor names in the current scope. For example, suppose the constructors of S.t were S.A and S.B. In this case datatype replications will also cause unqualified names A and B to share bindings with S.A and S.B, respectively. Thus, the exact set of names that are being bound depends on the definition of S.t much like the bindings introduced by open X depend on the definition of X. Fortunately, datatype replication is no more difficult to deal with than open. The algorithms presented in Section 6 would work just as well under a suitable (but straightforward) extension of Restriction 2c. To incorporate our new restrictions directly into ML it would be necessary to explain the notion of compilation units and the idea of groups as part of the language definition. The original definition and commentary [Milner and Tofte 1991; Milner et al. 1990] only briefly discussed separate compilation; the revised definition [Milner et al. 1997] has dropped every mention of it. REFERENCES Abadi, M., Lampson, B., and L´ evy, J.-J. 1996. Analysis and caching of dependencies. In Proceedings of the 1996 ACM SIGPLAN International Conference on Functional Programming (ICFP’96). ACM Press, 83–91. Ada 1980. Military standard: Ada programming language. Tech. Rep. MIL-STD-1815, Department of Defense, Naval Publications and Forms Center, Philadelphia, PA. Adams, R., Tichy, W., and Weinert, A. 1994. The cost of selective recompilation and environment processing. ACM Transactions on Software Engineering and Methodology 3, 1 (January), 3–28. Appel, A. W. and MacQueen, D. B. 1991. Standard ML of New Jersey. In 3rd International Symposium on Programming Language Implementation and Logic Programming, M. Wirsing, Ed. Springer-Verlag, New York, 1–13. Arnold, K. and Gosling, J. 1996. The Java Programming Language. Addison Wesley, Reading, MA. Blume, M. 1995. Standard ML of New Jersey compilation manager. Manual accompanying SML/NJ software. Blume, M. 1997. Hierarchical modularity and intermodule optimization. Ph.D. thesis, Princeton University. ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.
812
·
Matthias Blume
Blume, M. and Appel, A. W. 1999. Hierarchical modularity. ACM Transactions on Programming Languages and Systems, 813–847. Clinger, W. and Rees, J. 1991. Revised4 Report on the Algorithmic Language Scheme. LISP Pointers IV, 3 (July-September), 1–55. DuBois, P. 1996. Software Portability with imake, 2nd ed. O’Reilly and Associates, Sebastopol, CA. Hanna, C. B. and Levin, R. 1993. The Vesta language for configuration management. Tech. Rep. 107, Digital Equipment Corp. Systems Research Center. June. Harper, R., Lee, P., Pfenning, F., and Rollins, E. 1994. A Compilation Manager for Standard ML of New Jersey. In 1994 ACM SIGPLAN Workshop on ML and its Applications. 136–147. Jensen, K. and Wirth, N. 1978. Pascal: User Manual and Report, 2nd ed. New York. Kernighan, B. W. and Ritchie, D. M. 1988. The C Programming Language, 2nd ed. Prentice Hall, Englewood Cliffs, New Jersey 07632. Kfoury, A. J., Tiuryn, J., and Urzyczyn, P. 1994. An analysis of ML typability. ACM 41, 2 (Mar.), 368–398. Levin, R. and McJones, P. R. 1993. The Vesta approach to precise configuration of large software systems. Tech. Rep. 105, Digital Equipment Corp. Systems Research Center. June. Mairson, H. G. 1990. Deciding ML typability is complete for deterministic exponential time. In 17th Annual ACM Symposium on Principles of Programming Languages. ACM Press, New York, 382–401. Milner, R. and Tofte, M. 1991. Commentary on Standard ML. MIT Press, Cambridge, Massachusetts. Milner, R., Tofte, M., and Harper, R. 1990. The Definition of Standard ML. MIT Press, Cambridge, MA. Milner, R., Tofte, M., Harper, R., and MacQueen, D. 1997. The Definition of Standard ML (Revised). MIT Press, Cambridge, MA. Reppy, J. H. 1991. CML: A higher-order concurrent language. In Proceedings of the ACM SIGPLAN ’91 Conference on Programming Language Design and Implementation. SIGPLAN Notices 26, 6, 293–305. Stroustrup, B. 1997. The C++ Programming Language, 3rd ed. Addison-Wesley, Reading, MA. Tennent, R. D. 1981. Principles of Programming Languages. Prentice-Hall, Englewood Cliffs, NJ. Wirth, N. 1982. Programming in MODULA-2 , 2nd ed. Springer-Verlag, Berlin, Heidelberg, New York, Tokyo. Wirth, N. 1988a. From Modula to Oberon. Software Practice and Experience 18, 7 (July). Wirth, N. 1988b. The programming language Oberon. Software—Practice and Experience 18, 7 (July).
Received July 1998; revised November 1998; accepted February 1999
ACM Transactions on Programming Languages and Systems, Vol. 21, No. 4, July 1999.