• Runtime-sized Arrays
    • They are parameterized by expressions that are evaluated at runtime to compute the array's size.
    • There will be ways to specify constraints on the sizes of such arrays, that should be met when arguments are bound to parameters.
    • Under some circumstances those constraints will be solvable at compile-time, in others runtime checks will be emitted.
    • Use cases
      • string operations (e.g. concatenation)
      • vector operations (e.g. addition)
      • spline operations (e.g. raise or lower degree, evaluate, etc.)
      • OS APIs (e.g. functions that need you to call them once to get the size of an array, allocate the array and call them again with the array)
    • Implementation
      • The actual array sizes will be passed as extra hidden arguments.
      • Examples
        • merge(T a[n], T b[k])⟶T[n+k]
        • {
          • ...
        • }
        • In the above we have 3 different lengths, so 4 hidden arguments will be passed (the 4th is for the returned array's address).
        • op+(double a[n], double b[n])⟶doube[n]
        • {
          • for(int c=0 ; c < a.size ; ++c)
            • retval[c] <~ a[c] + b[c];
        • }
        • Here we have one hidden argument for the base address of the resulting array and one hidden argument for the common length of the input and output arrays.
        • In reality, two functions are defined: the explicit op+ and an implicit size calculation function (a.size, b.size)⟶retval.size.
      • Alternatively, we could put the lengths in the first word of the array representation, like Java does.
        • This might be wasteful if not all lengths are needed, but may be faster if they are used frequently.
      • For multidimensional arrays it is preferable to put all dimensions in the beginning than to have arrays of arrays, each with its own length like in Java.
    • Size variables
      • There are a number of options when it comes to how they are defined:
        • Don't use them: use |name_of_array| instead. For short array names it'll be almost as short.
        • Whenever an undeclared identifier appears in an array size expression, assume it names a size variable.
        • Prefix size variable names with a special character, so that they can be searched in their own namespace and they don't get captured when someone defines a normal variable of the same name in the scope. e.g. @n, @k, etc.
        • Require them to be explicitly defined, either as normal parameters or using a special syntax.
    • Return Value
      • When arrays are returned by external reference (or by value too?) there are the following possibilities:
        • An expression is given so that the caller can calculate the size. It must be side-effect free.
        • The size expression is omitted, but the size expression can be inferred. The inferred expression will be made part of the interface of the function. It can't depend on retval.size.
        • An expression is not given and can't be inferred from the function body. The function compiles, but can't be used unless the result is assigned to an object of known size: that size will be passed to the function. A warning about limited usability of the function can be issued.
        • An error is issued.
    • Constraints
      • Linear constraints should be solvable automatically at compile-time and be sufficient for 80% of the applications including the ones mentioned above.
      • It may also be possible to combine one-way constraints with linear ones in the same functions.
      • In some cases even non-linear constraints are solvable for some of their variables.
      • Linear
        • Goals
          • Identify cases when not all sizes need to be passed as hidden variables, but some are identical or can be computed by the others.
            • e.g. We may decide that sizes that can be computed from a base size using an addition or subtraction need not be passed individually, while others that need a multiplication should.
          • Statically check for correctness instead of emitting runtime checks.
          • Allow some sizes to be left out and inferred from others.
          • Generate the function that will be used to calculate the size of the output as a function of the sizes of the inputs.
        • Notation
          • @ for internal size variables produced by the system.
          • $ for explicit size variables defined by the programmer.
          • # for program values of integral type that can't be known at compile-time and are used as array sizes. Those known at compile-time appear verbatim.
        • Algorithm
          • The solver algorithm should be able to treat each function's body individually so that systems of equations don't get too big.
          • When solving the system for an individual function body the following should be considered known:
            • Internal size variables associated with input parameters for this body.
            • Values of normal expressions, used as array sizes.
          • Everything else should be considered unknown, including internal size variables associated with input parameters of recursive calls.
          • Deterministic expressions that are written in an identical way (save for whitespace) should result in the same # parameter.
            • The compiler might be able to prove that different computations result in the same value and in such a case it should use the same # parameter as well, but this is not guaranteed.
          • An error is issued if the system has zero solutions and a different one if it has infinitely many.
          • If the system has a unique solution then:
            • If it imposes additional constraints to a # parameter, an error is issued.
            • If it imposes additional constraints to a @ parameter, an informative message can be printed if the corresponding bounds expression was originally omitted.
            • If a public size depends on a # parameter whose computation depends on private values or involves side-effects, an error is issued.
          • Otherwise the function type checks.
        • Examples
          • merge : (a : int[na], b : int[nb]) ⟶ external ref int[na+nb]
          • No binding of arrays takes place in the function body so only the header is relevant here.
          • The following system of equations is constructed:
            • @m-0-a = $na
            • @m-0-b = $nb
            • @m-0-ret = $na + $nb
          • The system has a unique solution that depends on the 2 known @ parameters:
            • @m-0-ret = @m-0-a + @m-0-b
            • $na = @m-0-a
            • $nb = @m-0-b
          • Only the first equation is kept since na and nb are not directly used in the body. It will be used by the called for allocation.
          • @m-0-a and @m-0-b will be passed as hidden variables while @m-0-ret will be computed when needed.
          • +function mergesort :
            • (a : int[n]) ⟶ external ref int[n]
          • =code
            • +if n ≤ 1:
              • return a;
            • =else
              • val m = n/2+1;
              • return merge(mergesort(a[0..m], mergesort(a[m..$]));
            • −if
          • −function mergesort
          • The following system of equations is constructed:
            • @ms-0-a = $ms-0-n
            • @ms-0-ret = $ms-0-n
            • @ms-0-ret = @ms-0-a
            • @ms-1-a = #m−0
            • @ms-1-a = $ms-1-n
            • @ms-1-ret = $ms-1-n
            • @ms-2-a = @ms-0-a − #m
            • @ms-2-a = $ms-2-n
            • @ms-2-ret = $ms-2-n
            • @m-0-a = @ms-1-ret
            • @m-0-b = @ms-2-ret
            • @m-0-ret = @m-0-a + @m-0-b
            • @ms-0-ret = @m-0-ret
          • The system has a unique solution that depends on 1 @ parameter and 1 # parameter:
            • @ms-0-ret = @ms-0-a
            • $ms-0-n = @ms-0-a
            • @ms-1-a = #m
            • $ms-1-n = #m
            • @ms-1-ret = #m
            • @ms-2-a = @ms-0-a − #m
            • $ms-2-n = @ms-0-a − #m
            • @ms-2-ret = @ms-0-a − #m
            • @m-0-a = #m
            • @m-0-b = @ms-0-a − #m
            • @m-0-ret = @ms-0-a
          • No public sizes depend on private # parameters.
          • The following equations will be kept for code generation:
            • @ms-0-ret = @ms-0-a
            • @ms-1-a = #m
            • @ms-2-a = $ms-0-a − #m
          • Only @ms-0-a will be passed as hidden variable. @ms-0-ret is identical.
          • +function bad_mergesort :
            • (a : int[n]) ⟶ external ref int[n]
          • =code
            • +if n ≤ 1:
              • return a;
            • =else
              • val m = n/2+1;
              • return merge(bad_mergesort(a[0..m], bad_mergesort(a[m+1..$]));
            • −if
          • −function bad_mergesort
          • The following system of equations is constructed:
            • @ms-0-a = $ms-0-n
            • @ms-0-ret = $ms-0-n
            • @ms-0-ret = @ms-0-a
            • @ms-1-a = #m−0
            • @ms-1-a = $ms-1-n
            • @ms-1-ret = $ms-1-n
            • @ms-2-a = @ms-0-a − #m − 1
            • @ms-2-a = $ms-2-n
            • @ms-2-ret = $ms-2-n
            • @m-0-a = @ms-1-ret
            • @m-0-b = @ms-2-ret
            • @m-0-ret = @m-0-a + @m-0-b
            • @ms-0-ret = @m-0-ret
          • The system has no solutions:
            • @ms-0-ret = @ms-0-ret − #m − 1 + #m
              • or
            • 0 = 1
          • +function split :
            • (list : int[n+m]) ⟶ (int[n], int[m])
          • =code
            • auto (mut i, mut j) = (0, 0);
            • +∀e ∈ list:
              • +if e ≥ 0:
                • retval[0][i++];
              • =else
                • retval[1][j++];
              • −if
            • −∀
          • −function split
          • The following system of equations is constructed:
            • @sp-0-a = $sp-0-n + $sp-0-m
            • @sp-0-ret0 = $sp-0-n
            • @sp-0-ret1 = $sp-0-m
          • The system has infinitely many solutions:
            • @sp-0-ret0 = @sp-0-a − @sp-0-ret1
            • $sp-0-n = @sp-0-a − @sp-0-ret1
            • $sp-0-m = @sp-0-ret1
          • An error should be issued explaining the ambiguity. Alternatively, one could investigate further to see that making @sp-0-ret0 or @sp-0-ret1 known results in a unique solution and only warn about the function's limited applicability. In such a scenario the function result could be assigned to an array of known size without an error, but the function would still likely fail with array-out-of-bounds at run-time.
          • +function zip :
            • (T, S : Type) ⟶ (a : T[n], b : S[n]) ⟶ (T, S)[n]
          • =code
            • ∀i ∈ retval.indices():
              • retval[i] ↜ (a[i], b[i]);
          • −function zip
          • The following system of equations is constructed:
            • @zip-0-a = $zip-0-n
            • @zip-0-b = $zip-0-n
            • @zip-0-ret = $zip-0-n
          • The system has a unique solution that depends on 1 @ parameter:
            • @zip-0-ret = @zip-0-a
            • $zip-0-n = @zip-0-a
            • @zip-0-b = @zip-0-a
          • An additional constraint has been imposed on @zip-0-b, but since the array bound was filled, that was probably the intention and no message is printed.
          • The following equations are kept as part of the function's interface:
            • @zip-0-ret = @zip-0-a
            • @zip-0-b = @zip-0-a
          • Only @zip-0-a is passed as a hidden variable.
          • itoa : (n : uint) ⟶ ubyte[floor(log10(n))+1]
            • @itoa-0-ret = #itoa-0-ret
          • +function ascending_digits :
            • (n : uint) ⟶ (ubyte, bool)[]
          • =code
            • auto digits = itoa(n);
            • auto ascending : bool[];
            • ascending[0] = true;
            • ∀i ∈ 1..|digits|:
              • ascending[i] = digits[i−1] < digits[i];
            • return zip(digits, ascending);
          • −function ascending_digits
          • The following system of equations is constructed:
            • @ad-0-dg = @itoa-0-ret
            • @itoa-0-ret = #itoa-0-ret
            • @zip-0-a = @ad-0-dg
            • @zip-0-b = @ad-0-as
            • @zip-0-ret = @zip-0-a
            • @zip-0-b = @zip-0-a
            • @ad-0-ret = @zip-0-ret
          • The system has a unique solution that depends on 1 # parameter:
            • @ad-0-ret = #itoa-0-ret
            • @ad-0-dg = #itoa-0-ret
            • @ad-0-as = #itoa-0-ret
            • @zip-0-ret = #itoa-0-ret
            • @zip-0-a = #itoa-0-ret
            • @zip-0-b = #itoa-0-ret
            • @itoa-0-ret = #itoa-0-ret
          • #itoa-0-ret is computed by the expression "floor(log10(n))+1", which only depends on ascending_digits' parameter and is side-effect free and thus, is suitable for evaluation by an arbitrary caller.
          • The following equations are kept as part of the function's interface:
            • @ad-0-ret = #itoa-0-ret
          • Only @ad-0-ret is passed as a hidden variable.
          • #itoa-0-ret = floor(log10(n))+1 is inferred as the size of both ascending and retval arrays.
          • Additionally the following linear inequalities could have been inferred:
            • @ad-0-as > 0
          • An extension to the system that would flag this as an error could lead us to revise the type of itoa to something that exposes the fact that ∀n.|itoa(n)|>0.
          • If f ∈ T ⟶ (int[n], uint[n+10]) and g ∈ (int[m−10], uint[m]), then g(f(x)) might need to solve the system
            • l1 = n
            • l2 = n+10
            • l3 = m−10
            • l4 = m
            • n = m−10
            • n+10 = m
          • assuming l1, l2 and n are known and l3, l4 and m are unknowns. It should conclude that the system has exactly one solution (expressed as a function of one of the knowns), namely l4=m=n+10 and l3 = n.
      • One-Way
        • The simplest case of a one-way constraint is the expression that computes the size of a returned array from the function arguments.
        • Should one-way constraints between parameters or computing a parameter size from that of the return value be written in the type, or as a preconditions?
  • Generalized Runtime-sized Types
    • Any type whose size is not determined until the program runs.
      • Any structure that has runtime-sized arrays as fields qualifies. Is there any other case?
    • In general same rules apply as with runtime-sized arrays.
    • Todo
      • Can a structure contain more than one runtime-sized member? Can they appear anywhere, or just at the end?
      • How are they defined and used? In particular how is their size parameter tied to the parameters of sub-objects?