Select Contemporary Features

This page briefly presents a select number of features adopted from other contemporary languages and how they will be implemented in Arcadia.

  1. Foreach Loops

    This control flow statement has spread among imperative languages like the railway during the industrial revolution. Combined with ranges and a standard library of algorithms it will allow code like:

    1 # initialize array so that array[i] = i
    2 +∀ (mut element, index)  zip(array, array.indices):
    3     element  index;
    4 −∀ 
    

    Here zip lazily converts a tuple of ranges to a range of tuples. element and index are variables declared in a pattern. Only element is mutable.

    The difference in Arcadia is that foreach loops will be available at top level and in this case they will be executed during compilation, potentially declaring functions in each iteration! e.g.

    1 # overload trigonometric functions to work on arrays
    2 +∀fun  [sin, cos, tan, cot]:
    3     +function fun : 
    4         (ref array : double[n])  external ref double[n]
    5     =code
    6         i  retval.indices:
    7             retval[i]  fun(array[i]);
    8     function fun
    9 −∀
    

    [sin, cos, tan, cot] is a literal representing an array of functions. ref means pass-by-reference and external ref is a form of explicit RVO, which means that space for the returned array will be allocated (if needed) on the stack and only its address passed to the function. We'll talk more about arrays later.

    When combined with static reflection, the above loop is powerful enough to replace inheritance for code reuse as explained in DelphJ.

  2. Limited type inference

    This will be similar to C#, C++11, D and Rust. i.e.

    1. A variable's type can be inferred from the type of the initializer. e.g.
      1 auto i = 1;
      

      Note that auto replaces the storage class, not the type. So given i is an immutable variable with value known at compile-time, the above is equivalent to:

      1 static i : int = 1;
      
    2. An anonymous function type can be inferred from the type of the variable it is assigned to or parameter it is bound to.
    3. A function's return type can be inferred from the type of the expression used in the return statement.

    Additionally Arcadia will infer some attributes like "pure" and some array bounds.

  3. Dimensional Analysis

    This is a concept coming from physics and implemented in libraries like Boost.Units. It allows interesting things like:

    1 auto d = 5m cos(60°); # Fine. 60° converted to radians. d = 2.5m.
    2 d +↜ 50cm; # Fine. Now d = 3m.
    3 auto speed = d + 5s; # Illegal: Can't add quantities of different dimension.
    4 console << "Speed is: " d/5s "\n"; # Fine. Will print "0.6m/s".
    

    Note that Arcadia will support juxtaposition, so there will be no need for an explicit multiplication or concatenation in lines 1 and 4 respectively.

  4. Compile-time Evaluation

    We saw examples of this earlier. It will be similar to what D and Seed7 support, but the aim here is to allow any function to execute during compilation, even those that do I/O or access global state.

    All security checks will be enabled in this mode to ensure an erroneous program doesn't corrupt the compiler itself, plus an option similar to Clang's -fconstexpr-depth (but extending to loops as well) will prevent the compiler from entering an infinite loop.

    Moreover, in addition to implicit compile-time execution in certain contexts, you'll be able to use a variation of the function call syntax to explicitly request it.

    Lastly the compiler will evaluate functions at compile-time whenever possible during constant folding, obviating the need for things like constexpr functions.

  5. Vectors, Matrices and Colors

    Most of the conveniences for working with multidimensional entities found in MATLAB and OpenCL C, will be available here together with other enhancements. e.g.

     1 auto mut color = #0080ff; # azure
     2 color  color.bgra; # swap red and blue components. orange
     3 
     4 auto v : Vector = [0, π/6, π/4, π/3, π/2]; # initialize vector from array
     5 auto u = 3cos(v) + 2; # u[i] = 3cos(v[i]) + 2
     6 auto mat : Matrix = [[0, 1, 0, 1, 1], 
     7                      [1, 0, 2, 0, 1]]; # initialize matrix from array
     8 auto w = mat*v; # matrix multiplication. w = [1, 1]
     9 
    10 auto mut a = [-2, 10, 35, -15, 22, 7];
    11 sort(a[$0  0]); # sort only elements ≥ 0. a = [-2, 7, 10, -15, 22, 35]
    

    End-of-line comments are introduced by "#␣", not just "#", so there is no conflict with hex color literals.

  6. Runtime-Sized Arrays

    They are also known as variable-length arrays (C99) and arrays of run-time bound (C++ proposal). They live in the stack and their size remains constant throughout their lifetime but need not be known at compile time. They represent a solution that is more flexible than compile-time sized arrays and more efficient than dynamic memory allocation on the heap and are crucial for functions like the following for which the dynamic memory allocation could easily take more time than the actual computation:

     1 +function deCasteljau :
     2     (controlPoints : Point[], u : double)  Point
     3 =attributes
     4     pure, nothrow
     5 =preconditions
     6     0  u  1
     7 =postconditions
     8     retval  convexHull(controlPoints)
     9 =documentation
    10     Evaluates a polynomial curve in Bézier form at position u
    11     using De Casteljau's algorithm.
    12 
    13     @complexity O(d⋅n²) where d is the dimensionality of Point
    14                 and n is the number of control points.
    15 
    16     @param controlPoints The control points representing the curve.
    17     @param u The parameter value for which to evaluate the curve.
    18     @return C(u) where C is the Bézier curve.
    19 =code
    20     auto mut intermediatePoints = controlPoints;
    21         # intermediatePoints size depends on controlPoints size
    22         # which will become known when the program runs.
    23     +∀i  1..|Points|:
    24         +∀j  0..|Points|−i:
    25             intermediatePoints[j]  (1u)⋅intermediatePoints[j]
    26                                     + uintermediatePoints[j+1];
    27         −∀
    28     −∀
    29 
    30     return intermediatePoints[0];
    31 function deCasteljau
    

    Here |·| denotes the cardinality (i.e. the number of elements) of a set... or anything that contains elements really. .. is the range operator and returns a range consisting of all the integers from its left operand up to and excluding its right operand. return implicitly assigns to retval before returning control to the caller.

    Where things get really interesting is when runtime-sized arrays are combined with external ref to allow returning such arrays from functions! We actually show an example of this above without mentioning the returned array was a runtime-sized one. Another application is to conveniently and efficiently wrap low-level C APIs such as the one offered by OpenCL:

     1 # Wrap clGetPlatfromIDs/3 to handle allocation automatically.
     2 +function clGetPlatformIDs :
     3     ()  cl_platform_id[()  uint {clGetPlatformIDs(0, null, &retval);}()]
     4 =code
     5     clGetPlatformIDs(|retval|, &retval, null);
     6 function clGetPlatformIDs
     7 
     8 # Use wrapper to loop over available platforms
     9 +∀platformID  clGetPlatformIDs():
    10     # call clGetPlatformInfo to get information about the platform
    11 −∀
    

    The expression inside the brackets in line 3 defines a lambda function and calls it immediately, just to introduce a context where the API version of clGetPlatformIDs can save the number of available platforms. That number is then returned to become the array size. The computation plus the accompanying allocation are automatically inserted at each call site before the call to our version of clGetPlatformIDsexternal ref wasn't included because it's the default for large types.

    & returns the address of its operand and is required in line 5 too, as no automatic conversion from array to pointer is defined. The final version of the language will probably require a little more convincing to allow binding something that does bounds checking to something that doesn't even know its size, but that's not important for this example.

    APIs that require such generation of boilerplate code are quite common. Examples include:

    • WinAPI (GetRawInputDeviceList, GetRawInputDeviceInfo, GetRawInputData,...)
    • OpenGL (glGetIntegerv(GL_COMPRESSED_TEXTURE_FORMATS,...))
    • OpenCL (clGetPlatformIDs, clGetPlarformInfo, clGetDeviceIDs, clGetDeviceInfo, clGetContextInfo,...)
    • Wintab (WTInfo(WTI_INTERFACE,IFC_WINTABID,...))
  7. Unicode Support

    Every self-respecting language needs to support Unicode nowadays, but the level of support differs. Arcadia will natively support Unicode concepts like code units, code points and grapheme clusters while making Unicode the default (e.g. you won't have to prefix strings with 'u')

    As an example, there will be a function from range of characters to range of grapheme clusters. Without getting into details, when you press the right arrow in a text editor, you move the cursor one grapheme cluster (not one character) to the right so that you don't awkwardly position yourself between 'α' and '´' in 'ά'!

  8. Pattern Matching

    Similar to that found in functional programming languages, but supporting opaque types by allowing a special function I call a 'deconstructor' to take care of the decomposition.

    If the constructor for a class C is responsible for initializing an object of that class using the information provided by its arguments, then the deconstructor is responsible for the opposite flow of information, from the object to the provided variables. With the risk of the special syntax acting as a distraction, I'll use the tuple class as an example:

    1 auto (mut x, mut y) = (1,2); # two variables are declared. x=1 and y=2
    2 (x,y)  (y,x); # the variables swap values. x=2 and y=1
    

    Everything should be removable by the optimizer, but the semantics would be as follows: In the first line, two integer variables are allocated and constructed. Then '1' and '2' are packed into a tuple by the class constructor and finally the addresses of the newly constructed variables together with that of the tuple are passed to the class deconstructor. The latter will assign the values to the variables.

    In the second line, the values of y and x are used (copied) to construct a tuple. Then the address of that tuple and those of the variables are passed to the deconstructor.

    Besides declarations and the left-hand-side of the assignment operator, pattern matching will appear in the case statement and maybe in function definitions too.