Introducing Arcadia

An Introduction to Arcadia

First a disclaimer: the design of Arcadia isn't complete. Completing the design is exactly what I'm raising money for! Consequently the examples I'm going to show you are likely to change somewhat as the design progresses and even later as I start to gather feedback on how the language is used... but let's not get ahead of ourselves!

We'll start with the usual. Only... there are two ways to say "Hello World!" in Arcadia!

One of the pillars of Arcadia's design is compile-time evaluation — i.e. the ability to run normal, unmodified functions and control-flow statements during compilation. Together with top-level statements, this gives us our first version of the classic:

1 # Print a message during compilation.
2 import io.*;
3 
4 console << "Hello World!\n";

Line 2 will import all symbols exported by the io module including console and <<. Line 4 will output the message inside "s, or throw an exception if something goes wrong.

Since we are at top-level, the statement will execute at compile-time and the message will be printed by the compiler process, effectively making it an interpreter and the program a script!

\n is an escape sequence representing the newline character, just like in the C family of languages.

Now let's rewrite the above example to instead build an executable that will print the message:

 1 # Print a message during execution.
 2 import io.*;
 3 
 4 +function main :
 5     (arguments : String[]) ⟶ ()
 6 =attributes
 7     entry
 8 =code
 9     console << "Hello World!\n";
10 −function main

main doesn't have a special meaning here. Instead the entry attribute marks it as the place to start executing from. Like in Java, you are limited to one such function per compilation unit — not per program — the aim being to facilitate exploratory testing of individual modules.

Its type is, predictably, a function taking an array of strings (which by the way are initialized to the program's command-line arguments) and returning nothing.

At first glance this might seem unnecessarily verbose, but Arcadia isn't being designed with program size in mind, but rather with how that size grows. And this example is too small to show any asymptotic effects. Still, code golfers might be disappointed. :)

Let's look at another example:

1 +function factorial : 
2     (n : ubyte) ⟶ ulong
3 =code
4     retval = 1;
5     ∀i ∈ 2,...,n:
6         retval *↜ i;
7 −function factorial

ubyte and ulong are unsigned (hence the 'u') integer types, 8 and 64 bits wide respectively.

retval represents the return value of the function (so its type here is ulong) and, so long as you assign to it somewhere in the function body, you don't have to include a return statement.

Next we get to ∀ and another central theme in Arcadia: Unicode. An assortment of mathematical operators and symbols will be available, while foreach, in, <~, -> and - will be syntactically and semantically equivalent to ∀, ∈, ↜, ⟶ and − respectively in environments where the latter aren't easy to type.

Unfortunately full typesetting of mathematical formulas like the one in Mathematica won't be available, primarily to allow editing in plain text.

Line 5 is a foreach loop similar to those found in Python, D, etc. and it executes line 6 for each element in a range.

,..., is the simplest form of the sequence operator specifying an arithmetic progression with step 1. It will return a range lazily generating the numbers 2, 3, etc. up to and including n.

*↜ is the multiplication assignment operator, similar to C's *= so line 6 could have been written as retval ↜ retval * i;.

There are two more things to note before moving on the the next example:

If the multiplication overflows, an exception will be thrown. Yes, you'll be able to omit the related checks from release builds, as you'll be able to omit bounds, assertion and contract checks, but it's important that these checks can be emitted. And with automated benchmarking, you'll know exactly how much you pay for them!
The syntax has been designed from scratch. Apart from avoiding some nuances like dangling else and = vs == in conditions, I believe that an original syntax will help set the language apart and provide a fresh perspective on old concepts.

One of Arcadia's goals is to keep you sane as program size grows and after documentation, tests, attributes, contracts, logging and benchmarks are in place. So let's look at how the above function could look in production:

 1 +function factorial :
 2     (n : ubyte) ⟶ ulong
 3 =attributes
 4     pure, strictly_ascending, nothrow
 5 =preconditions
 6     n ≤ 20 # otherwise it will overflow
 7 =postconditions
 8     n > 5 ⟹ (n/3.0)^n < retval < (n/2.0)^n # non-tight bounds
 9 =documentation
10     Returns the factorial n! of a non-negative integer n.
11 
12     @complexity O(n)
13 
14     @param n The natural number whose factorial should be returned.
15     @return n! = n⋅(n−1)⋅...⋅1
16 
17     @see binomialCoefficient
18 =code
19     retval = 1;
20 
21     ∀i ∈ 2,...,n:
22         retval *↜ i;
23 =tests
24     0 ⇒ 1,
25     1 ⇒ 1,
26     2 ⇒ 2,
27     3 ⇒ 6,
28     10 ⇒ 3628800
29 =benchmarks
30     inputs: 0,2,...,20 # the running time for each input in the
31                        # range will be measured (multiple times).
32     size: $0 # the function that takes the input and returns the
33              # size used for fitting.
34     model: a*$0 + b # meaning a linear model.
35                     # The fitter will produce values for a and b.
36 −function factorial

In Arcadia you'll be able to annotate various entities with attributes that will be accessible through reflection, much like annotations in Java, but additionally you'll be able to specify the meaning of those attributes in the language itself instead of an external tool. Here we have 3 standard library attributes:

pure is the combination of two attributes: deterministic (the function always returns the same value for the same arguments) and side_effect_free (the function doesn't change any external state). The compiler will check to verify those properties and in fact would have added the attribute by itself if we hadn't.
strictly_ascending is a mathematical property and basically means n₁ < n₂ ⟹ factorial(n₁) < factorial(n₂). It will be used at call sites to determine the minimum and maximum value the expression can take as part of VRP.
nothrow means the function should never throw exceptions.

In the precondition block we specify that the function isn't suitable for integers greater than 20, while in the postcondition block we have a conditional property: if input is large enough, this function's value will be between those of two others.

In general you'll be able to write class invariants, use old argument values in postconditions, postconditions in interfaces will bind implementors,... the works! The information provided by postconditions will be usable by the optimizer which will be able to optimize out preconditions.

The compiler itself will largely ignore documentation comments, except from making them available verbatim to the program through reflection. They are meant for external tools like Doxygen.

Since our function is pure, we can write unit tests for it quite concisely, by just listing the valid input-output combinations we want to test. If our function threw exceptions, we could have used a different symbol to also list (invalid input, exception) pairs. For non-pure functions test blocks containing arbitrary code will also be supported.

Like tests, benchmarks only run upon request. If we were to run them, the function would be called multiple times for each prescribed input so that its running time could be measured, then the given function (which could have been omitted in this case since it's the identity one) would map inputs to their size and finally a linear regression algorithm would find the best values for the parameters of the model that connects sizes with running times.

The result would be a report that contained the running times in tabular format, or a chart showing how running time changes with size, or the fitted model, or a combination of the above. You can read more about automated benchmarks here.

$n is a placeholder similar to _n in Boost.Lambda or argn in Boost.Phoenix. (Or x when we say "the function x²+1", if you think about it!) Its type is something on the lines of "function taking at least n arguments and returning the type of its nth argument" and its value is a projection function returning the nth of its arguments. It's useful for very succinctly specifying anonymous functions, as long as they are not nested.

You may have noticed I mentioned logging above, but haven't included any in the example. That's because the only kind of logging that would make sense here would be trace level and probably not even that. The idea is that instrumentation for debug and trace level logging shouldn't intermix with regular code making it difficult to follow its semantics, but rather done centrally, or better yet using the same point-and-click or textual mechanisms used today to add things like e.g. a conditional breakpoint.

For the first option things like logging the entry and exit (including arguments) for a set of functions should be easily implementable using Arcadia's powerful meta-programming capabilities. The second is mostly a debugger issue and it's not clear at the moment whether Arcadia will have its own debugger.