Basic blocks of a Catala program

In this section, the tutorial introduces the basic blocks of a Catala program : the difference between law and code, data structures, scopes, variables and formulas. By the end of the section, you should be able to write a simple Catala program equivalent to a single function with local variables whose definitions can refer to one another.

Tutorial cheat code

A recap of the tutorial section with the full code that is expected to be in your tutorial.catala_en companion file is attached at the end of this page.

Pease refer to it if you feel lost during the reading and want to have a vibes check of whether you are on track to complete your tutorial.catala_en file.

Mixing law and code

Catala is a language designed around the concept of literate programming, that is the mixing between the computer code and its specification in a single document. Why literate programming? Because it enables a fine-grained correspondance between the specification and the code. Whenever the specification is updated, knowing where to update the code is trivial with literal programming. This is absolutely crucial for enabling long-term maintenance of complex and high-assurance programs like tax or social benefits computation.

Hence, a Catala source code file looks like a regular Markdown document, with the specification written down and styled as Markdown text, with the Catala code only present in well-bounded Catala code blocks introduced by a line with ```catala and ended by a line with ```.

Before writing any Catala code, we must introduce the specification of the code for this tutorial. This specification will be based on a fictional Tax Code defining a simple income tax. But in general, anything can be used as a specification for a Catala program: laws, executive orders, court cases motivations, legal doctrine, internal instructions, technical specifications, etc. These sources can be mixed to form a complete Catala program that relies on these multiple sources. Concretely, incorporating a legal source of specification into the Catala program amounts to copy-pasting the text and formatting it in Markdown syntax inside the source code file.

Without further ado, let us introduce the first bit of specification for our fictional income tax, article 1 of the CTTC (Catala Tutorial Tax Code):

Article 1

The income tax for an individual is defined as a fixed percentage of the individual's income over a year.

Formatting legal text in Catala

Catala uses Markdown-like formatting for the legal text in the .catala_en files. So, to copy the text of the article into your tutorial.catala_en file, mark up the article header with ## and put the text below, as such:

## Article 1

The income tax for an individual is defined as a fixed percentage of the
individual's income over a year.

The spirit of writing code in Catala is to stick to the specification at all times in order to put the code snippets where they belong. Hence, we will introduce below the Catala code snippets that translate article 1, which should be put just below article 1 in the Catala source code file.

These code snippets should describe the program that computes the income tax, and contain the rule defining it as a multiplication of the income as rate. It is time to dive into Catala as a programming language.

# We will soon learn what to write here in order to translate the meaning
# of article 1 into Catala code.

# To create a block of Catala code in your file, bound it with Markdown-style
# "```catala" and "```" delimiters. You can write comments in Catala code blocks
# by prefixing lines with "#"

# In the rest of the tutorial, when presenting Catala code snippets, it is
# assumed implicitly that you should copy-paste them into your
# tutorial.catala_en file inside a Catala code block enclosed between
# "```catala" and "```" delimiters, and placed near the article of law that
# it implements.

Setting up data structures

The content of article 1 assumes a lot of implicit context: there exists an individual with an income, as well as an income tax that the individual has to pay each year. Even if this implicit context is not verbatim in the law, we have to explicit it in the computer code, in the form of data structures and function signatures.

Catala is a strongly-typed, statically compiled language, so all data structures and function signatures have to be explicitly declared. So, we begin by declaring the type information for the individual, the taxpayer that will be the subject of the tax computation. This individual has an income and a number of children, both pieces of information which will be needed for tax purposes :

Declaring a structure

# Data structure declarations and generally any declaration in Catala often
# does not match any specific article of law. Hence, you can put all the
# declarations at the top of your tutorial.catala_en file, before article 1.

# The name of the structure, "Individual", must start with an
# uppercase letter: this is the CamelCase convention.
declaration structure Individual:
  # In this line, "income" is the name of the structure field and
  # "money" is the type of what is stored in that field.
  # Available types include: "integer", "decimal", "money", "date",
  # "duration", and any other structure or enumeration that you declare.
  data income content money
  # The field names "income" and "number_of_children" start by a lowercase
  # letter, they follow the snake_case convention.
  data number_of_children content integer

This structure contains two data fields, income and number_of_children. Structures are useful to group together data that goes together. Usually, you get one structure per concrete object on which the law applies (like the individual). It is up to you to decide how to group the data together, but we advise you to aim at optimizing code readability.

Declaring an enumeration

Sometimes, the law gives an enumeration of different situations. These enumerations are modeled in Catala using an enumeration type, like:

# The name "TaxCredit" is also written in CamelCase.
declaration enumeration TaxCredit:
  # The line below says that "TaxCredit" can be a "NoTaxCredit" situation.
  -- NoTaxCredit
  # The line below says that alternatively, "TaxCredit" can be a
  # "ChildrenTaxCredit" situation. This situation carries a content
  # of type integer corresponding to the number of children concerned
  # by the tax credit. This means that if you're in the "ChildrenTaxCredit"
  # situation, you will also have access to this number of children.
  -- ChildrenTaxCredit content integer

In computer science terms, such an enumeration is called a "sum type" or simply an enum. The combination of structures and enumerations allow the Catala programmer to declare all possible shapes of data, as they are equivalent to the powerful notion of algebraic data types.

Notice that these data structures that we have declared cannot always be attached naturally to a particular piece of the specification text. So, where to put these declarations in your literate programming file? Since you will be often going back to these data structure declarations during programming, we advise you to group them together in some sort of prelude in your code source file. Concretely, this prelude section containing the data structure declaration will be your one stop shop when trying to understand the data manipulated by the rules elsewhere in the source code file.

Scopes as basic computation blocks

We've defined and typed the data that the program will manipulate. Now, we have to define the logical context in which this data will evolve. Because Catala is a functional programming language, all code exists within a function. And the equivalent to a function in Catala is called a scope. A scope is comprised of :

a name,
input variables (similar to function arguments),
internal variables (similar to local variables),
output variables (that together form the return type of the function).

For instance, article 1 declares a scope for computing the income tax:

Declaring a scope

# Scope names use the CamelCase naming convention, like names of structs
# or enums Scope variables, on the other hand, use the snake_case naming
# convention, like struct fields.
declaration scope IncomeTaxComputation:
  # The following line declares an input variable of the scope, which is akin to
  # a function parameter in computer science term. This is the piece of
  # data on which the scope will operate.
  input individual content Individual
  internal tax_rate content decimal
  output income_tax content money

The scope is the basic abstraction unit in Catala programs, and scopes can be composed. Since a function can call other functions, scopes can also call other scopes. We will see later how to do this, but first let us focus on the inputs and outputs of scopes.

The declaration of the scope is akin to a function signature: it contains a list of all the arguments along with their types. But in Catala, scope variables can be input, internal or output. input means that the variable has to be provided whenever the scope is called, and cannot be defined within the scope. internal means that the variable is defined within the scope and cannot be seen from outside the scope; it's not part of the return value of the scope. output means that a caller can retrieve the computed value of the variable. Note that a variable can also be simultaneously an input and an output of the scope, in that case it should be annotated with input output.

Once the scope has been declared, we can use it to define our computation rules and finally code up article 1!

Defining variables and formulas

Article 1 actually gives the formula to define the income_tax variable of scope IncomeTaxComputation, which translates to the following Catala code:

Article 1

The income tax for an individual is defined as a fixed percentage of the individual's income over a year.

Defining a variable

scope IncomeTaxComputation:
  definition income_tax equals
    individual.income * tax_rate

Let us unpack the code above. Each definition of a variable (here, income_tax) is attached to a scope that declares it (here, IncomeTaxComputation). After equals, we have the actual expression for the variable : individual.income * tax_rate. The syntax for formulas uses the classic arithmetic operators. Here, * means multiplying an amount of money by a decimal, returning a new amount of money. The exact behavior of each operator depends on the types of values it is applied on. For instance, here, because a value of the money type is always an integer number of cents, * rounds the result of the multiplication to the nearest cent to provide the final value of type money (see the FAQ for more information about rounding in Catala). About individual.income, we see that the . notation lets us access the income field of individual, which is actually a structure of type Individual.

Using enumerations

Similarly to struct field access, Catala lets you inspect the contents of a enumeration value with pattern matching, as it is usual in functional programming language. Concretely, if tax_credit is a variable whose type is TaxCredit as declared above, then you can define the amount of a tax credit that depends on a number of eligible children with the following pattern matching:

match tax_credit with pattern
-- NoTaxCredit: $0
-- ChildrenTaxCredit of number_of_eligible_children:
  $10,000 * number_of_eligible_children

In the branch -- ChildrenTaxCredit of number_of_eligible_children:, you know that tax_credit is in the variant ChildrenTaxCredit, and number_of_eligible_children lets you bind the integer payload of the variant. Like in a regular functional programming language, you can give any name you want to number_of_eligible_children, which is useful if you're nesting pattern matching and want to differentiate the contents of two different variant payloads.

Now, back to our scope IncomeTaxComputation. at this point we're still missing the definition of tax_rate. This is a common pattern when coding the law: the definitions for various variables are scattered in different articles. Fortunately, the Catala compiler automatically collects all the definitions for each scope and puts them in the right order. Here, even if we define tax_rate after income_tax in our source code, the Catala compiler will switch the order of the definitions internally because tax_rate is used in the definition of income_tax. More generally, the order of toplevel definitions and declarations in Catala source code files does not matter, and you can refactor code around freely without having to care about dependency order.

In this tutorial, we'll suppose that our fictional CTTC specification defines the percentage in the next article. The Catala code below should not surprise you at this point.

Article 2

The fixed percentage mentioned at article 1 is equal to 20 %.

scope IncomeTaxComputation:
  # Writing 20% is just an alternative for the decimal "0.20".
  definition tax_rate equals 20 %

Common values and computations in Catala

So far, we have seen values that have types like decimal, money, integer. One could object that there is no point in distinguishing these three concepts, as they are merely numbers. However, the philosophy of Catala is to make every choice that affects the result of the computation explicit, and the representation of numbers does affect the result of the computation. Indeed, financial computations vary according to whether we consider money amount as an exact number of cents, or whether we store additional fractional digits after the cent. Since the kind of programs Catala is designed for implies heavy consequences for a lot of users, the language is quite strict about how numbers are represented. The rule of thumb is that, in Catala, numbers behave exactly according to the common mathematical semantics one can associate to basic arithmetic computations (+, -, *, /).

In particular, that means that integer values are unbounded and can never overflow. Similarly, decimal values can be arbitrarily precise (although they are always rational, belonging to ℚ) and do not suffer from floating-point imprecisions. For money, the language makes an opinionated decision: a value of type money is always an integer number of cents.

These choices has several consequences:

integer divided by integer gives a decimal ;
money cannot be multiplied by money (instead, multiply money by decimal) ;
money multiplied (or divided) by decimal rounds the result to the nearest cent ;
money divided by money gives a decimal (that is not rounded whatsoever).

Types, values and operations

Concretely, this gives:

10 / 3 = 3.333333333...
$10 / 3.0 = $3.33
$20 / 3.0 = $6.67
$10 / $3 = 3.33333333...

The Catala compiler will guide you into using the correct operations explicitly, by reporting compiler errors when that is not the case.

Resolving typing errors on operations

For instance, typing to add an integer and a decimal gives the following error message from the Catala compiler:

┌─[ERROR]─
│
│  I don't know how to apply operator + on types integer and decimal
│
├─➤ tutorial_en.catala_en
│    │
│    │   definition x equals 1 + 2.0
│    │                       ‾‾‾‾‾‾‾
│
│ Type integer coming from expression:
├─➤ tutorial_en.catala_en
│    │
│    │   definition x equals 1 + 2.0
│    │                       ‾
│
│ Type decimal coming from expression:
├─➤ tutorial_en.catala_en
│    │
│    │   definition x equals 1 + 2.0
│    │                           ‾‾‾
└─

To fix this error, you need to use explicit casting, for instance by replacing 1 by decimal of 1. Refer to the language reference for all possible casting, operations and their associated semantics.

Catala also has built-in date and duration types with the common associated operations (adding a duration to a date, substracting two dates to get a duration, etc.). For a deeper look at date computations (which are very tricky!), look at the language reference.

Testing the code

Now that we have implemented a few articles in Catala, it is time to test our code to check that it behaves correctly. We encourage you to test your code a lot, as early as possible, and check the test result in a continuous integration system to prevent regressions.

The testing of Catala code is done with the interpreter inside the compiler, accessible with the interpret command and the --scope option that specifies the scope to be interpreted?

Why can't I test IncomeTaxComputation directly?

The reflex at this point is to execute the following command:

$ catala interpret tutorial.catala_en --scope=IncomeTaxComputation
┌[ERROR]─
│
│  This scope needs input arguments to be executed. But the Catala built-in interpreter does not have a way to retrieve input values from the command line, so it cannot execute this scope.
│  Please create another scope that provides the input arguments to this one and execute it instead.
│
├─➤ tutorial.catala_en
│   │
│   │   input individual content Individual
│   │         ‾‾‾‾‾‾‾‾‾‾
└─

As the error message says, trying to interpret directly IncomeTaxComputation is like trying to compute the taxes of somebody without knowing the income of the person! To be executed, the scope needs to be called with concrete values for the income and the number of children of the individual. Otherwise, Catala will complain that input variables of the scope are missing for the interpretation.

The pattern for testing make use of concepts that will be seen later in the tutorial, so it is okay to take part of the following as some mysterious syntax that performs what we want. Basically, we will be creating for our test case a new test that will pass specific arguments to IncomeTaxComputation which is being tested:

Defining a test

declaration scope Test:
  # The following line is mysterious for now
  output computation content IncomeTaxComputation

scope Test:
  definition computation equals
    # The following line is mysterious for now
    output of IncomeTaxComputation with {
      # Below, we pass the input variables for "IncomeTaxComputation"
      -- individual:
        # "individual" has a structure type, so we need to build the
        # structure "Individual" with the following syntax
        Individual {
          # "income" and "number_of_children" are the fields of the structure;
          # we give them the values we want for our test
          -- income: $20,000
          -- number_of_children: 0
        }
    }

This test can now be executed through the Catala interpreter:

$ catala interpret tutorial.catala_en --scope=Test
┌─[RESULT]─
│ computation = IncomeTaxComputation { -- income_tax: $4,000.00 }
└─

We can now check that $4,000 = $20,000 * 20%; the result is correct.

Test, test, test!

Use this test to regularly play with the code during the tutorial and inspect its results under various input scenarios. This will help you understand the behavior of Catala programs, and spot errors in your code 😀

You can also check that there is no syntax or typing error in your code, without testing it, with the following command:

$ catala typecheck tutorial.catala_en
┌─[RESULT]─
│ Typechecking successful!
└─

Checkpoint

This concludes the first section of the tutorial. By setting up data structures like structure and enumeration, representing the types of scope variables, and definition of formulas for these variables, you should now be able to code in Catala the equivalent of single-function programs that perform common arithmetic operations and define local variables.

Recap of the current section

For reference, here is the final version of the Catala code consolidated at the end of this section of the tutorial.

```catala
declaration structure Individual:
  data income content money
  data number_of_children content integer

declaration scope IncomeTaxComputation:
  input individual content Individual
  internal tax_rate content decimal
  output income_tax content money
```

## Article 1

 The income tax for an individual is defined as a fixed percentage of the
 individual's income over a year.

```catala
scope IncomeTaxComputation:
  definition income_tax equals
    individual.income * tax_rate
```

## Article 2

The fixed percentage mentioned at article 1 is equal to 20 %.

```catala
scope IncomeTaxComputation:
  label article_2
  definition tax_rate equals 20%
```

## Test

```catala
declaration scope Test:
  output computation content IncomeTaxComputation

scope Test:
  definition computation equals
    output of IncomeTaxComputation with {
      -- individual:
        Individual {
          -- income: $20,000
          -- number_of_children: 0
        }
    }
```

The Catala domain-specific programming language