This document was produced from a set of source files written in the Catala programming language, mixing together the legislative text and the computer code that translates it. For more information about the methodology and how to read the code, please visit https://catala-lang.org.
Source files weaved in this document:
Welcome to this tutorial, whose objective is to guide you through the features of the Catala language and teach you how to annotate a legislative text using the language. This document is addressed primarily to developers or people that have a programming background, though tech-savvy lawyers will probably figure things out.
To begin writing a Catala program, you must start from the text of the legislative source that will justify the code that you will write. Concretely, that means copy-pasting the text of the law into a Catala source file and formatting it according so that Catala can understand it. Catala source files have the “.catala_en” extension. If you were to write a Catala program for a French law, you would use the “.catala_fr” extension; for Polish law, “catala_pl”, etc.
You can write any kind of plain text in Catala, and it will be printed as is in PDF or HTML output. You can split your text into short lines of less than 80 characters, terminal-style, and those will appear as a single paragraph in the output. If you want to create a new paragraph, you have to leave a blank line in the source. The syntax for non-code text in a Catala program follows a subset of Markdown that supports titles and Catala block codes.
Catala allows you to declare section or subsection headers as it is done here, with the “#” symbol. You can define heading of lower importance by adding increasing numbers of “#” after the title of the heading. Generally, the literate programming syntax of Catala follows the syntax of Markdown (though it does not have all of its features).
In the rest of this tutorial, we will analyze a fictional example that defines an income tax. This income tax is defined through several articles of law, each of them introduced by a header. Here is the first one:
The income tax for an individual is defined as a fixed percentage of the individual’s income over a year.
To translate that fictional income tax definition into a Catala program, we will intertwine short snippets of code between the sentences of the legislative text. Each snippet of code should be as short as possible and as close as possible to the actual sentence that justifies the code. This style is called literate programming, a programming paradigm invented by the famous computer scientist Donald Knuth in the 70s.
The content of article 1 uses a lot of implicit context: there exists an individual with an income, as well as an income tax that the individual has to pay each year. Even if this implicit context is not verbatim in the law, we have to explicit it for programming purposes. Concretely, we need a “metadata” section that defines the shape and types of the data used inside the law.
Let’s start our metadata section by declaring the type information for the individual, the taxpayer that will be the subject of the tax computation. This individual has an income and a number of children, both pieces of information which will be needed for tax purposes :
This structure contains two data fields, “income” and “number_of_children”. Structures are useful to group together data that goes together. Usually, you get one structure per concrete object on which the law applies (like the individual). It is up to you to decide how to group the data together, but you should aim to optimize code readability.
Sometimes, the law gives an enumeration of different situations. These enumerations are modeled in Catala using an enumeration type, like:
In computer science terms, such an enumeration is called a “sum type” or simply an enum. The combination of structures and enumerations allow the Catala programmer to declare all possible shapes of data, as they are equivalent to the powerful notion of “algebraic data types”.
We’ve defined and typed the data that the program will manipulate. Now we have to define the logical context in which this data will evolve. This is done in Catala using “scopes”. Scopes are close to functions in terms of traditional programming. Scopes also have to be declared in metadata, so here we go:
The scope is the basic abstraction unit in Catala programs, and the declaration of the scope is akin to a function signature: it contains a list of all the arguments along with their types. But in Catala, scopes’ variables stand for three things: input arguments, local variables and outputs. The difference between these three categories can be specified by the different input/output attributes preceding the variable names. “input” means that the variable has to be defined only when the scope IncomeTaxComputation is called. “internal” means that the variable cannot be seen from outside the scope: it is neither an input nor an output of the scope. “output” means that a caller scope can retrieve the computed value of the variable. Note that a variable can also be simultaneously an input and an output of the scope, in that case it should be annotated with “input output”.
We now have everything to annotate the contents of article 1, which is copied over below.
The income tax for an individual is defined as a fixed percentage of the individual’s income over a year.
In the code, we are defining inside our scope the amount of the income tax according to the formula described in the article. When defining formulas, you have access to all the usual arithmetic operators: addition “+”, subtraction “-”, multiplication “*” and division (slash).
However, in the Catala code, you should be aware that these operators can behave differently depending on the quantities considered: indeed, money for example is rounded at the cent. The Catala compiler is able to automatically select the appropriate operation: here it can detect that money is being multiplied by a decimal which is a known operation that yields an amount of money, rounded at the cent. Some other operations are not allowed, like multiplying two amounts of money together, or adding two dates.
Coming back to article 1, one question remains unknown: what is the value of the fixed percentage? Often, precise values are defined elsewhere in the legislative source. Here, let’s suppose we have:
The fixed percentage mentioned at article 1 is equal to 20 %.
You can see here that Catala allows definitions to be scattered throughout the annotation of the legislative text, so that each definition is as close as possible to its location in the text.
So far so good, but now the legislative text introduces some trickiness. Let us suppose the third article says:
If the individual is in charge of 2 or more children, then the fixed percentage mentioned at article 1 is equal to 15 %.
This article actually gives another definition for the fixed percentage, which was already defined in article 2. However, article 3 defines the percentage conditionally to the individual having more than 2 children. Catala allows you precisely to redefine a variable under a condition:
While conditional definitions are a powerful tool for expressing legal conditions, a correctly drafted legislative source should always ensure that at most one condition is true at all times. This way, when the Catala program is executed, the right definition will be dynamically chosen by looking at which condition is true.
Here, however, our definition of fixed_percentage
conflicts with the more general definition that we gave above—so Catala
will give us an error if we try and use this definition as is. In
situations like this, Catala allows us to define a precedence on the
conditions, which has to be justified by the law. But we will see how to
do that later.
So far, you’ve learnt how to declare a scope with some variables, and give definitions to these variables scattered across the text of the law at the relevant locations. But there is a pattern very frequent in legislative texts: what about conditions? A condition is a value that can be either true or false, like a boolean in programming. However, the law implicitly assumes a condition is false unless specified otherwise. This pattern is so common in law that Catala gives it a special syntax. More precisely, it calls the definition of conditions “rules”, which coincides with the usual meaning law people would give it.
Here is an example of a condition that might arise in the law:
The children eligible for application of article 3
When interacting with other elements of the code, condition values behaves like boolean values.
Catala lets you define functions anywhere in your scope variable. Indeed, Catala is a functional language and encourages using functions to describe relationships between data. Here’s what it looks like in the metadata definition when we want to define a two-brackets tax computation:
The tax amount for a two-brackets computation is equal to the amount of income in each bracket multiplied by the rate of each bracket.
Now that we’ve defined our helper scope for computing a two-brackets tax, we want to use it in our main tax computation scope. As mentioned before, Catala’s scope can also be thought of as big functions. And these big functions can call each other, which is what we’ll see in the below article.
For individuals whose income is greater than $100,000, the income tax of article 1 is 40% of the income above $100,000. Below $100,000, the income tax is 20% of the income.
Now that we’ve successfully defined our income tax computation, the legislator inevitably comes to disturb our beautiful and regular formulas to add a special case! The article below is a really common pattern in statutes, and let’s see how Catala handles it.
Individuals earning less than $10,000 are exempted of the income tax mentioned at article 1.
That’s it! We’ve defined a two-brackets tax computation with a special case simply by annotating legislative article by snippets of Catala code. However, attentive readers may have caught something weird in the Catala translation of articles 5 and 6. What happens when the income of the individual is lesser than $10,000? Right now, the two definitions at articles 5 and 6 for income_tax apply, and they’re in conflict.
This is a flaw in the Catala translation, but the language can help you find this sort of errors via simple testing or even formal verification. Let’s start with the testing.
Testing Catala programs can be done directly into Catala. Indeed, writing test cases for each Catala scope that you define is a good practice called “unit testing” in the software engineering community. A test case is defined as another scope:
This test should pass. We can validate it by inserting a special test snippet in the file as follows:
$ catala interpret --scope Test1 --disable-warnings
┌─[RESULT]─
│ income_tax = $72,000.00
└─
Running clerk test
will automatically check that the
given command matches its expected output when run on this file (in
practice, it’s generally preferred to put such tests in a separate
file).
Let us now consider a failing test case:
$ catala interpret --scope Test2 --disable-warnings
┌─[ERROR]─
│
│ During evaluation: conflict between multiple valid consequences for
│ assigning the same variable.
│
├─➤ tutorial_en/tutorial_en.catala_en:318.7-318.30:
│ │
│ 318 │ income * brackets.rate1
│ │ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
├─ The Catala language tutorial
│ └┬ Functions
│ └─ Article 4
│
├─➤ tutorial_en/tutorial_en.catala_en:387.22-387.24:
│ │
│ 387 │ consequence equals $0
│ │ ‾‾
└─ The Catala language tutorial
└┬ Scope inclusion
└─ Article 6
#return code 123#
This test case should compute a $0 income tax because of Article 6. But instead, execution yields an error stating that there is a conflict between rules.
Indeed, the definition of the income tax in article 6 conflicts with the definition of income tax in article 5. But actually, article 6 is just an exception of article 5. In the law, it is implicit that if article 6 is applicable, then it takes precedence over article 5.
This implicit precedence has to be explicitly declared in Catala. Here is a fixed version of the NewIncomeTaxComputation scope:
To define an exception to a rule, you have to first label the rule that you want to attach the exception to. You can put any snake_case identifier for the label:
And the test that should now work:
$ catala interpret --scope Test3 --disable-warnings
┌─[RESULT]─
│ income_tax = $0.00
└─
Note that the label system also lets you define more complicated exceptions patterns. Sometimes, you want to declare an exception to a group of piecewise definitions. To do that, simply use the same label for all the piecewise definitions.
As we have seen, two exceptions applying at the same time to a given rule are in conflict, and trigger an error. It happens, though, that these exceptions yield the same end result: for convenience, Catala tolerates this case and returns the common result, as long as there is a strict syntactic equality.
Individuals with 7 children or more are exempted of the income tax mentioned at article 1.
The same problem as above would be triggered for families with an
income below $10,000
and 7 children or more. But here
Catala can detect that it won’t matter since the result in both cases is
an exemption.
$ catala interpret --scope Test4 --disable-warnings
┌─[RESULT]─
│ income_tax = $0.00
└─
In some cases, it is useful to apply the computation of a scope only under specific circumstances. For example, some social benefits may have different computation modes depending on the situation of the beneficiary. In this case, defining each of the different modes as subscopes can be tedious for two reasons: first, some input values may not be relevant for the cases the beneficiary is not concerned with, and the language will still enforce that you leave nothing undefined ; second, unnecessary computation will take place.
For these cases, it is possible to call a scope directly, specifying all its input variables at once, and getting back its output variables in a way similar to usual subscopes.
There are three important bits here: - the
output of NewIncomeTaxComputationFixed with { ... }
is the
most interesting part: it triggers a call of the named scope while
setting its input variable individual
- it is wrapped in a
let <name> equals <expression> in ...
construct
we haven’t seen yet: this allows us to give a name to the result of the
call so that we can refer to it within the last part -
result_of_tax_computation.income_tax
re-uses the name we
have just been defining, which holds the results of the scope call, and
retrieves the output of the scope we are interested in, here
income_tax
You can check that income_tax
was defined as an output
with content money
, which matches what we are expecting
here.
It’s not by chance that
result_of_tax_computation.income_tax
looks similar to the
way we access a field in a structure: in fact, when defining any scope
(here we are concerned with NewIncomeTaxComputationFixed
) a
structure with the same name, and containing all of its outputs as
fields, was automatically defined by Catala. In our case there is just
one output, so the structure NewIncomeTaxComputationFixed
has a single field data income_tax content money
.
The result_of_tax_computation
intermediate name in
practice holds this structure, as returned by the direct scope call. And
the .income_tax
syntax is indeed an access to the
income_tax
field of this structure.
In addition, when declaring a subscope, it is possible to prefix it
with output
:
Besides defining the subscope, this declares
tax_computation
as an output variable that will contain the
structure holding the outputs of
NewIncomeTaxComputationFixed
.
$ catala interpret --scope Test4bis --disable-warnings
┌─[RESULT]─
│ tax_computation =
│ NewIncomeTaxComputationFixed {
│ -- tax_formula: <function>
│ -- income_tax: $0.00
│ }
└─
With its “input”,“output” and “internal” variables, Catala’s scope are close to regular functions with arguments, local variables and output. However, the law can sometimes be adversarial to good programming practices and define provisions that break the abstraction barrier normally associated to a function.
This can be the case when an outside body of legislative text “reuses” a a legal concept but adding a twist on it. Consider the following fictional (but not quite pathological computational-wise) example.
The justice system delivers fines to individuals when they committed an offense. The fines are determined based on the amount of taxes paid by the individual. The more taxes the individual pays, the higher the fine. However, the determination of the amount of taxes paid by an individual in this context shall include a flat tax fee of $500 for individuals earning less than $10,000.
When a quantity is mentioned in the law, it does not always maps exactly to a unique Catala variable. More precisely, it often happens that the law defines a unique quantity with multiple computation steps, each new one building on the previous one. Here is an example of such a setup and how to deal with it thanks to a dedicated Catala feature.
Under the hood, the different states of a Catala variable are implemented by distinct variables inside the lower intermediate representations of the language.
For taxation purposes, the values of the building operated for charity purposes can be deducted from the wealth of the individual, which is then capped at $2,500,000.
Any references to wealth
outside of its definition would
refer to its last, or “outcome” state (any definitions of
wealth
, if it were the case the variable was marked
input
or context
, would obviously affect its
first, or “starting” state). But it may happen that the law explicitely
refers to the “uncapped wealth after charity reductions”. This can be
accessed by explicitely writing
wealth state after_charity_deductions
.
There are two limitations to this: - it is only available within the
scope the variable is defined in: even if wealth
were an
output, its intermediate states remain opaque to the outside. - it is
not allowed within definitions of the variable itself, where the
preceding state is always implicitly used.
So far, this tutorial has introduced you to the basic structure of Catala programs with scope, definitions and exceptions. But to be able to handle most statutes, Catala comes with support for the usual types of values on which legislative computations operate.
Booleans are the most basic type of value in Catala: they can be either true or false. Conditions are simply booleans with a default value of false, see the section about conditions above.
Integers in Catala are infinite precision: they behave like true mathematical integers, and not like computer integers that are bounded by a maximum value due to them being stored in 32 or 64 bits. Integers can be negative.
Decimals in Catala also have infinite precision, behaving like true mathematical rational numbers, and not like computer floating point numbers that perform approximate computations.
In Catala, money is simply represented as an integer number of cents. You cannot in Catala have a more precise amount of money than a cent. However, you can multiply an amount of money by a decimal, and the result is rounded towards the nearest cent. This principled behaviour to keep track of where you need precision in your computations and select decimals for precision instead of relying on money figures where sub-cent precision matters. Two money amounts can be divided, yielding a decimal.
Multiplying two money amounts is not supported, as it is most likely
to be a coding error ; if processing a computation complex enough to
require it, convert them to decimals first using the syntax
decimal of $1.00
.
Catala has support for Gregorian calendar dates as well as duration computations in terms of days, months and years. A difference between dates is a duration measured in days, and the addition of a date and a duration yields a new date. Durations measured in days, months or years are not mixable with each other, as months and years do not always have the same number of days. This non-mixability is not captured by the type system of Catala but will yield errors at runtime. Date literals are specified using the ISO 8601 standard to avoid confusion between american and european notations.
Date computations can be ambiguous. What’s
|2024-02-29| + 1 year
?
$ catala interpret --scope AmbiguousDate --disable-warnings
┌─[ERROR]─
│
│ During evaluation: ambiguous date computation, and rounding mode was not
│ specified.
│
├─➤ tutorial_en/tutorial_en.catala_en:962.41-962.42:
│ │
│ 962 │ definition result equals |2000-02-29| + 1 year
│ │ ‾
└─ The Catala language tutorial
└┬ Catala values
└┬ Dates and durations
└─ Ambiguities
#return code 123#
Indeed, there are two possible answers to this, and none is canonically better than the other. Depending on the jurisdiction or jurisprudence, you can specify which one is right for the case at hand as so:
$ catala interpret --scope AmbiguousDate2 --disable-warnings
┌─[RESULT]─
│ result = 2001-02-28
└─
Some other cases are always ambiguous and will inevitably result in
an error, like asking whether 30 day < 1 month
.
$ catala interpret --scope AmbiguousDate3 --disable-warnings
┌─[ERROR]─
│
│ During evaluation: ambiguous comparison between durations in different
│ units (e.g. months vs. days).
│
├─➤ tutorial_en/tutorial_en.catala_en:1011.35-1011.36:
│ │
│ 1011 │ definition result equals 30 day < 1 month
│ │ ‾
└─ The Catala language tutorial
└┬ Catala values
└┬ Dates and durations
└─ Ambiguities
#return code 123#
Often, Catala programs need to work with collections of data because the law is about the number of children, the maximum of a list, etc. Catala features first-class support for lists. You can create a list, filter its elements but also aggregate over its contents to compute all sorts of values.
While lists are for variable-size collections of values of the same type, sometimes it is useful to group together a fixed number of values of different, known types. These are called pairs when the number is two, tuples in the general case.
In general, it should be favoured to explicitly declare a named
structure, with named data
fields to hold each of those.
But sometimes, for intermediate values, that can become cumbersome.
Elements of a tuple are simply separated by commas, and delimited by
parentheses as so:
Access to a member of a tuple is simply done using syntax
.N
, where N
is the index in the tuple,
starting at 1.
Note that applying a function to multiple arguments or to a tuple is the same thing (as long as the number and types of the arguments match). It is even possible to process a tuple of lists as if it were a list of tuples :
This will create a new list where the first element is the product of
the first elements of values
and rates
respectively, and so on. Be wary that the two lists must be of the same
length, or this would result in a runtime error.
This tutorial presents the basic concepts and syntax of the Catala language features. It is then up to you use them to annotate legislative texts with their algorithmic translation.
There is no single way to write Catala programs, as the program style should be adapted to the legislation it annotates. However, Catala is a functional language at heart, so following standard functional programming design patterns should help achieve concise and readable code.
Toplevel definitions provide a way to define values or functions directly in the program, without putting them in a scope. This is useful for constants or helper functions, as shown in the examples below.
Toplevel definitions are available by name throughout the program ; they can depend on each other (as long as there are no cycles), but are not allowed to rely on any scope evaluations.
They can be useful both for defining constants and pure utility functions
« Throughout this corpus, the number of workdays per week is assumed to be 5. »
« […] The final allocation is rounded up to the next multiple of $100 »
The point here is that the computation formula is purely technical,
so once justified where it is written, and when actually needing this
computation in the translation of the law, you just need to write
round_up_100 of value
, which is much more
self-explanatory.
« The amount to include in gross income is the excess of the fair market value of the property over the amount paid. »
Here we could write the formula
if fair_market_value > amount_paid then fair_market_value - amount_paid else $0
,
which is the definition of “excess of”. However, it requires careful
reading to make sure there was no mistake, and takes us further from the
law and into technical detail. It’s better to define a named function as
such:
Here the function definition is easier to verify, and the final definition of the scope is more straightforward, while staying closer to the original text.
Modules are “compilation units” in Catala: they can be compiled separately and reused within different programs. A self-contained corpus of texts that can be referred to from other corpuses is a good candidate for a module. It’s possible for a module to rely on other modules, as long as there is no cyclic dependency.
A module is declared with the following syntax:
This is typically written on top of the file, and the name of the module must match the name of the file (but for the first letter, which must be a capital).
Anything that is declared in catala-metadata
sections
will be made available to users of the module; things that only appear
in catala
sections won’t be visible. For example, a scope
that is defined in a catala
section can be
executed from outside of the module as long as it has been
declared in a catala-metadata
section.
If a module needs to be split in several files, there must be a main
file declaring the module, in which the basic, textual inclusion is used
with the > Include: file.catala_en
directive.
The usage of an external module must be declared, preferably at the
beginning of the file as well. This is done with the directive
> Using SomeModule
, with SomeModule
the
name of the needed module. To access the declarations from
SomeModule
(enumerations, structures, scopes and top-level
values) without mixing them up with the ones from the current, or other
imported modules, they must be prefixed with SomeModule.
,
e.g. SomeModule.scope_from_that_module
.
For convenience, it is possible to alias, or locally rename
the module using > Using SomeModule as NewName
, and
henceforth refer to it with NewName.scope_from_that_module
instead.
To compile multi-module Catala programs, use the dedicated
clerk
build-system. It automatically detects and finds the
modules being used and ensures a consistent build of the whole program.
See clerk --help
for more detail (if the modules are
scattered across directories, use the -I <directory>
option to lookup modules inside them).