Basic blocks of a Catala program
In this section, the tutorial introduces the basic blocks of a Catala program : the difference between law and code, data structures, scopes, variables and formulas. By the end of the section, you should be able to write a simple Catala program equivalent to a single function with local variables whose definitions can refer to one another.
A recap of the tutorial section with the full code that is expected to be
in your tutorial.catala_en
companion file is attached at the end of this page.
Pease refer to it if you feel lost during the reading and want to have a
vibes check of whether you are on track to complete your tutorial.catala_en
file.
Mixing law and code
Catala is a language designed around the concept of literate programming, that is the mixing between the computer code and its specification in a single document. Why literate programming? Because it enables a fine-grained correspondance between the specification and the code. Whenever the specification is updated, knowing where to update the code is trivial with literal programming. This is absolutely crucial for enabling long-term maintenance of complex and high-assurance programs like tax or social benefits computation.
Hence, a Catala source code file looks like a regular Markdown
document, with the specification written down and styled as Markdown text,
with the Catala code only present in well-bounded Catala code blocks introduced
by a line with ```catala
and ended by a line with ```
.
Before writing any Catala code, we must introduce the specification of the code for this tutorial. This specification will be based on a fictional Tax Code defining a simple income tax. But in general, anything can be used as a specification for a Catala program: laws, executive orders, court cases motivations, legal doctrine, internal instructions, technical specifications, etc. These sources can be mixed to form a complete Catala program that relies on these multiple sources. Concretely, incorporating a legal source of specification into the Catala program amounts to copy-pasting the text and formatting it in Markdown syntax inside the source code file.
Without further ado, let us introduce the first bit of specification for our fictional income tax, article 1 of the CTTC (Catala Tutorial Tax Code):
The income tax for an individual is defined as a fixed percentage of the individual's income over a year.
Catala uses Markdown-like formatting for the legal text in the .catala_en
files. So, to copy the text of the article into your tutorial.catala_en
file, mark up the article header with ##
and put the text below, as such:
## Article 1
The income tax for an individual is defined as a fixed percentage of the
individual's income over a year.
The spirit of writing code in Catala is to stick to the specification at all times in order to put the code snippets where they belong. Hence, we will introduce below the Catala code snippets that translate article 1, which should be put just below article 1 in the Catala source code file.
These code snippets should describe the program that computes the income tax, and contain the rule defining it as a multiplication of the income as rate. It is time to dive into Catala as a programming language.
# We will soon learn what to write here in order to translate the meaning
# of article 1 into Catala code.
# To create a block of Catala code in your file, bound it with Markdown-style
# "```catala" and "```" delimiters. You can write comments in Catala code blocks
# by prefixing lines with "#"
# In the rest of the tutorial, when presenting Catala code snippets, it is
# assumed implicitly that you should copy-paste them into your
# tutorial.catala_en file inside a Catala code block enclosed between
# "```catala" and "```" delimiters, and placed near the article of law that
# it implements.
Setting up data structures
The content of article 1 assumes a lot of implicit context: there exists an individual with an income, as well as an income tax that the individual has to pay each year. Even if this implicit context is not verbatim in the law, we have to explicit it in the computer code, in the form of data structures and function signatures.
Catala is a strongly-typed, statically compiled language, so all data structures and function signatures have to be explicitly declared. So, we begin by declaring the type information for the individual, the taxpayer that will be the subject of the tax computation. This individual has an income and a number of children, both pieces of information which will be needed for tax purposes :
# Data structure declarations and generally any declaration in Catala often
# does not match any specific article of law. Hence, you can put all the
# declarations at the top of your tutorial.catala_en file, before article 1.
# The name of the structure, "Individual", must start with an
# uppercase letter: this is the CamelCase convention.
declaration structure Individual:
# In this line, "income" is the name of the structure field and
# "money" is the type of what is stored in that field.
# Available types include: "integer", "decimal", "money", "date",
# "duration", and any other structure or enumeration that you declare.
data income content money
# The field names "income" and "number_of_children" start by a lowercase
# letter, they follow the snake_case convention.
data number_of_children content integer
This structure contains two data fields, income
and number_of_children
.
Structures are useful to group together data that goes together. Usually, you
get one structure per concrete object on which the law applies (like the
individual). It is up to you to decide how to group the data together, but we
advise you to aim at optimizing code readability.
Sometimes, the law gives an enumeration of different situations. These enumerations are modeled in Catala using an enumeration type, like:
# The name "TaxCredit" is also written in CamelCase.
declaration enumeration TaxCredit:
# The line below says that "TaxCredit" can be a "NoTaxCredit" situation.
-- NoTaxCredit
# The line below says that alternatively, "TaxCredit" can be a
# "ChildrenTaxCredit" situation. This situation carries a content
# of type integer corresponding to the number of children concerned
# by the tax credit. This means that if you're in the "ChildrenTaxCredit"
# situation, you will also have access to this number of children.
-- ChildrenTaxCredit content integer
In computer science terms, such an enumeration is called a "sum type" or simply an enum. The combination of structures and enumerations allow the Catala programmer to declare all possible shapes of data, as they are equivalent to the powerful notion of algebraic data types.
Notice that these data structures that we have declared cannot always be attached naturally to a particular piece of the specification text. So, where to put these declarations in your literate programming file? Since you will be often going back to these data structure declarations during programming, we advise you to group them together in some sort of prelude in your code source file. Concretely, this prelude section containing the data structure declaration will be your one stop shop when trying to understand the data manipulated by the rules elsewhere in the source code file.
Scopes as basic computation blocks
We've defined and typed the data that the program will manipulate. Now, we have to define the logical context in which this data will evolve. Because Catala is a functional programming language, all code exists within a function. And the equivalent to a function in Catala is called a scope. A scope is comprised of :
- a name,
- input variables (similar to function arguments),
- internal variables (similar to local variables),
- output variables (that together form the return type of the function).
For instance, article 1 declares a scope for computing the income tax:
# Scope names use the CamelCase naming convention, like names of structs
# or enums Scope variables, on the other hand, use the snake_case naming
# convention, like struct fields.
declaration scope IncomeTaxComputation:
# The following line declares an input variable of the scope, which is akin to
# a function parameter in computer science term. This is the piece of
# data on which the scope will operate.
input individual content Individual
internal tax_rate content decimal
output income_tax content money
The scope is the basic abstraction unit in Catala programs, and scopes can be composed. Since a function can call other functions, scopes can also call other scopes. We will see later how to do this, but first let us focus on the inputs and outputs of scopes.
The declaration of the scope is akin to a function signature: it contains a list
of all the arguments along with their types. But in Catala, scope variables can
be input
, internal
or output
. input
means that the variable has to be
provided whenever the scope is called, and cannot be defined within the scope.
internal
means that the variable is defined within the scope and cannot be
seen from outside the scope; it's not part of the return value of the scope.
output
means that a caller can retrieve the computed value of the variable.
Note that a variable can also be simultaneously an input and an output of the
scope, in that case it should be annotated with input output
.
Once the scope has been declared, we can use it to define our computation rules and finally code up article 1!
Defining variables and formulas
Article 1 actually gives the formula to define the income_tax
variable of
scope IncomeTaxComputation
, which translates to the following Catala code:
The income tax for an individual is defined as a fixed percentage of the individual's income over a year.
scope IncomeTaxComputation:
definition income_tax equals
individual.income * tax_rate
Let us unpack the code above. Each definition
of a variable (here,
income_tax
) is attached to a scope that declares it (here,
IncomeTaxComputation
). After equals
, we have the actual expression for the
variable : individual.income * tax_rate
. The syntax for formulas uses
the classic arithmetic operators. Here, *
means multiplying an amount of
money
by a decimal
, returning a new amount of money
. The exact behavior of
each operator depends on the types of values it is applied on. For instance,
here, because a value of the money
type is always an integer number of cents,
*
rounds the result of the multiplication to the nearest cent to provide the
final value of type money
(see the FAQ for more information
about rounding in Catala). About individual.income
, we see that the .
notation
lets us access the income
field of individual
, which is actually a structure
of type Individual
.
Similarly to struct field access, Catala lets you inspect the contents of
a enumeration value with pattern matching, as it is usual in functional programming
language. Concretely, if tax_credit
is a variable whose type is TaxCredit
as
declared above, then you can define the amount of a tax credit that depends
on a number of eligible children with the following pattern matching:
match tax_credit with pattern
-- NoTaxCredit: $0
-- ChildrenTaxCredit of number_of_eligible_children:
$10,000 * number_of_eligible_children
In the branch -- ChildrenTaxCredit of number_of_eligible_children:
, you know
that tax_credit
is in the variant ChildrenTaxCredit
, and
number_of_eligible_children
lets you bind the integer
payload of the
variant. Like in a regular functional programming language, you can give any
name you want to number_of_eligible_children
, which is useful if you're
nesting pattern matching and want to differentiate the contents of two different
variant payloads.
Now, back to our scope IncomeTaxComputation
. at this point we're still missing
the definition of tax_rate
. This is a common pattern when coding the
law: the definitions for various variables are scattered in different articles.
Fortunately, the Catala compiler automatically collects all the definitions for
each scope and puts them in the right order. Here, even if we define
tax_rate
after income_tax
in our source code, the Catala compiler
will switch the order of the definitions internally because tax_rate
is used in the definition of income_tax
. More generally, the order of toplevel
definitions and declarations in Catala source code files does not matter, and
you can refactor code around freely without having to care about dependency
order.
In this tutorial, we'll suppose that our fictional CTTC specification defines the percentage in the next article. The Catala code below should not surprise you at this point.
The fixed percentage mentioned at article 1 is equal to 20 %.
scope IncomeTaxComputation:
# Writing 20% is just an alternative for the decimal "0.20".
definition tax_rate equals 20 %
Common values and computations in Catala
So far, we have seen values that have types like decimal
, money
, integer
.
One could object that there is no point in distinguishing these three concepts,
as they are merely numbers. However, the philosophy of Catala is to make every
choice that affects the result of the computation explicit, and the
representation of numbers does affect the result of the computation. Indeed,
financial computations vary according to whether we consider money amount as an
exact number of cents, or whether we store additional fractional digits after
the cent. Since the kind of programs Catala is designed for
implies heavy consequences for a lot of users, the language is quite strict
about how numbers are represented. The rule of thumb is that, in Catala,
numbers behave exactly according to the common mathematical semantics one
can associate to basic arithmetic computations (+
, -
, *
, /
).
In particular, that means that integer
values are unbounded and can never
overflow. Similarly, decimal
values can be arbitrarily precise (although they are always rational, belonging
to ℚ) and do not suffer from floating-point imprecisions. For money
, the
language makes an opinionated decision: a value of type money
is always
an integer number of cents.
These choices has several consequences:
integer
divided byinteger
gives adecimal
;money
cannot be multiplied bymoney
(instead, multiplymoney
bydecimal
) ;money
multiplied (or divided) bydecimal
rounds the result to the nearest cent ;money
divided bymoney
gives adecimal
(that is not rounded whatsoever).
Concretely, this gives:
10 / 3 = 3.333333333...
$10 / 3.0 = $3.33
$20 / 3.0 = $6.67
$10 / $3 = 3.33333333...
The Catala compiler will guide you into using the correct operations explicitly, by reporting compiler errors when that is not the case.
For instance, typing to add an integer
and a decimal
gives the
following error message from the Catala compiler:
┌─[ERROR]─
│
│ I don't know how to apply operator + on types integer and decimal
│
├─➤ tutorial_en.catala_en
│ │
│ │ definition x equals 1 + 2.0
│ │ ‾‾‾‾‾‾‾
│
│ Type integer coming from expression:
├─➤ tutorial_en.catala_en
│ │
│ │ definition x equals 1 + 2.0
│ │ ‾
│
│ Type decimal coming from expression:
├─➤ tutorial_en.catala_en
│ │
│ │ definition x equals 1 + 2.0
│ │ ‾‾‾
└─
To fix this error, you need to use explicit casting, for instance by replacing
1
by decimal of 1
. Refer to the language reference for all
possible casting, operations and their associated semantics.
Catala also has built-in date
and duration
types with the common associated
operations (adding a duration to a date, substracting two dates to get a
duration, etc.). For a deeper look at date computations (which are very tricky!), look at the language reference.
Testing the code
Now that we have implemented a few articles in Catala, it is time to test our code to check that it behaves correctly. We encourage you to test your code a lot, as early as possible, and check the test result in a continuous integration system to prevent regressions.
The testing of Catala code is done with the interpreter inside the compiler,
accessible with the interpret
command and the --scope
option that specifies
the scope to be interpreted?
Why can't I test IncomeTaxComputation
directly?
Why can't I test IncomeTaxComputation
directly?
The reflex at this point is to execute the following command:
$ catala interpret tutorial.catala_en --scope=IncomeTaxComputation
┌[ERROR]─
│
│ This scope needs input arguments to be executed. But the Catala built-in interpreter does not have a way to retrieve input values from the command line, so it cannot execute this scope.
│ Please create another scope that provides the input arguments to this one and execute it instead.
│
├─➤ tutorial.catala_en
│ │
│ │ input individual content Individual
│ │ ‾‾‾‾‾‾‾‾‾‾
└─
As the error message says, trying to interpret directly IncomeTaxComputation
is like
trying to compute the taxes of somebody without knowing the income of the person!
To be executed, the scope needs to be called with concrete values for the income
and the number of children of the individual. Otherwise, Catala will complain
that input
variables of the scope are missing for the interpretation.
The pattern for testing make use of concepts that will be seen
later in the tutorial, so it is okay to take part of
the following as some mysterious syntax that performs what we want. Basically,
we will be creating for our test case a new test that will pass specific
arguments to IncomeTaxComputation
which is being tested:
declaration scope Test:
# The following line is mysterious for now
output computation content IncomeTaxComputation
scope Test:
definition computation equals
# The following line is mysterious for now
output of IncomeTaxComputation with {
# Below, we pass the input variables for "IncomeTaxComputation"
-- individual:
# "individual" has a structure type, so we need to build the
# structure "Individual" with the following syntax
Individual {
# "income" and "number_of_children" are the fields of the structure;
# we give them the values we want for our test
-- income: $20,000
-- number_of_children: 0
}
}
This test can now be executed through the Catala interpreter:
$ catala interpret tutorial.catala_en --scope=Test
┌─[RESULT]─
│ computation = IncomeTaxComputation { -- income_tax: $4,000.00 }
└─
We can now check that $4,000 = $20,000 * 20%
; the result is correct.
Use this test to regularly play with the code during the tutorial and inspect its results under various input scenarios. This will help you understand the behavior of Catala programs, and spot errors in your code 😀
You can also check that there is no syntax or typing error in your code, without testing it, with the following command:
$ catala typecheck tutorial.catala_en
┌─[RESULT]─
│ Typechecking successful!
└─
Checkpoint
This concludes the first section of the tutorial. By setting up data structures
like structure
and enumeration
, representing the types of scope
variables, and definition
of formulas for these variables, you should now be able to
code in Catala the equivalent of single-function programs that perform common
arithmetic operations and define local variables.
Recap of the current section
Recap of the current section
For reference, here is the final version of the Catala code consolidated at the end of this section of the tutorial.
```catala
declaration structure Individual:
data income content money
data number_of_children content integer
declaration scope IncomeTaxComputation:
input individual content Individual
internal tax_rate content decimal
output income_tax content money
```
## Article 1
The income tax for an individual is defined as a fixed percentage of the
individual's income over a year.
```catala
scope IncomeTaxComputation:
definition income_tax equals
individual.income * tax_rate
```
## Article 2
The fixed percentage mentioned at article 1 is equal to 20 %.
```catala
scope IncomeTaxComputation:
label article_2
definition tax_rate equals 20%
```
## Test
```catala
declaration scope Test:
output computation content IncomeTaxComputation
scope Test:
definition computation equals
output of IncomeTaxComputation with {
-- individual:
Individual {
-- income: $20,000
-- number_of_children: 0
}
}
```