Tutorial : computing your taxes
Welcome to this tutorial, whose objective is to guide you through the features of the Catala language and teach you how to annotate a simple legislative text using the language, and get out an executable program that compute your taxes!
This tutorial does not cover the installation of Catala. For more information
about this to the Getting started chapter. If you
want follow this tutorial locally, simply create an empty file with the
extension .catala_en
, which you will be filling as you read the tutorial by
copy-pasting the relevant section.
At any point, please refer to the Catala syntax cheat sheet or the reference guide for an exhaustive view of the syntax and features of Catala; this tutorial is rather designed to ease you into the language and its common use patterns.
Mixing law and code
Catala is a language designed around the concept of literate programming, that is the mixing between the computer code and its specification in a single document. Why literate programming? Because it enables a fine-grained correspondance between the specification and the code. Whenever the specification is updated, knowing where to update the code is trivial with literal programming. This is absolutely crucial for enabling long-term maintenance of complex and high-assurance programs like tax or social benefits computation.
Hence, a Catala source code file looks like a regular Markdown
document, with the specification written down and styled as Markdown text,
with the Catala code only present in well-bounded Catala code blocks introduced
by ```catala
.
Before writing any Catala code, we must then introduce the specification of the code for this tutorial. This specification will be based on a fictional Tax Code defining a simple income tax. But in general, anything can be used as a specification for a Catala program: laws, executive orders, court cases motivations, legal doctrine, internal instructions, technical specifications, etc. These sources can also be mixed to form a complete Catala program that relies on multiple sources of specification. Concretely, just copy-paste the text of the specification and format it in Markdown syntax inside a Catala source code file.
Without further ado, let us introduce the first bit of specification for our fictional income tax, Article 1 of the CTTC (Catala Tutorial Tax Code):
Article 1
The income tax for an individual is defined as a fixed percentage of the individual’s income over a year.
The spirit of writing code in Catala is to stick to the specification at all times in order to put the code snippets where they belong. Hence, we will introduce below the Catala code snippets that translate Article 1, which should be put just below Article 1 in the Catala source code file.
These code snippets should describe the program that computes the income tax, and contain the rule defining it as a multiplication of the income as rate. It is time to dive into Catala as a programming language.
# We will soon learn what to write here in order to translate the meaning
# of Article 1 into Catala code.
# To create a block of Catala code in your file, bound it with Markdown-style
# “```catala” and “```” delimiters. You can write comments in Catala code blocks
# by prefixing lines with “#”
Setting up data structures
The content of Article 1 assumes a lot of implicit context: there exists an individual with an income, as well as an income tax that the individual has to pay each year. Even if this implicit context is not verbatim in the law, we have to explicit it in the computer code, in the form of data structures and function signatures.
Catala is a strongly-typed, statically compiled language, so all data structures and function signatures have to be explicitly declared. So, we begin by declaring the type information for the individual, the taxpayer that will be the subject of the tax computation. This individual has an income and a number of children, both pieces of information which will be needed for tax purposes :
# The name of the structure, “Individual”, must start with an
# uppercase letter: this is the CamelCase convention.
declaration structure Individual:
# In this line, “income” is the name of the structure field and
# “money” is the type of what is stored in that field.
# Available types include: “integer”, “decimal”, “money”, “date”,
# “duration”, and any other structure or enumeration that you declare.
data income content money
# The field names “income” and “number_of_children” start by a lowercase
# letter, they follow the snake_case convention.
data number_of_children content integer
This structure contains two data fields, income
and number_of_children
.
Structures are useful to group together data that goes together. Usually, you
get one structure per concrete object on which the law applies (like the
individual). It is up to you to decide how to group the data together, but we
advise you to aim at optimizing code readability.
Sometimes, the law gives an enumeration of different situations. These enumerations are modeled in Catala using an enumeration type, like:
# The name “TaxCredit” is also written in CamelCase.
declaration enumeration TaxCredit:
# The line below says that “TaxCredit” can be a “NoTaxCredit” situation.
-- NoTaxCredit
# The line below says that alternatively, “TaxCredit” can be a
# “ChildrenTaxCredit” situation. This situation carries a content
# of type integer corresponding to the number of children concerned
# by the tax credit. This means that if you’re in the “ChildrenTaxCredit”
# situation, you will also have access to this number of children.
-- ChildrenTaxCredit content integer
In computer science terms, such an enumeration is called a “sum type” or simply an enum. The combination of structures and enumerations allow the Catala programmer to declare all possible shapes of data, as they are equivalent to the powerful notion of “algebraic data types”.
Notice that these data structures that we have declared cannot always be attached naturally to a particular piece of the specification text. So, where to put these declarations in your literate programming file? Since you will be often going back to these data structure declarations during programming, we advise you to group them together in some sort of prelude in your code source file.
Scopes as basic computation blocks
We’ve defined and typed the data that the program will manipulate. Now we have to define the logical context in which this data will evolve. Because Catala is a functional programming language, all code exists within a function. And the equivalent to a function in Catala is called a scope. Every scope has a name, input variables (similar to function arguments), internal variables (similar to local variables), and output variables (that together form the return type of the function). For instance, Article 1 defines a scope for computing the income tax:
declaration scope IncomeTaxComputation:
# Scope names use CamelCase.
input individual content Individual
# This line declares a scope variable of the scope, which is akin to
# a function parameter in computer science term. This is the piece of
# data on which the scope will operate.
internal fixed_percentage content decimal
output income_tax content money
The scope is the basic abstraction unit in Catala programs, and scopes can be composed. Since a function can call other functions, scopes can also call other scopes. We will see later how to do this, but first let us focus on the inputs and outputs of scopes.
The declaration of the scope is akin to a function signature: it contains a list
of all the arguments along with their types. But in Catala, scopes’ variables
can be input, local or output. input
means that the variable is provided
whenever the scope is called, and cannot be defined within the scope. internal
means that the variable is defined within the scope and cannot be seen from
outside the scope; it’s not part of the return value of the scope. output
means that a caller can retrieve the computed value of the variable. Note that a
variable can also be simultaneously an input and an output of the scope, in that
case it should be annotated with input output
.
Once the scope has been declared, we can use it to define our computation rules and finally code up Article 1!
Defining variables and formulas
Article 1 actually gives the formula to define the income_tax
variable of
scope IncomeTaxComputation
.
Article 1
The income tax for an individual is defined as a fixed percentage of the individual’s income over a year.
scope IncomeTaxComputation: definition income_tax equals individual.income * fixed_percentage
Let us unpack the code above. Each definition
of a variable (here,
income_tax
) is attached to a scope that declares it (here,
IncomeTaxComputation
). After equals
, we have the actual expression for the
variable : individual.income * fixed_percentage
. The syntax for formulas uses
the classic arithmetic operators. Here, *
means multiplying an amount of
money
by a decimal
, returning a new amount of money
. The exact behavior of
each operator depends on the types of values it is applied on. For instance,
here, because a value of the money
type is always an integer number of cents,
*
rounds the result of the multiplication to the nearest cent to provide the
final value of type money
(see the FAQ for more information
about rounding in Catala). About individual.income
, we see that the .
notation
lets us access the income
field of individual
, which is actually a structure
of type Individual
.
However, at this point we’re still missing the definition of fixed_percentage
.
This is a common pattern when coding the law: the definitions for various
variables are scattered in different articles. Fortunately, the Catala compiler
automatically collects all the definitions for each scope and puts them
in the right order. Here, even if we define fixed_percentage
after
income_tax
in our source code, the Catala compiler will switch the order
of the definitions internally because fixed_percentage
is used in the
definition of income_tax
. More generally, the order of toplevel definitions
and declarations in Catala source code files does not matter, and you can
refactor code around freely without having to care about dependency order.
In this tutorial, we’ll suppose that our fictional CTTC specification defines the percentage in the next article. The Catala code below should not surprise you at this point.
Article 2
The fixed percentage mentioned at article 1 is equal to 20 %.
scope IncomeTaxComputation: # Writing 20% is just an alternative for the decimal “0.20”. definition fixed_percentage equals 20 %
Conditional definitions and exceptions
So far so good, but specifications coming from legal text do not always neatly combine articles dans variable definitions. Sometimes, and this is a very common pattern, a later article redefines a variable already defined previously, but with a twist in a certain exceptional situation. For instance, Article 3 of CTTC:
Article 3
If the individual is in charge of 2 or more children, then the fixed percentage mentioned at article 1 is equal to 15 %.
This article actually gives another definition for the fixed percentage, which
was already defined in article 2. However, article 3 defines the percentage
conditionally to the individual having more than 2 children. How to redefine
fixed_percentage
? Catala allows you precisely to redefine a variable under a
condition with the under condition ... consequence
syntax:
scope IncomeTaxComputation:
definition fixed_percentage under condition
individual.number_of_children >= 2
consequence equals 15 %
What does this mean? If the individual has more than two children, then
fixed_percentage
will be 15 %
. Conditional definitions let you define
your variables piecewise, one case at a time; the Catala compiler stitches
everything together for execution. More precisely, at runtime, we look at
the conditions of all piecewise definitions for a same variable, and pick
the one that is valid.
But what happens if no conditional definition is valid at runtime? Or multiple valid definitions at the same time? In these cases, Catala will abort execution and return an error message like the one below:
┌─[ERROR]─
│
│ During evaluation: conflict between multiple valid consequences for assigning the same variable.
│
├─➤ tutorial_en.catala_en
│ │
│ │ definition fixed_percentage equals 20 %
│ │ ‾‾‾‾
├─ Article 2
│
├─➤ tutorial_en.catala_en
│ │
│ │ consequence equals 15 %
│ │ ‾‾‾‾
└─ Article 3
If the specification is correctly drafted, then these error situations should
not happen, as one and only one conditional definition should be valid at all
times. Here, however, our definition of fixed_percentage
conflicts with the
more general definition that we gave above. To correctly model situations like
this, Catala allows us to define precedence of one conditional definitions
over another. It is as simple as adding exception
before the definition.
For instance, here is a more correct version of the code for Article3 :
Article 3
If the individual is in charge of 2 or more children, then the fixed percentage mentioned at article 1 is equal to 15 %.
scope IncomeTaxComputation: exception definition fixed_percentage under condition individual.number_of_children >= 2 consequence equals 15 %
With exception
, the conditional definition at Article 3 will be picked over
the base case at Article 1 when the individual has two children or more. This
exception
mechanism is modeled on the logic of legal drafting: it is the key
mechanism that lets us split our variables definition to match the structure of
the specification. Without exception
, it is not possible to use the literate
programming style. This is precisely why writing and maintaining computer
programs for taxes or social benefits is very difficult with mainstream
programming languages. So, go ahead and use exception
as much as possible,
since it is a very idiomatic Catala concept.
Composing scopes and functions together
Catala is a functional language and encourages using functions to describe relationships between data. As part of our ongoing CTTC specification, we will now imagine an alternative tax system with two progressive brackets. This new example will illustrate how to write more complex Catala programs by composing abstractions together.
First, let us start with the data structure and new scope for our new two-brackets tax computation.
# This structure describes the parameters of a tax computation formula that
# has two tax brackets, each with their own tax rate.
declaration structure TwoBrackets:
data breakpoint content money
data rate1 content decimal
data rate2 content decimal
declaration scope TwoBracketsTaxComputation:
# This input variable contains the description of the
# parameters of the tax formula.
input brackets content TwoBrackets
# But for declaring the tax_formula variable, we declare it as
# a function: “content money depends on income content money” means a function
# that returns money as output (the tax) and takes the “income” money
# parameter as input.
output tax_formula content money depends on income content money
The scope TwoBracketsTaxComputation
is a generic scope that takes as input
the parameters of a two-brackets tax computation, and returns a function
that effectively computes the amount of tax under this given two-brackets
system. Passing around functions as values is a powerful tool to gain
in expressivity when programming, and reduce code duplication and boilerplate
to instantiate the same concept multiple times. More importantly, functions
as values allow us to sometimes stick closer to the text of the specification
that might be very general. Imagine the following Article 4 of the CTTC:
Article 4
The tax amount for a two-brackets computation is equal to the amount of income in each bracket multiplied by the rate of each bracket.
scope TwoBracketsTaxComputation : # This is the formula for implementing a two-brackets tax system. definition tax_formula of income equals if income <= brackets.breakpoint then income * brackets.rate1 else ( brackets.breakpoint * brackets.rate1 + (income - brackets.breakpoint) * brackets.rate2 )
The above formula for the two-brackets tax system computation also introduces
the if ... then ... else ...
syntax, that is still available in Catala even if
the language encourages the use of conditional definitions and exceptions. Here,
we could have defined tax_formula
with two conditional definitions, one for
income <= brackets.breakpoint
, and one for income > brackets.breakpoint
.
However, the legal specification here does not split the definition of
tax_formula
in two different article, so it does not make sense to split the
definition of tax_formula
in the code.
More generally, a rule of thumb for deciding when to split or not variable
definitions into conditional definitions is to simply follow what the
specification text does. If the specification text splits it definition in
multiple paragraphs or sentence, then you can annotate each paragraph or
sentence with the corresponding conditional definition. But if the specification
introduces the definition in a single block of text, then no need to split the
code. For instance, it is more compact to translate a table of values in a
specification with if ... then ... else ...
statements than conditional
definitions, so it may be better to proceed that way.
Now that we’ve defined our helper new scope for computing a two-brackets tax, we want to use it in our main tax computation scope. As mentioned before, Catala’s scope can also be thought of as functions. And sometimes, the specification does implicity translates into a function call, like the article below.
Article 5
For individuals in charge of zero children, the income tax of Article 1 is defined as a two-brackets computation with rates 20% and 40%, with an income breakpoint of $100,000.
To translate Article 5 into Catala code, we need the scope IncomeTaxComputation
to call the scope TwoBracketsTaxComputation
. One way to write that is to
declare TwoBracketsTaxComputation
as a static sub-scope of IncomeTaxComputation
.
This is done by updating the declaration of IncomeTaxComputation
and
adding a line for the TwoBracketsTaxComputation
sub-scope:
declaration scope IncomeTaxComputation:
# This line says that we add the “two_brackets” as a scope variable.
# However, the “scope” keyword tells that this item is not a piece of data
# but rather a subscope that we can use to compute things.
two_brackets scope TwoBracketsTaxComputation
input individual content Individual
output income_tax content money
two_brackets
is thus the name of the sub-scope call and we can provide its
arguments to code up the two-brackets computation parameters set by Article 5:
scope IncomeTaxComputation : # Since the subscope “two_brackets” is like a function we can call, # we need to define its arguments. This is done below with the only # parameter “brackets” of sub-scope call “two_brackets” : definition two_brackets.brackets equals TwoBrackets { -- breakpoint: $100,000 -- rate1: 20% -- rate2: 40% }
The sub-scope call two_brackets
now has data flowing in to
TwoBracketsTaxComputation
, letting it compute its output tax_formula
,
which is the function that we will use to compute the income tax in
the case of Article 5, that is when the individual has no children. As for
Article 3, we will use an exceptional conditional definition for income_tax
,
that makes use of two_brackets.tax_formula
:
scope IncomeTaxComputation: # The syntax of calling a function “f” with argument “x” is “f of x”. exception definition income_tax under condition individual.number_of_children = 0 consequence equals two_brackets.tax_formula of individual.income
The snippet of code below exceptionally calls the function
two_brackets.tax_formula
when the individual has no children; but
two_brackets.tax_formula
is itself the output of the scope
TwoBracketsTaxComputation
called as a sub-scope within IncomeTaxComputation
.
This pattern of scopes returning functions adheres to the spirit of functional
programming, where functions are passed around as values. We encourage you to
use this pattern for encoding complex specifications, as it is quite expressive,
and does not make use of shared mutable state in memory (which does not exist in
Catala anyway).
Complex exceptions patterns
With our last code snippet, note that we introduced our third conditional
definition for income_tax
: there is one base case, and two exceptions (one if
there is more than two children, another if there is zero children). So far,
the two exceptions have been simply declared with the exception
keyword. That
keyword alone suffices because there is only one base case that the exception
is refering to. However, sometimes the specification implicitly sets up
more complex exception patterns:
Article 6
Individuals earning less than $10,000 are exempted of the income tax mentioned at article 1.
At a first glance, this Article 6 merely defines another exceptional conditional
definition for variable income_tax
of scope IncomeTaxComputation
. But this
third exception is likely to conflict with the first one when the individual
earns less than $10,000, and has zero children! If such a conflict between
exceptions were to happen, the Catala program would crash with an error message
similar to the one we already saw when programming Article 3:
┌─[ERROR]─
│
│ During evaluation: conflict between multiple valid consequences for assigning the same variable.
│
├─➤ tutorial_en.catala_en
│ │
│ │ consequence equals two_brackets.tax_formula of individual.income
│ │ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
├─ Article 5
│
├─➤ tutorial_en.catala_en
│ │
│ │ consequence equals $0
│ │ ‾‾
└─ Article 6
In this situation, we need to prioritize the exceptions. This prioritization requires legal expertise and research, as it is not always obvious which exception should prevail in any given situation. Hence, Catala error messages indicating a conflict during evaluation are an invitation to call the lawyer in your team and have them interpret the specification, rather than fixing the conflict yourself.
Here, because Article 6 follows Article 5, and because it is more favorable to
the taxpayer to pay $0 in tax rather than the result of the two-brackets
computation, we can make the legal decision to prioritize the exception of
Article 6 over the exception of Article 5. Now, let us see how to write that
with Catala. Because Article 1 is the base case for the exception of Article 5,
and Article 5 is the base case for the exception of Article 6, we need to give
the definitions of income_tax
at Articles 1 and 5 labels so that the
exception
keywords in Article 5 and 6 can refer to those labels:
Article 1
The income tax for an individual is defined as a fixed percentage of the individual’s income over a year.
scope IncomeTaxComputation: label article_1 definition income_tax equals individual.income * fixed_percentage
Article 5
For individuals in charge of zero children, the income tax of Article 1 is defined as a two-brackets computation with rates 20% and 40%, with an income breakpoint of $100,000.
scope IncomeTaxComputation: label article_5 exception article_1 definition income_tax under condition individual.number_of_children = 0 consequence equals two_brackets.tax_formula of individual.income
Article 6
Individuals earning less than $10,000 are exempted of the income tax mentioned at article 1.
scope IncomeTaxComputation: exception article_5 definition income_tax under condition individual.income <= $10,000 consequence equals $0
At runtime, here is how Catala will determine which of the three definitions
to pick for income_tax
: first, it will try the most exceptional
exception (Article 6), and test whether the income is below $10,000;
if not, then it will default to the exception level below (Article 5),
and test whether there are no children; if not, it will default to the
base case (Article 1).
This scenario defines an “exception chain”, but it can get more complex than
that. Actually, Catala lets you define “exception trees” as big as you want,
simply by providing label
and exception
tags that refer to each other
for your conditional definitions. This expressive power will help you tame
the complexity of legal specifications and keep your Catala code readable
and maintainable.