Last Updated:

Immutability in OCaml and Rust

A number of programming languages claim the immutability of variables as one of their main features. Among them are the ML family (OCaml, F#, Standard ML) and Haskell, as well as young Clojure and Rust. If you're unfamiliar with them, you've probably wondered: How is this different from const in C and C++? Let's find out.

 

We will write examples in OCaml and Rust to demonstrate the similarities and differences in the implementation of this idea in different languages. You can run examples in OCaml online on the try.ocamlpro.com website, and examples in Rust on play.rust-lang.org.

A Brief History of Variables

In the earliest period of computer history, when people wrote machine code, the entire organization of the program memory was on the conscience of the programmer and all addresses also had to be specified manually.

A little later, assemblers appeared that allowed you to specify symbol labels instead of numeric addresses. Let's take an example in the conditional assembly language and see what the output of the string will look like hello world in an infinite loop.

Any modern assembler for us will figure out how to place a string in memory hello world and machine instructions, and the label foo in jmp foo will replace the actual address of the instruction push msg in memory. Then the linker will substitute the function name instead of the name print her real address is in the library, but that's another story. This is the first level of abstraction compared to native code.

The first versions of Fortran and other early languages were more developed macro-assemblers than compilers in the modern sense. Even C at the time of its appearance translated each language operator into one or two PDP-11 machine instructions.

There is no memory security in assembly languages: you can write any data to the address of any label, and the consequences will appear only at run time. Since then, languages have evolved towards greater abstraction and expressiveness: it became possible to specify the meaning of variables and limit their possible values using types.

One thing remained unchanged: each variable name is associated with a specific memory area or at least the same data. Assigning a new value in imperative programming always erases old data in memory and replaces it with new data.

This causes the greatest difficulties when the compiler begins to apply optimizations to variables. If the contents of the memory can change at any time, it is not easy to judge whether the value of a variable can be lined up.

Tasks like undo and redo become even more complicated. If you are writing a text or image editor with the ability to undo changes, there are only two options in a language like C: store each version of the data or explicitly store a list of operations performed, such as DeleteLineRange(10,11) and ApplyFilter(Blur, radius=2) .

Even in simpler tasks, it may turn out that functions from the library modify existing data, and if the original data is still needed, it must be copied in its entirety.

Popularity copy. copy() and copy. deepcopy() in Python code is a clear confirmation of this.

Constants

 

The constant mechanism in languages like C is the first small step to immutable variables. if we write const int foo = 17 the compiler has a guarantee that the value associated with the foo name will never change at run time. this allows us to safely optimize the code so that the associations the name foo or the value 17 with some address in memory will not remain there - in all expressions like bar = foo*2, the word foo will simply be replaced with the value 17. With data of greater complexity and size, such naïve optimization no longer works, but the scope for optimizations is still greater than with modifiable ones Variables.

One main limitation remains: constant names are associated with certain values for the entire program or module. It is this restriction that removes immutable variables.

Associate names with values and scopes

The capabilities of languages usually do not work in isolation, but together. You can avoid constantly associating names with values if creating new scopes is simple and "cheap".

Often, syntax like let name = value and variations are used to bind a name to a value . Each binding opens a new scope. Let's look at the OCaml example.

Or a similar example to Rust.

this is a very simple example that differs from const in c only in that we did not have to invent a new name for each new value. in both cases the compiler understands that outside the scope of scope 0 (after the second let) the old x value no one uses it and the memory allocated for it can be safely freed up or not allocate memory for it dynamically at all.

Much more interesting are the cases when names are used anew, and old data remains alive in memory.

Circuit

 

Scopes are called lexical if the value of a variable is permanently determined by the place where it is defined and cannot change in the future. The use of such visibility regions makes possible closures—functions that capture variables from the context where they were created.

You can demonstrate this by first defining a variable and using it in a function, and then overriding and using it in another function. Here's an example on OCaml.

The program will output hello world first, then bye world. The greeting and farewell functions refer to the same name of the msg variable. The key point is that the expression let msg "bye world", which opens the scope of Scope 2, has no effect on previous scopes.

For the greeting function, the msg variable is associated with the hello world value, and this value will remain in the program's memory, although it is no longer associated with any name.

Syntax let greeting () = . . .  it is the syntactic sugar for associating an anonymous function with a name, and the "true form" of these expressions would be let greeting = fun () -> . . . . Since in OCaml and Haskell all functions are closures and capture values from their scope, there is no reason to use a longer form.

In Rust, there are both lexical and dynamic scopes in which the value of variables is determined by where the function is called. Functions created with the fn keyword behave like C-style functions, and attempts to reference a variable created with let will cause a compilation error (see rustc --explain E0434 ). You can create closures in it by explicitly associating an anonymous function with a name.

Here's an example of the syntax of anonymous functions in Rust.

[crayon-63002dc12f3e1008055731 inline="true" class="language-rust"]<span class="pun">|option|</span> <span class="pun">-></span> <span class= "pun">type</span> <span class="pun">{expression}</span>
 

[/crayon]
For functions that do not return useful values, the unit type with a single value is used. in OCaml, if we want to explicitly specify the type of the function, it will be called 
()unit.

In Haskell and Rust, the token is also used to name this type. In addition, in Rust, it is necessary to specify the type of anonymous functions, so the syntax of an anonymous function without a return value will () | | -> () { . . . }.

Using this knowledge, we will write an analogue of our program in Rust.

fn main() {
// Scope 0
let msg = "hello world";
let greeting = || ->() { println!("{}", msg) };


let msg = "by world";
// Scope 1
letfarewell=|| ->() { println!("{}", msg) };


welcome();
farewell();
}

If you run it, the result is the same as the OCaml example.

Closures as a form of encapsulation

 

The combination of closures with variables that can be changed allows encapsulation to be implemented without the need to resort to OOP tools and control over access to the fields of the object.

For the following example, we need to know that the variable (reference) in OCaml can be created by the function let myrref = ref $value, and its value can be obtained by the operator ! myref .

We'll write a function that creates a counter function that increments its value with each call:

let make_counter initial_value =
letcounter = refinitial_valuein
fun() -> let() = counter := !counter + 1 in !counter


let my_counter = make_counter 0


let() =
Printf.printf "%d\n" (my_counter());
Printf.printf "%d\n" (my_counter());
Printf.printf "%d\n" (my_counter())

When the program runs, we will see 1 2 3 in the output. Every time we call make_counter, it creates a variable and a function that increments it and returns a new value. However, from outside that counter function , nothing can change the counter value or read it without increasing it.

The same effect can be achieved without variables that can be modified, if you return not a single value, but a tuple from the new value and the new function.

Data Structures

So far, we've only worked with values of primitive types, and now let's look at the data structures. In OCaml, Haskell, and Rust, they are implemented using tuples and sum types (tagged union) like

As an example, consider a single-linked list. Linked lists are so widely used to build immutable data structures that OCaml and Haskell provide a built-in syntax for them. General sense: the list can be empty or consist of a head and tail - a pair of value and another list (perhaps empty). In conditional syntax, it would look like this.

In OSaml, you can write let xs = [1; 2; 3], which would be syntactic sugar for 1 :: (2 :: (3 :: [])).

Unlike lists in Python or Ruby, this implementation is a single-linked list and does not allow arbitrary access to any element (and even more so - to modify them). Memory security is the first obvious benefit.

A less obvious advantage is that creating a new list from an existing list by adding a new item to its beginning is safe for old data. Let's look at an example.

Since there is no way to change anything inside the original xs = [2; 3] Does not exist, in the program's memory the new ys list will consist of a value of 1 and a pointer to the old list xs. The compiler knows that this is safe, and does not allocate memory for the new list.

If the list items themselves are large, the memory savings with this approach can be quite significant. At the same time, storing previous versions of the list does not cause any difficulties, it is enough to save them in separate variables - in the memory of the program, this will again be just pointers to individual parts of the new large list.

The first obvious drawback is the greater algorithmic complexity of typical operations. For example, you can get an element with the number n only by passing the first n elements:

let rec nth n xs =
match xs with
| [] -> None
| y :: ys ->
if n = 0 then y
else nth (n - 1) ys

The second drawback is that some algorithms will lead to implicit copying of the entire list of values to a new memory location. For example, you can flip a list only by creating a new one based on the old one.

How significant these disadvantages are depends on the task and what we are comparing it to. Since OCaml, Haskell, and Rust are compiled into native code, and memory organization for these types is much more compact than for objects in Python or Ruby, performance may be higher, despite the formally greater complexity of the algorithm. Also, if the old list is still needed for other purposes, the memory would still have to be copied. Here, at least, it does not require our explicit involvement.

However, with creativity, you can improve the performance of immutable data structures and make it O(1) for certain operations. For example, the lack of quick random access to elements can be compensated for with so-called zippers. Zipper is a two-part tuple of the data structure and its element that we are currently working with. Here's an example for the list.

With the help of zippers, you can implement a program like a text editor or window manager in a purely functional style, without sacrificing much performance. When the user navigates to another line, the "window" of the zipper shifts, so the access time to the current row at any time is the same.

Conclusion

We have covered only the very basics, but I hope that this knowledge will help you in learning the now popular Rust or classic OCaml and Haskell.