Building complex apps on Tezos / LIGO using Lazily loaded code

16 min readMar 6, 2021

NOTE: This article is a product of its time, both Tezos and the LIGO language have changed significantly since. Today one might want to consider alternate approaches to building complex smart contracts.

For those new to the party, Tezos is a liquid Proof-of-Stake blockchain with a smart contract VM developed with security and formal verifiability in mind [unfortunately plagued by chronic mismanagement]. While Ethereum famously took the direction of choosing implementation simplicity above eveything else, the guiding direction of Tezos has been, and is, one of security. If we gave chains descriptive nicknames like Pokemon, Tezos would be “the formal verification blockchain”.

My purpose with this article, and hopefully those following it, is to document some of the challenges we face with our team at [redacted] building real-world distributed applications on the Tezos platform, and the solutions we have discovered or brainstormed.

I hope this can help newcomers to the ecosystem bootstrap their projects with less trial and error.

(Note: Our shop made the decision to use the LIGO smart contract language, using its OCaml-inspired CameLIGO syntax. The decision was mainly based on personal tastes — we’re mainly FP people — , but LIGO has proven itself to be a comfortable and extremely readable language to work in. All code snippets and examples are in CameLIGO.)

1. Introduction

1.1 Contract size — hard cap and cost

Tezos has a hard cap on transaction size, and thus smart contract code size — currently set at 16 kBytes.

While in general it’s good advice to keep smart contract size as small as possible, code that’s just plain more than 16k does happen in the real world. And even under that limit, the larger a contract or its state vector is, the more gas a caller must pay just for deserializing data from the chain, and initializing the Michelson VM. Nothing had actually been run yet, this is just setting the environment up, and it’s already costing gas — the larger the contract or the state vector is, the more it costs.

The only exception to this is the lazily loaded big_map type. Given that Michelson (the Tezos VM’s low-level language) supports functional features like lambdas, a potential solution to this code size problem is to store code as data in a lazily loaded big_map. That is what this article is about.

1.2 Our example repo

Halfway through the investigation into how to achieve this goal in a safe manner, I realized I was developing a framework, even if it’s a relatively unwieldy and unproven framework at the moment. It’s a repeatable, uniform pattern of doing things, invariant over the domain of whatever business logic one might wish to implement.

You are welcome to peruse our repository of examples. Just don’t forget your Pokemon flute. ;)

https://github.com/tzConnectBerlin/Snorlax

The repository contains two examples. A simpler, more elegant solution nicknames “Munchlax”, and a more flexible one “Snorlax”. The functional difference between the two is that Snorlax allows lambdas to lazily load other lambdas from storage.

The actual business logic implemented in the examples is intentionally of extreme simplicity, with a single nat serving as business storage. To keep things readable and to the point.

2. Some personal notes from our journey

2.1 Why store code as data?!

When we asked the maintainers of the LIGO compiler about how to do laizly loaded lambdas stored in big maps, their answer was something along the lines of “the way to do that is not to do that”. Lazily loaded code in smart contracts tends to be viewed with scorn, for good reasons. It has security and auditability implications, it comes with an overhead, both in gas cost, code size and code complexity, and it just doesn’t intuitively sound like a good idea at all. So why do it then?

I don’t want to endorse this approach as the end-all of complex decentralized application development on Tezos, but it is a powerful tool with benefits, and its downsides can be mitigated.

So… why monolithic contracts?

Having a monolithic contract means having a single shared storage for all of one’s logic. For certain applications, this might be an extremely desirable property.
Having a monolithic contract means not having to worry about the intricacies of contract interoperation, which comes with its own security, debugging and auditing challenges.

And, having monolithic contracts, why lazily loaded code?

In codebases under 16kByte, lazily loading code heavily reduces the static cost of deserialization.
Codebases over 16kbyte can only be monolithic if using some form of lazy loading.

While Tezos has global functions (ie. libraries) on its roadmap, which would provide another approach to this problem, at the moment using lambdas to lazily load code is the only existing solution within these self-imposed constraints.

2.2 How we ran out of space while trying to save space

Our first, naive experiment implementing something like this ended in severe disappointment. Even though we stripped out all the actual logic from our contract, the bare skeleton of it ended up ballooning in size, dwarfing the original number we wished to reduce. The reason for this is that Michelson does not have type aliases. All complex types need to be declared from the ground up, in terms of built-in types.

In Tezos, a smart contract entrypoint takes as its arguments the call argument and a portion of the contract state vector, and returns a list of operations (such as transferring XTZ or calling other contracts) and the new contract state vector. Given that large contracts tend to have large state vector types, this can really add up quickly… especialy since storing an endpoint in a big_map means another layer of repeated type definitions. So clearly having a lambda type for each entrypoint, and having a big_map for each lambda type wasn’t going to cut it.

2.3 Lazy typing in Michelson

It turned out however, that having separate types for each entrypoint wasn’t necessary at all.

While Michelson is strongly typed on principle, and this strong typing is one of its main advantages, it does leave us a small escape hatch: the PACK and UNPACK opcodes. This makes it possible to take a strongly typed value on the stack, and pack it into a bytes entity. Then, before consuming the value, one can unpack the bytes into a specified expected type. The result is an option either containing Some of the expected type if the unpacking and type check succeeded, or None if we were expecting something that wasn’t there.

In essence, this turns Michelson into a lazily typed language. The most immediately noticeable tradeoff is the loss of typechecking at compile time. The check will happen, but only at runtime, and while the language does force the programmer to think proactively about the check failing, we ultimately end up with the possibility of a runtime type error instead of a compile-time one. In our case, we are quite confident that the tradeoff is a good one.

2.4 There can be only one (endpoint type)

We were pointed to a LIGO codebase that used lazy loaded functions, making use of PACK and UNPACK for avoiding the very issues we have encountered. It gave us the necessary push in the right direction, even if it also gave us some food for thought about an antipattern I’ll mention later on.

The fundamental trick they did was to replace many lambda types with a single one. The only type difference between business endpoint calls is their call argument. Ultimately, they all operate on the same storage, and return the same operation list and storage pair. Which is good, because a big_map cannot be PACK’ed, and the business storage of any self-respecting contract will contain a big_map or two. On the other hand, call arguments will always be packable, since big maps cannot be sent as literals in a call.

This means we can turn…

type foo_lambda = (foo_params * storage) ->
  (operation list * storage)
type bar_lambda = (bar_params * storage) ->
  (operation list * storage)

into…

type endpoint_lambda = (bytes * storage) ->
  (operation list * storage)

This cuts down massively on code size in the main portion of the contract, with a small overhead in the lambda functions themselves. It’s not simply a good tradeoff, but an essential one, if we are to have any hope of this be feasible. Here’s what lazy unpacking in an implementation of the endpoint foo would look like:

let foo : endpoint_lambda =
  fun (params, storage : bytes * storage) ->
   let params = (Bytes.unpack params) : foo_params option in
   let params = match params with
   | None -> ( failwith "Lazy type error" : foo_params )
   | Some p -> p in
   do_foo_stuff ( params, storage )

The reason this function is defined in this backhanded manner is to enforce a type check on it at compile time — that it’s indeed of type endpoint_lambda.

2.5 Should lazy code be able to call other lazy code?

This is where we came upon a crossroads of sorts. Should lazy code be given access to the big_map of lambdas, and allowed to call other lazy lambdas? This could, in the right hands, massively cut down on code size on-chain. It’s like dynamic linking versus static linking from the C/C++ world — do we optimize for size or simplicity?

Allowing this to happen does mean that the lambdas need to be packed. Instead of a (string, endpoint_lambda) big_map, we need a (string, bytes) big_map. As Michelson does not have type aliases, it also doesn’t allow recursive types. This means a further layer of lazy typing, and code being stored on-chain in an opaque bytestream format, and that auditing the contract will have to be a much more involved process. It’s just not elegant.

Still, we do provide an example of this in action, because the size benefits can be potentially massive. whether you should consider this solution for your project is entirely up to your specific needs, and what you’re optimizing for. We’re going with the simpler solution.

2.6 Lazy loading and auditability

It’s a recurring joke in blockchain circles that a smart contract is neither smart, nor a contract. While they do tend to be quite dumb, I disagree with the second half of the statement. A smart contract, while not a legal contract, is in fact a contract between parties — between the decentralized entity represented by the code, and any users interacting with it. Auditability and verifiability are king — in Tezos land, even more so than normally. You know, it is “the formal verification blockchain” after all.

We have to consider how storing large portions of contract code in a non-iterable dictionary affects contract readability, auditability, and ultimately, security. One can no longer just get the Michelson code of the contract, analyze it, and know exactly what to expect from it. And this is where we first diverged massively from the LIGO example we were shown in the wild.

The example — a live example as far as we understand — defined the main entrypoint function somewhat like this:

// WARNING: INSECURE CODE!!!!
// ONLY INCLUDED AS AN EXAMPLE OF WHAT not TO DO!type runEntrypointLambdaParameter = {
  lambdaName: string;
  lambdaParameter: bytes; }let main = (action, storage : action * storage) :
  operation list * storage =
  match action with
  | RunEntrypointLambda runEntrypointLambdaParameter ->
    runEntrypointLambda (runEntrypointLambdaParameter, storage)

While this is enticingly simple, let’s consider the implications. We have an endpoint where an arbitrary lambda in storage may be called by name from a big_map, a lazily loaded dictionary with no way of enumerating its content.

Without datamining the internal data representation of a tezos node, or enumerating all the transactions ever executed on the contract right from its origination, and checking what had been put in that big_map over the entire course of history, there is simply no way of knowing what’s exactly in there. That’s far too much due diligence to expect from anyone, and for all a user knows, the function stealAllFundsFromContract could be sitting there silently, waiting to be called by someone in the know.

Beside this clear violation of security by design principles, this overly flexible implementation also carries the quality of life cost of not knowing how to interact with the contract from reading it… Not to mention having to PACK the argument on the clientside.

We are of the belief that the facade of a smart contract should be fully specified, with no lazy typing whatsoever. All callable endpoints, and all call parameter types should be defined, and in full view. So even if there was a malicious lambda sitting in the big_map, a user can be assured relatively easily that no execution path exists where it could ever be called.

This is the pattern for our main entrypoint that we have decided on:

type main_action =
  | Installer of installer_action
  | Foo of foo_params
  | Bar of bar_paramslet main (action, storage : action * storage)
  : operation list * container_storage =
  match action with
  | Installer params -> installer (params, container_storage)
  | Foo params ->
    endpoint_dispatch ( "foo", Bytes.pack params, storage )
  | Bar params ->
    endpoint_dispatch ( "bar", Bytes.pack params, storage )

This way, the lambdas that can be called from the main dispatcher are fixed and static, and we retain the benefit of strong typing and well-defined entrypoints.

2.7 The installer

There are two reasons why you might want lazily loaded code. One reason is to make a large (but still deployable) smart contract cheaper to call. In this case one can just deploy the lazily loaded code as a big_map literal in the origination transaction. The other reason, oversized contracts, requires the ability to build the contract over multiple transactions, requiring an installer endpoint:

type lambda_repository =
[@layout:comb]
{
 creator : address;
 lambda_map : (string, endpoint_lambda) big_map;
}let install_lambda ( params, storage
  : install_params * lambda_repository ) : lambda_repository =
  let u = assert_installer_access_control lambda_repository in
  let updated_map = Big_map.update params.name
    ( Some ( params.code ) ) storage.lambda_map in
  { storage with lambda_map = updated_map }

But, having an installer is also a potential attack vector. A malicious contract owner, or a third party attacker might “upgrade” (replace) a lazily loaded function with a compromised version. An initial audit might show that there is no way to steal funds from the contract, but if the code is editable after the fact, that audit cannot be conclusive. In a smart contract, an upgrade path is essentially a backdoor.

Locking down this route of attack requires having a so-called burnable fuse — the ability to forever seal write access to the lazily loaded code after installation had been finished. A contract with an unburned fuse must be viewed as insecure, and the first step of an audit has to be checking the state of the fuse.

The implementation itself can be relatively simple. Since we already need to store the contract creator’s address for controlling access to the installer, just by making this an option allows it to be cleared, to forever deny access to everyone:

type lambda_repository =
[@layout:comb]
{
 creator : address option;
 lambda_map : (string, endpoint_lambda) big_map;
}let assert_installer_access_control
  ( storage : lambda_repository ) : unit =
  let creator = match storage.creator with
  | None -> ( failwith "Access denied: contract sealed" : address )
  | Some addr -> addr in
  if ( creator <> Tezos.sender ) then
    ( failwith "Access denied: unauthorized caller" : unit )
  else
    unitlet seal_contract ( storage : lambda_repository )
  : lambda_repository =
  let u = assert_installer_access_control lambda_repository in
  { storage with creator = ( None : address option ) }

2.8 Separating business and housekeeping

Beside correct access control, it’s important to define the contract in a way that reduces the chance of security-breaking bugs, and makes auditing easier. Grabbing the low-hanging fruit, we explicitly separated the housekeeping part of the contract state — the big_map of lazily loaded functions, and the state of the installer — , from the business storage, the data that the contract was created to handle in the first place.

By only allowing lazily loaded endpoints to mutate the business state, and not the big_map of lambdas, even a surface-level audit can ensure that the code of the contract is static and immutable:

let endpoint_dispatch ( lambda_id, params, storage
  : string * bytes * storage ) : operation list * storage =
  let business_storage = storage.business_storage in
  let operations, new_storage = run_endpoint_lambda (
    lambda_id, packed_params, business_storage ) in
  operations, { storage with business_storage = new_storage }

As an aside, this separation proved to be quite important in our experiments with contract interop. Ultimately, the utility of a framework isn’t only defined by what it allows one to do, but what it expressly disallows. Expressly disallowing lazily loaded code to mutate housekeeping state turned out to be incredibly beneficial for everyone involved. ;)

3. Our example frameworks

4.1 Boilerplate, macros and m4

When you check out our code on Github, please don’t be too taken aback by the fact that we ditched the LIGO preprocessor in favor of m4, an — at the time of writing — almost 45 year old macro language. Don’t ask why we specifically chose m4 — part of it clearly is that we’re hipsters.

M4 is immediately available on any Unix-flavored OS, and does everything we might want to ask of it, almost to the point of metaprogramming. The GNU C preprocessor was given a short consideration, and then promptly dropped as a candidate — while less intrusive than m4, it was far less powerful, and just too cumbersome to work with in conjunction with LIGO.

The reason we needed an alternative to the LIGO preprocessor was the lack of parametric macros. When looking at one lazy type, it doesn’t seem like much… But when working with a bunch of lazy types it really adds up, and boilerplate is never good to have in your codebase. As Jane Street OCaml guru Yaron Minksy said in a talk, writing the same code structure twice is one too many.

Using m4 opens up a bunch of avenues to massively reducing boilerplate, and is fully transparent — the preprocessing and compilation steps are entirely distinct, with the preprocessed code output to a file, inspectable and even more importantly, (formally) verifiable. The only downside is that some LIGO syntax highlighters seemingly take offense at this intrusion, and stop working — that said, CameLIGO syntax highlighting wasn’t quite up to par anyway, so this was a downside we are willing to live with.

Note that we use prefixed m4, and redefined the m4 quotes to guillemets («, ») to avoid clashes with LIGO code. All the m4_dnl statements you see are just for reducing excessive whitespace in the output. To see how m4 needs to be invoked to actually work with our source, see build/preprocess.sh

4.3 Source tree structure

The repository contains two source trees, one (‘Snorlax’) allows lazy code to call other lazy code, while the other (‘Munchlax’) is a cut down version that does not allow it. Both are split into three main parts: container, lazy and common.

Container has the code that will be deployed as the smart contract in the origination transaction. The contract’s argument type and main function are defined in main.mligo
Lazy has the lazily loaded functions. To make the build process simpler, each lambda is defined as the function f in its own separate file under lazy/lazy_lambdas
In the Snorlax example, since it allows multiple types for lambdas, each filename is prefixed with the lambda type.

Common has the types and helper functions used by both the container and lazy lambdas.

4.5 How to use our stuff to build your stuff

To write your own lazily loaded LIGO contract based on one of our foundations provided, you need to do the following:

Define your business storage and call parameters in the file common/business_interface.mligo.m4
Define your initial business state (at contract origination) in the file container/initial_business_storae.mligo.m4
Define your entrypoints in container/main.mligo.m4 and write dispatch calls for them in the main function.
Code your business logic as lazy lambdas under the folder lazy

When you’re adding new files, just copy this snippet to surround your code, and set the file name appropriately in the ifdef… It looks ugly, but it’s m4 and we did our best to tame it in the hour or so we wanted to spend on something so tangential to our actual goal.

m4_changequote m4_dnl
m4_changequote(«,») m4_dnl
m4_ifdef(«YOUR_FILENAME»,,«m4_define(«YOUR_FILENAME»,1) m4_dnl
m4_include(m4_helpers.m4) m4_dnl// YOUR CODE HERE») m4_dnl

To include files, don’t use m4_include… We had to write our own, smart file includer to actually understand the concept of a working directory, and behave the way you’d expect an include statement to behave. This is m4_loadfile — the syntax is m4_loadfile({relative path},{filename}).

The two macros you get for this trouble are LAZY_TYPE and, in Snorlax, DECLARE_LAMBDA. Here’s how you can use them.

LAZY_TYPE({type name}) defines an unpack_{type name} function that takes a bytes and returns the type or calls FAILWITH on a type mismatch. Beside using it to define an unpacker for the endpoint lambda type in Snorlax*, this should be used in the lazy tree to unpack the packed arguments in the endpoint lambdas.
* In Munchlax the lambdas aren’t packed themselves.
DECLARE_LAMBDA({lambda name},{argument type},{result type}) declares a lambda of the specified type, with the name {lambda name}_lambda and a helper function named run_{lambda name}_lambda, that can invoke a lambda of this type by name from the big_map.
It also defines the argument and result types as {lambda name}_params and {lambda name}_return, and via the LAZY_TYPE macro, declares a function unpack_{lambda name}_lambda.
Naturally, since there should be only one lambda type in the container contract, this is for defining lambdas called by other lambdas, under the lazy tree.

4.6 Real world optimizations

Our goal with Snorlax / Munchlax was mainly to hash out a proof of concept implementation, and there are a few things one might wish to consider when building real-world contracts, mainly around the installer logic.

In case the total code size is under 16kBytes, the entire installer should be stripped out, and the lazy lambdas written to storage as a big_map literal in the origination transaction. This cuts down the size of the main contract, while also limiting potential attack surfaces. There is absolutely no reason not to do this if you can.

In larger contracts, batch deployment can be cheaper and more efficient than doing a separate contract call for every single endpoint. Our goal was to demonstrate a principle in a simple and understandable manner. In a real world scenario, thought should be given to making the install endpoint capable of handling batches of lambdas, and to bring the contract up in as few transactions as possible.

Final thoughts

There are good reasons why one shouldn’t mix smart contract code and data, but sometimes real world requirements — such as sheer contract size, or transaction cost — do override this type of common sense caution.

I believe that the principles presented above significantly reduce the risk associated with lazily loading code from storage. Auditing is still more than possible, since reading the code and storage of a contract via tezos-client is extremely simple.

Our hope is that this blogpost and our examples will be of use to those working in the Tezos ecosystem, and will contribute to safer, better designed smart contracts.

Feedback and thoughts are more than welcome. Especially if you managed to find a major potential vulnerability or other risk in the approach presented, please file a bug report in the Github repository.

Tips accepted with gratitude at:
XTZ: tz1VdmiG2hfF4XmZz3Jfm8tUqfSMri1Xtzkc
ETH: 0x7cd9379B19E19c6dA303dEc60A14091cC472F59f