Data_encoding.Compact
This module provides specialized encoding combinators that are implemented to reduce the size of the serialization result.
The main trick this module relies on is the notion of “shared tags”. In Data_encoding
, the union
combinator uses (at least) one byte every time it is used, to “tag” the output and distinguish between various disjunction cases. As a consequence, if n
union
are composed together to define one encoding, (at least) n
bytes are being allocated. However, in practice, only few bits are used in each tags, which means the rest is “wasted.”
As an example, consider this type:
type t =
| T1 of { f1 : int option; f2 : (int, bool) Either.t }
| T2 of { f3: int }
A value of t
using the constructor T1
will be serialized into a binary array of this form:
┌────────┬─────────┬─────────────┬─────────┬─────────────┐ │ tag(t) │ tag(f1) │ payload(f1) │ tag(f2) │ payload(f2) │ └────────┴─────────┴─────────────┴─────────┴─────────────┘ 1 byte 1 byte N bytes 1 byte M bytes
Where tag(f)
is a value used by Data_encoding
to distinguish between several encoding alternatives for f
, and payload(f)
is the resulting binary array.
For both option
and Either.t
, the tag of the encoding only uses one bit in practice. Which means that for T1
, encoding the pair (f1, f2)
needs two bits, but the default approach of Data_encoding
uses two bytes instead. Similarly, to distinguish between T1
and T2
needs one bit, but the default approach is to use an additional tag (one additional byte).
This module provides an approach to tackle this issue, by allocating only one tag (i.e., one byte) that is used to store the useful bits to distinguish between the disjunction cases. We call this tag the “shared tag” of the encoding. The bits of the shared tag describes precisely the layout of the encoded data.
For instance, considering a compact encoding for t
, the third bit of the tag can be used to distinguish between T1
and T2
. In case the third bit is 0, the first bit of the tag determines the case of option
, and the second the case of Either.t
.
As a consequence the resulting binary array for the constructor T1
is, using
_
to represent meaningless bits,0
and 1
to represent actual bit values,e
to represent the bit used to distinguish the Either
case of f2
, ando
to represent the bit used to distinguish the Option
case of f1
:┌──────────┬─────────────┬─────────────┐ │ _____0eo │ payload(f1) │ payload(f2) │ └──────────┴─────────────┴─────────────┘ 1 byte N bytes M bytes
while the resulting binary array for the constructor T2
is
┌──────────┬─────────────┐ │ _____100 │ payload(f3) │ └──────────┴─────────────┘ 1 byte N bytes
The description of a compact encoding.
Turn a compact encoding into a regular Data_encoding.t
.
val tag_bit_count : 'a t -> int
tag_bit_count c
is the number of bits of tag that a compact encoding uses.
Similarly to Data_encoding
, we provide various combinators to compose compact encoding together.
A compact encoding used to denote an impossible case inside of conjunction operators such as union
.
Uses 0 bit of tag.
val refute : void -> 'a
refute x
can be used to refute a branch of a match
which exhibits a value of type void
.
val unit : unit t
A compact encoding of the singleton value unit
, which has zero memory footprint.
Uses zero (0) bits of tag.
In JSON it is represented as the empty object {}
.
val null : unit t
A compact encoding of the singleton value unit
, which has zero memory footprint.
Uses zero (0) bits of tag.
In JSON it is represented as null
.
val bool : bool t
Efficient encoding of boolean values. It uses one (1) bit in the shared tag, and zero bit in the payload.
payload encoding
unconditionally uses encoding
in the payload, and uses zero (0) bit in the shared tag.
conv ?json f g e
reuses the encoding e
for type b
to encode a type a
using the isomorphism (f, g)
. The optional argument allows to overwrite the encoding used for JSON, in place of the one computed by default.
tup1 e
wraps the underlying encoding of e
in a tup1
(from the parent module). This is only useful in that, provided you use make ~tag_size:`Uint0
to translate the returned compact encoding, it allows you to call merge_tups
on it.
Uses as many bits of tag as e
.
tup2 e1 e2
concatenates the shared tags and payloads of e1
and e2
.
Uses as many bits of tags as the sum of the tags used by its arguments.
tup3 e1 e2 e3
concatenates the shared tags and payloads of e1
, e2
, and e3
.
Uses as many bits of tags as the sum of the tags used by its arguments.
tup4 e1 e2 e3 e4
concatenates the shared tags and payloads of e1
, e2
, e3
and e4
.
Uses as many bits of tags as the sum of the tags used by its arguments.
req "f" compact
can be used in conjunction with objN
to create compact encoding with more readable JSON encoding, as an alternative of tupN
. The JSON output is a dictionary which contains the field f
with a value encoded using compact
.
Same as req
, but the field is optional.
An objN
compact encoding uses as many bits of tags as its number of opt
fields.
obj1
can be used in conjunction with req
or opt
to produce more readable JSON outputs.
Uses as many bits of tags as there are opt
fields in its arguments.
val int32 : int32 t
A compact encoding for int32
values. It uses 2 bits in the shared tag, to determine how many bytes are used in the payload:
00
: from 0 to 255, one byte.01
: from 256 to 65,535, two bytes.10
: from 65,536 to Int32.max_int
and for negative values, four bytes.Note that by itself, this compact encoding is not necessarily more economical in space. However, in combination with other compact encodings (say, when you have two bits of tag to spare anyway) or given a very skewed distribution of values (say, when the vast majority of your values are in the 0–255 interval), then it can help you save some space.
Uses two (2) bits of tag.
val int64 : int64 t
A compact encoding for int64
values. It uses 2 bits in the shared tag, to determine how many bytes are used in the payload:
00
: from 0 to 255, one byte.01
: from 256 to 65,535, two bytes.10
: from 65,536 to 4,294,967,295 four bytes.11
: from 4,294,967,295 and for negative values eight bytes.See int32
for usage recommendations.
Uses two (2) bits of tag.
list ~bits:n encoding
uses n
bits in the shared tag to encode the size of small lists.
For instance, list ~bits:2 encoding
,
00
: the payload is empty, because it is the empty list01
: the singleton list, whose element is encoded using encoding
.10
: a list of two elements encoded with encoding
11
: a list of more than two elements, prefixed with its encoded size (i.e., the number of bytes it takes to represent the whole value) (which uses 4 bytes)With ~bits:3
, lists of 0 to 6 items are encoded with tags 000
to 110
, and lists of 7 or more are encoded with tag 111
and the length.
Uses n
bits of tags.
val case :
title:string ->
?description:string ->
'b t ->
('a -> 'b option) ->
('b -> 'a) ->
'a case
Usage: case name encode decode encoding
Intended to be used inside a union
.
union cases
creates a new compact encoding to encompass a disjunction of cases.
The value uses some tag bits to distinguish the different cases of the union (see discussion of parameter union_tag_bits
) and some tag bits (potentially 0) to distinguish the values within a case (see discussion of parameter cases_tag_bits
).
E.g., Given type t = A of bool | B of int option
and the encoding
let c = union [ case "A" (function A b -> Some b | _ -> None) (fun b -> A b) bool; case "B" (function B i -> Some i | _ -> None) (fun i -> B b) (option (payload int)); in make ~tag_size:`Uint8 c
then a value can have either of the following 4 tags:
A
, false
A
, true
B
, Some
(a payload of 4 bytes follows)B
, None
In other words, the second bit of this tag is used to discriminate the cases of the union, whilst the first bit is used to discriminate within each case.
Note that the compact union can be combined with more compact encoding before being passed to make
in which case the two bits of tags will be combined with the tags of the other compact encodings. E.g., make ~tag_size:`Uint8 (tup2 c c)
.
val void_case : title:string -> 'a case
void_case ~title
is an impossible case. It is provided so you can add unused tags within a union. E.g., union [case _; void_case ~title:"reserved-for-v04-compatibility"; case _; case _]
uses two bits of tag for the discrimination of the union, but the tag 01
is unused (reserved for some version compatibility).
module Custom : sig ... end
This module can be used to write compact encoding for complex types without relying on the existing combinators.