Module Append_only_file.Make

Abstraction for irmin-pack's append only files (i.e. suffix and dict).

It is parameterized with Io, a file system abstraction (e.g. unix, mirage, eio_linux).

It comprises a persistent file, an append buffer and take care of automatically shifting offsets to deal with legacy file headers.

Parameters

module Io : Io.S
module Errs : Io_errors.S with module Io = Io

Signature

module Io = Io
module Errs = Errs
type t
type auto_flush_procedure = [
  1. | `Internal
  2. | `External of t -> unit
]

auto_flush_procedure defines behavior when the flush threshold is reached.

  • Use `Internal to have the buffer automatically flushed.
  • Use `External f to have f called when the flush threshold is reached. It is the responsibility of f to call flush, in addition to any other processing it does.
val create_rw : path:string -> overwrite:bool -> auto_flush_threshold:int -> auto_flush_procedure:auto_flush_procedure -> (t, [> Io.create_error ]) Stdlib.result

Create a rw instance of t by creating the file at path.

val open_rw : path:string -> end_poff:Optint.Int63.t -> dead_header_size:int -> auto_flush_threshold:int -> auto_flush_procedure:auto_flush_procedure -> (t, [> Io.open_error | `Closed | `Invalid_argument | `Read_out_of_bounds | `Inconsistent_store ]) Stdlib.result

Create a rw instance of t by opening an existing file at path.

End Offset

The file has an end offset at which new data will be saved. While this information could be computed by looking at the size of the file, we prefer storing that information elsewhere (i.e. in the control file). This is why open_rw and open_ro take an end_poff parameter, and also why refresh_end_poff exists. The abstractions above Append_only_file are responsible for reading/writing the offsets from/to the control file.

dead_header_size

Designates a small area at the beginning of the file that should be ignored. The offsets start after that area.

The actual persisted size of a file is end_poff + dead_header_size.

This concept exists in order to keep supporting `V1 and `V2 pack stores with `V3.

Auto Flushes

One of the goals of the Append_only_file abstraction is to provide buffered appends. auto_flush_threshold is the soft cap after which the buffer should be flushed. When a call to append_exn fills the buffer, either the buffer will be flushed automatically, if auto_flush_procedure = `Internal, or the supplied external function f will be called, if auto_flush_procedure = `External f.

val open_ro : path:string -> end_poff:Optint.Int63.t -> dead_header_size:int -> (t, [> Io.open_error | `Closed | `Inconsistent_store | `Invalid_argument | `Read_out_of_bounds ]) Stdlib.result

Create a ro instance of t by opening an existing file at path

val close : t -> (unit, [> Io.close_error | `Pending_flush ]) Stdlib.result

Close the underlying file.

The internal buffer is expected to be in a flushed state when close is called. Otherwise, an error is returned.

val end_poff : t -> Optint.Int63.t

end_poff t is the number of bytes of the file. That function doesn't perform IO.

RW mode

It also counts the bytes not flushed yet.

RO mode

This information originates from the latest reload of the control file. Calling refresh_end_poff t updates end_poff.

val read_to_string : t -> off:Optint.Int63.t -> len:int -> (string, [> Io.read_error ]) Stdlib.result
val read_exn : t -> off:Optint.Int63.t -> len:int -> bytes -> unit

read_exn t ~off ~len b puts the len bytes of t at off to b.

read_to_string should always be favored over read_exn, except when performences matter.

It is not possible to read from an offset further than end_poff t.

Raises Io.Read_error and Errors.Pack_error `Read_out_of_bounds.

RW mode

Attempting to read from the append buffer results in an `Read_out_of_bounds error. This feature could easily be implemented in the future if ever needed. It was not needed with io_legacy.

val append_exn : t -> string -> unit

append_exn t ~off b writes b to the end of t. Might trigger an auto flush.

Grows end_poff, but the parent abstraction is expected to persist this somewhere (e.g. in the control file).

Post-condition: end_poff t - end_poff (old t) = String.length b.

Raises Io.Write_error

RW mode

Always raises Errors.RO_not_allowed

val flush : t -> (unit, [> Io.write_error ]) Stdlib.result

Flush the append buffer. Does not call fsync.

RO mode

Always returns Error `Ro_not_allowed.

val fsync : t -> (unit, [> Io.write_error ]) Stdlib.result

Tell the os to fush its internal buffers. Does not call flush.

RO mode

Always returns Error `Ro_not_allowed.

val refresh_end_poff : t -> Optint.Int63.t -> (unit, [> `Rw_not_allowed ]) Stdlib.result

Ingest the new end offset of the file. Typically happens in RO mode when the control file has been re-read.

RW mode

Always returns Error `Rw_not_allowed.

val readonly : t -> bool
val auto_flush_threshold : t -> int option
val empty_buffer : t -> bool
val path : t -> string