Module Tezt_mavryk_tezt_performance_regression.Long_test

Long test registration and helpers.

Registering Tests

type timeout =
  1. | Seconds of int
  2. | Minutes of int
  3. | Hours of int
  4. | Days of int

Timeout specifications.

Contrary to regular tests, long tests must have a timeout. Indeed, as long tests can run for days, there is no global timeout. So if a test got stuck and never finished, the whole process would get stuck forever.

For instance, Hours 10 means that your test should fail if it has been running for more than 10 hours.

Timeouts are only a safety measure to prevent a test from blocking forever. They are not meant to be used as a way to check that a test runs in a given amount of time. They should thus significantly be over-estimated. A good rule of thumb is to:

  • multiply by 10 for tests that take minutes to run;
  • multiply by 5 for tests that take hours to run;
  • multiply by 2 for tests that take days to run. For instance, a test that usually takes between 10 to 20 hours to run could have a timeout of about 3 days.

You should still try to avoid waiting for events that may not happen, and if you have to, try to also wait for another event that indicates that the first event will never happen. In other words, try to detect failures early.

type executor

Machines on which to run a given test.

Tests that are very long (days) should run on their own executor, otherwise shorter tests will not be run very often.

Values of type executor are just tags that are added to tests. Executors will execute tests which have the tag that correspond to themselves (e.g. the x86 executor 1 will execute tests tagged with x86_executor1). If you specify an empty list of executors, the test will not run on any executor.

val x86_executor1 : executor

AMD64 executor number 1.

val x86_executor2 : executor

AMD64 executor number 2.

val block_replay_executor : executor

Executor for the block-replay semantic regression tests

val register : __FILE__:string -> title:string -> tags:string list -> ?uses:Tezt_wrapper.Uses.t list -> ?uses_node:bool -> ?uses_client:bool -> ?uses_admin_client:bool -> ?team:string -> executors:executor list -> timeout:timeout -> (unit -> unit Lwt.t) -> unit

Wrapper over Test.register to register a performance regression test.

Differences with Test.register are:

  • the timeout parameter;
  • if the test fails, an alert is emitted;
  • tag "long" is added;
  • team is added to tags and is used to decide where to send alerts;
  • executors specifies which machines shall run the test;
  • data points created with add_data_point and measure are pushed at the end of the test (whether it succeeds or not).

Because an alert is emitted in case of failure, such tests are meant to be run not on the CI (which already sends e-mails in case a job fails) but on custom architectures. Although it can also make sense to emit alerts from the CI for a scheduled pipeline if nobody receives (or read) its alerts.

Alerts are only sent if the relevant configuration is set (see alert). team specifies which Slack webhook to use to send alerts. If team is not specified, or if team doesn't have a corresponding entry in the configuration file, the default Slack webhook is used.

  • raises Invalid_arg

    if title contains a newline character.

Alerts

type category = string

category of an alert message

val alert : ?category:category -> ('a, unit, string, unit) Stdlib.format4 -> 'a

Emit an alert.

The alert is sent to a Slack channel. It is also written in logs with Log.error, prefixed with "Alert: ".

Be careful with alert fatigue: only send alerts that really need to be acted on. Each test can only send a maximum of 2 alerts, and the total number of alerts for all tests cannot exceed 100. Alerts can be sent outside of tests, in which case only the global limit applies.

If an alert for category was already sent less than rate_limit_per_category seconds ago, the alert will not be sent. See the Configuration section for documentation about rate_limit_per_category.

Default category is "".

val alert_exn : exn -> ('a, unit, string, unit) Stdlib.format4 -> 'a

Same as alert, but also log an exception.

The alert itself does not contain the exception, as it could contain sensitive data.

Emitting And Retrieving Data Points

val add_data_point : InfluxDB.data_point -> unit

Add a data point to be sent at the end of the current test.

If data points cannot be pushed at the end of the test, an alert is emitted.

  • raises Invalid_arg

    if no test is currently running, or if it was not registered with Long_test.register or Long_test.register_with_protocol.

val query : InfluxDB.select -> (InfluxDB.result_data_point list list -> 'a) -> 'a option Lwt.t

Wrapper over InfluxDB.query.

This wrapper takes care of:

  • providing the configuration;
  • emitting alerts in case something goes wrong;
  • adding a test = "test title" clause to the innermost SELECT of the query.

This wrapper passes the query result to a function. The intended behavior of this function is to extract the results, e.g. check that they contain the expected number of series and the expected columns. If this function raises an exception, an alert is emitted and query returns None. query also returns returns None if InfluxDB is not configured or if it failed to perform the query.

If log is true, log the results using Log.debug. Default is false.

module Stats : sig ... end

Statistics to retrieve with get_previous_stats.

val get_previous_stats : ?limit:int -> ?minimum_count:int -> ?tags:(InfluxDB.tag * string) list -> InfluxDB.measurement -> InfluxDB.field -> 'a Stats.t -> (int * 'a) option Lwt.t

Retrieve statistics about measurements made in previous test runs.

Usage: get_previous_stats measurement field stats

Example: get_previous_stats "rpc" "duration" Stats.(_3 mean median stddev)

This retrieves statistics specified by stats for the field named field of measurement named measurement. Only the limit most recent data points are considered in the statistics.

tags is a list of (tag, value) pairs. If tags is specified, only data points that are tagged with tag equal to value for all tags are returned.

This returns None if:

  • InfluxDB is not configured;
  • the data points cannot be retrieved (in which case an alert is also emitted);
  • less than minimum_count data points exist (default value is 3).

Otherwise it returns Some (count, average) where count is the number of data points which were used (with minimum_count <= count <= limit).

  • raises Invalid_arg

    if no test is currently running, or if it was not registered with Long_test.register or Long_test.register_with_protocol.

val get_pending_data_points : ?tags:(InfluxDB.tag * string) list -> InfluxDB.measurement -> InfluxDB.data_point list

Get the list of data points that were added for a given measurement but not sent yet.

Those data points are pending: they have not been sent to InfluxDB yet. Those data points will be sent at the end of the current test. The order of the resulting list is unspecified.

tags is a list of (tag, value) pairs. If tags is specified, only data points that are tagged with tag equal to value for all tags are returned.

  • raises Invalid_arg

    if no test is currently running, or if it was not registered with Long_test.register or Long_test.register_with_protocol.

Regression Testing

type check =
  1. | Mean
  2. | Median

Which statistics to compare with previous data points.

val check_regression : ?previous_count:int -> ?minimum_previous_count:int -> ?margin:float -> ?check:check -> ?stddev:bool -> ?data_points:InfluxDB.data_point list -> ?tags:(InfluxDB.tag * string) list -> InfluxDB.measurement -> InfluxDB.field -> unit Lwt.t

Compare data points from the current test with previous tests.

Usage: check_regression measurement field

This compare two sets of data points:

  • the set of current data points;
  • the set of previous data points.

The set of current data points is data_points, which defaults to the current pending data points. Only data points for measurement are used, and only if they have a field field with a float value.

tags is a list of (tag, value) pairs. If tags is specified, only data points that are tagged with tag equal to value for all tags are used.

If the set of data points to use is empty, the function returns immediately without doing anything.

The set of previous data points is obtained by querying InfluxDB. If InfluxDB is not configured, or if less than minimum_previous_count previous data points exist, the function returns without doing anything else. If more than minimum_previous_count previous data points are available, the last previous_count points are used.

The mean or median (depending on check) of those two sets is compared. If the value for the current data points is more than the value for the previous data points multiplied by 1. +. margin, an alert is emitted.

If stddev is true, this also logs the standard deviation of the previous data points.

Default values:

  • previous_count = 10
  • minimum_previous_count = 3
  • margin = 0.2
  • check = Mean
  • stddev = false
  • raises Invalid_arg

    if no test is currently running, or if it was not registered with Long_test.register or Long_test.register_with_protocol.

val time : ?previous_count:int -> ?minimum_previous_count:int -> ?margin:float -> ?check:check -> ?stddev:bool -> ?repeat:int -> ?tags:(string * string) list -> InfluxDB.measurement -> (unit -> unit) -> unit Lwt.t

Do something and measure how long it takes.

Usage: time measurement f

This executes f, measures the time it takes to run, adds a data point for measurement with field "duration" equal to time, and uses check_regression to compare with previous values. If f raises an exception, data points are not pushed and the exception is propagated.

If repeat is specified, call f repeat times to obtain as many data points.

See check_regression for documentation about other optional parameters.

  • raises Invalid_arg

    if no test is currently running, or if it was not registered with Long_test.register or Long_test.register_with_protocol.

val measure_and_check_regression : ?previous_count:int -> ?minimum_previous_count:int -> ?margin:float -> ?check:check -> ?stddev:bool -> ?repeat:int -> ?tags:(string * string) list -> InfluxDB.measurement -> (unit -> float) -> unit Lwt.t

Same as time, but instead of measuring the duration taken by f () execution, delegates this responsability to f itself.

In this case, f represents a thunk that executes an expression or a program and evaluates in the duration taken by its execution.

val time_lwt : ?previous_count:int -> ?minimum_previous_count:int -> ?margin:float -> ?check:check -> ?stddev:bool -> ?repeat:int -> ?tags:(string * string) list -> InfluxDB.measurement -> (unit -> unit Lwt.t) -> unit Lwt.t

Same as time, but for functions that return promises.

Note that other concurrent promises may slow down the measured function and result in inaccurate measurements.

val measure_and_check_regression_lwt : ?previous_count:int -> ?minimum_previous_count:int -> ?margin:float -> ?check:check -> ?stddev:bool -> ?repeat:int -> ?tags:(string * string) list -> InfluxDB.measurement -> (unit -> float Lwt.t) -> unit Lwt.t

Same as time_lwt, but instead of measuring the duration taken by f () execution, delegates to f itself.

In this case, f represents a thunk that executes an expression or a program and evaluates in the duration taken by its execution.

Graphs

val update_grafana_dashboard : Grafana.dashboard -> unit

Wrapper over Grafana.update_dashboard.

This wrapper takes care of:

  • providing the configuration;
  • prepending the measurement prefix configured for InfluxDB;
  • running Lwt_main.run for you.

Does nothing if Grafana or InfluxDB are not configured.

  • raises Invalid_arg

    if the dashboard UID is invalid.

Configuration

This module should be initialized with init to read the configuration file. If not, default values are used.

If you don't need to send alerts, send data points, and retrieve data points, you do not need to write a configuration file. If you do intend to run the tests yourself, you should store data points in your own database though. Indeed, time measurements only make sense when they are compared with other measurements that have been run on the same hardware.

This module sends data points to an InfluxDB database, queries data points from this database, and sends alerts to a Slack webhook. To configure this integration, write a configuration file in one of the following locations:

More precisely, if environment variable TEZT_CONFIG is set to a non-empty value, the location is read from the file located at this value. If this variable does not denote a valid configuration file, the program exits. If this variable is empty or not set, other configuration files are tried: the first one which exists is read. If the first one that exists is not a valid configuration file, the program exits. If no configuration file exists, default configuration values are used.

The contents of this file should look like:

    {
        "alerts": {
            "slack_webhook_urls": {
                "default": "https://hooks.slack.com/services/XXX/XXX/XXX",
                "p2p": "https://hooks.slack.com/services/XXX/XXX/XXX",
                "shell": "https://hooks.slack.com/services/XXX/XXX/XXX"
            },
            "max_total": 100,
            "max_by_test": 2,
            "gitlab_project_url": "https://gitlab.com/org/repo",
            "timeout": 20,
            "rate_limit_per_category": 84600,
            "last_alerts_filename", "last_alerts.json",
            "max_alert_size": 1000,
            "max_alert_lines": 20
        },
        "influxdb": {
            "url": "https://localhost:8086",
            "database": "db",
            "credentials": {
                "username": "user",
                "password": "password"
            },
            "measurement_prefix": "tezt_",
            "tags": { "commit_hash": "12345678" },
            "timeout": 20
        },
        "grafana": {
            "url": "https://localhost/api",
            "api_token": "123456789",
            "data_source": "InfluxDB",
            "timeout": 20
        },
        "test_data_path" : "/path/to/the/test_data_path"
    }

where:

Note that for Grafana dashboards to be updated, InfluxDB also needs to be configured, because measurements in Grafana queries are prefixed with the measurement prefix configured for InfluxDB.

val init : unit -> unit

Read configuration file.

Please, be sure this function is called at first before using any function from Long_test to avoid undesired behaviour regarding the loading of the configuration.

val test_data_path : unit -> string

Get the test_data_path field from the configuration.

Internal Functions for Debugging

val unsafe_query : InfluxDB.select -> (InfluxDB.result_data_point list list -> 'a) -> 'a option Lwt.t

Same as query, but without modifying the query.

Warning: contrary to query this doesn't automatically add the test = "test title" clause, so you may get results from other tests.

val log_unsafe_query : InfluxDB.select -> unit Lwt.t

Perform a query with unsafe_query and log the query and its result.