Since the very early days of Test Kitchen there has been a desire to use it to
test multi-server operations. Several attempts have been made over the years,
kitchen-nodes, manual orchestration of the Vagrant plugin, and most
kitchen-terraform. All of these attempts have yielded some success,
but generally come via major caveats or restrictions. A unified path to true,
generic multi-server testing has been talked about for years, but only recently
have we all circled down on a design that seems workable over the long term.
Test Kitchen is not normally covered by the Chef RFC process, but given both
the scope and importance of the feature, an exception is being made.
As an operations developer, I want to test multi-server interactions, so that I have confidence in cluster-oriented CM code.
And a more specific use case that is also rolled in here:
As a Chef user, I want to test cookbook upgrades and idempotence, so that I have confidence in my Chef code.
To start with, why is this a difficult problem? This is written based on Test Kitchen as it stands today and may become out of date in the future.
[ed: I do not like the term "supersuite", please come up with something better and tell me. maybe "multisuite"?]
The core of the new system is a configuration section called "supersuites". Each supersuite contains a set of steps to be run in a particular order. For example:
platforms: - name: centos-7 suites: - name: web1 run_list: - role[web] - name: web2 run_list: - role[web] supersuites: - name: web steps: - converge web1 - converge web2 - verify web1
When you run
kitchen test web-centos-7 on this configuration, it will run the
steps in the order listed, aborting if any step fails.
This is combined with a new lifecycle phase in the Test Kitchen state machine
for instances before
gossip. This passes the driver plugin
data about which instances will participate in the supersuite so it can set any
instance state flags it wants to be used later in the
create phase. There will
also be an optional
late_gossip phase added after
create but before
that is also passed all the instance data for any post-creation fixups. Together
these phases ensure that instances will be able to see each other on the network
and will have access to any required shared resources.
A supersuite is a configuration section [ed: better word?] which has a name and a list of steps to execute. Each supersuite includes one or more suites.
A superinstance is the combination of a supersuite and a platform, and represents the collective of all the instances which correspond to the suites in the supersuite.
If this new
supersuites configuration section is present, Test Kitchen switches in to
multi-server mode where the instance matrix is based on "platforms x
supersuites" (rather than platforms x suites as normal). Multiple supersuites
can be present, though a suite can only be a member of a single supersuite to
allow the existing instance state machine to remain mostly unchanged. A single
suite can also be referenced multiple times to allow for upgrade/downgrade
testing and idempotence testing.
The step syntax is built to be similar to the
kitchen command line utility to
make it easier to get started with for experience Test Kitchen users. It consists
of a string matching
"<action> <suite_name>". Actions use the familiar Test
destroy. All instances that are a part
of the supersuite are created together during initialization, so this step does
not need to be explicitly stated. The
destroy step action is present to allow
for testing cluster failover, but any existing instances will be destroyed
kitchen destroy or
kitchen test command matching the existing
Test Kitchen semantics.
Suite names and supersuite names cannot overlap, doing so will be a configuration error.
converge step runs a provisioner plugin on an instance. With no other
options this is equivalent to running
kitchen converge <instance_name>.
Additional configuration options can be specified in the step and will be merged
in to the instance configuration for only that
converge action. For example:
supersuites: - name: web steps: - converge web1 - step: converge web1 attributes: version: 2 run_list: - recipe[other] - step: converge web1 provisioner_config: name: shell script: init.sh - verify web1
Driver configuration cannot be set like this as
create runs in its own phase
outside of the
verify step runs a verifier plugin on an instance. With no other options
this is equivalent to running
kitchen verify <instance_name>. Additional
configuration options can be specified in the step, in same way as the
converge step. Unlike normal Test Kitchen usage, this will not execute a
converge phase if the instance is not already converged. The
setup phase will
be run for any instance not already in that state.
By default the supersuite name will be used in place of the suite name for finding the test files, but this can be overridden by a configuration option:
supersuites: - name: web steps: - converge web1 - step: verify web1 name: other
destroy step runs the driver teardown on an instance. With no other options
this is equivalent to running
kitchen destroy <instance_name>. As with
verify, additional configuration options can be added to the
step if needed.
This step is intended to be used much less frequently than the other two and is for testing things like automated failover or cluster recovery.
[ed: do we also need a step for creating an instance mid-test?]
All steps will support an
allow_failure configuration option to allow
continuing with the supersuite even if the step fails. This can be used to test
explicit repair of broken instances.
create action on a superinstance will run the
late_gossip phases on all instances that are a part of the supersuite. If no
input is given, it will process all superinstances in the order specified in the
configuration. If an input string is given and does not match any
superinstances, it will be checked against normal instances and if any match
they will run a normal
create action without the
converge action on a superinstance will run all
steps in the order specified in the configuration. If no input is given it will
process all superinstances in the order specified in the configuration. If an
input string is given and does not match any superinstances, it will be checked
against normal instances and if any match they will run a normal
verify action on a superinstance will run all
verify supersuite steps in
the order specified in the configuration. If an instance is not in the
converged state, it will not be converged like with normal Test Kitchen
operation [ed: should verify on an unconverged instance be an error?]. If any
destroy action steps are present in the superinstance, this will warn that
destroy steps are not being processed. If no input is given it will process
all superinstances in the order specified in the configuration. If an input
string is given and does not match any superinstances, it will be checked
against normal instances and if any match they will run a normal
test action on a superinstance will run all supersuite steps in the order
specified in the configuration. As with normal Test Kitchen, this will include
create action, and destroying all the instances based on the value
--destroy command line option. The destroy at the end is an all-or-
nothing, so it will either destroy all instances in the superinstance or leave
them all. A new command-line option will be added,
--no-recreate to suppress
destroy action before the
create [ed: should this apply to non-
kitchen test too?]. The
--no-recreate flag will allow for
slightly faster testing when using
destroy step actions as this is the only
way to run those. If no input is given it will process all superinstances in the
order specified in the configuration. If an input string is given and does not
match any superinstances, it will be checked against normal instances and if any
match they will run a normal
login action is mostly unchanged from normal Test Kitchen. Running
on a superinstance will display an error message explaining that it can only be
used on specific instances, and will show the names of the instances in the
kitchen list with no input when supersuites are present will
display only the superinstances. Running
kitchen list --instances will display
the state of the underlying instances, using the same output as normal
list. If an input string is given and does not match any superinstances, it
will be checked against normal instances and if any match they will be displayed
package action on a supersuite is undefined at this time.
Executing two or more supersuites concurrently is explicitly not going to be supported for the foreseeable future. Executing multiple steps within a single supersuite concurrently may be added, but not in the first version. A suggested syntax for this is:
supersuites: - name: web steps: - concurrent: - converge web1 - converge web2 - verify web1
Further exploration of the completed feature will be required to assess the impact of concurrent execution on the multi-server model, but it should be viable.
This proposal does not explicitly address making Chef search work between
instances, this feature should be added to the
provisioners in Test Kitchen but does not require any new structural elements
outside of this design.
Including both single and multi-server tests in the same Test Kitchen
configuration is possible but not easy to use as most commands will run in
"supersuite mode" if any supersuites are present in the configuration. This can
be addressed in a somewhat verbose manner by wrapping any single server tests in
a supersuite that happens to consist only of
converge <suite> and
All Test Kitchen drivers that want to support multi-server testing will need to be updated to include an implementation of the gossip phase, even if it might be a no-op in some cases.
As the new functionality is only activated if one or more supersuites exist in the configuration, existing Test Kitchen users should be unaffected unless they opt in to the new features.
This work is in the public domain. In jurisdictions that do not allow for this, this work is available under CC0. To the extent possible under law, the person who associated CC0 with this work has waived all copyright and related or neighboring rights to this work.