virtiofsd/doc/migration.md
Hanna Czenczek 964e615eab Add migration.md document
This new document describes how virtio-fs migration works, what its
fundamental limitations are, why users will need to make choices on the
command line, and what choices/settings we recommend, given common
circumstances.

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
2024-10-18 16:33:29 +02:00

265 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Migration with virtio-fs
========================
virtiofsd supports migration through [vhost-users device state
interface](https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#migrating-back-end-state),
allowing it to place internal state into the vhost-user front-ends (e.g.
QEMUs) migration stream. This allows it to transfer information about files
and directories the guest has open to the destination instance. It is very
important to note however that virtiofsd never migrates any data, i.e. source
and destination are expected to export shared directories with matching
contents (e.g. by using the same directory on the same filesystem).
If you do not care about any of the details, feel free to skip ahead to the
[section explaining recommended configurations](#recommended-configurations).
Filesystem State Requirements
-----------------------------
As just mentioned, virtiofsd does not migrate any filesystem data, and provides
no facilities to do so. The user is responsible for ensuring that the shared
directories used by the source and destination instances of virtiofsd have the
same content. Specifically, they must have the same content during switch-over,
once execution is stopped on the source, until it is resumed on the destination.
One way to achieve this is to use the same directory on the same filesystem for
both instances, e.g. by using a shared network filesystem. If that is not
possible, the contents of the shared directory must be copied (outside of QEMU)
from the source to the destination during the switch-over phase. This may be
reasonably feasible for a read-only use case, where copying can take place long
in advance of the actual migration.
Snapshots
---------
Because virtiofsd embeds its state into the front-ends migration stream, it is
possible to store this stream somewhere to restore it later, i.e. in a
snapshot. From a technical perspective, this is perfectly fine, but it must
again be stressed that virtiofsds state includes absolutely no data;
therefore, some mechanism outside of virtio-fs/virtiofsd must be used to ensure
that when restoring such a snapshot, the shared directory is in exactly the
same state as it was when the snapshot was taken.
What Needs to Be Migrated Anyway?
---------------------------------
For every file or directory that is open in the guest, virtiofsd has a
corresponding file descriptor (FD) open in the shared directory. The
destination instance must restore these FDs, so the source instance must provide
instructions on how to do so.
The same applies to files and directories the guest does not really have open,
but still has their directory entries cached; through FUSE, the guest kernel can
reference all such cached entries by associated integer IDs. Therefore,
virtiofsd needs to have an internal map that can convert each ID into
something that strongly references its associated filesystem object;
specifically, either an `O_PATH` FD or a file handle, depending on the
`--inode-file-handles` setting. These too need to be transferred in some manner
to the destination.
Migration Modes
---------------
There are two general ways virtiofsds internal state can be serialized and
migrated, [by path](#by-path---migration-modefind-paths) or [as file
handles](#as-file-handles---migration-modefile-handles).
### By Path (`--migration-mode=find-paths`)
For every filesystem object that must be transferred to the destination,
virtiofsd tries to find its path inside of the shared directory, and transmits
that to the destination, which then opens it.
Because paths can change, this mode can be quite brittle. virtiofsd begins
collecting paths once migration starts (long before the switch-over phase), so
any changes to those paths afterwards can lead to various problems, especially
if those changes are done by third parties outside of the VM guest.
Some examples for such changes are:
#### Unlinking
Files can exist without paths, specifically when theyre opened but unlinked.
Consequently, such files (that may be open in the guest) cannot be migrated
using paths. When migrating anyway, the file contents will be lost once the
source instance is quit.
Note that for files for which virtiofsd cannot find a path, migration will
produce an error. The error response behavior is controlled via the destination
instances `--migration-on-error` switch; `abort` will abort migration (on the
destination) when any error occurs, allowing execution to be resumed on the
source side, with any FD still open. `guest-error` will continue migration,
marking any file that could not be migrated as faulty, returning errors for any
guest accesses.
#### Renaming / Moving
When files or directories are renamed or moved by the migrating guest, virtiofsd
is naturally aware of this, and so can update the paths it holds internally.
This is not the case when paths are changed outside of virtiofsd, by third
parties. In this case, virtiofsd will remain unaware and will send the outdated
path to the destination, which will not be able to resolve it (error behavior is
then controlled by the `--migration-on-error` switch, as described in the
[Unlinking](#unlinking) section).
In contrast to the *unlinking* case, it would at least theoretically be possible
to migrate these files using their new paths, if virtiofsd somehow could get
notified of the rename/move. The `--migration-confirm-paths` option has it
double-check each collected path at switch-over time, and so may be able to
detect such moves and renames in many cases (but does so on the source side, so
still has a non-empty TOCTTOU window).
#### Replacing
In the *renaming / moving* case, the worst thing that can happen is that a file
the guest has open is no longer accessible after migration. A much worse case
is when a file is replaced without it being noticed: In this case, the
destination will open the other file, but present it as the old one to the
guest, with no error indication at all. That can lead to data corruption.
The migration destination cannot detect this case without performing specific
checks, because opening the path it has received from the source will succeed
(but yield the wrong file). Such checks are:
* `--migration-verify-handles`: With this switch, source and destination
generate a file handle for each transferred path. A file handle is a piece
of data that uniquely identifies a filesystem object (like a file or
directory), and becomes invalid (“stale”) when that object is deleted; so we
can use it to verify a files identity between source and destination.
However, it only works when source and destination use the same shared
directory on the same filesystem (e.g. a network filesystem). Furthermore,
any mismatches that are detected cannot be recovered from (i.e. we still dont
know the involved files true paths, so `--migration-on-error` will decide how
to proceed).
* `--migration-confirm-paths`: This switch makes the source instance
double-check all paths during switch-over, i.e. when both the source and
destination instance are stopped. While this can theoretically allow error
recovery (by fetching an updated path from */proc/self/fd*), and does not
require source and destination to use the same filesystem, it still leaves a
small TOCTTOU window open (between checking and the destination instance
opening the paths), and it requires doing potentially quite a bit of I/O
(checking paths) during migration downtime, which is generally not desirable.
Both switches can also be used together, but they can only be used in
*find-paths* migration mode, not *file-handles* (because they simply are not
necessary in *file-handles* mode). Check the [dedicated section for more
information on recommended configurations](#recommended-configurations).
#### Implementation Detail: Collecting Paths
There are two ways paths can be collected, either by [looking up FDs in
*/proc/self/fd*](#querying-procselffd), or by [recursing through the shared
directory](#recursing-through-shared-directory). virtiofsd implements both of
these, but only uses the latter as a fall-back for when the former fails.
##### Querying /proc/self/fd
*/proc/self/fd* contains a symbolic link for each file descriptor opened by the
current process. These arent really symbolic links, though: Opening them does
not resolve their link target, but directly opens (basically duplicates) the
corresponding file descriptor.
Still, these links can have valid targets: The kernel tries to keep track
internally what paths the underlying filesystem objects have, and provides this
information there. Querying this is thus a much faster way to get a path for
our file descriptors than to recurse through the shared directory.
The downside is that there is no formal guarantee that this works. It is
unclear under what circumstances this can break down; if it does, virtiofsd will
fall back to [recursing through the shared
directory](#recursing-through-shared-directory).
For what its worth, the only case we have seen where a file has a valid path,
but */proc/self/fd* cannot provide it, is to use its file handle to open the
file, when it has not yet been opened through its path. For example:
1. Open file using path
2. Generate and store file handle
3. Unmount file system, then mount it again
4. Open stored file handle
Something like this can happen with virtiofsd only on the migration destination
instance after a *file-handles* migration; in other cases, virtiofsd will
generally open files by path first, giving the kernel a chance to make a note of
that path.
##### Recursing Through Shared Directory
We can also obtain files paths by recursing through the shared directory,
enumerating all paths therein, and associating them with the respective files
and directories. Naturally, this is quite slow, especially the more files there
are in the shared directory, which is why virtiofsd will only fall back to this
implementation if it fails to query a path from */proc/self/fd*.
### As File Handles (`--migration-mode=file-handles`)
Every filesystem object that must be transferred to the destination is converted
to a file handle (a piece of data that uniquely identifies this object on a
given filesystem, and can be used to open it), which is sent to the destination.
Because there is a unique and permanent relationship between such an object and
its file handle, this migration mode is not susceptible to the problems “by
path” migration has, for example, a file handle even stays valid when a file has
a link count of 0 (i.e. is deleted, has no path anymore) but some process still
has it open (i.e. holds an FD).
However, because file handles are just some data that allows access to
everything on a filesystem without checking e.g. access rights along a files
path, opening them requires the *DAC_READ_SEARCH* capability, which grants the
ability to read any file, regardless of its access mode. Generally, this
capability is only available to applications running as root.
Furthermore, because file handles are specific to a given filesystem instance,
when using them for virtio-fs migration, the source and destination instance
must use the same shared directory on the same filesystem, e.g. a shared network
filesystem.
Recommended Configurations
--------------------------
### General
Consider which **`--migration-on-error`** mode suits your needs:
* `abort`: When any error is encountered (e.g. destination cannot find a file
that is open in the guest), abort migration altogether. You can then
generally resume execution on the source; the source virtiofsd instance will
retain all open file descriptors until it is quit.
* `guest-error`: When encountering errors pertaining to a specific file or
directory, do not abort migration, but instead mark that file or directory as
invalid. Any guest accesses to it will then result in guest-visible errors.
### Shared Filesystems
When source and destination instance use the same shared directory on the same
filesystem, using **`--migration-mode=file-handles`** is recommended. This
requires the destination instance to have the `DAC_READ_SEARCH` capability.
If that capability cannot be provided, we recommend using
**`--migration-mode=find-paths`** together with
**`--migration-verify-handles`**. Using **`--migration-confirm-paths`**
additionally is optional; it can better recover from unexpected path changes
than `verify-handles` alone, but will prolong migration downtime.
### Different Filesystem
If the source and destination shared directory are not the exact same directory
on the same filesystem, users must ensure their contents are equal at migration
switch-over. For example, read-only configuration directories presented to the
guest via virtio-fs can just be copied over to the destination ahead of
migration.
For such cases, use **`--migration-mode=find-paths`**.
We also recommend the filesystem to be read-only, which can be reinforced with
virtiofsds **`--readonly`** switch. If that is not possible, *take special
care* to ensure source and destination directory contents match at the
switch-over point in time!
If, during migration, it is possible for the shared directory contents to be
modified by a party other than the migrating virtiofsd instance, we strongly
recommend using **`--migration-confirm-paths`**. Still, that is not a 100 %
safe solution. So above all, for the case where source and destination instance
do not use the same shared directory on the same (shared) filesystem, we
strongly advise not to allow the shared directory to be modified at all during
migration.