Special cases

The handle model handles most of file access cleanly. But files have edges — operations that bypass the model, semantics that don't fit, filesystems whose behaviour FACS cannot fully control. This page covers those edges.

Each of the special cases below is something that someone debugging FACS behaviour will eventually hit. Knowing the rules ahead of time keeps the gotchas from being surprises.

O_PATH — fds without a cached mask

O_PATH is a Linux flag for opening a file in a "path-only" mode. The fd that comes back can be used to refer to the file (for openat(dirfd, ..., 0, fd), for fstatat, for path manipulation) but cannot be used for actual data operations.

Under FACS, an O_PATH open does not run a full AccessCheck. The fd is not FACS-managed; it has no cached granted mask. Operations that would normally consult the cache either work unconditionally (because they don't need any access) or use a fresh live AccessCheck (because they do):

Operation on O_PATH fd	Behaviour
`fstat`, `fstatat`	Works without access check. The fd is a path reference; getting stat info is allowed.
`openat` using the fd as dirfd	Normal — runs AccessCheck for the new open. The dirfd's O_PATH status doesn't affect the new fd.
`kacs_get_sd` with AT_EMPTY_PATH	Runs a live AccessCheck (the fd has no cached mask to consult).
`kacs_set_sd` with AT_EMPTY_PATH	Same — live AccessCheck.
`read`, `write`, `mmap`	Denied with `EBADF`. The fd cannot be used for data operations.
`fchmod`, `fchown`, `fgetxattr`, `fsetxattr`, `ioctl`	Denied with `EBADF`.

O_PATH is useful for path manipulation — keeping a reference to a directory while traversing the tree, or referring to a file by fd rather than by path. For these uses, the lack of a cached mask is irrelevant.

The catch: a process that wants to do anything with an O_PATH fd beyond path navigation needs to reopen it. The reopen is a fresh access check that decides what the resulting fd can actually do.

The exec dual gate

Exec is the only operation in v0.20 that performs a use-time access check rather than relying on the cached mask. The reasoning: exec changes the process's identity (via the new binary's PIP) and effectively replaces it; the open-time decision is no longer the right one to use.

Specifically, execveat(AT_EMPTY_PATH) on an open fd:

Runs a fresh AccessCheck against the file's current DACL using the caller's current effective token.
Requires both the +x mode bit on the file and FILE_EXECUTE on the caller's access to be present.

The two checks answer two different questions:

+x on the file means this file is intended to be run as a program. It is a property of the file itself, set by whoever owns it — the file's author saying "this is an executable, not data". A file without +x is data; trying to exec it fails regardless of who is asking, because exec on data is meaningless.
FILE_EXECUTE on the access means this caller is allowed to execute this program. It is an access decision against the DACL — the question of whether this principal, on this token, is authorised to run this binary.

Both have to be true for exec to proceed. A data file (no +x) is not runnable by anyone; an executable file the caller is not authorised for is runnable in principle but not by this caller.

The dual gate is the v0.20 exception to the handle model. The reason for live evaluation here is that exec's consequences are large enough — the binary that runs is what the kernel will then trust at the verified PIP level — that running on a stale cache is the wrong default.

mmap(PROT_EXEC) is different from exec. It uses the cached mask (FILE_EXECUTE from the open), and additionally runs the LSV mitigation if enabled. mmap is not the same as exec; the dual-gate-with-live-check applies only to exec specifically.

Append-only handles

A handle opened with FILE_APPEND_DATA granted but not FILE_WRITE_DATA is an append-only handle. The kernel enforces this strictly:

write() at the current offset is allowed only if the current offset is at end-of-file. Otherwise denied.
pwrite() at any offset other than end-of-file is denied (RWF_APPEND-style).
Mapping the file shared writable (mmap(MAP_SHARED, PROT_WRITE)) is denied.
ftruncate to extend the file is denied (cannot rewind via truncate).
fallocate modes that would mutate (FALLOC_FL_PUNCH_HOLE, FALLOC_FL_COLLAPSE_RANGE, FALLOC_FL_ZERO_RANGE) are denied.

Operations that legitimately only append — POSIX write with O_APPEND, RWF_APPEND writes — work normally. The kernel makes the data lands at end-of-file.

The use case: log files that should be append-only. A logger holds a handle with FILE_APPEND_DATA but not FILE_WRITE_DATA; it can write log lines but cannot rewrite earlier ones. Even if the logger is compromised, the existing log entries are safe from modification through this handle.

Append-only is the FACS expression of the "secure log" pattern. It exists at the right-mask level, not as a separate file attribute.

sticky bit — no effect under FACS

The traditional Unix sticky bit on a directory restricts file deletion within: only the owner of a file (or the owner of the directory) may unlink files in a sticky-bit directory. This is what makes /tmp work.

Under FACS, the sticky bit has no effect. Deletion is gated by FILE_DELETE_CHILD on the parent directory's SD, period. The sticky bit's semantics are not encoded.

The reason: FACS's access model is comprehensive enough that the sticky bit is unnecessary. A /tmp directory's DACL grants the directory's FILE_DELETE_CHILD to the appropriate principals (typically just the owner or the file owner, mediated through OWNER RIGHTS); this is the same effect the sticky bit had, expressed in the DACL.

For /tmp specifically, the Peios-default DACL provides equivalent semantics: each user can create files in /tmp (FILE_ADD_FILE granted to Authenticated Users), each file's owner can delete their own (FILE_DELETE_CHILD inherited per-file from OWNER RIGHTS ACEs), but a user cannot delete another user's files.

A directory whose mode includes the sticky bit (visible to ls -l as a t) is informational; the mode is derived from the SD by FACS. The bit can be set but has no enforcement consequence.

POSIX ACLs — replaced by KACS

The Linux POSIX ACLs (set via setfacl and stored as system.posix_acl_access and system.posix_acl_default xattrs) are not honoured by FACS. They are replaced by KACS DACLs.

Specifically:

Writes to system.posix_acl_access and system.posix_acl_default xattrs are unconditionally denied.
Existing POSIX ACL xattrs on files (carried over from a pre-Peios system, say) are ignored. FACS uses the KACS DACL exclusively.

The kernel will not silently translate POSIX ACLs to KACS DACLs. Migrating from a Linux system with POSIX ACLs requires re-writing the SDs in KACS form. Tools that can do this conversion exist in the migration tooling; the kernel does not do it on the fly.

fchown — gated on WRITE_OWNER

The legacy fchown syscall changes a file's owner UID/GID. Under FACS, the family is permitted but gated on SD rights:

fchown() succeeds only if the fd's granted mask includes WRITE_OWNER; otherwise -EACCES.
chown() and lchown() run a fresh access check requiring WRITE_OWNER on the file's SD.

The Linux uid/gid change does not alter the SD's owner SID. Changing the real owner goes through kacs_set_sd with OWNER_SECURITY_INFORMATION — the KACS semantics for ownership (covered in Managing file security) replace the legacy mode-and-owner duo.

The kernel surfaces the file's owner SID's projected UID as the result of stat-style queries, so ls -l shows a UID. But the UID is derived from the owner SID; setting it via fchown is not the way.

NFS — dual authority

NFS client mounts are a special case. The underlying filesystem is on a remote server; the server enforces its own access control independent of what the client thinks. FACS on the client side cannot reach into the server's authorisation; it can only express its local view.

The pattern Peios uses: NFS client mounts are configured with the synthesize_ephemeral mount policy. The client synthesises a local SD for each file accessed (per the mount template), runs AccessCheck locally, and either authorises the operation or denies it. If authorised, the operation is forwarded to the server, which may then deny it for its own reasons.

The consequence: a locally-authorised open can still produce I/O errors when the server refuses. A caller may successfully open(O_RDONLY) against an NFS file (FACS's local synthesis decided to grant) but the read() returns -EACCES from the server.

This is "dual authority": the client and the server both have a say, and both have to permit. The client's denial is final on the client side; the server's denial is final on the server side. There is no single source of truth for what is allowed.

⚠ Warning

Don't trust FACS results on NFS for security purposes. The local FACS decision is "the client will let this go forward"; the server may still refuse. If you need a security guarantee, the server's access control is what matters.

Other implications:

Expect I/O errors from server-side denial. A successful open does not guarantee a successful read. Handle EACCES (or EIO) from the operation, not just from the open.
The synthesised SD is local. Changes to the file's actual SD on the server are not visible to FACS's local synthesis. Tools that want to inspect the real SD need to query the server.

NFS server mounts (where Peios is the server) work the other direction: FACS enforces the SD on the local files; the protocol exposes it. The remote client's view of what is allowed is whatever the protocol negotiates, which may be limited by NFS-protocol-level restrictions but is ultimately decided by FACS.

`/proc` and `/sys`

/proc and /sys are kernel-managed pseudo-filesystems. They are not FACS-managed in the standard sense — their mount policy is unmanaged, which means the FACS handle model does not apply.

What does happen:

File operations under /proc/<pid>/* route through the process's PSB / process SD checks (the two-check rule from PIP). Access is decided per-operation against the target process.
/sys/kernel/security/kacs/* files have explicit SDs on them (set up by the kernel) and are accessed via the kernel's own check logic.
General /proc files (/proc/uptime, /proc/cpuinfo) are world-readable by convention.
/sys writes are restricted to BUILTIN\Administrators and SYSTEM by hardcoded rule.

The result: operations on /proc and /sys work, but they don't go through the FACS handle model. The cached mask on the fd from opening one of these files is meaningless; the per-operation check is what gates access.

This is why cat /proc/self/status produces output without any apparent FACS involvement — the access is granted by the kernel's /proc-specific rules, and the kernel just makes data available.

The mount-policy classes are covered in Mount policies. unmanaged is the special class for kernel-managed filesystems; it cannot be set via the public ABI.

Whiteouts in renameat2

renameat2(RENAME_WHITEOUT) — the Linux-specific flag for creating a "whiteout" entry as part of a rename (used by overlay filesystems) — is supported on FACS-managed filesystems. The rename and the whiteout it leaves behind happen as one atomic operation, exactly as on stock Linux.

A whiteout is a chrdev(0,0) sentinel that overlay filesystems drop at the old name to mask a file in a lower layer. FACS authorises it the same way it authorises any new node: you need FILE_ADD_FILE on the source directory (where the whiteout lands), in addition to the usual rename rights. If you lack that right the whole rename is denied before anything is created, and the whiteout — like every freshly created node — inherits a security descriptor from its parent directory and is recorded in the audit trail.

This matters if you run overlayfs or union-style filesystems on FACS-managed storage: they work without the partial-support caveats earlier previews carried.

Summary table

The special cases at a glance:

Case	What is different
O_PATH	No cached mask; data operations denied; live check for SD ops
exec via execveat	Live AccessCheck plus Linux `+x` mode bit
Append-only handles	Write-at-end only; mmap-shared-writable and truncate denied
Sticky bit	No effect — the DACL is the gate
POSIX ACLs	Replaced by KACS; xattr writes denied
fchown	Denied on FACS-managed fds; use kacs_set_sd
NFS client	Dual authority; local FACS plus remote enforcement
`/proc` and `/sys`	Unmanaged mount policy; per-operation check by kernel-specific rules
renameat2 RENAME_WHITEOUT	Supported; atomic, needs FILE_ADD_FILE on the source directory, whiteout inherits an SD

Each is a deliberate decision. Some reflect Linux compatibility (POSIX ACLs replaced); some reflect security policy (exec dual gate, append-only enforcement); some reflect the model's limits (NFS dual authority). Knowing them keeps the surprises from being surprises.

Where to go next

For the model these cases are edges of, read The handle model.

For how a mount's policy decides whether FACS applies at all, read Mount policies.

For the SD read/write syscalls referenced throughout, read Managing file security.

O_PATH — fds without a cached mask #

The exec dual gate #

Append-only handles #

sticky bit — no effect under FACS #

POSIX ACLs — replaced by KACS #

fchown — gated on WRITE_OWNER #

NFS — dual authority #

/proc and /sys #

Whiteouts in renameat2 #

Summary table #

Where to go next #

See also