KSM (Kernel Same-page Merging)

KSM scans physical memory looking for pages with byte-identical contents across two or more processes and merges them. After a merge, both virtual mappings point at a single physical page marked copy-on-write; the duplicate is freed. The savings can be substantial on hosts running similar workloads — virtualisation hosts running multiple Linux VMs, container hosts running near-identical services — where large fractions of address-space-wide content (kernel text, glibc, common libraries, zero pages) duplicate across processes.

KSM also produces a cross-process timing side channel that has been used in real attacks against virtualisation tenants. The mechanism is intrinsic to deduplication-with-COW; it cannot be fixed without disabling KSM. This page covers what KSM does, the side channel, and Peios's default-dormant posture.

How KSM works

A kernel thread, ksmd, walks the eligible page set on a configurable cadence. For each candidate page, it computes a hash, looks the hash up in a tree of previously-seen pages, and if it finds a match, byte-compares to confirm identity. If the bytes match, ksmd updates both virtual mappings to point at the same physical page, frees the duplicate, and marks the page write-protected (so subsequent writes trigger a copy-on-write break).

KSM is opt-in per region. A process must call madvise(addr, len, MADV_MERGEABLE) to mark a memory region as KSM-eligible. Pages outside any MADV_MERGEABLE region are never scanned. A process that has not opted in is not subject to merging at all.

A second opt-in path covers entire processes:

Call	Effect
`prctl(PR_SET_MEMORY_MERGE, 1)`	Mark every future allocation in the calling process as `MADV_MERGEABLE`. Saves the per-allocation marking work for processes that want full opt-in.

Both opt-ins apply only to the calling process's own memory. There is no way to mark another process's memory as mergeable.

MADV_UNMERGEABLE reverses the opt-in for a region; pages currently merged with others are forcibly unmerged (each region gets its own private physical page back) and the region is removed from ksmd's scan set.

Configuration surface

/sys/kernel/mm/ksm/ exposes the daemon's controls:

Tunable	Effect
`run`	`0` = stop scanning, `1` = scan continuously, `2` = stop and unmerge everything.
`pages_to_scan`	Number of pages scanned per pass. Higher = more CPU spent on KSM.
`sleep_millisecs`	Delay between scan passes. Lower = more aggressive scanning.
`pages_shared`, `pages_sharing`, `pages_unshared`, `pages_volatile`	Statistics — how many pages are currently merged, sharing a merged page, never matched, or in flux.

These are admin tunables registry-driven via ksyncd, like the rest of the VM sysctl surface.

The timing side channel

When two processes share a merged page and one of them writes to it, the kernel must:

Detect the write (it traps because the page is write-protected).
Allocate a new physical page.
Copy the contents.
Update the writing process's page table to point at the new page.
Resume the write.

This break-on-write takes measurably longer than a write to a non-merged page. The latency difference is detectable from userspace by timing the write.

The attack:

Attacker process opts into MADV_MERGEABLE.
Attacker allocates a page and fills it with a guess about what the victim has in memory — for example, "I'm guessing the victim's TLS session key starts with these 16 bytes followed by zeros."
Attacker waits for ksmd to run a scan.
Attacker writes to the guessed page and times the write.
If the write was slow, the page got merged with one of the victim's pages — meaning the victim has memory byte-identical to the attacker's guess. The attacker has just confirmed 4096 bytes (or however much of the page they guessed correctly) of the victim's memory contents.

This pattern is called memory deduplication attacks in the academic literature. Demonstrated against:

TLS session keys in browsers and servers (paper: "Wait a minute! A fast, Cross-VM Attack on AES," 2014).
Browser-rendered private content (page contents revealing what URLs a user is visiting).
ASLR offsets across VM tenants (paper: "CAIN: Silently Breaking ASLR in the Cloud," 2015).
Cryptographic key material in shared-host configurations.

The leak is fundamental to deduplication-with-COW. Mitigations attempted in the literature — making COW-break time-constant, scoping deduplication to within trust domains — either destroy the performance benefit or fail to close all the variants. There is no version of KSM that is both useful and side-channel-free.

The Peios policy

Peios's default posture is:

KSM is compiled into the kernel. The feature is available; building a fortress image that wants KSM doesn't require a custom kernel.
ksmd is dormant by default. The default-image kernel does not run the scanning daemon. MADV_MERGEABLE calls succeed but mark pages eligible for a daemon that isn't running, so no merging takes place.
Toggling the daemon on requires permission. Writing to the registry key controlling KSM (or directly to /sys/kernel/mm/ksm/run) is gated by the key's SD; the default DACL grants write to TCB-tier identities only. An admin enabling KSM is making a deliberate, audited decision.
The toggle emits an audit event. When ksmd is enabled or disabled, the audit subsystem records the change with the principal and timestamp. Security teams monitoring the audit stream see when a side-channel feature is activated.

The reasoning:

MADV_MERGEABLE and PR_SET_MEMORY_MERGE are unprivileged. A process voluntarily exposing its own memory to scanning is not a security boundary violation — it's the process's own choice. The cost falls on the process; there's no reason to gate self-opt-in.
Running ksmd is a system-wide policy choice. It affects every process that has opted in (so the host operator's choice has consequences for all participating workloads), and it activates a known side channel (so the choice carries security weight). Hence the gate on the toggle.
The savings are real, and some hosts want them. VM hypervisors and container hosts with high duplication can reclaim gigabytes. Foreclosing KSM entirely (compiling it out) is too aggressive.
The side channel is real. Enabling ksmd casually is a mistake. The default-dormant posture means accidental exposure is not a thing; you have to deliberately turn it on.

Operational patterns

A VM host that wants KSM:

Enables ksmd via the registry knob (TCB-only operation, audited).
Configures pages_to_scan and sleep_millisecs according to host CPU budget.
Operates the workloads with awareness that they're sharing a side channel with anything else marked MADV_MERGEABLE on the same host.

A general-purpose system never enables KSM. The default-dormant configuration is correct.

A high-security image (fortress-mode) may go further and prevent the toggle from being flipped — by tightening the registry-key DACL to deny even TCB-tier writes, or by compiling KSM out entirely. This is a per-image policy choice.

Hugepages and KSM

KSM does not merge hugepages. The unit of deduplication is the standard 4 KB page, and hugepage-backed regions are not scanned. Workloads using MADV_HUGEPAGE or MAP_HUGETLB are implicitly excluded from the side channel for those regions.

If a workload wants both huge pages (for performance) and KSM (for memory savings), it must use small-page mappings for the regions it wants merged and accept the performance cost on that subset. In practice these goals conflict and most workloads pick one.

KSM (Kernel Same-page Merging)

How KSM works

Configuration surface

The timing side channel

The Peios policy

Operational patterns

Hugepages and KSM

See also

See also

KSM (Kernel Same-page Merging)

How KSM works #

Configuration surface #

The timing side channel #

The Peios policy #

Operational patterns #

Hugepages and KSM #

See also #

See also

How KSM works

Configuration surface

The timing side channel

The Peios policy

Operational patterns

Hugepages and KSM

See also