Building Your OpenStack Security Baseline | VEXXHOST

OpenStack security is a set of decisions distributed across Keystone, Neutron, Nova, and Barbican. What to change, why it matters, and what breaks if you skip it.

Security in OpenStack is a set of decisions distributed across Keystone, Neutron, Nova, and Barbican, where a reasonable default in development becomes an unacceptable exposure in production. This guide covers the specific changes that matter, why they matter, and what happens if you skip them.

Keystone

Disable the Admin Token Before Anything Else

The Admin Token is a shared secret used to bootstrap Keystone. It carries no user context and no scope. It grants unrestricted access to your entire Keystone deployment to anyone who has it.

In production, it must not exist. Remove AdminTokenAuthMiddleware from your paste application pipelines in keystone-paste.ini. Every hour your deployment runs with the admin token active is an hour that token can be used from anywhere on your network without attribution to any user or project.

Fernet Key Rotation: The Multi-Node Race Condition

Fernet key rotation has a failure mode that is easy to miss and silently breaks authentication. In a multi-node Keystone deployment, if you rotate keys without first distributing them to all nodes, a token created with the new primary key on one node will fail validation on every other node that hasn't received the update yet.

The upstream Keystone docs state this directly: "If the rotation and distribution are not lock-step, a single keystone node in the deployment will create tokens with a primary key that no other node has as a staged key. This will cause tokens generated from one keystone node to fail validation on other keystone nodes."

The staged key (key 0) exists specifically to handle this window. It can decrypt tokens but is never used to create them. This means a node that has the staged key — but has not yet received the new primary — can still validate tokens created on another node, as long as distribution happens before the next rotation.

The correct sequence is:

Confirm all Keystone nodes have the same key repository.
Run keystone-manage fernet_rotate on one node.
Distribute the updated repository to all other nodes before the next rotation.

The formula for setting max_active_keys is:

max_active_keys = (token_expiration_hours / rotation_frequency_hours) + 2

The two additional keys account for the staged key and a buffer. For example: 24-hour token validity with 6-hour rotation requires (24 / 6) + 2 = 6 active keys. Setting this too low means Keystone prunes secondary keys that are still needed to validate unexpired tokens, silently breaking authentication. Treat Fernet keys with the same care as SSL private keys. Any node joining the cluster must have the same key repository before it starts issuing or validating tokens.

If your deployment uses service token authentication (where services may need to validate expired tokens), adjust the formula to account for allow_expired_window: max_active_keys = ((token_expiration + allow_expired_window) / rotation_frequency) + 2.

Credential Encryption

Since the Newton release, Keystone encrypts all credentials stored in the SQL backend using Fernet. This requires a separate key repository configured explicitly in keystone.conf:

[credential]
provider = fernet
key_repository = /etc/keystone/credential-keys/

This is a separate key repository from the token keys, with its own rotation lifecycle. If this section is absent from your configuration, confirm that keystone-manage credential_setup has been run and that keystone-manage credential_migrate has been completed after upgrades from older deployments. Do not infer plaintext storage merely from the absence of an explicit [credential] section in keystone.conf.

PCI-DSS Compliance via [security_compliance]

All PCI-DSS compliance controls in Keystone live in the [security_compliance] section of keystone.conf. The configurable parameters:

[security_compliance]
change_password_upon_first_use = true
disable_user_account_days_inactive = 90
lockout_duration = 1800
lockout_failure_attempts = 6
minimum_password_age = 1
password_expires_days = 90
password_regex = (?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%^&*]).{8,}
password_regex_description = Must be 8+ chars with uppercase, lowercase, digit, and special character
unique_last_password_count = 5

The PCI-DSS mappings:

PCI-DSS 8.1.4 (disable inactive accounts within 90 days): disable_user_account_days_inactive = 90
PCI-DSS 8.1.6 (lock after no more than 6 failed attempts): lockout_failure_attempts = 6
PCI-DSS 8.1.7 (lockout minimum 30 minutes): lockout_duration = 1800 (seconds)
PCI-DSS 8.2.5 (no reuse of last 4 passwords): unique_last_password_count = 5

Two caveats the upstream docs are explicit about. First, these controls apply only to Keystone's SQL identity backend. If you use LDAP, federated identity, or any non-SQL driver, PCI-DSS compliance for authentication is entirely the responsibility of that external system — OpenStack cannot enforce it. Second, in most HA deployments, TLS is terminated at the public endpoint, and traffic between the load balancer and backend Keystone nodes on the private network may be unencrypted. If your private network is considered at risk, the load balancer must be configured for TLS on the internal network. OpenStack does not manage this; your deployment tooling does.

Service Accounts Must Not Lock Out

Account lockout is correct policy for users. For service accounts it can take down services. If a service user gets locked out of Keystone, the corresponding service stops working.

Exclude service accounts via the CLI:

openstack user set --ignore-lockout-failure-attempts <service-user-id>

Or via the REST API:

curl -X PATCH \
-H "X-Auth-Token: $TOKEN" \
-H "Content-Type: application/json" \
-d '{"user": {"options": {"ignore_lockout_failure_attempts": true}}}' \
https://keystone.example.com/v3/users/<user_id>

Apply this to all service users before enabling lockout globally.

Policy Changes Take Effect Immediately

Changes to policy.json do not require a service restart. They take effect the moment the file is saved. Test policy changes thoroughly in staging first.

Neutron

OVN vs. iptables: What Actually Changed

Historically, OpenStack used a Linux bridge between each instance and the OVS integration bridge br-int because OVS could not interact directly with iptables. Security group rules lived in iptables on that intermediate bridge. Every VM had its own bridge with its own iptables chain.

OVN replaces this entirely. It implements security group rules as OpenFlow flows evaluated in kernel space, eliminating the Linux bridge and iptables dependency. Beyond simplifying the architecture, OVN introduces Port Groups: instead of creating separate ACL flows for every port in a security group, OVN groups ports with identical security group membership and applies one set of rules to the group. A security group shared by 100 instances creates one ACL set instead of 100. The performance and scalability improvement is significant and increases as VM count grows.

conntrack: Tune Before You Need To

Neutron security groups are stateful. Inbound TCP on port 443 automatically allows the corresponding response traffic without a separate egress rule. The underlying mechanism is Linux connection tracking (conntrack), and it has limits.

The default conntrack table can be exhausted on high-throughput compute nodes handling large numbers of concurrent connections, causing new connections to fail. Check your current state:

sudo conntrack -C # current entry count
sudo sysctl net.netfilter.nf_conntrack_max # current limit
sudo sysctl -w net.netfilter.nf_conntrack_max=262144 # increase if needed
Set this in /etc/sysctl.d/ to persist across reboots and monitor table utilization as part of your standard compute node metrics.

Stateless Security Groups (OVN, requires OVN >= 21.06)

OVN deployments can use stateless security groups, which bypass connection tracking entirely. Support for the allow-stateless ACL action was added in OVN 21.06. On deployments running OVN older than 21.06, stateless security groups are not supported. The allow_stateless_action_supported configuration option that previously controlled this was removed in the 2025.1 (Epoxy) release — the 2023.1 (Antelope) release notes deprecated it for removal. On any supported release, stateless security groups work if your OVN version meets the minimum.

The tradeoff is explicit: stateless groups do not automatically allow return traffic. A rule allowing outbound TCP requires a corresponding rule explicitly allowing inbound replies. Stateless mode is also the only viable option when offloading OpenFlow actions to hardware.

For DPDK-based deployments, stateless NAT for floating IPs is available via [ovn] stateless_nat_enabled in ml2_conf.ini. It is disabled by default. Enabling it avoids conntrack OVN actions for floating IP traffic. The option lives in ml2_conf.ini because it is a configuration option for the ML2/OVN mechanism driver, not for Neutron's core service.

Egress Filtering: The Half That Gets Skipped

The default Neutron security group allows all egress traffic. Most operators tighten ingress rules carefully and leave egress entirely open. Egress filtering is what prevents a compromised instance from initiating unauthorized outbound connections. Default-deny egress with explicit rules for traffic you expect is the correct model, not the exception.

New RBAC Defaults Require Opt-In

Neutron's new secure RBAC defaults, including a service role for port policies, are not enabled automatically for Neutron 2023.1 and older deployments. To enable them:

[oslo_policy]
enforce_new_defaults = true
For newer OpenStack releases, check the release-specific Neutron policy defaults rather than assuming these options are opt-in (the behavior may differ from what is described here).

One important constraint: setting enforce_scope = true will cause 403 Forbidden responses to any API calls made with a system-scoped token, because all Neutron APIs are currently project-scoped. Do not enable scope enforcement in Neutron until the project has completed the necessary scoping work in your deployment.

Nova

Live Migration TLS: QEMU-Native Is What You Actually Want

The live_migration_tunnelled option has two significant limitations the upstream Nova docs acknowledge directly: it cannot handle block migration (live migration with non-shared storage), and it has substantial performance overhead due to increased data copying on both source and destination hosts.

QEMU-native TLS solves both problems. QEMU-native TLS encrypts all migration streams — guest RAM, device state, and disk data over NBD for block migration — with significantly lower overhead. It requires libvirt 4.4.0 and QEMU 2.11.

On every compute node, add to /etc/libvirt/qemu.conf:

default_tls_x509_cert_dir = "/etc/pki/qemu"
default_tls_x509_verify = 1

Setting both default_tls_x509_cert_dir and default_tls_x509_verify means there is no need to specify any of the other individual _tls config options. Then in nova.conf:

[libvirt]
live_migration_with_native_tls = true
live_migration_scheme = tls

Both lines are required. Omitting it produces a silent failure with no indication migrations are unencrypted.

Ensure TCP ports 16514 and 49152–49215 are open between compute nodes.

Note on VNC consoles: VNC settings allow clients from any IP address to connect to instance consoles. When hardening compute hosts, restrict VNC access to trusted networks or protect it with firewalls independently.

MDS Vulnerabilities: Explicit CPU Flag Exposure

The MDS vulnerabilities (RIDL, Fallout, ZombieLoad, disclosed May 2019) affect Intel x86_64 CPUs and have a specific OpenStack mitigation path.

With cpu_mode=host-model (the default when virt_type=kvm or virt_type=qemu), the md-clear CPU flag passes through to guests automatically. The same applies to cpu_mode=host-passthrough.

With cpu_mode=custom, you must explicitly add md-clear along with flags for prior vulnerabilities. The Nova docs example:

[libvirt]
cpu_mode = custom
cpu_models = IvyBridge
cpu_model_extra_flags = spec-ctrl,ssbd,md-clear

After updating all vulnerable compute nodes, running guests must be fully powered down and cold-booted (an explicit stop followed by a start) to activate the new CPU model. A live migration is not sufficient. Validate the mitigation is active on the host:

cat /sys/devices/system/cpu/vulnerabilities/mds

The output "SMT vulnerable" in the response means Hyper-Threading may still expose you depending on workload. For multi-tenant deployments running untrusted workloads, review whether disabling Hyper-Threading is warranted.

Barbican

The Simple Crypto Plugin Is Not Production-Safe

The default Barbican configuration uses the simple crypto plugin, which encrypts all secrets with a single symmetric key stored in plaintext in barbican.conf. A single compromised key exposes every secret for every tenant in the database, and key rotation requires re-encrypting all stored secrets. The simple crypto plugin is appropriate for development only.

PKCS#11 and the MKEK Key Hierarchy

The production configuration for HSM-backed deployments uses a three-tier key hierarchy: a Master KEK (MKEK) stored in and never extracted from the HSM, per-project wrapped KEKs stored encrypted in the Barbican database, and per-secret encrypted blobs also stored in the database.

The MKEK never leaves the HSM. All wrapping and unwrapping operations for project KEKs happen within the HSM's memory. Different tenants have different per-project KEKs, so a compromised pKEK for one tenant does not expose others. When the MKEK is rotated, only the project KEKs need to be re-wrapped — not every stored secret.

The MKEK model also solves an HSM capacity problem. The naive approach creates one KEK per project directly on the HSM. HSMs have limited storage and in a multi-tenant deployment this will eventually fail to create KEKs for new projects. The MKEK model keeps a minimum number of keys on the HSM while still maintaining per-project encryption.

MKEK Rotation Procedure

Rotation uses barbican-manage. Perform steps in order:

Generate a new MKEK on the HSM:

barbican-manage hsm gen_mkek \
--library-path /path/to/pkcs11.so \
--passphrase <hsm-pin> \
--slot-id 1 \
--label <unique-mkek-label> \
--length 32

Generate a new HMAC key:

barbican-manage hsm gen_hmac \
--library-path /path/to/pkcs11.so \
--passphrase <hsm-pin> \
--slot-id 1 \
--label <unique-hmac-label> \
--length 32

Update /etc/barbican/barbican.conf with the new labels and restart Barbican.
Rewrap all project KEKs:

barbican-manage hsm rewrap_pkek

The upstream docs note that both the new MKEK and HMAC key must already be generated, their labels set in barbican.conf, and Barbican restarted before running rewrap_pkek. The --dry-run flag is available to preview the operation without committing changes.

Cinder LUKS Volume Encryption

Creating an encrypted volume type, as shown in current upstream Cinder documentation:

openstack volume type create \
--encryption-provider luks \
--encryption-cipher aes-xts-plain64 \
--encryption-key-size 256 \
--encryption-control-location front-end LUKS

The Cinder docs are explicit about access control: non-admin users need the creator role to store secrets in Barbican and to create encrypted volumes. Grant it:

openstack role add --project PROJECT --user USER creator

If migrating from the legacy ConfKeyManager (fixed key stored in configuration files), do not remove the fixed_key value from nova.conf and cinder.conf until you have verified no volumes still depend on it — volumes encrypted with the fixed key will become inaccessible if it is removed prematurely.

Atmosphere-Specific Considerations

Ingress Controller: Patched CVEs and Active EOL

Atmosphere upgraded the nginx ingress controller from 1.10.1 to 1.12.1 to address CVE-2025-1097, CVE-2025-1098, CVE-2025-1974, CVE-2025-24513, and CVE-2025-24514. These CVEs were patched in both v1.11.5 and v1.12.1. If you are running any version below 1.11.5, upgrading to a patched version is an immediate priority but this should be treated as short-term risk reduction only.

The Kubernetes ingress-nginx controller reached end-of-life on March 24, 2026. The repository is now read-only. There will be no new features, no bug fixes, and no further CVE patches. The Ingress API itself (networking.k8s.io/v1) is not deprecated only this controller implementation. EOL software in the L7 data path is an automatic finding in SOC 2, PCI-DSS, ISO 27001, and HIPAA audits. Migration to a supported ingress controller is the required long-term remediation.

Container Image Provenance

All container images in Atmosphere now use external repositories with independent versioning per component. Images for OVN, Open vSwitch, libvirt, and all OpenStack services have dedicated repositories with independent release cycles. This removes image build infrastructure from Atmosphere itself and allows precise tracking of which version of each component is running.

Monitoring Ships by Default

Prometheus monitoring, Grafana dashboards, log aggregation, and vulnerability scanning are included in every Atmosphere deployment. Security observability is built in, not a separate configuration step.

Where to Start

For teams auditing an existing deployment, the priority order based on blast radius:

Confirm the Admin Token is disabled in keystone-paste.ini.
Verify Fernet key distribution happens before rotation on multi-node deployments.
Confirm service accounts have ignore_lockout_failure_attempts = True.
Verify Barbican is not using the simple crypto plugin for production secrets.
Check that users creating encrypted volumes have the creator role in Barbican.
Review default egress security group rules.
Check conntrack table utilization on high-throughput compute nodes.
If live migration is in use, verify QEMU-native TLS is configured.
For Intel-based deployments with custom CPU mode, confirm spec-ctrl,ssbd,md-clear are in cpu_model_extra_flags.

VEXXHOST has been contributing to the OpenStack community since 2011 and operates production clouds across the OpenStack, Kubernetes, and Ceph ecosystems. We build Atmosphere, an open-source private cloud platform. We are a Gold Member of the OpenInfra Foundation. If you are working through a security review of your OpenStack deployment, we are happy to discuss what we have seen and what has worked.

OpenStack security is a set of decisions distributed across Keystone, Neutron, Nova, and Barbican. What to change, why it matters, and what breaks if you skip it.

Keystone

Disable the Admin Token Before Anything Else

The Admin Token is a shared secret used to bootstrap Keystone. It carries no user context and no scope. It grants unrestricted access to your entire Keystone deployment to anyone who has it.