Linux 7.0 Released

Linux v7.0 was released a few hours ago on Sunday, April 12th. Unfortunately, due to my personal schedule and some other challenges, I never had a chance to write up my usual summary of the LSM, SELinux, and audit highlights from the v7.0 merge window; thankfully LWN.net did their usual good job of summarizing both the first and second weeks of the merge window.

Below is a list of the LSM, SELinux, and audit highlights from the Linux v7.0 merge window through to the tagged release from Linus.

LSM

  • Unified the security_inode_listsecurity() calls in NFSv4. While looking at the security_inode_listsecurity() LSM hook with the goal of improving the API, we realized that the NFSv4 code was making multiple calls to the LSM hook that could be consolidated into one. While this change improves the quality of the NFSv4 code, it will also enable additional future work to improve the LSM API.

  • Move from kmalloc() to kmalloc_obj() and kmalloc_flex() as part of a tree-wide conversion that is described in this article from LWN.net.

  • Resolved a number of Sparse warnings caused by the LSM static branch keys not being marked as static.

  • Added __rust_helper annotations to the LSM and credential Rust wrapper functions.

  • Removed the unsused set_security_override_from_ctx() function.

  • Minor improvements to the LSM hook kdoc comment blocks.

SELinux

  • Added support for applying SELinux policy to BPF tokens. This involves the addition of two new permissions to the bpf object class, map_create_as and prog_load_as, as well as a new policy capability, bpf_token_perms, to enable the new functionality. The patch author, Eric Suen, describes the change in his patch description:

    This patch adds SELinux support for controlling BPF token access. With this change, SELinux policies can now enforce constraints on BPF token usage based on both the delegating (privileged) process and the recipient (unprivileged) process.

    Supported operations currently include:

    • map_create
    • prog_load

    High-level workflow:

    1. An unprivileged process creates a VFS context via fsopen() and obtains a file descriptor.
    2. This descriptor is passed to a privileged process, which configures BPF token delegation options and mounts a BPF filesystem.
    3. SELinux records the creator_sid of the privileged process during mount setup.
    4. The unprivileged process then uses this BPF fs mount to create a token and attach it to subsequent BPF syscalls.
    5. During verification of map_create and prog_load, SELinux uses creator_sid and the current SID to check policy permissions via:
      avc_has_perm(creator_sid, current_sid, SECCLASS_BPF,
              BPF__MAP_CREATE, NULL);
      

    The implementation introduces two new permissions:

    • map_create_as
    • prog_load_as

    At token creation time, SELinux verifies that the current process has the appropriate *_as permission (depending on the allowed_cmds value in the bpf_token) to act on behalf of the creator_sid.

    Example SELinux policy:

    allow test_bpf_t self:bpf {
      map_create map_read map_write prog_load prog_run
      map_create_as prog_load_as
    };
    

    Additionally, a new policy capability bpf_token_perms is added to ensure backward compatibility. If disabled, previous behavior (checks based on current process SID) is preserved.

  • As described earlier in this post, convert a number of kmalloc() calls to kmalloc_obj() and kmalloc_flex() as part of a larger tree-wide conversion.

  • Removed a BUG() macro call that was no longer necessary as the error condition is now checked at kernel build time.

Audit

  • Add source and destination port information to the NETFILTER_PKT audit records while consolidating much of the netfilter packet audit code into a new function which can be easily disabled when audit is not enabled at kernel build time. These changes should not only improve the usefulness of the audit logs through network port information, it should also provide a minor performance boost for systems built without audit.

  • Update the audit syscall classifier code to include the listxattrat(), getxattrat(), and fchmodat2() syscalls.

  • As described earlier in this post, convert a number of kmalloc() calls to kmalloc_obj() and kmalloc_flex() as part of a larger tree-wide conversion.

  • A number of small, internal changes to how audit tracks and records pathnames brought about by some related work in the VFS subsystem. There should be no user visible changes.

  • Move a handful of declarations in the code to resolve a number of Sparse warnings.

Linux 6.19 Released

With Linux v6.19 being released on Sunday, February 8th, this post is much later than usual. However, there were only a few small LSM and SELinux changes beyond what was mentioned in my post about the merge window changes, the highlights are below.

LSM

  • The LSM initialization rework merged during the Linux v6.19 merge window introduced a regression causing the procfs file “/proc/sys/vm/mmap_min_addr” to disppear when CONFIG_SECURITY was not enabled at compile time. The final release of Linux v6.19 fixes this problem by ensuring that the “/proc/sys/vm/mmap_min_addr” is present regardless of the CONFIG_SECURITY configuration.

  • There were a number of small changes made to securityfs as part of a much larger VFS effort. These changes were focused on implementation improvements and should no result in any user visible changes.

SELinux

  • Much like the securityfs chanes mentioned above, there were a number of similar, small changes made to selinuxfs. Once again, none of these changes should be visible to users.

Linux 6.19 Merge Window

Linux v6.18 was released on Sunday, November 30th, with the Linux v6.19 merge window opening immediately afterwards. Below are the highlights of the LSM, SELinux, and audit pull requests which have been merged into Linus’ tree.

LSM

  • The LSM initialization code was heavily reworked to improve code quality, avoid unnecessary work related to LSMs that are disabled at boot time, and provide support for a LSM notification that indicates that all enabled LSMs have been fully initialized. The LSM_STARTED_ALL notification is currently unused, but work is in progress which makes use of this notification to measure the IPE boot policy once all of the LSMs have been fully initialized and started.

  • The device_cgroup code was updated to make better use of the seq_put*() helper functions. This is purely a code quality improvement, there should be no visible user impact.

SELinux

  • Traditionally memfd files were labeled as either tmpfs or hugetlbfs files depending on the system’s configuration. While this was simple, and aligned well with the memfd implementation, it made it difficult to differentiate between memfd files and other tmpfs/hugetlbfs files. In order to resolve this a new policy capability was created, “memfd_class”, which, when enabled, adds a new object class for memfd files, memfd_file. The new object class enables policy developers to write policy specifically for memfd files without impacting other tmpfs or hugetlbfs files. As the patch developer, Thiébaud Weksteen, pointed out in the commit description, this is of particular interest when execution of memfds are attempted:

    The ability to limit fexecve on memfd has been of interest to avoid potential pitfalls where /proc/self/exe or similar would be executed (see ChromeOS Issue and memfd exec protections). Reuse the “execute_no_trans” and “entrypoint” access vectors, similarly to the file class. These access vectors may not make sense for the existing “anon_inode” class. Therefore, define and assign a new class “memfd_file” to support such access vectors.

  • A new build time configuration has been introduced, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS, which allows adjustment of the SELinux Access Vector Cache (AVC) hash bucket sizes. The default value is set to 9 bits, resulting in 512 entries for each bucket. Users with unusual workloads or non-typical SELinux policies may want to experiment with this value.

  • The SELinux Access Vector Cache (AVC) moved from a custom hash function to the MurmurHash3 hash, resulting in improvements in hash distribution and latency.

Audit

  • The __audit_inode_child() function loops over the list of logged inodes twice, first to search for a parent inode, and then again to search for a potential match for the child inode. Linux v6.19 will consolidate these two loops into a single loop that searches for a matching parent and child inode at the same time, resulting in approximately a 50% reduction in audit overhead.