Memory safety guideline
Motivation๐
Tempesta FW, as well as some other our clients’ projects, is a mission critical software, working on the Internet edges. Security and reliability of the software is the key property.
The CISA Product Security Bad Practices demands a memory safety roadmap. The Case for Memory Safe Roadmaps suggests particular steps and technologies to make C and C++ code safe(-er). This document provides safe and security code guidelines for C and C++.
This guideline is mandatory for all projects that process untrusted user inputs. Tempesta FW or TLS are examples of such code. Tempesta DB or user-space logger aren’t.
This page is incomplete and is supposed to be extended and/or fixed in almost all sections.
General practices๐
This section describes common for Linux kernel C as well as C++ programming practices.
Use address sanitizers๐
Use KASAN for the Linux kernel or
Clang AddressSanitizer for
C or C++ user-space code. The Clang address sanitizer doesn’t reveal as many
issues as valgrind.
The address sanitizers, especially valgrind, imply significant performance overhead,
so they must be used with long running tests suites, not with resulting production code.
Use static analyzers๐
Static analyzers also must be integrated with CI. We have good experience with Coverity scan, which is free for open source projects, and cppcheck. Clang static analyzer misses too many code problems.
Code coverage๐
gcov(1) can be used for the Linux kernel and user-space to measure the code coverage by tests.
The Case for Memory Safe Roadmaps recommends 80% coverage, but the absolute value doesn’t mean much in practice. If the coverage is 80%, then we analyze the 20% not covered code and it may contain only trivial code (e.g. wrappers). Form the other hand, if we have coverage like 95%, then the rest 5% may container quite crucial piece of code.
Fuzzing๐
We aim deterministic fuzzing, which mutate, probably infinitely, some data corpus. The mutations are non-random and obey particular rules to not to be rejected by tested system early and on the same code check. For example, if we generated HTTP requests randomly, then the tested HTTP server would reject the most of the requests on the first method character since only a small set of strings is allowed as a HTTP method.
Also if a deterministic fuzzer discovers a problem, it can be ran again to reproduce the problem.
Use assertions only if a disaster is inevitable๐
The Linux kernel patch verification script
shows warnings for assertions (BUG_ON() and family in the kernel).
The problem with assertions is that the crash program.
Assertions should be used only if a program crash or similar disaster is inevitable.
A good example could be:
assert(p != NULL);
f(p->foo);
If p is NULL, then the program is going to crash anyway. But if we crash early,
then we know we crash happens and we avoid possible security exploitations of the
code flaw.
Form the other hand a bad example is
assert(a > b);
f(a);
The relation between a and b doesn’t necessary lead to a crash or data leakage.
If it does in f(), then the assertion must be close to the code, which crashes.
Also f() and the calling code may change in such a way that a <= b is a valid
condition.
C++๐
There are guidelines for C++ safe programming and C++ provides a lot for tools for safer programming in comparison with plain C.
Use new standards๐
C++ is getting better in terms of safety and security, so newer standards introduce safety features, which should be employed in our code.
Follow the C++ Guidelines๐
There are at least two notable C++ guidelines:
The guidelines provide many rules accompanied by examples of safe and unsafe code. If you do C++ edge server development, then you must read and follow the guidelines.
Consider some of the rules briefly and in more details in the following sections:
- Avoid bound errors
- By default use
const. In Rust all variables are constant by default andmutkeyword is used to declare a variable as mutable. With this rule, we aim to achieve the same level of control over unwanted memory changes.
Avoid raw memory operations๐
Typically a data plane (e.g. network packets processing) code is performance crucial, so we do use custom memory allocators, which require raw memory operations.
For such cases Rust programs must use unsafe blocks, which is equal to default C++ mode. The "default" C++ is fast, but unsafe (see Herb Sutter’s keynote).
Wherever, performance isn’t crucial, at least in control plane, such as configuration processing, safe, yet slower, C++ techniques must be used.
For example this unsafe C-like code:
char buf[1024];
unsigned size = sizeof(buf) - SOME_CONSTANT;
buf[size] = '\0';
read_json_config(buf, size);
Should be replaces with safer:
constexpr auto size = 1024;
std::array<char, size> buf = { 0 };
read_json_config(buf, size - SOME_CONSTANT);
The one problem with the original code is that it involves address arithmetics,
which is easy to make a mistake in. Another problem is that it leaves the areas
of uninitialized memory: if the JSON document is shorter than size, then there
could be uninitialized data between the end of read string and written \0.
Use hardened libc++ and compiler options๐
To enforce avoidance of using raw (and unsafe) pointers, use Clang++ Safe Buffers and hardened libc++. The hardened libc++ should be used in the fast mode for production builds and there should be a CI job for the build with debug mode.
Use std::unique_ptr instead of raw pointers๐
Wherever you use * for a raw pointer, make sure that you can’t use std::unique_ptr
or references &. In general, for non-performance crucial code and the code, which
doesn’t need to work with raw memory, use std::unique_ptr or std::shared_ptr. E.g.
instead of
tasks[i].client = new Client(foo);
use
tasks[i].client = std::make_unique<Client>(foo);
Also read C++ Core Guidelines: R.3: A raw pointer is non-owning.
Avoid C-style arrays๐
For example, instead of char buf[100] use std::string, std::array or std::vector.
If you still need a C-style array, use std::span or std::string_view to safely
work with it’s length. Consider an example serialization function (inspired by
the blog post
and C++ Core Guidelines: Catch run-time errors early):
void
serialize(const char *str, size_t len)
{
std::cout << len << ": ";
for (auto i = 0; i < len; ++i)
std::cout << str[i] << " ";
std::cout << std::endl;
}
You can call the function as
char str[] = {'a', 'b', 'c'};
serialize(str, sizeof(str));
If you define str as a C-string, then you need to adjust the len computation:
char *str = "abc";
serialize(str, sizeof(str) - 1);
Next, if you change the type to int, then you need other len computation:
int str[] = {'a', 'b', 'c'};
serialize(str, sizeof(str) / sizeof(str[0]));
The point is that it’s easy to make a bug in length computation.
C++ STL provides span and string_view to safely pass C strings and arrays
with correct length computation:
void
serialize(std::span<char> array)
{
std::cout << array.size() << ": ";
for (const auto c: array)
std::cout << c << " ";
std::cout << std::endl;
}
void
print(std::string_view str)
{
std::cout << str.size() << ": " << str << std::endl;
}
int
main()
{
char array[] = {'a', 'b', 'c'};
serialize(array);
const char *str = "abc";
print(str);
return 0;
}
Or, better, use std::array and std::string (note that serialize() and
print() aren’t changed and work just the same way):
std::array<char, 3> a{'a', 'b', 'c'};
std::string s("abc");
serialize(a);
print(s);
Also reference C++ Core Guidelines: Prefer using STL array or vector instead of a C array for this rule.
Access containers with bounds checking๐
Prefer access to containers with bounds checking, e.g prefer std::vector::at() to
std::vector::operator[].
Restrict the code๐
Use const and noexcept specifiers wherever possible. This makes the code easier
to review, faster and allows compiler and static analyzers to do their work better.
References๐
- The Case for Memory Safe Roadmaps
- C++ Core Guidelines
- Delivering Safe C++ – Bjarne Stroustrup – CppCon 2023
- "C++ safety, in context", Herb Sutter
- Herb Sutter: Safety, Security, Safety and C / C++ – C++ Evolution, ACCU 2024
- Jim Radigan: -memory-safe C++, CppCon 2022