[RFC PATCH] all: RFC - add support for ebpf

162 views
Skip to first unread message

Tom Hromatka

unread,
Feb 7, 2018, 4:27:39 PM2/7/18
to libse...@googlegroups.com, tom.hr...@oracle.com, tom.hr...@gmail.com, dhaval...@oracle.com
This patch adds initial support for EBPF. The hooks are in place
for full EBPF support, but this patch only adds support for EBPF
hash maps.

For large, simple seccomp filters (i.e. filters that only filter on
syscalls - not arguments), a hash map can significantly speed up
seccomp processing time.

Signed-off-by: Tom Hromatka <tom.hr...@oracle.com>
---
include/seccomp.h.in | 31 ++++++++
src/Makefile.am | 2 +-
src/api.c | 36 ++++++++-
src/arch.c | 24 +++++-
src/arch.h | 3 +
src/db.c | 208 +++++++++++++++++++++++++++++++++++++-----------
src/db.h | 25 +++++-
src/gen_ebpf.c | 206 ++++++++++++++++++++++++++++++++++++++++++++++++
src/gen_ebpf.h | 38 +++++++++
src/gen_pfc.c | 62 +++++++++++++++
src/libbpf.h | 197 +++++++++++++++++++++++++++++++++++++++++++++
src/system.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++---
src/system.h | 2 +-
13 files changed, 988 insertions(+), 65 deletions(-)
create mode 100644 src/gen_ebpf.c
create mode 100644 src/gen_ebpf.h
create mode 100644 src/libbpf.h

diff --git a/include/seccomp.h.in b/include/seccomp.h.in
index 56ae73e..bf66765 100644
--- a/include/seccomp.h.in
+++ b/include/seccomp.h.in
@@ -65,6 +65,7 @@ enum scmp_filter_attr {
SCMP_FLTATR_CTL_TSYNC = 4, /**< sync threads on filter load */
SCMP_FLTATR_API_TSKIP = 5, /**< allow rules with a -1 syscall */
SCMP_FLTATR_CTL_LOG = 6, /**< log not-allowed actions */
+ SCMP_FLTATR_USE_EBPF = 7, /**< generate an EBPF filter */
_SCMP_FLTATR_MAX,
};

@@ -542,6 +543,20 @@ int seccomp_syscall_priority(scmp_filter_ctx ctx,
int seccomp_rule_add(scmp_filter_ctx ctx,
uint32_t action, int syscall, unsigned int arg_cnt, ...);

+/**
+ * Add a new rule to the EBPF map
+ * @param ctx the filter context
+ * @param action the filter action
+ * @param syscall the syscall number
+ *
+ * This function adds a syscall-action mapping to the EBPF map. Unlike
+ * seccomp_rule_add(), this rule does not support advanced arguments.
+ * For simple seccomp filters that don't use filter arguments, an EBPF map
+ * could be significantly faster than seccomp_rule_add().
+ *
+ */
+int seccomp_map_rule_add(scmp_filter_ctx ctx,
+ uint32_t action, int syscall);

/**
* Add a new rule to the filter
@@ -602,6 +617,22 @@ int seccomp_rule_add_exact_array(scmp_filter_ctx ctx,
const struct scmp_arg_cmp *arg_array);

/**
+ * Add an EBPF map to the filter
+ * @param ctx the filter context
+ * @param max_entries the maximum number of entries in the map
+ * @note this option can only be used with EBPF seccomp filters
+ *
+ * This function adds an EBPF map to the filter. The map will be created
+ * when seccomp_load() is called. When userspace calls a syscall, the
+ * seccomp filter will perform an EBPF map lookup first. If the syscall
+ * has a corresponding action in the map, that action will be invoked. If
+ * not, then seccomp will continue to process the filter until a match is
+ * found.
+ *
+ */
+int seccomp_add_map(scmp_filter_ctx ctx, int max_entries);
+
+/**
* Generate seccomp Pseudo Filter Code (PFC) and export it to a file
* @param ctx the filter context
* @param fd the destination fd
diff --git a/src/Makefile.am b/src/Makefile.am
index 2e7e38d..8290e73 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -27,7 +27,7 @@ endif

SOURCES_ALL = \
api.c system.h system.c helper.h helper.c \
- gen_pfc.h gen_pfc.c gen_bpf.h gen_bpf.c \
+ gen_pfc.h gen_pfc.c gen_bpf.h gen_bpf.c gen_ebpf.h gen_ebpf.c \
hash.h hash.c \
db.h db.c \
arch.c arch.h \
diff --git a/src/api.c b/src/api.c
index e1a5e8b..845787c 100644
--- a/src/api.c
+++ b/src/api.c
@@ -234,7 +234,11 @@ API int seccomp_arch_exist(const scmp_filter_ctx ctx,
if (arch_valid(arch_token))
return -EINVAL;

- return (db_col_arch_exist(col, arch_token) ? 0 : -EEXIST);
+ /* col->filters and col->map_filters should contain the same arch
+ * information. Therefore we only need to check one or the other
+ */
+ return (db_col_arch_exist(col, col->filters, col->filter_cnt,
+ arch_token) ? 0 : -EEXIST);
}

/* NOTE - function header comment in include/seccomp.h */
@@ -249,7 +253,10 @@ API int seccomp_arch_add(scmp_filter_ctx ctx, uint32_t arch_token)
arch = arch_def_lookup(arch_token);
if (arch == NULL)
return -EINVAL;
- if (db_col_arch_exist(col, arch_token))
+ /* col->filters and col->map_filters should contain the same arch
+ * information. Therefore we only need to check one or the other
+ */
+ if (db_col_arch_exist(col, col->filters, col->filter_cnt, arch_token))
return -EEXIST;

return db_col_db_new(col, arch);
@@ -265,7 +272,8 @@ API int seccomp_arch_remove(scmp_filter_ctx ctx, uint32_t arch_token)

if (arch_valid(arch_token))
return -EINVAL;
- if (db_col_arch_exist(col, arch_token) != -EEXIST)
+ if (db_col_arch_exist(col, col->filters, col->filter_cnt,
+ arch_token) != -EEXIST)
return -EEXIST;

return db_col_db_remove(col, arch_token);
@@ -442,6 +450,21 @@ API int seccomp_rule_add(scmp_filter_ctx ctx,
return rc;
}

+API int seccomp_map_rule_add(scmp_filter_ctx ctx,
+ uint32_t action, int syscall)
+{
+ int rc;
+ struct db_filter_col *col = (struct db_filter_col *)ctx;
+
+ rc = db_action_valid(action);
+ if (rc < 0)
+ return rc;
+ if (action == col->attr.act_default)
+ return -EPERM;
+
+ return db_col_map_rule_add(col, 0, action, syscall);
+}
+
/* NOTE - function header comment in include/seccomp.h */
API int seccomp_rule_add_exact_array(scmp_filter_ctx ctx,
uint32_t action, int syscall,
@@ -495,6 +518,13 @@ API int seccomp_rule_add_exact(scmp_filter_ctx ctx,
return rc;
}

+API int seccomp_add_map(scmp_filter_ctx ctx, int max_entries)
+{
+ struct db_filter_col *col = (struct db_filter_col *)ctx;
+
+ return db_col_map_add(col, max_entries);
+}
+
/* NOTE - function header comment in include/seccomp.h */
API int seccomp_export_pfc(const scmp_filter_ctx ctx, int fd)
{
diff --git a/src/arch.c b/src/arch.c
index 21829c7..d76f179 100644
--- a/src/arch.c
+++ b/src/arch.c
@@ -419,7 +419,8 @@ int arch_filter_rule_add(struct db_filter_col *col, struct db_filter *db,
return -ENOMEM;
rule->action = action;
rule->syscall = syscall;
- memcpy(rule->args, chain, sizeof(*chain) * ARG_COUNT_MAX);
+ if (chain != NULL)
+ memcpy(rule->args, chain, sizeof(*chain) * ARG_COUNT_MAX);
rule->prev = NULL;
rule->next = NULL;

@@ -459,5 +460,26 @@ rule_add_failure:
rule = rule->next;
free(rule_tail);
} while (rule);
+
+ return rc;
+}
+
+int arch_filter_map_add(struct db_filter_col *col, struct db_filter *db,
+ int max_entries)
+{
+ int rc = 0;
+ struct db_api_map_create *map;
+
+ if (db->map != NULL)
+ /* only one map is allowed per arch */
+ return -EINVAL;
+
+ map = malloc(sizeof(*map));
+ if (map == NULL)
+ return -ENOMEM;
+
+ map->max_entries = max_entries;
+ db->map = map;
+
return rc;
}
diff --git a/src/arch.h b/src/arch.h
index 63f0d7f..3e4344b 100644
--- a/src/arch.h
+++ b/src/arch.h
@@ -91,4 +91,7 @@ int arch_filter_rule_add(struct db_filter_col *col, struct db_filter *db,
bool strict, uint32_t action, int syscall,
struct db_api_arg *chain);

+int arch_filter_map_add(struct db_filter_col *col, struct db_filter *db,
+ int max_entries);
+
#endif
diff --git a/src/db.c b/src/db.c
index dcd5f65..0b60edc 100644
--- a/src/db.c
+++ b/src/db.c
@@ -544,8 +544,9 @@ struct db_api_rule_list *db_rule_dup(const struct db_api_rule_list *src)
*/
int db_col_reset(struct db_filter_col *col, uint32_t def_action)
{
+ int rc;
unsigned int iter;
- struct db_filter *db;
+ struct db_filter *db = NULL, *db_map = NULL;
struct db_filter_snap *snap;

if (col == NULL)
@@ -559,6 +560,14 @@ int db_col_reset(struct db_filter_col *col, uint32_t def_action)
free(col->filters);
col->filters = NULL;

+ /* free any map filters */
+ for (iter = 0; iter < col->map_filter_cnt; iter++)
+ _db_release(col->map_filters[iter]);
+ col->map_filter_cnt = 0;
+ if (col->map_filters)
+ free(col->map_filters);
+ col->map_filters = NULL;
+
/* set the endianess to undefined */
col->endian = 0;

@@ -574,11 +583,25 @@ int db_col_reset(struct db_filter_col *col, uint32_t def_action)

/* reset the initial db */
db = _db_init(arch_def_native);
- if (db == NULL)
- return -ENOMEM;
- if (db_col_db_add(col, db) < 0) {
- _db_release(db);
- return -ENOMEM;
+ if (db == NULL) {
+ rc = -ENOMEM;
+ goto error;
+ }
+ if (db_col_db_add(col, &col->filters, &col->filter_cnt, db) < 0) {
+ rc = -ENOMEM;
+ goto error;
+ }
+
+ /* reset the initial map db */
+ db_map = _db_init(arch_def_native);
+ if (db_map == NULL) {
+ rc = -ENOMEM;
+ goto error;
+ }
+ if (db_col_db_add(col, &col->map_filters,
+ &col->map_filter_cnt, db_map) < 0) {
+ rc = -ENOMEM;
+ goto error;
}

/* reset the transactions */
@@ -592,6 +615,13 @@ int db_col_reset(struct db_filter_col *col, uint32_t def_action)
}

return 0;
+error:
+ if (db)
+ _db_release(db);
+ if (db_map)
+ _db_release(db_map);
+
+ return rc;
}

/**
@@ -647,6 +677,14 @@ void db_col_release(struct db_filter_col *col)
free(col->filters);
col->filters = NULL;

+ /* free any map filters */
+ for (iter = 0; iter < col->map_filter_cnt; iter++)
+ _db_release(col->map_filters[iter]);
+ col->map_filter_cnt = 0;
+ if (col->map_filters)
+ free(col->map_filters);
+ col->map_filters = NULL;
+
/* free the collection */
free(col);
}
@@ -740,12 +778,13 @@ int db_col_merge(struct db_filter_col *col_dst, struct db_filter_col *col_src)
* zero if a matching filter does not exist.
*
*/
-int db_col_arch_exist(struct db_filter_col *col, uint32_t arch_token)
+int db_col_arch_exist(struct db_filter_col *col, struct db_filter **db,
+ unsigned int db_cnt, int32_t arch_token)
{
unsigned int iter;

- for (iter = 0; iter < col->filter_cnt; iter++)
- if (col->filters[iter]->arch->token == arch_token)
+ for (iter = 0; iter < db_cnt; iter++)
+ if (db[iter]->arch->token == arch_token)
return -EEXIST;

return 0;
@@ -846,6 +885,9 @@ int db_col_attr_set(struct db_filter_col *col,
rc = -EOPNOTSUPP;
}
break;
+ case SCMP_FLTATR_USE_EBPF:
+ col->attr.use_ebpf = (value ? 1 : 0);
+ break;
default:
rc = -EEXIST;
break;
@@ -854,6 +896,23 @@ int db_col_attr_set(struct db_filter_col *col,
return rc;
}

+static int _db_col_db_new(struct db_filter_col *col, struct db_filter ***db,
+ unsigned int *db_cnt, const struct arch_def *arch)
+{
+ int rc;
+ struct db_filter *db_new;
+
+ db_new = _db_init(arch);
+ if (db_new == NULL)
+ return -ENOMEM;
+
+ rc = db_col_db_add(col, db, db_cnt, db_new);
+ if (rc < 0)
+ _db_release(db_new);
+
+ return rc;
+}
+
/**
* Add a new architecture filter to a filter collection
* @param col the seccomp filter collection
@@ -867,14 +926,15 @@ int db_col_attr_set(struct db_filter_col *col,
int db_col_db_new(struct db_filter_col *col, const struct arch_def *arch)
{
int rc;
- struct db_filter *db;

- db = _db_init(arch);
- if (db == NULL)
- return -ENOMEM;
- rc = db_col_db_add(col, db);
+ rc = _db_col_db_new(col, &col->filters, &col->filter_cnt, arch);
if (rc < 0)
- _db_release(db);
+ return rc;
+
+ rc = _db_col_db_new(col, &col->map_filters, &col->map_filter_cnt,
+ arch);
+ if (rc < 0)
+ return rc;

return rc;
}
@@ -889,69 +949,65 @@ int db_col_db_new(struct db_filter_col *col, const struct arch_def *arch)
* same architecture. Returns zero on success, negative values on failure.
*
*/
-int db_col_db_add(struct db_filter_col *col, struct db_filter *db)
+int db_col_db_add(struct db_filter_col *col, struct db_filter ***db,
+ unsigned int *db_cnt, struct db_filter *db_arch)
{
struct db_filter **dbs;

- if (col->endian != 0 && col->endian != db->arch->endian)
+ if (col->endian != 0 && col->endian != db_arch->arch->endian)
return -EEXIST;

- if (db_col_arch_exist(col, db->arch->token))
+ if (db_col_arch_exist(col, *db, *db_cnt, db_arch->arch->token))
return -EEXIST;

- dbs = realloc(col->filters,
- sizeof(struct db_filter *) * (col->filter_cnt + 1));
+ dbs = realloc(*db, sizeof(struct db_filter *) * (*db_cnt + 1));
if (dbs == NULL)
return -ENOMEM;
- col->filters = dbs;
- col->filter_cnt++;
- col->filters[col->filter_cnt - 1] = db;
+
+ *db = dbs;
+ (*db_cnt)++;
+ (*db)[*db_cnt - 1] = db_arch;
+
if (col->endian == 0)
- col->endian = db->arch->endian;
+ col->endian = db_arch->arch->endian;

return 0;
}

-/**
- * Remove a filter DB from a filter collection
- * @param col the seccomp filter collection
- * @param arch_token the architecture token
- *
- * This function removes an existing seccomp filter DB from an existing seccomp
- * filter collection. Returns zero on success, negative values on failure.
- *
- */
-int db_col_db_remove(struct db_filter_col *col, uint32_t arch_token)
+static int _db_col_db_remove(struct db_filter_col *col, struct db_filter ***db,
+ unsigned int *db_cnt, uint32_t arch_token)
{
unsigned int iter;
unsigned int found;
struct db_filter **dbs;

- if ((col->filter_cnt <= 0) || (db_col_arch_exist(col, arch_token) == 0))
+ if ((*db_cnt <= 0) ||
+ (db_col_arch_exist(col, *db, *db_cnt,
+ arch_token) == 0))
return -EINVAL;

- for (found = 0, iter = 0; iter < col->filter_cnt; iter++) {
+ for (found = 0, iter = 0; iter < *db_cnt; iter++) {
if (found)
- col->filters[iter - 1] = col->filters[iter];
- else if (col->filters[iter]->arch->token == arch_token) {
- _db_release(col->filters[iter]);
+ (*db)[iter - 1] = (*db)[iter];
+ else if ((*db)[iter]->arch->token == arch_token) {
+ _db_release((*db)[iter]);
found = 1;
}
}
- col->filters[--col->filter_cnt] = NULL;
+ (*db)[--(*db_cnt)] = NULL;

- if (col->filter_cnt > 0) {
+ if (*db_cnt > 0) {
/* NOTE: if we can't do the realloc it isn't fatal, we just
* have some extra space allocated */
- dbs = realloc(col->filters,
- sizeof(struct db_filter *) * col->filter_cnt);
+ dbs = realloc(*db,
+ sizeof(struct db_filter *) * (*db_cnt));
if (dbs != NULL)
- col->filters = dbs;
+ *db = dbs;
} else {
/* this was the last filter so free all the associated memory
* and reset the endian token */
- free(col->filters);
- col->filters = NULL;
+ free(*db);
+ *db = NULL;
col->endian = 0;
}

@@ -959,6 +1015,32 @@ int db_col_db_remove(struct db_filter_col *col, uint32_t arch_token)
}

/**
+ * Remove a filter DB from a filter collection
+ * @param col the seccomp filter collection
+ * @param arch_token the architecture token
+ *
+ * This function removes an existing seccomp filter DB from an existing seccomp
+ * filter collection. Returns zero on success, negative values on failure.
+ *
+ */
+int db_col_db_remove(struct db_filter_col *col, uint32_t arch_token)
+{
+ int rc;
+
+ rc = _db_col_db_remove(col, &col->filters, &col->filter_cnt,
+ arch_token);
+ if (rc < 0)
+ return rc;
+
+ rc = _db_col_db_remove(col, &col->map_filters, &col->map_filter_cnt,
+ arch_token);
+ if (rc < 0)
+ return rc;
+
+ return rc;
+}
+
+/**
* Test if the argument filter can be skipped because it's a tautology
* @param arg argument filter
*
@@ -1639,6 +1721,22 @@ add_return:
return rc;
}

+int db_col_map_rule_add(struct db_filter_col *col,
+ bool strict, uint32_t action, int syscall)
+{
+ int rc = 0, rc_tmp;
+ unsigned int iter;
+
+ for (iter = 0; iter < col->map_filter_cnt; iter++) {
+ rc_tmp = arch_filter_rule_add(col, col->map_filters[iter],
+ strict, action, syscall, NULL);
+ if (rc == 0 && rc_tmp < 0)
+ rc = rc_tmp;
+ }
+
+ return rc;
+}
+
/**
* Start a new seccomp filter transaction
* @param col the filter collection
@@ -1764,3 +1862,21 @@ void db_col_transaction_commit(struct db_filter_col *col)
col->snapshots = snap->next;
_db_snap_release(snap);
}
+
+int db_col_map_add(struct db_filter_col *col, int max_entries)
+{
+ int rc = 0, rc_tmp;
+ unsigned int iter;
+
+ for (iter = 0; iter < col->filter_cnt; iter++) {
+ rc_tmp = arch_filter_map_add(col, col->map_filters[iter],
+ max_entries);
+ if (rc == 0 && rc_tmp < 0)
+ rc = rc_tmp;
+ }
+
+ if (rc == 0)
+ col->attr.use_ebpf = true;
+
+ return rc;
+}
diff --git a/src/db.h b/src/db.h
index b82491a..46450a6 100644
--- a/src/db.h
+++ b/src/db.h
@@ -48,6 +48,10 @@ struct db_api_rule_list {
struct db_api_rule_list *prev, *next;
};

+struct db_api_map_create {
+ int max_entries;
+};
+
struct db_arg_chain_tree {
/* argument number (a0 = 0, a1 = 1, etc.) */
unsigned int arg;
@@ -141,6 +145,8 @@ struct db_filter_attr {
uint32_t api_tskip;
/* SECCOMP_FILTER_FLAG_LOG related attributes */
uint32_t log_enable;
+ /* Should seccomp generate EBPF instructions? */
+ uint32_t use_ebpf;
};

struct db_filter {
@@ -152,6 +158,11 @@ struct db_filter {

/* list of rules used to build the filters, kept in order */
struct db_api_rule_list *rules;
+
+ /* EBPF map */
+ struct db_api_map_create *map;
+ /* file descriptor of the map - generated by the kernel */
+ int map_fd;
};

struct db_filter_snap {
@@ -174,6 +185,10 @@ struct db_filter_col {
struct db_filter **filters;
unsigned int filter_cnt;

+ /* map filters */
+ struct db_filter **map_filters;
+ unsigned int map_filter_cnt;
+
/* transaction snapshots */
struct db_filter_snap *snapshots;
};
@@ -202,7 +217,8 @@ int db_col_valid(struct db_filter_col *col);

int db_col_merge(struct db_filter_col *col_dst, struct db_filter_col *col_src);

-int db_col_arch_exist(struct db_filter_col *col, uint32_t arch_token);
+int db_col_arch_exist(struct db_filter_col *col, struct db_filter **db,
+ unsigned int db_cnt, int32_t arch_token);

int db_col_attr_get(const struct db_filter_col *col,
enum scmp_filter_attr attr, uint32_t *value);
@@ -210,12 +226,15 @@ int db_col_attr_set(struct db_filter_col *col,
enum scmp_filter_attr attr, uint32_t value);

int db_col_db_new(struct db_filter_col *col, const struct arch_def *arch);
-int db_col_db_add(struct db_filter_col *col, struct db_filter *db);
+int db_col_db_add(struct db_filter_col *col, struct db_filter ***db,
+ unsigned int *db_cnt, struct db_filter *db_arch);
int db_col_db_remove(struct db_filter_col *col, uint32_t arch_token);

int db_col_rule_add(struct db_filter_col *col,
bool strict, uint32_t action, int syscall,
unsigned int arg_cnt, const struct scmp_arg_cmp *arg_array);
+int db_col_map_rule_add(struct db_filter_col *col,
+ bool strict, uint32_t action, int syscall);

int db_col_syscall_priority(struct db_filter_col *col,
int syscall, uint8_t priority);
@@ -226,4 +245,6 @@ void db_col_transaction_commit(struct db_filter_col *col);

int db_rule_add(struct db_filter *db, const struct db_api_rule_list *rule);

+int db_col_map_add(struct db_filter_col *col, int max_entries);
+
#endif
diff --git a/src/gen_ebpf.c b/src/gen_ebpf.c
new file mode 100644
index 0000000..50674db
--- /dev/null
+++ b/src/gen_ebpf.c
@@ -0,0 +1,206 @@
+/**
+ * Seccomp EBPF Translator
+ *
+ * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
+ * Author: Tom Hromatka <tom.hr...@oracle.com>
+ */
+
+/*
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, see <http://www.gnu.org/licenses>.
+ */
+
+#include <errno.h>
+#include <inttypes.h>
+#include <linux/bpf.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#ifndef _BSD_SOURCE
+#define _BSD_SOURCE
+#endif
+#include <seccomp.h>
+
+#include "db.h"
+#include "gen_ebpf.h"
+#include "helper.h"
+#include "libbpf.h"
+
+/**
+ * Free the EBPF program
+ * @param prg the EBPF program
+ *
+ * Free the EBPF program. None of the associated EBPF state used to generate
+ * the EBPF program is released in this function.
+ *
+ */
+static void _program_free(struct ebpf_program *prgm)
+{
+ if (prgm == NULL)
+ return;
+
+ if (prgm->insns)
+ free(prgm->insns);
+
+ free(prgm);
+}
+
+static struct ebpf_program *_program_merge(struct ebpf_program *first,
+ struct ebpf_program *second)
+{
+ struct ebpf_program *new;
+ struct bpf_insn *insns;
+ int16_t insn_cnt;
+
+ if (first == NULL)
+ /* special case where the first program hasn't been populated
+ * yet. simply return the second
+ */
+ return second;
+
+ if (first->insn_cnt == 0 || second->insn_cnt == 0)
+ return NULL;
+
+ new = malloc(sizeof(*new));
+ if (new == NULL)
+ return NULL;
+
+ insn_cnt = first->insn_cnt + second->insn_cnt;
+ insns = malloc(insn_cnt * sizeof(struct bpf_insn));
+ if (insns == NULL) {
+ _program_free(new);
+ return NULL;
+ }
+
+ memcpy(insns, first->insns, first->insn_cnt);
+ memcpy(&insns[first->insn_cnt], second->insns, second->insn_cnt);
+
+ new->insns = insns;
+ new->insn_cnt = insn_cnt;
+
+ _program_free(first);
+ _program_free(second);
+
+ return new;
+}
+
+static struct ebpf_program *_gen_ebpf_map_arch(const struct db_filter *db)
+{
+ /// TODO - add an if(arch == ...) around these instructions
+
+ struct ebpf_program *prgm;
+ struct bpf_insn *insns2, insns[] = {
+ /* save off reg 1 (sd) in case we want it later */
+ BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+
+ /* arg 1 - map_fd */
+ BPF_LD_MAP_FD(BPF_REG_1, db->map_fd),
+
+ /* arg 2 - load the fp with sd->nr */
+ BPF_LDX_MEM(BPF_W, BPF_REG_7, BPF_REG_6,
+ offsetof(struct seccomp_data, nr)),
+ BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_7, -4),
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4),
+
+ /* lookup the syscall in the map */
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_map_lookup_elem),
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
+ /* the syscall was found, read the action and return it */
+ BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 0),
+ BPF_EXIT_INSN(),
+ /* syscall wasn't found */
+ /// TODO - Remove the following and instead continue with
+ /// "normal" filtering
+ BPF_MOV64_IMM(BPF_REG_0, SCMP_ACT_KILL),
+ BPF_EXIT_INSN(),
+ };
+
+ /* copy the stack variable to something with permanence */
+ insns2 = malloc(sizeof(insns));
+ if (insns2 == NULL)
+ return NULL;
+ memcpy(insns2, insns, sizeof(insns));
+
+ prgm = malloc(sizeof(*prgm));
+ if (prgm == NULL)
+ return NULL;
+
+ prgm->insns = insns2;
+ prgm->insn_cnt = sizeof(insns) / sizeof(struct bpf_insn);
+
+ return prgm;
+}
+
+static struct ebpf_program *_gen_ebpf_build_map_prgms(
+ const struct db_filter_col *col)
+{
+ int iter;
+ struct ebpf_program *prgm = NULL, *new = NULL;
+
+ for (iter = 0; iter < col->map_filter_cnt; iter++) {
+ new = _gen_ebpf_map_arch(col->map_filters[iter]);
+ if (new == NULL)
+ goto error;
+
+ prgm = _program_merge(prgm, new);
+ if (prgm == NULL)
+ goto error;
+ }
+
+ return prgm;
+error:
+ _program_free(new);
+ _program_free(prgm);
+ return NULL;
+}
+
+/**
+ * Generate an EBPF representation of the filter DB
+ * @param col the seccomp filter collection
+ *
+ * This function generates an EBPF representation of the given filter collection.
+ * Returns a pointer to a valid ebpf_program on success, NULL on failure.
+ *
+ */
+struct ebpf_program *gen_ebpf_generate(const struct db_filter_col *col)
+{
+ struct ebpf_program *prgm;
+
+ prgm = zmalloc(sizeof(*(prgm)));
+ if (prgm == NULL)
+ goto error;
+
+ prgm = _gen_ebpf_build_map_prgms(col);
+
+ /// TODO - process the rest of the EBPF filter
+
+ return prgm;
+error:
+ _program_free(prgm);
+ return NULL;
+}
+
+/**
+ * Free memory associated with an EBPF representation
+ * @param program the EBPF representation
+ *
+ * Free the memory associated with an EBPF representation generated by the
+ * gen_ebpf_generate() function.
+ *
+ */
+void gen_ebpf_release(struct ebpf_program *program)
+{
+ _program_free(program);
+}
diff --git a/src/gen_ebpf.h b/src/gen_ebpf.h
new file mode 100644
index 0000000..4ede785
--- /dev/null
+++ b/src/gen_ebpf.h
@@ -0,0 +1,38 @@
+/**
+ * Seccomp EBPF Translator
+ *
+ * Copyright (c) 2018 Oracle and/or its affiliates. All rights reserved.
+ * Author: Tom Hromatka <tom.hr...@oracle.com>
+ */
+
+/*
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of version 2.1 of the GNU Lesser General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this library; if not, see <http://www.gnu.org/licenses>.
+ */
+
+#ifndef _TRANSLATOR_EBPF_H
+#define _TRANSLATOR_EBPF_H
+
+#include <inttypes.h>
+#include <linux/bpf.h>
+
+#include "db.h"
+
+struct ebpf_program {
+ struct bpf_insn *insns;
+ int16_t insn_cnt;
+};
+
+struct ebpf_program *gen_ebpf_generate(const struct db_filter_col *col);
+void gen_ebpf_release(struct ebpf_program *program);
+
+#endif
diff --git a/src/gen_pfc.c b/src/gen_pfc.c
index 2453d46..6d44336 100644
--- a/src/gen_pfc.c
+++ b/src/gen_pfc.c
@@ -324,6 +324,65 @@ arch_return:
}

/**
+ * Generate pseudo filter code for an architecture EBPF map
+ * @param col the seccomp filter collection
+ * @param db the single seccomp map filter
+ * @param fds the file stream to send the output
+ *
+ * This function generates a pseudo filter code representation of the given
+ * filter DB and writes it to the given output stream. Returns zero on
+ * success, negative values on failure.
+ *
+ */
+static int _gen_pfc_arch_map(const struct db_filter_col *col,
+ const struct db_filter *db, FILE *fds)
+{
+ int rc = 0;
+ struct db_sys_list *s_iter;
+ const char *sys_name;
+
+ if (db->map != NULL) {
+ fprintf(fds, "# create an EBPF hash map for arch %s\n",
+ _pfc_arch(db->arch));
+ fprintf(fds, "# NOTE - this is done at seccomp_load() time\n");
+ fprintf(fds, "#\tsyscall\t\t\taction\n");
+ fprintf(fds, "#\t---------------------------\n");
+ db_list_foreach(s_iter, db->syscalls) {
+ sys_name = arch_syscall_resolve_num(db->arch,
+ s_iter->num);
+
+ /* This printf intentionally omits the newline
+ * _pfc_action() will provide it
+ */
+ fprintf(fds, "#\t%s (%d)\t\t", sys_name, s_iter->num);
+ _pfc_action(fds, s_iter->action);
+ }
+
+ fprintf(fds, "syscall(bpf, BPF_MAP_CREATE, max_entries=%d)\n",
+ db->map->max_entries);
+ }
+
+ fprintf(fds, "# filter for arch %s (%u)\n",
+ _pfc_arch(db->arch), db->arch->token_bpf);
+ fprintf(fds, "if ($arch == %u)\n", db->arch->token_bpf);
+
+ if (db->map != NULL) {
+ _indent(fds, 1);
+ fprintf(fds, "# lookup syscall in EBPF hash map\n");
+ _indent(fds, 1);
+ fprintf(fds, "$action = ebpf_hash_map_lookup ($syscall)\n");
+ _indent(fds, 1);
+ fprintf(fds, "# filter for a valid action found in the hash map\n");
+ _indent(fds, 1);
+ fprintf(fds, "if ($action != NULL)\n");
+ _indent(fds, 2);
+ fprintf(fds, "action $action;\n");
+ }
+
+ return rc;
+}
+
+/**
* Generate a pseudo filter code string representation
* @param col the seccomp filter collection
* @param fd the fd to send the output
@@ -354,6 +413,9 @@ int gen_pfc_generate(const struct db_filter_col *col, int fd)
fprintf(fds, "# pseudo filter code start\n");
fprintf(fds, "#\n");

+ if (col->attr.use_ebpf)
+ for (iter = 0; iter < col->map_filter_cnt; iter++)
+ _gen_pfc_arch_map(col, col->map_filters[iter], fds);
for (iter = 0; iter < col->filter_cnt; iter++)
_gen_pfc_arch(col, col->filters[iter], fds);

diff --git a/src/libbpf.h b/src/libbpf.h
new file mode 100644
index 0000000..63db45b
--- /dev/null
+++ b/src/libbpf.h
@@ -0,0 +1,197 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* eBPF mini library */
+#ifndef __LIBBPF_H
+#define __LIBBPF_H
+
+struct bpf_insn;
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU64_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_ALU32_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_MOV32_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV64_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_MOV32_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
+#define BPF_LD_IMM64(DST, IMM) \
+ BPF_LD_IMM64_RAW(DST, 0, IMM)
+
+#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_DW | BPF_IMM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = (__u32) (IMM) }), \
+ ((struct bpf_insn) { \
+ .code = 0, /* zero is reserved opcode */ \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = ((__u64) (IMM)) >> 32 })
+
+#ifndef BPF_PSEUDO_MAP_FD
+# define BPF_PSEUDO_MAP_FD 1
+#endif
+
+/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
+#define BPF_LD_MAP_FD(DST, MAP_FD) \
+ BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
+
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Atomic memory add, *(uint *)(dst_reg + off16) += src_reg */
+
+#define BPF_STX_XADD(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_XADD, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
+
+#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
+
+#define BPF_JMP_REG(OP, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = CODE, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN() \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_EXIT, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = 0 })
+
+#endif
diff --git a/src/system.c b/src/system.c
index af46109..38b5bef 100644
--- a/src/system.c
+++ b/src/system.c
@@ -21,7 +21,10 @@

#include <stdlib.h>
#include <errno.h>
+#include <linux/bpf.h>
#include <sys/prctl.h>
+#include <stdio.h>
+#include <string.h>

#define _GNU_SOURCE
#include <unistd.h>
@@ -31,6 +34,8 @@
#include "arch.h"
#include "db.h"
#include "gen_bpf.h"
+#include "gen_ebpf.h"
+#include "helper.h"
#include "system.h"

/* NOTE: the seccomp syscall whitelist is currently disabled for testing
@@ -233,17 +238,104 @@ void sys_set_seccomp_flag(int flag, bool enable)
}
}

-/**
- * Loads the filter into the kernel
- * @param col the filter collection
- *
- * This function loads the given seccomp filter context into the kernel. If
- * the filter was loaded correctly, the kernel will be enforcing the filter
- * when this function returns. Returns zero on success, negative values on
- * error.
- *
- */
-int sys_filter_load(const struct db_filter_col *col)
+#undef EBPF_MAP_DEBUG
+#ifdef EBPF_MAP_DEBUG
+static int _sys_lookup_map_element(int map_fd, int key)
+{
+ int rc = 0;
+ union bpf_attr attr;
+ uint32_t value;
+
+ memset(&attr, '\0', sizeof(attr));
+
+ attr.map_fd = map_fd;
+ attr.key = (uint64_t)&key;
+ attr.value = (uint64_t)&value;
+
+ rc = syscall(__NR_bpf, BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr));
+ fprintf(stdout, "lookup: map_fd=%d key=%d value=0x%08x rc=%d\n",
+ map_fd, key, value, rc);
+
+ return rc;
+}
+#endif /* EBPF_MAP_DEBUG */
+
+static int _sys_populate_map(const struct db_filter *filter)
+{
+ int rc = 0;
+ struct db_api_rule_list *rule, *first_rule;
+ union bpf_attr attr;
+
+ rule = first_rule = filter->rules;
+ while (rule != NULL) {
+ memset(&attr, '\0', sizeof(attr));
+ attr.map_fd = filter->map_fd;
+ attr.key = (uint64_t)&rule->syscall;
+ attr.value = (uint64_t)&rule->action;
+ attr.flags = BPF_NOEXIST;
+
+ rc = syscall(__NR_bpf,
+ BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+ if (rc < 0)
+ break;
+
+#ifdef EBPF_MAP_DEBUG
+ _sys_lookup_map_element(filter->map_fd, rule->syscall);
+#endif
+
+ rule = rule->next;
+ if (rule == first_rule)
+ break;
+ }
+
+ return rc;
+}
+
+static int _sys_create_map(struct db_filter *filter)
+{
+ int rc = 0, map_fd;
+ union bpf_attr attr;
+
+ if (filter->map != NULL) {
+ memset(&attr, '\0', sizeof(attr));
+ attr.map_type = BPF_MAP_TYPE_HASH;
+ attr.key_size = sizeof(int);
+ attr.value_size = sizeof(uint32_t);
+ attr.max_entries = filter->map->max_entries;
+
+ map_fd = syscall(__NR_bpf,
+ BPF_MAP_CREATE, &attr, sizeof(attr));
+ if (map_fd < 0)
+ return map_fd;
+
+ filter->map_fd = map_fd;
+
+ rc = _sys_populate_map(filter);
+ if (rc < 0)
+ return rc;
+ }
+
+ return rc;
+}
+
+static int _sys_create_maps(struct db_filter_col *col)
+{
+ int rc, iter;
+
+ for (iter = 0; iter < col->map_filter_cnt; iter++) {
+ rc = _sys_create_map(col->map_filters[iter]);
+ if (rc < 0)
+ goto error;
+ }
+
+ return rc;
+
+error:
+ /// TODO - remove all maps upon error
+ return rc;
+}
+
+static int sys_bpf_filter_load(struct db_filter_col *col)
{
int rc;
struct bpf_program *prgm = NULL;
@@ -280,3 +372,108 @@ filter_load_out:
return -errno;
return 0;
}
+
+static char license[] = "GPL";
+static char prog_name[] = "seccomp";
+
+#undef EBPF_LOGGING_ENABLE
+#ifdef EBPF_LOGGING_ENABLE
+#define LOG_SZ 65000
+static char ebpf_log[LOG_SZ];
+#endif
+
+static int sys_ebpf_filter_load(struct db_filter_col *col)
+{
+ int rc;
+ struct ebpf_program *prgm = NULL;
+ union bpf_attr *attr = NULL;
+
+ rc = _sys_create_maps(col);
+ if (rc < 0)
+ return rc;
+
+ prgm = gen_ebpf_generate(col);
+ if (prgm == NULL)
+ return -ENOMEM;
+
+ attr = zmalloc(sizeof(*attr));
+ if (attr == NULL)
+ return -ENOMEM;
+
+ /// TODO - I need to create a SECCOMP program type in the kernel
+ //attr->prog_type = BPF_PROG_TYPE_SECCOMP;
+ attr->prog_type = BPF_PROG_TYPE_KPROBE;
+ attr->insn_cnt = prgm->insn_cnt;
+ attr->insns = (unsigned long)prgm->insns;
+ attr->license = (unsigned long)license;
+#ifdef EBPF_LOGGING_ENABLE
+ attr->log_level = 1;
+ attr->log_size = LOG_SZ;
+ attr->log_buf = (unsigned long)ebpf_log;
+#else
+ attr->log_level = 0;
+ attr->log_size = 0;
+ attr->log_buf = (unsigned long)NULL;
+#endif /* EBPF_LOGGING_ENABLE */
+
+ attr->prog_flags = 0;
+ memcpy(attr->prog_name, prog_name, sizeof(prog_name));
+
+ /* attempt to set NO_NEW_PRIVS */
+ if (col->attr.nnp_enable) {
+ rc = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+ if (rc < 0)
+ goto filter_load_out;
+ }
+
+ /* load the filter into the kernel */
+ if (sys_chk_seccomp_syscall() == 1) {
+ int flgs = 0;
+ if (col->attr.tsync_enable)
+ flgs |= SECCOMP_FILTER_FLAG_TSYNC;
+ if (col->attr.log_enable)
+ flgs |= SECCOMP_FILTER_FLAG_LOG;
+ rc = syscall(_nr_seccomp,
+ SECCOMP_SET_MODE_EBPF_FILTER, flgs, attr);
+ if (rc > 0 && col->attr.tsync_enable)
+ /* always return -ESRCH if we fail to sync threads */
+ errno = ESRCH;
+ } else
+ rc = prctl(PR_SET_SECCOMP, SECCOMP_MODE_EBPF_FILTER, attr);
+
+#ifdef EBPF_LOGGING_ENABLE
+ printf("ebpf_log =\n");
+ printf("%s\n", ebpf_log);
+#endif
+
+filter_load_out:
+ /* cleanup and return */
+ gen_ebpf_release(prgm);
+ if (attr)
+ free(attr);
+ if (rc < 0)
+ return -errno;
+ return 0;
+}
+
+/**
+ * Loads the filter into the kernel
+ * @param col the filter collection
+ *
+ * This function loads the given seccomp filter context into the kernel. If
+ * the filter was loaded correctly, the kernel will be enforcing the filter
+ * when this function returns. Returns zero on success, negative values on
+ * error.
+ *
+ */
+int sys_filter_load(struct db_filter_col *col)
+{
+ int rc;
+
+ if (col->attr.use_ebpf)
+ rc = sys_ebpf_filter_load(col);
+ else
+ rc = sys_bpf_filter_load(col);
+
+ return rc;
+}
diff --git a/src/system.h b/src/system.h
index b793687..7f3ca97 100644
--- a/src/system.h
+++ b/src/system.h
@@ -127,6 +127,6 @@ void sys_set_seccomp_action(uint32_t action, bool enable);
int sys_chk_seccomp_flag(int flag);
void sys_set_seccomp_flag(int flag, bool enable);

-int sys_filter_load(const struct db_filter_col *col);
+int sys_filter_load(struct db_filter_col *col);

#endif
--
2.7.4

Tom Hromatka

unread,
Feb 7, 2018, 4:27:39 PM2/7/18
to libse...@googlegroups.com, tom.hr...@oracle.com, tom.hr...@gmail.com, dhaval...@oracle.com
This patch adds support for EBPF hash maps to seccomp.

Several in-house customers have identified that their large
seccomp filters are slowing down their applications. Their
filters largely consist of simple allow/deny logic for many
syscalls (306 in one case) and for the most part don't utilize
argument filtering.

I put together a rough but fully working prototype that called
getppid() in a loop using one of my customer's seccomp filters.
I ran this loop one million times and recorded the min, max,
and mean times (in TSC ticks) to call getppid(). (I didn't
disable interrupts, so the max time was often large.) I chose
to report the minimum time because I feel it best represents the
actual time to traverse the syscall.

Test Case minimum TSC ticks to make syscall
----------------------------------------------------------------
seccomp disabled 620
getppid() at the front of 306-syscall seccomp filter 722
getppid() in middle of 306-syscall seccomp filter 1392
getppid() at the end of the 306-syscall filter 2452
seccomp using a 306-syscall-sized EBPF hash map 800

As shown in the table above, adding EBPF hash map support to
seccomp can significantly improve syscall performance in the
average and worst case for these customers.

TODOs:
* These changes collide with commit ce3dda9a - "massive src/db.c
rework". I will rebase them on top of that commit, but it
doesn't look trivial at the moment
* Add BPF_PROG_TYPE_SECCOMP and its associated functions to the
kernel
* Add doxygen comments
* Add libseccomp tests
* Address the handful of TODOs in the RFC

Next Steps:
Obviously the kernel changes need to be committed prior to adding
EBPF hash map support to the seccomp library, but I wanted to get
this community's buy-in before continuing much further. The
critical logic changes largely reside in this library, so I chose
to start here.

A few questions for this community:
* General thoughts on this patch, of course
* Going forward from here - some of my customers also need to
add a few more advanced rules which filter on syscall arguments
(in addition to their many basic allow/deny rules). I have been
waffling between supporting a hybrid cBPF and EBPF solution
initially vs. adding full EBPF support. A hybrid cBPF/EBPF
solution (where two filters are loaded into the kernel) could be
completed in a much quicker timeframe than full EBPF support.

Tom Hromatka (1):
all: RFC - add support for ebpf

include/seccomp.h.in | 31 ++++++++
src/Makefile.am | 2 +-
src/api.c | 36 ++++++++-
src/arch.c | 24 +++++-
src/arch.h | 3 +
src/db.c | 208 +++++++++++++++++++++++++++++++++++++-----------
src/db.h | 25 +++++-
src/gen_ebpf.c | 206 ++++++++++++++++++++++++++++++++++++++++++++++++
src/gen_ebpf.h | 38 +++++++++
src/gen_pfc.c | 62 +++++++++++++++
src/libbpf.h | 197 +++++++++++++++++++++++++++++++++++++++++++++
src/system.c | 219 ++++++++++++++++++++++++++++++++++++++++++++++++---
src/system.h | 2 +-
13 files changed, 988 insertions(+), 65 deletions(-)
create mode 100644 src/gen_ebpf.c
create mode 100644 src/gen_ebpf.h
create mode 100644 src/libbpf.h

--
2.7.4

Tom Hromatka

unread,
Feb 7, 2018, 4:31:00 PM2/7/18
to libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
On 02/07/2018 02:27 PM, Tom Hromatka wrote:

> This patch adds support for EBPF hash maps to seccomp.
>

The prototype seccomp/EBPF changes to the kernel are attached
below.  They are definitely a work in progress.

bpf_prog_seccomp_prepare() is loosely based upon bpf_prog_load(),
and seccomp_set_mode_ebpf_filter() is loosely based upon
seccomp_set_mode_filter().



commit bf655771e7c108ab76100f4e9af189f8f6b9bf67
Author: Tom Hromatka <tom.hr...@oracle.com>
Date:   Tue Jan 30 17:19:53 2018 -0700

    seccomp: add ebpf support

    This commit adds ebpf support to seccomp.  Prior to this commit
    seccomp only supported classic bpf.

    Signed-off-by: Tom Hromatka <tom.hr...@oracle.com>

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0b25cf8..508e83d 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -430,6 +430,7 @@ static inline int bpf_map_attr_numa_node(const union
bpf_attr *attr)
 }

 struct bpf_prog *bpf_prog_get_type_path(const char *name, enum
bpf_prog_type type);
+int bpf_prog_seccomp_prepare(union bpf_attr *attr, struct bpf_prog
**prog);

 #else /* !CONFIG_BPF_SYSCALL */
 static inline struct bpf_prog *bpf_prog_get(u32 ufd)
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 2a0bd9d..c71053f 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -10,11 +10,13 @@
 #define SECCOMP_MODE_DISABLED  0 /* seccomp is not in use. */
 #define SECCOMP_MODE_STRICT    1 /* uses hard-coded filter. */
 #define SECCOMP_MODE_FILTER    2 /* uses user-supplied filter. */
+#define SECCOMP_MODE_EBPF_FILTER        3 /* uses user-supplied EBPF. */

 /* Valid operations for seccomp syscall. */
 #define SECCOMP_SET_MODE_STRICT                0
 #define SECCOMP_SET_MODE_FILTER                1
 #define SECCOMP_GET_ACTION_AVAIL       2
+#define SECCOMP_SET_MODE_EBPF_FILTER    3

 /* Valid flags for SECCOMP_SET_MODE_FILTER */
 #define SECCOMP_FILTER_FLAG_TSYNC      1
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5cb783f..e33344c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1229,6 +1229,89 @@ static int bpf_prog_load(union bpf_attr *attr)
        return err;
 }

+int bpf_prog_seccomp_prepare(union bpf_attr *attr, struct bpf_prog **prog)
+{
+       int err;
+
+       if (attr->prog_flags & ~BPF_F_STRICT_ALIGNMENT)
+               return -EINVAL;
+
+       if (attr->insn_cnt == 0 || attr->insn_cnt > BPF_MAXINSNS)
+               return -E2BIG;
+
+/// TODO
+#if 0
+       if (attr->_prog_type != BPF_PROG_TYPE_SECCOMP)
+               return -EINVAL;
+#endif
+
+       /* plain bpf_prog allocation */
+       *prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
+       if (!prog)
+               return -ENOMEM;
+
+       err = security_bpf_prog_alloc((*prog)->aux);
+       if (err)
+               goto free_prog_nouncharge;
+
+       err = bpf_prog_charge_memlock(*prog);
+       if (err)
+               goto free_prog_sec;
+
+       (*prog)->len = attr->insn_cnt;
+
+       err = -EFAULT;
+       if (copy_from_user((*prog)->insns, u64_to_user_ptr(attr->insns),
+                          bpf_prog_insn_size(*prog)) != 0)
+               goto free_prog;
+
+       (*prog)->orig_prog = NULL;
+       (*prog)->jited = 0;
+
+       atomic_set(&(*prog)->aux->refcnt, 1);
+       (*prog)->gpl_compatible = 1;
+
+       if (attr->prog_ifindex) {
+               err = bpf_prog_offload_init(*prog, attr);
+               if (err)
+                       goto free_prog;
+       }
+
+/// TODO
+       /* find program type */
+#if 0
+       err = find_prog_type(type, prog);
+       if (err < 0)
+               goto free_prog;
+#else
+       (*prog)->type = attr->prog_type;
+#endif
+
+       (*prog)->aux->load_time = ktime_get_boot_ns();
+
+       /* run eBPF verifier */
+       err = bpf_check(prog, attr);
+       if (err < 0)
+               goto free_used_maps;
+
+       /* eBPF program is ready to be JITed */
+       *prog = bpf_prog_select_runtime(*prog, &err);
+       if (err < 0)
+               goto free_used_maps;
+
+       return err;
+
+free_used_maps:
+       free_used_maps((*prog)->aux);
+free_prog:
+       bpf_prog_uncharge_memlock(*prog);
+free_prog_sec:
+       security_bpf_prog_free((*prog)->aux);
+free_prog_nouncharge:
+       bpf_prog_free(*prog);
+       return err;
+}
+
 #define BPF_OBJ_LAST_FIELD file_flags

 static int bpf_obj_pin(const union bpf_attr *attr)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 5f0dfb2ab..ab3895e 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -16,6 +16,7 @@

 #include <linux/refcount.h>
 #include <linux/audit.h>
+#include <linux/bpf.h>
 #include <linux/compat.h>
 #include <linux/coredump.h>
 #include <linux/kmemleak.h>
@@ -205,6 +206,7 @@ static u32 seccomp_run_filters(const struct
seccomp_data *sd,
         * All filters in the list are evaluated and the lowest BPF return
         * value always takes priority (ignoring the DATA).
         */
+       rcu_read_lock();
        for (; f; f = f->prev) {
                u32 cur_ret = BPF_PROG_RUN(f->prog, sd);

@@ -213,6 +215,7 @@ static u32 seccomp_run_filters(const struct
seccomp_data *sd,
                        *match = f;
                }
        }
+       rcu_read_unlock();
        return ret;
 }
 #endif /* CONFIG_SECCOMP_FILTER */
@@ -786,6 +789,7 @@ int __secure_computing(const struct seccomp_data *sd)
                __secure_computing_strict(this_syscall);  /* may call
do_exit */
                return 0;
        case SECCOMP_MODE_FILTER:
+       case SECCOMP_MODE_EBPF_FILTER:
                return __seccomp_filter(this_syscall, sd, false);
        default:
                BUG();
@@ -893,6 +897,48 @@ static inline long seccomp_set_mode_filter(unsigned
int flags,
 }
 #endif

+static long seccomp_set_mode_ebpf_filter(unsigned int flags,
+                                        const char __user *uattr)
+{
+       union bpf_attr attr = {};
+       struct bpf_prog *prog;
+       struct seccomp_filter *sfilter = NULL;
+       const unsigned long seccomp_mode = SECCOMP_MODE_EBPF_FILTER;
+       int err;
+
+       if (copy_from_user(&attr, uattr, sizeof(attr)) != 0)
+               return -EFAULT;
+
+       err = bpf_prog_seccomp_prepare(&attr, &prog);
+       if (err)
+               return err;
+
+       sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
+       if (!sfilter)
+               goto free_prog;
+
+       sfilter->prog = prog;
+       refcount_set(&sfilter->usage, 1);
+
+       spin_lock_irq(&current->sighand->siglock);
+
+       err = seccomp_attach_filter(flags, sfilter);
+       if (err)
+               goto unlock;
+
+       seccomp_assign_mode(current, seccomp_mode);
+
+       spin_unlock_irq(&current->sighand->siglock);
+
+       return err;
+unlock:
+       spin_unlock_irq(&current->sighand->siglock);
+       kfree(sfilter);
+free_prog:
+       bpf_prog_free(prog);
+       return err;
+}
+
 static long seccomp_get_action_avail(const char __user *uaction)
 {
        u32 action;
@@ -927,6 +973,8 @@ static long do_seccomp(unsigned int op, unsigned int
flags,
                return seccomp_set_mode_strict();
        case SECCOMP_SET_MODE_FILTER:
                return seccomp_set_mode_filter(flags, uargs);
+       case SECCOMP_SET_MODE_EBPF_FILTER:
+               return seccomp_set_mode_ebpf_filter(flags, uargs);
        case SECCOMP_GET_ACTION_AVAIL:
                if (flags != 0)
                        return -EINVAL;
@@ -969,6 +1017,10 @@ long prctl_set_seccomp(unsigned long seccomp_mode,
char __user *filter)
                op = SECCOMP_SET_MODE_FILTER;
                uargs = filter;
                break;
+       case SECCOMP_MODE_EBPF_FILTER:
+               op = SECCOMP_SET_MODE_EBPF_FILTER;
+               uargs = filter;
+               break;
        default:
                return -EINVAL;
        }


Paul Moore

unread,
Feb 7, 2018, 5:17:10 PM2/7/18
to Tom Hromatka, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
Hi Tom,

Thanks for the patches, I obviously haven't had a chance to look at
them in detail, but I wanted to provide some quick
feedback/confirmation as well as ask a few questions; in no particular
order:

* You are correct, we need kernel support before we can consider
merging the libseccomp code into the master branch. However, if you
feel it would be helpful to have libseccomp support in order to
test/verify the kernel support, I would consider merging the code into
a development branch.

* My apologies on the collision with the "massive src/db.c rework".
That was a bit of a special case patch and I only did it that way
because I didn't believe anyone else had any pending work that
involved that code. If there is anything I can do to help here, let
me know.

* Testing is a pretty hard requirement for libseccomp, especially
these patches. I'm going to ask for +90% test coverage of this new
code (we're currently at ~92%); I know that's a lot, but this is
critical code.

* Improved performance is always good, but functional equivalency and
compatibility with the existing API/callers is another hard
requirement. I mention this because of your comments regarding
argument filtering and hybrid BPF filters, can you elaborate a bit
more on the possible solutions to ensure this works under EBPF
(including hybrid options)?

--
paul moore
www.paul-moore.com

Tom Hromatka

unread,
Feb 7, 2018, 6:17:16 PM2/7/18
to Paul Moore, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com


On 02/07/2018 03:17 PM, Paul Moore wrote:
> Hi Tom,
>
> Thanks for the patches, I obviously haven't had a chance to look at
> them in detail, but I wanted to provide some quick
> feedback/confirmation as well as ask a few questions; in no particular
> order:

No worries.  Thanks for the quick turnaround!

>
> * You are correct, we need kernel support before we can consider
> merging the libseccomp code into the master branch. However, if you
> feel it would be helpful to have libseccomp support in order to
> test/verify the kernel support, I would consider merging the code into
> a development branch.

Thanks for being flexible.  I don't think we will need to
commit the libseccomp changes prior to the kernel changes.
But I felt it was important to make the libseccomp changes
(or at least the general idea of the changes) available
prior to sending out some possibly cryptic kernel changes.
Hopefully this will make it easier for the kernel side to
understand the desired goal.

>
> * My apologies on the collision with the "massive src/db.c rework".
> That was a bit of a special case patch and I only did it that way
> because I didn't believe anyone else had any pending work that
> involved that code. If there is anything I can do to help here, let
> me know.

It's 100% my fault.  The seccomp code base has been so stable,
that I grew lax and didn't check for updates like I should
have.  It shouldn't be too hard to straighten, but I'll ping
you if I have any questions.  Thanks.

>
> * Testing is a pretty hard requirement for libseccomp, especially
> these patches. I'm going to ask for +90% test coverage of this new
> code (we're currently at ~92%); I know that's a lot, but this is
> critical code.

I 100% agree.  I hesitated to send the RFC out without tests,
but I didn't want to write tests for code that may change
significantly before going upstream.

>
> * Improved performance is always good, but functional equivalency and
> compatibility with the existing API/callers is another hard
> requirement. I mention this because of your comments regarding
> argument filtering and hybrid BPF filters, can you elaborate a bit
> more on the possible solutions to ensure this works under EBPF
> (including hybrid options)?
>

Agreed.  We cannot afford to break existing seccomp users.

So this is purely theoretical, but I think it should work.  We could
utilize seccomp's ability in the kernel to run multiple seccomp filters.
The first filter to run would be the EBPF filter.  If it returns an
action to run, we could bail out of the for() loop in seccomp_run_filters()
early.  If not, continue running the cBPF filter(s) as usual. Existing
seccomp users wouldn't even load an EBPF filter so their behavior would
remain unchanged.

On the libseccomp side, I would expect something like this:

int sys_filter_load(struct db_filter_col *col)
{
    int rc;

    /* load the classic BPF filter first */
    rc = sys_bpf_filter_load(col);
    if (rc < 0)
        return rc;

    /* process the EBPF filter.  note - it will only load a
     * filter if it has been instructed to do so by the user.
     * for classic seccomp filters this is effectively a nop
     */
    rc = sys_ebpf_filter_load(col);

    return rc;
}

And on the kernel side, I would propose a change like this:

static u32 seccomp_run_filters(const struct seccomp_data *sd,
                               struct seccomp_filter **match)
{
...
    for (; f; f = f->prev)
    {
        u32 cur_ret = BPF_PROG_RUN(f->prog, sd);

+        if (DATA_ONLY(cur_ret) == SECCOMP_RET_DATA_BREAK)
+        {
+            ret = cur_ret & ~SECCOMP_RET_DATA_BREAK;
+            *match = f;
+            break;
+        }

        if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret))
        {
            ret = cur_ret;
            *match = f; }
        }
...

The EBPF code I added to src/gen_ebpf.c could set the
SECCOMP_RET_DATA_BREAK bit in the data section of the action
to force an early exit from the above for() loop.

the data porti

Kees Cook

unread,
Feb 7, 2018, 7:12:24 PM2/7/18
to Tom Hromatka, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
On Thu, Feb 8, 2018 at 8:27 AM, Tom Hromatka <tom.hr...@oracle.com> wrote:
> This patch adds support for EBPF hash maps to seccomp.
>
> Several in-house customers have identified that their large
> seccomp filters are slowing down their applications. Their
> filters largely consist of simple allow/deny logic for many
> syscalls (306 in one case) and for the most part don't utilize
> argument filtering.
>
> I put together a rough but fully working prototype that called
> getppid() in a loop using one of my customer's seccomp filters.
> I ran this loop one million times and recorded the min, max,
> and mean times (in TSC ticks) to call getppid(). (I didn't
> disable interrupts, so the max time was often large.) I chose
> to report the minimum time because I feel it best represents the
> actual time to traverse the syscall.
>
> Test Case minimum TSC ticks to make syscall
> ----------------------------------------------------------------
> seccomp disabled 620
> getppid() at the front of 306-syscall seccomp filter 722
> getppid() in middle of 306-syscall seccomp filter 1392
> getppid() at the end of the 306-syscall filter 2452
> seccomp using a 306-syscall-sized EBPF hash map 800

Are the filters generated by libseccomp sorted as a binary tree? Based
on these numbers, it would seem not. I think you could get a huge
performance gain out of large filters by having them set up that way.

(Also, 306-syscall-sized filter? There are only 332 syscalls in
x86_64, even in a worst-case, this should be 1 greater-than compare
and 26 equality checks, not 306 equality checks...)

> As shown in the table above, adding EBPF hash map support to
> seccomp can significantly improve syscall performance in the
> average and worst case for these customers.

I'm not against eventually adding eBPF hash map support, but every
time someone has talked about slow filters, it's because they're doing
something hugely inefficient in their filter. :)

-Kees

--
Kees Cook
Pixel Security

Tom Hromatka

unread,
Feb 7, 2018, 8:40:41 PM2/7/18
to libse...@googlegroups.com, kees...@chromium.org, tom.hr...@gmail.com, Dhaval Giani
This customer's current seccomp filter is painfully simple.

if (arch == x86_64)
  if ($syscall == 43)
    action ALLOW;
  ...
  (305 ifs later)
  ...
  if ($syscall == 56)
    action ALLOW;
  action ERRNO(5);

libseccomp today generates a classic BPF filter.  The kernel
then converts it to an EBPF filter using bpf_prog_create_from_user().
Does cBPF support any logic other than jump-true and jump-false
logic?  I'm far from an expert on cBPF, but I wasn't aware of it
supporting much beyond these basic instructions.


>
> (Also, 306-syscall-sized filter? There are only 332 syscalls in
> x86_64, even in a worst-case, this should be 1 greater-than compare
> and 26 equality checks, not 306 equality checks...)

This customer is very, very concerned about security.  Given the excitement
of the last couple months, I can't blame them.  There are only a handful of
syscalls that they wish to block - 26 or so as you pointed out.  But
they are
afraid of a new syscall being added to the kernel that would then be
allowed by
accident.  Thus they created a filter with a very large whitelist and a
default
action of ERRNO(5).  In their estimation, this is safer than a filter with a
26-syscall blacklist and a default action of ALLOW.


>> As shown in the table above, adding EBPF hash map support to
>> seccomp can significantly improve syscall performance in the
>> average and worst case for these customers.
> I'm not against eventually adding eBPF hash map support, but every
> time someone has talked about slow filters, it's because they're doing
> something hugely inefficient in their filter. :)

I agree their filter has a primary focus of security first and performance
second.  I am hoping to mitigate some of their performance woes without
hindering their security focus - misguided or not.

Thanks!

Tom

Kees Cook

unread,
Feb 7, 2018, 8:57:23 PM2/7/18
to Tom Hromatka, libse...@googlegroups.com, tom.hr...@gmail.com, Dhaval Giani
On Thu, Feb 8, 2018 at 12:40 PM, Tom Hromatka <tom.hr...@oracle.com> wrote:
> This customer's current seccomp filter is painfully simple.
>
> if (arch == x86_64)
> if ($syscall == 43)
> action ALLOW;
> ...
> (305 ifs later)
> ...
> if ($syscall == 56)
> action ALLOW;
> action ERRNO(5);
>
> libseccomp today generates a classic BPF filter. The kernel
> then converts it to an EBPF filter using bpf_prog_create_from_user().
> Does cBPF support any logic other than jump-true and jump-false
> logic? I'm far from an expert on cBPF, but I wasn't aware of it
> supporting much beyond these basic instructions.

It's jump true/false, yes, but there is much more than just equality
tests. As such, they could write the above as:

if arch != x86_64, jump kill
if syscall == bad1, jump kill
...
if syscall == bad26, jump kill
if syscall > 332, jump kill
action ALLOW
kill: action KILL

Or the binary search I mentioned:

if arch != x86_64, action KILL
if syscall > 150, jump label150
if syscall > 75, jump label75
if syscall > 38, jump label38
if syscall > 16, jump label16
if syscall > 8, jump label8
...
jump kill
label8:
if syscall == 4, jump kill
if syscall == 3, jump allow
...
kill:
action KILL
allow:
action ALLOW

And if they for some reason have to stick with their giant whitelist,
they should organize it by syscall frequency, so that their most
commonly executed syscalls are at the start of the filter.

> Thus they created a filter with a very large whitelist and a default
> action of ERRNO(5). In their estimation, this is safer than a filter with a
> 26-syscall blacklist and a default action of ALLOW.

Yeah, they can still do that kind of thing, but there's MUCH more than
can do to improve the execution time of their filter. :)

> I agree their filter has a primary focus of security first and performance
> second. I am hoping to mitigate some of their performance woes without
> hindering their security focus - misguided or not.

I think that should be entirely possible with a wider use of the
existing cBPF operators and reasonable attention to common search
algorithm runtime analysis.

Paul Moore

unread,
Feb 7, 2018, 10:19:10 PM2/7/18
to Kees Cook, Tom Hromatka, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
The libseccomp generated filters are not sorted that way. The
libseccomp generated filters are generated with the idea that it is
best to put the shortest rules at the front of the filter; with the
users having some ability to influence the exact ordering via priority
hints (see seccomp_syscall_priority(3)). The idea of sorting syscalls
via a balanced tree has come up in the past, I just haven't had the
time to play with it.

There are other optimizations, but they mostly come into play when
filtering on syscall arguments.

> (Also, 306-syscall-sized filter? There are only 332 syscalls in
> x86_64, even in a worst-case, this should be 1 greater-than compare
> and 26 equality checks, not 306 equality checks...)

... or honestly, just a blacklist. Without knowing more about the
application it's hard to recommend an approach, but most users have
been finding blacklists to be much more manageable, even if it doesn't
provide the same level of warm fuzzies.

>> As shown in the table above, adding EBPF hash map support to
>> seccomp can significantly improve syscall performance in the
>> average and worst case for these customers.
>
> I'm not against eventually adding eBPF hash map support, but every
> time someone has talked about slow filters, it's because they're doing
> something hugely inefficient in their filter. :)
>
> -Kees
>
> --
> Kees Cook
> Pixel Security
>
> --
> You received this message because you are subscribed to the Google Groups "libseccomp" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to libseccomp+...@googlegroups.com.
> To post to this group, send email to libse...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
paul moore
www.paul-moore.com

Tom Hromatka

unread,
Feb 7, 2018, 10:29:57 PM2/7/18
to Kees Cook, libse...@googlegroups.com, tom.hr...@gmail.com, Dhaval Giani
Thanks.  I had tunnel vision on libseccomp's user APIs and wasn't
thinking too clearly about cBPF itself.  I agree - cBPF can
definitely do what I want, but I'm uncertain how best to expose these
features to my customers in a user-friendly manner.  I guess I need
to think about the customer API side some more.  In the b-tree case,
perhaps an scmp_filter_attr - say SCMP_FLTATR_USE_BTREE.

Thanks for the help.

Tom

Paul Moore

unread,
Feb 7, 2018, 10:30:51 PM2/7/18
to Tom Hromatka, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
One of the gotchas with loading multiple filters is that the first,
and all non-final, filters need to allow the syscalls necessary to
load the additional filter (I'm not 100% on EBPF, but I believe it is
still just seccomp(2)). Generally this is user/caller problem, but if
we are hiding the fact that we are loading two filters instead of one
we are going to need to do some additional work to ensure we can load
the second filter and not allow any syscalls the caller didn't intend.

It might be a better use of time and energy to look into improving
libseccomp's filter generation, e.g. Kees' comments around
sorted-trees/qsort.

--
paul moore
www.paul-moore.com

Paul Moore

unread,
Feb 7, 2018, 10:39:33 PM2/7/18
to Tom Hromatka, Kees Cook, libse...@googlegroups.com, tom.hr...@gmail.com, Dhaval Giani
We should make optimizations like this transparent to the caller; they
should "just work" and be the default. The libseccomp attribute API
is there for toggling behavior (e.g. bad ABI actions), and while I see
no reason why we couldn't use it to tune filter generation, I don't
want to expose knobs if there isn't a serious need. I'm not sure I
want to completely abandon the highest-return-density-at-the-top idea,
but we might be able to do something where we sort syscalls within a
given priority range.

--
paul moore
www.paul-moore.com

Tom Hromatka

unread,
Feb 8, 2018, 9:26:15 AM2/8/18
to Paul Moore, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
You are definitely correct.  The cBPF filter would have to
(at a minimum) allow __NR_seccomp and __NR_bpf.  __NR_bpf is
used to create and populate the EBPF map.

If a user was especially security conscious, they would then
need to block those syscalls in the EBPF map.  This is
potentially error prone and slightly cumbersome.

Paul Moore

unread,
Feb 8, 2018, 9:34:34 AM2/8/18
to Tom Hromatka, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
Yes, exactly.

It's also not backwards compatible, you need a new(er) kernel for this
to work. Like I said earlier, it might be easier to look into
improving the existing syscall sorting code in libseccomp; I believe
all the relevant code lives in src/gen_bpf.c:_gen_bpf_arch() and
should be relatively easy to play with ... much easier than the stuff
in src/db.c ;)

>> It might be a better use of time and energy to look into improving
>> libseccomp's filter generation, e.g. Kees' comments around
>> sorted-trees/qsort.

--
paul moore
www.paul-moore.com

Tom Hromatka

unread,
Feb 8, 2018, 9:42:54 AM2/8/18
to Paul Moore, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
It can be made backward compatible if libseccomp only calls __NR_bpf and
__NR_seccomp iff the user has requested EBPF features.  But I digress.

Okay, I'll dig into src/gen_bpf.c and try to improve the syscall sorting
code.
I'm intrigued by the binary tree idea.  Let me know if there's anything you
would like to see added/changed.

Thanks.

Tom

Paul Moore

unread,
Feb 8, 2018, 4:06:33 PM2/8/18
to Tom Hromatka, libse...@googlegroups.com, tom.hr...@gmail.com, dhaval...@oracle.com
If you're speaking in general terms, we do use the GH issue tracker:

* https://github.com/seccomp/libseccomp/issues

... which is a good place to start, feel free to find something you
find interesting and let me know.

If you're speaking specific to BPF filter generation, I think we
already talked about my thinking; my initial desire being to find a
balance between the existing returns-near-the-top behavior and a fully
sorted approach. Feel free to come back to the list with any ideas
you may have and we talk it over if you like.
Reply all
Reply to author
Forward
0 new messages