Code Review - New Akaros profiler

Davide Libenzi

unread,

Oct 28, 2015, 6:06:15 PM10/28/15

to Akaros

Since the previous one has not been done yet, this is a cumulative one:

https://github.com/brho/akaros/compare/master...dlibenzi:profiler_dev_v2

The following changes since commit 6f3723cd8f883260a78fdf411911d7469464caa5:

Update file-posix.c utest (2015-10-15 12:07:00 -0400)

are available in the git repository at:

g...@github.com:dlibenzi/akaros profiler_dev_v2

for you to fetch changes up to 6e40b75927c808dd945cb3ad0ee6e52a5a2dc495:

Added simple netcat client for Akaros (2015-10-28 14:49:56 -0700)

----------------------------------------------------------------

Davide Libenzi (14):

Completely restructured profiler code cutting all the unused code

Removed unused old oprofile code

Do not race when multiple init happen at the same time

Added struct proc pointer to base of the executable args block

Added full binary path into the proc structure

Added APIs to access process startup information

Added API to retrieve the full path of a struct dentry

Extend hash enumeration API to accept an opaque pointer

Added API to retrieve the current set of processes on the system

Added API to enumerate the VM regions of a process

Implemented the new profiler

Added kprof to perf converter

Added patch to Linux perf to allow it to work with Akaros

Added simple netcat client for Akaros

Documentation/profiling.txt | 68 +-

Makefile | 12 +

kern/arch/x86/kdebug.c | 19 +

kern/drivers/dev/kprof.c | 420 ++++--------

kern/include/elf.h | 3 +

kern/include/env.h | 17 +-

kern/include/hashtable.h | 6 +-

kern/include/kdebug.h | 8 +-

kern/include/mm.h | 3 +

kern/include/oprofile.h | 167 -----

kern/include/process.h | 9 +

kern/include/profiler.h | 24 +

kern/include/ros/profiler_records.h | 45 ++

kern/include/umem.h | 2 +

kern/include/vfs.h | 10 +

kern/src/Kbuild | 2 +-

kern/src/elf.c | 36 +

kern/src/hashtable.c | 10 +-

kern/src/mm.c | 16 +

kern/src/monitor.c | 4 +-

kern/src/oprofile/COPYING | 356 ----------

kern/src/oprofile/Kbuild | 1 -

kern/src/oprofile/buffer_sync.c | 566 ---------------

kern/src/oprofile/buffer_sync.h | 22 -

kern/src/oprofile/cpu_buffer.c | 760 ---------------------

kern/src/oprofile/cpu_buffer.h | 76 ---

kern/src/oprofile/event_buffer.c | 195 ------

kern/src/oprofile/event_buffer.h | 30 -

kern/src/oprofile/nmi_timer_int.c | 173 -----

kern/src/oprofile/oprof.c | 283 --------

kern/src/oprofile/oprof.h | 36 -

kern/src/oprofile/oprofile_files.c | 201 ------

kern/src/oprofile/oprofile_perf.c | 266 --------

kern/src/oprofile/oprofile_stats.c | 84 ---

kern/src/oprofile/oprofile_stats.h | 34 -

kern/src/oprofile/oprofilefs.c | 280 --------

kern/src/oprofile/timer_int.c | 120 ----

kern/src/process.c | 63 +-

kern/src/profiler.c | 438 ++++++++++++

kern/src/schedule.c | 4 +-

kern/src/syscall.c | 61 +-

kern/src/umem.c | 31 +

kern/src/vfs.c | 34 +

tools/apps/snc/Makefile | 39 ++

tools/apps/snc/snc.c | 105 +++

tools/profile/kprof2perf/Makefile | 47 ++

tools/profile/kprof2perf/kprof2perf.c | 682 ++++++++++++++++++

tools/profile/kprof2perf/perf_format.h | 519 ++++++++++++++

.../kprof2perf/perf_patches/perf_patch.diff | 130 ++++

49 files changed, 2476 insertions(+), 4041 deletions(-)

delete mode 100644 kern/include/oprofile.h

create mode 100644 kern/include/profiler.h

create mode 100644 kern/include/ros/profiler_records.h

delete mode 100644 kern/src/oprofile/COPYING

delete mode 100644 kern/src/oprofile/Kbuild

delete mode 100644 kern/src/oprofile/buffer_sync.c

delete mode 100644 kern/src/oprofile/buffer_sync.h

delete mode 100644 kern/src/oprofile/cpu_buffer.c

delete mode 100644 kern/src/oprofile/cpu_buffer.h

delete mode 100644 kern/src/oprofile/event_buffer.c

delete mode 100644 kern/src/oprofile/event_buffer.h

delete mode 100644 kern/src/oprofile/nmi_timer_int.c

delete mode 100644 kern/src/oprofile/oprof.c

delete mode 100644 kern/src/oprofile/oprof.h

delete mode 100644 kern/src/oprofile/oprofile_files.c

delete mode 100644 kern/src/oprofile/oprofile_perf.c

delete mode 100644 kern/src/oprofile/oprofile_stats.c

delete mode 100644 kern/src/oprofile/oprofile_stats.h

delete mode 100644 kern/src/oprofile/oprofilefs.c

delete mode 100644 kern/src/oprofile/timer_int.c

create mode 100644 kern/src/profiler.c

create mode 100644 tools/apps/snc/Makefile

create mode 100644 tools/apps/snc/snc.c

create mode 100644 tools/profile/kprof2perf/Makefile

create mode 100644 tools/profile/kprof2perf/kprof2perf.c

create mode 100644 tools/profile/kprof2perf/perf_format.h

create mode 100644 tools/profile/kprof2perf/perf_patches/perf_patch.diff

Davide Libenzi

unread,

Nov 3, 2015, 7:51:14 AM11/3/15

to Akaros

All the changes I had planned to simplify the profiler interface are in this branch, which I consider it done until CR comments come.

Barret Rhoden

unread,

Nov 11, 2015, 11:43:33 AM11/11/15

to aka...@googlegroups.com

On 2015-10-28 at 15:06 "'Davide Libenzi' via Akaros"

<aka...@googlegroups.com> wrote:
> Since the previous one has not been done yet, this is a cumulative
> one:
>
> https://github.com/brho/akaros/compare/master...dlibenzi:profiler_dev_v2
>
>
> The following changes since commit
> 6f3723cd8f883260a78fdf411911d7469464caa5:
>
> Update file-posix.c utest (2015-10-15 12:07:00 -0400)
>
> are available in the git repository at:
>
> g...@github.com:dlibenzi/akaros profiler_dev_v2
>
> for you to fetch changes up to
> 6e40b75927c808dd945cb3ad0ee6e52a5a2dc495:
>
> Added simple netcat client for Akaros (2015-10-28 14:49:56 -0700)

Please run checkpatch on your patches before sending them in. A bunch
of these are false positives, but many are not.

In the meantime, I'll take a look at the actual code too.

Barret

my checkpatch output:

--------------------------------------------------------------------------
../patches/0001-Completely-restructured-profiler-code-cutting-all-th.patch
--------------------------------------------------------------------------
ERROR: space required after that ',' (ctx:VxV)
#86: FILE: kern/drivers/dev/kprof.c:406:
+ n = profiler_read(va,n);
^

ERROR: space required after that ',' (ctx:VxV)
#161: FILE: kern/include/profiler.h:15:
+int profiler_read(void *va,int);
^

ERROR: space required before the open parenthesis '('
#1233: FILE: kern/src/profiler.c:202:
+ for(core = 0; core < num_cores; core++) {

total: 3 errors, 0 warnings, 419 lines checked

../patches/0001-Completely-restructured-profiler-code-cutting-all-th.patch has style problems, please review.
------------------------------------------------------
../patches/0002-Removed-unused-old-oprofile-code.patch
------------------------------------------------------
total: 0 errors, 0 warnings, 0 lines checked

../patches/0002-Removed-unused-old-oprofile-code.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------------------
../patches/0003-Do-not-race-when-multiple-init-happen-at-the-same-ti.patch
--------------------------------------------------------------------------
ERROR: do not initialise statics to 0 or NULL
#41: FILE: kern/src/profiler.c:33:
+static int profiler_users = 0;

total: 1 errors, 0 warnings, 185 lines checked

../patches/0003-Do-not-race-when-multiple-init-happen-at-the-same-ti.patch has style problems, please review.
--------------------------------------------------------------------------
../patches/0004-Added-struct-proc-pointer-to-base-of-the-executable-.patch
--------------------------------------------------------------------------
total: 0 errors, 0 warnings, 21 lines checked

../patches/0004-Added-struct-proc-pointer-to-base-of-the-executable-.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------------
../patches/0005-Added-full-binary-path-into-the-proc-structure.patch
--------------------------------------------------------------------
total: 0 errors, 0 warnings, 233 lines checked

../patches/0005-Added-full-binary-path-into-the-proc-structure.patch has no obvious style problems and is ready for submission.
----------------------------------------------------------------------
../patches/0006-Added-APIs-to-access-process-startup-information.patch
----------------------------------------------------------------------
total: 0 errors, 0 warnings, 46 lines checked

../patches/0006-Added-APIs-to-access-process-startup-information.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------------------
../patches/0007-Added-API-to-retrieve-the-full-path-of-a-struct-dent.patch
--------------------------------------------------------------------------
total: 0 errors, 0 warnings, 72 lines checked

../patches/0007-Added-API-to-retrieve-the-full-path-of-a-struct-dent.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------------------
../patches/0008-Extend-hash-enumeration-API-to-accept-an-opaque-poin.patch
--------------------------------------------------------------------------
total: 0 errors, 0 warnings, 106 lines checked

../patches/0008-Extend-hash-enumeration-API-to-accept-an-opaque-poin.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------------------
../patches/0009-Added-API-to-retrieve-the-current-set-of-processes-o.patch
--------------------------------------------------------------------------
total: 0 errors, 0 warnings, 68 lines checked

../patches/0009-Added-API-to-retrieve-the-current-set-of-processes-o.patch has no obvious style problems and is ready for submission.
------------------------------------------------------------------------
../patches/0010-Added-API-to-enumerate-the-VM-regions-of-a-process.patch
------------------------------------------------------------------------
total: 0 errors, 0 warnings, 27 lines checked

../patches/0010-Added-API-to-enumerate-the-VM-regions-of-a-process.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------------------
../patches/0011-Added-generic-backtrace-functions-to-allow-backtrace.patch
--------------------------------------------------------------------------
total: 0 errors, 0 warnings, 100 lines checked

../patches/0011-Added-generic-backtrace-functions-to-allow-backtrace.patch has no obvious style problems and is ready for submission.
---------------------------------------------------------------------
../patches/0012-Added-API-to-append-data-into-a-circular-buffer.patch
---------------------------------------------------------------------
WARNING: kfree(NULL) is safe and this check is probably not required
#108: FILE: kern/lib/circular_buffer.c:33:
+ if (cb->mem)
+ kfree(cb->mem);

total: 0 errors, 1 warnings, 169 lines checked

../patches/0012-Added-API-to-append-data-into-a-circular-buffer.patch has style problems, please review.
--------------------------------------------------
../patches/0013-Implemented-the-new-profiler.patch
--------------------------------------------------
ERROR: space required after that ',' (ctx:VxV)
#320: FILE: kern/drivers/dev/kprof.c:56:
+ {".", {Kprofdirqid, 0, QTDIR},0, DMDIR|0550},
^

WARNING: externs should be avoided in .c files
#329: FILE: kern/drivers/dev/kprof.c:65:
+extern int booting;

WARNING: kfree(NULL) is safe and this check is probably not required
#413: FILE: kern/drivers/dev/kprof.c:132:
+ if (kprof.pdata) {
+ kfree(kprof.pdata);

WARNING: Missing a blank line after declarations
#546: FILE: kern/drivers/dev/kprof.c:211:
+ int i;
+ ERRSTACK(1);

ERROR: spaces required around that ':' (ctx:VxW)
#615: FILE: kern/drivers/dev/kprof.c:264:
+ return kprof.pdata != NULL ? kprof.psize: profiler_size();
^

ERROR: space required before the open brace '{'
#646: FILE: kern/drivers/dev/kprof.c:291:
+ if (c->qid.type & QTDIR){

ERROR: "foo* const * bar" should be "foo * const *bar"
#816: FILE: kern/drivers/dev/kprof.c:420:
+ const char* const * cmds = profiler_configure_cmds();

ERROR: Macros with complex values should be enclosed in parentheses
#1191: FILE: kern/include/kdebug.h:51:
+#define trace_printx(args...) if (printx_on) trace_printk(TRUE, args)

ERROR: "foo* const *bar" should be "foo * const *bar"
#1242: FILE: kern/include/profiler.h:17:
+const char* const *profiler_configure_cmds(void);

WARNING: __packed is preferred over __attribute__((packed))
#1283: FILE: kern/include/ros/profiler_records.h:17:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1293: FILE: kern/include/ros/profiler_records.h:27:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1304: FILE: kern/include/ros/profiler_records.h:38:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1312: FILE: kern/include/ros/profiler_records.h:46:
+} __attribute__((packed));

ERROR: "foo* bar" should be "foo *bar"
#1335: FILE: kern/include/string.h:27:
+void *memchr(const void* mem, int chr, int len);

ERROR: do not initialise statics to 0 or NULL
#1433: FILE: kern/src/profiler.c:40:
+static int tracing = 0;

ERROR: "foo* bar" should be "foo *bar"
#1442: FILE: kern/src/profiler.c:50:
+static inline char* vb_encode_uint64(char* data, uint64_t n)

ERROR: "foo* bar" should be "foo *bar"
#1442: FILE: kern/src/profiler.c:50:
+static inline char* vb_encode_uint64(char* data, uint64_t n)

WARNING: kfree(NULL) is safe and this check is probably not required
#1743: FILE: kern/src/profiler.c:278:
+ if (profiler_percpu_ctx) {
+ kfree(profiler_percpu_ctx);

ERROR: else should follow close brace '}'
#1812: FILE: kern/src/profiler.c:330:
+ }
+ else

ERROR: "foo* const *bar" should be "foo * const *bar"
#1819: FILE: kern/src/profiler.c:336:
+const char* const *profiler_configure_cmds(void)

ERROR: spaces required around that ':' (ctx:VxW)
#1887: FILE: kern/src/profiler.c:387:
+ cpu_buf->tracing = onoff ? 1: -1;
^

ERROR: spaces required around that ':' (ctx:VxW)
#2001: FILE: kern/src/profiler.c:448:
+ return profiler_queue ? qlen(profiler_queue): 0;
^

ERROR: spaces required around that ':' (ctx:VxW)
#2007: FILE: kern/src/profiler.c:453:
+ return profiler_queue ? qread(profiler_queue, va, n): 0;
^

ERROR: "foo* bar" should be "foo *bar"
#2036: FILE: kern/src/string.c:121:
+memchr(const void* mem, int chr, int len)

total: 16 errors, 8 warnings, 1975 lines checked

../patches/0013-Implemented-the-new-profiler.patch has style problems, please review.
--------------------------------------------------------------------------
../patches/0014-Enabled-prof-kptrace-collection-of-anything-which-go.patch
--------------------------------------------------------------------------
WARNING: externs should be avoided in .c files
#34: FILE: kern/drivers/dev/kprof.c:66:
+extern system_timing_t system_timing;

ERROR: space required after that ',' (ctx:VxV)
#64: FILE: kern/include/stdarg.h:13:
+#define va_copy(d,s) __builtin_va_copy(d,s)
^

ERROR: space required after that ',' (ctx:VxV)
#64: FILE: kern/include/stdarg.h:13:
+#define va_copy(d,s) __builtin_va_copy(d,s)
^

total: 2 errors, 1 warnings, 56 lines checked

../patches/0014-Enabled-prof-kptrace-collection-of-anything-which-go.patch has style problems, please review.
---------------------------------------------------
../patches/0015-Added-kprof-to-perf-converter.patch
---------------------------------------------------
WARNING: line over 80 characters
#159: FILE: tools/profile/kprof2perf/kprof2perf.c:45:
+ fprintf(stderr, "%s: %d: Assertion failed: " #c "\n", __FILE__, __LINE__); \

ERROR: do not initialise statics to 0 or NULL
#201: FILE: tools/profile/kprof2perf/kprof2perf.c:87:
+static int debug_level = 0;

ERROR: do not initialise statics to 0 or NULL
#204: FILE: tools/profile/kprof2perf/kprof2perf.c:90:
+static struct static_mmap64 *static_mmaps = NULL;

ERROR: "foo* bar" should be "foo *bar"
#211: FILE: tools/profile/kprof2perf/kprof2perf.c:97:
+static inline const char* vb_decode_uint64(const char* data, uint64_t *pval)

ERROR: "foo* bar" should be "foo *bar"
#211: FILE: tools/profile/kprof2perf/kprof2perf.c:97:
+static inline const char* vb_decode_uint64(const char* data, uint64_t *pval)

WARNING: line over 80 characters
#352: FILE: tools/profile/kprof2perf/kprof2perf.c:238:
+static char *mem_block_write(struct mem_block *mb, const void *data, size_t size)

ERROR: spaces required around that ':' (ctx:VxW)
#373: FILE: tools/profile/kprof2perf/kprof2perf.c:259:
+ return (flags & MBWR_SOLID) ? (space >= size): (space > 0);
^

WARNING: line over 80 characters
#560: FILE: tools/profile/kprof2perf/kprof2perf.c:446:
+ struct proftype_kern_trace64 *rec = (struct proftype_kern_trace64 *) pr->data;

WARNING: line over 80 characters
#583: FILE: tools/profile/kprof2perf/kprof2perf.c:469:
+ struct proftype_user_trace64 *rec = (struct proftype_user_trace64 *) pr->data;

ERROR: "foo* bar" should be "foo *bar"
#619: FILE: tools/profile/kprof2perf/kprof2perf.c:505:
+ const uint64_t* ids, size_t nids)

WARNING: please, no space before tabs
#890: FILE: tools/profile/kprof2perf/perf_format.h:85:
+^I * ^I{ u32^I^I^Ipid, tid; } && PERF_SAMPLE_TID$

WARNING: please, no space before tabs
#891: FILE: tools/profile/kprof2perf/perf_format.h:86:
+^I * ^I{ u64^I^I^Itime; } && PERF_SAMPLE_TIME$

WARNING: please, no space before tabs
#892: FILE: tools/profile/kprof2perf/perf_format.h:87:
+^I * ^I{ u64^I^I^Iid; } && PERF_SAMPLE_ID$

WARNING: please, no space before tabs
#893: FILE: tools/profile/kprof2perf/perf_format.h:88:
+^I * ^I{ u64^I^I^Istream_id;} && PERF_SAMPLE_STREAM_ID$

WARNING: please, no space before tabs
#894: FILE: tools/profile/kprof2perf/perf_format.h:89:
+^I * ^I{ u32^I^I^Icpu, res; } && PERF_SAMPLE_CPU$

WARNING: please, no space before tabs
#924: FILE: tools/profile/kprof2perf/perf_format.h:119:
+^I * ^Istruct sample_id^I^Isample_id;$

WARNING: please, no space before tabs
#935: FILE: tools/profile/kprof2perf/perf_format.h:130:
+^I * ^Istruct sample_id^I^Isample_id;$

WARNING: please, no space before tabs
#946: FILE: tools/profile/kprof2perf/perf_format.h:141:
+^I * ^Istruct sample_id^I^Isample_id;$

WARNING: please, no space before tabs
#957: FILE: tools/profile/kprof2perf/perf_format.h:152:
+^I * ^Istruct sample_id^I^Isample_id;$

WARNING: please, no space before tabs
#969: FILE: tools/profile/kprof2perf/perf_format.h:164:
+^I * ^Istruct sample_id^I^Isample_id;$

WARNING: please, no space before tabs
#980: FILE: tools/profile/kprof2perf/perf_format.h:175:
+^I * ^Istruct sample_id^I^Isample_id;$

WARNING: please, no space before tabs
#1027: FILE: tools/profile/kprof2perf/perf_format.h:222:
+^I * ^I{ u64^I^I^Iabi; # enum perf_sample_regs_abi$

WARNING: please, no space before tabs
#1028: FILE: tools/profile/kprof2perf/perf_format.h:223:
+^I * ^I u64^I^I^Iregs[weight(mask)]; } && PERF_SAMPLE_REGS_USER$

WARNING: please, no space before tabs
#1030: FILE: tools/profile/kprof2perf/perf_format.h:225:
+^I * ^I{ u64^I^I^Isize;$

WARNING: please, no space before tabs
#1031: FILE: tools/profile/kprof2perf/perf_format.h:226:
+^I * ^I char^I^I^Idata[size];$

WARNING: please, no space before tabs
#1032: FILE: tools/profile/kprof2perf/perf_format.h:227:
+^I * ^I u64^I^I^Idyn_size; } && PERF_SAMPLE_STACK_USER$

WARNING: please, no space before tabs
#1059: FILE: tools/profile/kprof2perf/perf_format.h:254:
+^I * ^Istruct sample_id^I^Isample_id;$

WARNING: __packed is preferred over __attribute__((packed))
#1223: FILE: tools/profile/kprof2perf/perf_format.h:418:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1249: FILE: tools/profile/kprof2perf/perf_format.h:444:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1254: FILE: tools/profile/kprof2perf/perf_format.h:449:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1295: FILE: tools/profile/kprof2perf/perf_format.h:490:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1307: FILE: tools/profile/kprof2perf/perf_format.h:502:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1316: FILE: tools/profile/kprof2perf/perf_format.h:511:
+} __attribute__((packed));

WARNING: __packed is preferred over __attribute__((packed))
#1333: FILE: tools/profile/kprof2perf/perf_format.h:528:
+} __attribute__((packed));

total: 6 errors, 28 warnings, 1291 lines checked

../patches/0015-Added-kprof-to-perf-converter.patch has style problems, please review.
--------------------------------------------------------------------------
../patches/0016-Added-patch-to-Linux-perf-to-allow-it-to-work-with-A.patch
--------------------------------------------------------------------------
ERROR: trailing whitespace
#33: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:14:
+ $

ERROR: trailing whitespace
#37: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:18:
+ $

ERROR: trailing whitespace
#45: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:26:
+ $

ERROR: trailing whitespace
#51: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:32:
+ $

ERROR: trailing whitespace
#98: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:79:
+ $

ERROR: trailing whitespace
#102: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:83:
+ $

ERROR: trailing whitespace
#104: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:85:
+ $

ERROR: trailing whitespace
#116: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:97:
+ $

ERROR: trailing whitespace
#127: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:108:
+ $

ERROR: trailing whitespace
#133: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:114:
+ $

ERROR: trailing whitespace
#144: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:125:
+ $

ERROR: trailing whitespace
#149: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:130:
+ $

ERROR: trailing whitespace
#165: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:146:
+ $

ERROR: trailing whitespace
#167: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:148:
+ $

ERROR: trailing whitespace
#173: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:154:
+ $

ERROR: trailing whitespace
#177: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:158:
+ $

ERROR: trailing whitespace
#183: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:164:
+ $

ERROR: trailing whitespace
#188: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:169:
+ $

ERROR: trailing whitespace
#192: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:173:
+ $

ERROR: trailing whitespace
#212: FILE: tools/profile/kprof2perf/perf_patches/perf_patch.diff:193:
+ $

total: 20 errors, 0 warnings, 201 lines checked

NOTE: Whitespace errors detected.
You may wish to use scripts/cleanpatch or scripts/cleanfile

../patches/0016-Added-patch-to-Linux-perf-to-allow-it-to-work-with-A.patch has style problems, please review.
-----------------------------------------------------------
../patches/0017-Added-simple-netcat-client-for-Akaros.patch
-----------------------------------------------------------
total: 0 errors, 0 warnings, 157 lines checked

../patches/0017-Added-simple-netcat-client-for-Akaros.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------
../patches/0018-Removed-lingering-error-string-variables.patch
--------------------------------------------------------------
total: 0 errors, 0 warnings, 49 lines checked

../patches/0018-Removed-lingering-error-string-variables.patch has no obvious style problems and is ready for submission.
--------------------------------------------------------------
../patches/0019-Added-test-case-for-circular-buffer-code.patch
--------------------------------------------------------------
WARNING: please write a paragraph that describes the config symbol fully
#22: FILE: kern/src/ktest/Kconfig.postboot:120:
+config TEST_circular_buffer

total: 0 errors, 1 warnings, 110 lines checked

../patches/0019-Added-test-case-for-circular-buffer-code.patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.

Kevin Klues

unread,

Nov 11, 2015, 11:49:14 AM11/11/15

to aka...@googlegroups.com

I have some git-hooks I wrote to perform checkpatch automatically on
every commit. They haven't made their way back to master yet, but they
are on my 'bootstrap' branch on github. Just copy these files into
your $AKAROS_ROOT/.git folder to apply them.

https://github.com/klueska/akaros/tree/bootstrap/scripts/git-hooks

Kevin

> --
> You received this message because you are subscribed to the Google Groups "Akaros" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to akaros+un...@googlegroups.com.
> To post to this group, send email to aka...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
~Kevin

Barret Rhoden

unread,

Nov 11, 2015, 11:55:58 AM11/11/15

to aka...@googlegroups.com

On 2015-11-11 at 08:48 Kevin Klues <klu...@gmail.com> wrote:
> I have some git-hooks I wrote to perform checkpatch automatically on
> every commit. They haven't made their way back to master yet, but they
> are on my 'bootstrap' branch on github. Just copy these files into
> your $AKAROS_ROOT/.git folder to apply them.
>
> https://github.com/klueska/akaros/tree/bootstrap/scripts/git-hooks

likewise, if you want to do the check right before submission, take
your FROM and TO (which you use for either request-pull or send-email):

$ git format-patch -o /tmp/SOMEDIR/ FROM..TO
$ ./scripts/checkpatch.pl /tmp/SOMEDIR/*.patch
(some time later)
$ rm /tmp/SOMEDIR/*.patch

in lieu of SOMEDIR, you can drop -o and just use your current directory
too.

barret

Barret Rhoden

unread,

Nov 11, 2015, 12:07:43 PM11/11/15

to aka...@googlegroups.com

On 2015-10-28 at 15:06 "'Davide Libenzi' via Akaros"
<aka...@googlegroups.com> wrote:

> Since the previous one has not been done yet, this is a cumulative
> one:
>
> https://github.com/brho/akaros/compare/master...dlibenzi:profiler_dev_v2
>
>
> The following changes since commit
> 6f3723cd8f883260a78fdf411911d7469464caa5:
>
> Update file-posix.c utest (2015-10-15 12:07:00 -0400)
>
> are available in the git repository at:
>
> g...@github.com:dlibenzi/akaros profiler_dev_v2
>
> for you to fetch changes up to
> 6e40b75927c808dd945cb3ad0ee6e52a5a2dc495:
>
> Added simple netcat client for Akaros (2015-10-28 14:49:56 -0700)

These patches don't apply cleanly due to the recent pragma_once and a
few other things (like device structure renames). If you get a chance,
can you fix that up? I'll still read through them in the meantime.

Thanks,

Barret

Davide Libenzi

unread,

Nov 11, 2015, 12:19:28 PM11/11/15

to Akaros

On Wed, Nov 11, 2015 at 8:43 AM, Barret Rhoden <br...@cs.berkeley.edu> wrote:

Please run checkpatch on your patches before sending them in. A bunch
of these are false positives, but many are not.

Let me take a look ...

my checkpatch output:

--------------------------------------------------------------------------
../patches/0001-Completely-restructured-profiler-code-cutting-all-th.patch
--------------------------------------------------------------------------
ERROR: space required after that ',' (ctx:VxV)
#86: FILE: kern/drivers/dev/kprof.c:406:
+ n = profiler_read(va,n);

Final code (tip of HEAD) does not have that anymore.

^

ERROR: space required after that ',' (ctx:VxV)
#161: FILE: kern/include/profiler.h:15:
+int profiler_read(void *va,int);

Ditto.

^

ERROR: space required before the open parenthesis '('
#1233: FILE: kern/src/profiler.c:202:
+ for(core = 0; core < num_cores; core++) {

Ditto.

ERROR: do not initialise statics to 0 or NULL
#41: FILE: kern/src/profiler.c:33:
+static int profiler_users = 0;

Fixing now.

WARNING: kfree(NULL) is safe and this check is probably not required
#108: FILE: kern/lib/circular_buffer.c:33:
+ if (cb->mem)
+ kfree(cb->mem);

Existing code, not mine, that now is gone, BTW.

#320: FILE: kern/drivers/dev/kprof.c:56:
+ {".", {Kprofdirqid, 0, QTDIR},0, DMDIR|0550},
^

Existing code, fixing now.

WARNING: externs should be avoided in .c files
#329: FILE: kern/drivers/dev/kprof.c:65:
+extern int booting;

Yes, this is an existing one, I can clean up, but please, do not hold this CRs any further for stuff like this.

I will post a follow up patches for things like these.

WARNING: kfree(NULL) is safe and this check is probably not required
#413: FILE: kern/drivers/dev/kprof.c:132:
+ if (kprof.pdata) {
+ kfree(kprof.pdata);

It does not only kfree, but anyway, fixing this as well.

WARNING: Missing a blank line after declarations
#546: FILE: kern/drivers/dev/kprof.c:211:
+ int i;
+ ERRSTACK(1);

False positive. BTW, empty like after var decls is great for me (my standard), but 99% of existing code place absolutely no spacing, not only between vars and code, but also between logical groups on instructions.

ERROR: spaces required around that ':' (ctx:VxW)
#615: FILE: kern/drivers/dev/kprof.c:264:
+ return kprof.pdata != NULL ? kprof.psize: profiler_size();

^

OK, fixing.

ERROR: space required before the open brace '{'
#646: FILE: kern/drivers/dev/kprof.c:291:
+ if (c->qid.type & QTDIR){

ERROR: "foo* const * bar" should be "foo * const *bar"
#816: FILE: kern/drivers/dev/kprof.c:420:
+ const char* const * cmds = profiler_configure_cmds();

Fixing.

ERROR: Macros with complex values should be enclosed in parentheses
#1191: FILE: kern/include/kdebug.h:51:
+#define trace_printx(args...) if (printx_on) trace_printk(TRUE, args)

Existing code, fixing now.

ERROR: "foo* const *bar" should be "foo * const *bar"
#1242: FILE: kern/include/profiler.h:17:
+const char* const *profiler_configure_cmds(void);

Fixing.

WARNING: __packed is preferred over __attribute__((packed))
#1283: FILE: kern/include/ros/profiler_records.h:17:
+} __attribute__((packed));

ENOENT

ERROR: "foo* bar" should be "foo *bar"
#1335: FILE: kern/include/string.h:27:
+void *memchr(const void* mem, int chr, int len);

Existing code (a bunch there use the same syntax), fixing that only.

ERROR: do not initialise statics to 0 or NULL
#1433: FILE: kern/src/profiler.c:40:
+static int tracing = 0;

Dupe, fixing.

ERROR: "foo* bar" should be "foo *bar"
#1442: FILE: kern/src/profiler.c:50:
+static inline char* vb_encode_uint64(char* data, uint64_t n)

Fixing.

ERROR: "foo* bar" should be "foo *bar"
#1442: FILE: kern/src/profiler.c:50:
+static inline char* vb_encode_uint64(char* data, uint64_t n)

Dupe.

WARNING: kfree(NULL) is safe and this check is probably not required
#1743: FILE: kern/src/profiler.c:278:
+ if (profiler_percpu_ctx) {
+ kfree(profiler_percpu_ctx);

Fixing...

ERROR: else should follow close brace '}'
#1812: FILE: kern/src/profiler.c:330:
+ }
+ else

Fixing.

ERROR: "foo* const *bar" should be "foo * const *bar"
#1819: FILE: kern/src/profiler.c:336:
+const char* const *profiler_configure_cmds(void)

Dupe.

ERROR: spaces required around that ':' (ctx:VxW)
#1887: FILE: kern/src/profiler.c:387:
+ cpu_buf->tracing = onoff ? 1: -1;

Fixing ...

^

ERROR: spaces required around that ':' (ctx:VxW)
#2001: FILE: kern/src/profiler.c:448:
+ return profiler_queue ? qlen(profiler_queue): 0;

Fixing ...

^

ERROR: spaces required around that ':' (ctx:VxW)
#2007: FILE: kern/src/profiler.c:453:
+ return profiler_queue ? qread(profiler_queue, va, n): 0;

Fixing ...

^

ERROR: "foo* bar" should be "foo *bar"
#2036: FILE: kern/src/string.c:121:
+memchr(const void* mem, int chr, int len)

Existing code, fixing ...

total: 16 errors, 8 warnings, 1975 lines checked

../patches/0013-Implemented-the-new-profiler.patch has style problems, please review.
--------------------------------------------------------------------------
../patches/0014-Enabled-prof-kptrace-collection-of-anything-which-go.patch
--------------------------------------------------------------------------
WARNING: externs should be avoided in .c files
#34: FILE: kern/drivers/dev/kprof.c:66:
+extern system_timing_t system_timing;

ERROR: space required after that ',' (ctx:VxV)
#64: FILE: kern/include/stdarg.h:13:
+#define va_copy(d,s) __builtin_va_copy(d,s)
^

ERROR: space required after that ',' (ctx:VxV)
#64: FILE: kern/include/stdarg.h:13:
+#define va_copy(d,s) __builtin_va_copy(d,s)

Following existing code syntax. Changed now.

^

total: 2 errors, 1 warnings, 56 lines checked

../patches/0014-Enabled-prof-kptrace-collection-of-anything-which-go.patch has style problems, please review.
---------------------------------------------------
../patches/0015-Added-kprof-to-perf-converter.patch
---------------------------------------------------
WARNING: line over 80 characters
#159: FILE: tools/profile/kprof2perf/kprof2perf.c:45:
+ fprintf(stderr, "%s: %d: Assertion failed: " #c "\n", __FILE__, __LINE__); \

Fixing ...

ERROR: do not initialise statics to 0 or NULL
#201: FILE: tools/profile/kprof2perf/kprof2perf.c:87:
+static int debug_level = 0;

Fixing ...

ERROR: do not initialise statics to 0 or NULL
#204: FILE: tools/profile/kprof2perf/kprof2perf.c:90:
+static struct static_mmap64 *static_mmaps = NULL;

Fixing ...

ERROR: "foo* bar" should be "foo *bar"
#211: FILE: tools/profile/kprof2perf/kprof2perf.c:97:
+static inline const char* vb_decode_uint64(const char* data, uint64_t *pval)

ERROR: "foo* bar" should be "foo *bar"
#211: FILE: tools/profile/kprof2perf/kprof2perf.c:97:
+static inline const char* vb_decode_uint64(const char* data, uint64_t *pval)

Buy one, get two. Fixing now ...

WARNING: line over 80 characters
#352: FILE: tools/profile/kprof2perf/kprof2perf.c:238:
+static char *mem_block_write(struct mem_block *mb, const void *data, size_t size)

Fixing ...

ERROR: spaces required around that ':' (ctx:VxW)
#373: FILE: tools/profile/kprof2perf/kprof2perf.c:259:
+ return (flags & MBWR_SOLID) ? (space >= size): (space > 0);

Fixing ...

^

WARNING: line over 80 characters
#560: FILE: tools/profile/kprof2perf/kprof2perf.c:446:
+ struct proftype_kern_trace64 *rec = (struct proftype_kern_trace64 *) pr->data;

WARNING: line over 80 characters
#583: FILE: tools/profile/kprof2perf/kprof2perf.c:469:
+ struct proftype_user_trace64 *rec = (struct proftype_user_trace64 *) pr->data;

Fixing both ...

ERROR: "foo* bar" should be "foo *bar"
#619: FILE: tools/profile/kprof2perf/kprof2perf.c:505:
+ const uint64_t* ids, size_t nids)

Fixing ...

This whole bunch coming from Linux. Figure that. Fixed now.

This is a patch file. It is going to fail.

WARNING: please write a paragraph that describes the config symbol fully
#22: FILE: kern/src/ktest/Kconfig.postboot:120:
+config TEST_circular_buffer

Does not apply.

I will fix those in a follow commit on that branch.

Davide Libenzi

unread,

Nov 11, 2015, 12:26:40 PM11/11/15

to Akaros

I actually slotted the changes in the same commit stream.

Davide Libenzi

unread,

Nov 11, 2015, 12:28:54 PM11/11/15

to Akaros

Thanks will do that.

What is the policy WRT the code the already violates the rules?

Kevin Klues

unread,

Nov 11, 2015, 12:34:29 PM11/11/15

to aka...@googlegroups.com

On Wed, Nov 11, 2015 at 9:28 AM, 'Davide Libenzi' via Akaros
<aka...@googlegroups.com> wrote:
> Thanks will do that.
> What is the policy WRT the code the already violates the rules?

My general rule has been: if checkpatch catches it, then fix it, even
if I didn't write it. It will only look at code that has changed, so
this applies to code you may not have written but are rearranging for
some reason.

Davide Libenzi

unread,

Nov 11, 2015, 12:39:20 PM11/11/15

to Akaros

Thanks Kevin. I already have a pre-commit there. Should I overwrite?

Kevin Klues

unread,

Nov 11, 2015, 12:45:18 PM11/11/15

to aka...@googlegroups.com

Unless you have custom stuff in there already, it is just an extension
of the default one, so it is ok to overwrite it.

On Wed, Nov 11, 2015 at 9:39 AM, 'Davide Libenzi' via Akaros
<aka...@googlegroups.com> wrote:
> Thanks Kevin. I already have a pre-commit there. Should I overwrite?
>

Davide Libenzi

unread,

Nov 11, 2015, 12:47:39 PM11/11/15

to Akaros

OK, will overwrite! Seems OK, according to the diff.

Davide Libenzi

unread,

Nov 11, 2015, 1:06:37 PM11/11/15

to Akaros

On Wed, Nov 11, 2015 at 9:07 AM, Barret Rhoden <br...@cs.berkeley.edu> wrote:

These patches don't apply cleanly due to the recent pragma_once and a
few other things (like device structure renames). If you get a chance,
can you fix that up? I'll still read through them in the meantime.

Apply where? Staging?

Davide Libenzi

unread,

Nov 11, 2015, 1:22:00 PM11/11/15

to Akaros

Should I rebase that branch against staging?

Kevin Klues

unread,

Nov 11, 2015, 1:23:52 PM11/11/15

to aka...@googlegroups.com

Yeah, I think that's what he's implying.

Davide Libenzi

unread,

Nov 11, 2015, 1:27:29 PM11/11/15

to Akaros

Done.

Davide Libenzi

unread,

Nov 11, 2015, 1:40:55 PM11/11/15

to Akaros

Or:

$ git request-pull -p A B C | ./scripts/checkpatch.pl -f -

Should work as well.

Barret Rhoden

unread,

Nov 12, 2015, 3:02:05 PM11/12/15

to aka...@googlegroups.com

Hi -

Overall, I'm a big fan of this patch set. But I do have a bunch of
concerns. Some minor, some bigger. See below.

Thanks,

Barret

On 2015-10-28 at 15:06 "'Davide Libenzi' via Akaros"

<aka...@googlegroups.com> wrote:
> Since the previous one has not been done yet, this is a cumulative
> one:
>
> https://github.com/brho/akaros/compare/master...dlibenzi:profiler_dev_v2
>
>
> The following changes since commit
> 6f3723cd8f883260a78fdf411911d7469464caa5:
>
> Update file-posix.c utest (2015-10-15 12:07:00 -0400)
>
> are available in the git repository at:
>
> g...@github.com:dlibenzi/akaros profiler_dev_v2
>
> for you to fetch changes up to
> 6e40b75927c808dd945cb3ad0ee6e52a5a2dc495:
>
> Added simple netcat client for Akaros (2015-10-28 14:49:56 -0700)

> From 2ba13e8925bd0524a13748beb71beb797e79831f Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Tue, 20 Oct 2015 15:53:08 -0700
> Subject: Completely restructured profiler code cutting all the unused code

> diff --git a/kern/include/profiler.h b/kern/include/profiler.h
> new file mode 100644
> index 000000000000..2ec9d4308632
> --- /dev/null
> +++ b/kern/include/profiler.h

New, non-trivial files need a copyright header.

> @@ -0,0 +1,18 @@
> +
> +#ifndef ROS_KERN_INC_PROFILER_H
> +#define ROS_KERN_INC_PROFILER_H

Can also do the pragma once now.

> diff --git a/kern/src/profiler.c b/kern/src/profiler.c
> new file mode 100644
> index 000000000000..ca7197bbc20b
> --- /dev/null
> +++ b/kern/src/profiler.c

Needs a copyright. It looks like this might be new code, but if there's
old stuff from oprofile in it, we need to put that. I see there are a
couple things like "op_entry" still around.

> +static inline int profiler_add_sample(struct profiler_cpu_context *cpu_buf,
> + uintptr_t pc, unsigned long event)

Do you want this to return a bool (TRUE for success) or an int (0 for
success)? See below:

> +{
> + ERRSTACK(1);
> + struct op_entry entry;
> + struct block *b;
> +
> + if (waserror()) {
> + poperror();
> + printk("%s: failed\n", __func__);
> + return 1;

error, value == 1

> + }
> +
> + b = profiler_cpu_buffer_write_reserve(cpu_buf, &entry, 0);
> + if (likely(b)) {
> + entry.sample->hdr = profiler_create_header(core_id(), 1);
> + entry.sample->event = (uint64_t) event;
> + profiler_cpu_buffer_add_data(&entry, &pc, 1);
> + }
> + poperror();
> +
> + return b == NULL;

success, value == 1.

> From 0a919619324782e873ddf5bbf9bd19e989f25162 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Tue, 20 Oct 2015 16:34:26 -0700
> Subject: Do not race when multiple init happen at the same time

> @@ -65,53 +64,63 @@ static inline size_t profiler_cpu_buffer_add_data(struct op_entry *entry,
> static void free_cpu_buffers(void)
> {
> kfree(profiler_percpu_ctx);
> + profiler_percpu_ctx = NULL;
> +
> + qclose(profiler_queue);
> + profiler_queue = NULL;
> }

Isn't this stuff possibly used from IRQ context? What's the plan for
cleaning up such that we are sure that there are no interrupts going off
while you are freeing things?

In:
> static int alloc_cpu_buffers(void)
> {
> - if (!profiler_queue) {
> - profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> - if (!profiler_queue)
> - goto fail;
> - }
> + int i;
> +
> + profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> + if (!profiler_queue)
> + return -ENOMEM;
>
> - /* we *really* don't want to block. Losing data is better. */
> qdropoverflow(profiler_queue, 1);

Then later:

> + qnonblock(profiler_queue, 1);

This is fragile. qnonblock means "error with EAGAIN if we're full".
Right now, you'll get by, since you still have qdropoverflow above. But
this could break with an error() thrown from IRQ context somewhere down
the line.

> void profiler_cleanup(void)
> {
> - free_cpu_buffers();
> + sem_down(&mtx);
> + profiler_users--;
> + if (profiler_users == 0)
> + free_cpu_buffers();
> + sem_up(&mtx);
> }

This might be a candidate for a kref, depending on how cleanup ends up
working. No need to change it now though.
> From fd096815dc200a550ea5e6b6d7f133df75e29ed9 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Thu, 22 Oct 2015 14:30:26 -0700
> Subject: Added full binary path into the proc structure
>

> @@ -791,6 +763,7 @@ static int sys_exec(struct proc *p, char *path, size_t path_l,
> t_path = copy_in_path(p, path, path_l);
> if (!t_path)
> return -1;
> + proc_replace_binary_path(p, t_path);

If we error out after this point, we'll have changed the path of the binary to
the new one even though the exec failed. I'm okay with that, just more of an
FYI.

> From 47bf34b1bc96b78a8091726cb3016b1f0d964847 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Thu, 22 Oct 2015 10:56:37 -0700
> Subject: Added APIs to access process startup information
>
> +char *get_startup_argv(struct proc *p, size_t idx, char *argp,
> + size_t max_size)
> +{
> + size_t stack_space = (const char *) USTACKTOP - (const char *) p->args_base;
> + const char *sptr = (const char *) p->args_base + sizeof(size_t) +
> + idx * sizeof(char *);
> + const char *argv = NULL;
> +
> + /* TODO,DL: Use copy_from_user() when available.
> + */

You've got these now, though memcpy_from_user will do it. (or will soon).
Either way, you don't need the TODO.

> + if (memcpy_from_user(p, &argv, sptr, sizeof(char *)))
> + return NULL;
> +
> + /* TODO,DL: Use strncpy_from_user() when available.
> + */
> + max_size = MIN(max_size, stack_space);
> + if (memcpy_from_user(p, argp, argv, max_size))
> + return NULL;
> + argp[max_size - 1] = 0;
> +
> + return argp;
> +}

How important is it that the command line data hasn't changed? We're trusting
the user to not have changed it since they were invoked.

Also, I guess we're going to want to parse the argv stuff later. Kevin put
together some code to pack and parse the args earlier. It might be easier and
to keep that blob around and parse it instead of this. Not a huge deal though.
> From 1aede03303debfe3075aa7055c6f1dd843f806bc Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Tue, 27 Oct 2015 15:29:10 -0700
> Subject: Added API to enumerate the VM regions of a process

> --- a/kern/include/mm.h
> +++ b/kern/include/mm.h
> @@ -51,6 +51,9 @@ void isolate_vmrs(struct proc *p, uintptr_t va, size_t len);
> void unmap_and_destroy_vmrs(struct proc *p);
> int duplicate_vmrs(struct proc *p, struct proc *new_p);
> void print_vmrs(struct proc *p);
> +void enumerate_vrms(struct proc *p,
^enumerate_vmrs

> diff --git a/kern/src/mm.c b/kern/src/mm.c
> index ca91e63620b6..06a349840404 100644
> --- a/kern/src/mm.c
> +++ b/kern/src/mm.c
> @@ -359,6 +359,18 @@ void print_vmrs(struct proc *p)
> vmr->vm_file, vmr->vm_foff);
> }
>
> +void enumerate_vrms(struct proc *p,
^enumerate_vmrs

> From 2df52e6bc39bf90a1c90e5382298fa2c2a8316f5 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Thu, 5 Nov 2015 17:30:24 -0800
> Subject: Added API to append data into a circular buffer
>
> Added API to append data into a circular buffer. The buffer data is
> allocated once at init time, and no more allocations happen after that.

> diff --git a/kern/include/circular_buffer.h b/kern/include/circular_buffer.h
> new file mode 100644
> index 000000000000..4ab8c42e2673
> --- /dev/null
> +++ b/kern/include/circular_buffer.h

Can you put in a little info about how to use this and what the rules are? For
instance, what happens to writers when the buffer is full (overwrite, block, or
drop?) or to readers when the buffer is empty? (looks like you just get 0). If
a reader uses an offset, does that just drop a chunk from the buffer? What is
the cbuf_size_t for?

> --- /dev/null
> +++ b/kern/lib/circular_buffer.c

> +void circular_buffer_clear(struct circular_buffer *cb)
> +{
> + if (cb->base) {

> + if (cb->mem)
> + kfree(cb->mem);

It's a little surprising that clear also frees the allocation, instead of just
resetting everything back to 0.

> + cb->rdptr = cb->wrptr = cb->base = cb->mem = NULL;
> + cb->size = cb->allocated = 0;
> + }
> +}
> +
> +static bool circular_buffer_is_overlap(const struct circular_buffer *cb,
> + const char *rptr, const char *wptr,
> + size_t size)
> +{
> + return (cb->size > 0) && (rptr >= wptr) && (rptr < (wptr + size));
> +}
> +
> +static void circular_buffer_write_skip(struct circular_buffer *cb, char *wrptr,
> + size_t size)

What are these helpers doing?

> +size_t circular_buffer_read(struct circular_buffer *cb, char *data, size_t size,
> + size_t off)
> +{
> + size_t asize = cb->size, rsize = 0;
> + const char *rdptr = cb->rdptr;
> +
> + while (asize > 0 && size > 0) {
> + size_t bsize = *(const cbuf_size_t *) rdptr;
> +
> + if (likely(bsize)) {
> + size_t esize = bsize - sizeof(cbuf_size_t);
> +
> + if (off >= esize) {
> + off -= esize;
> + } else {
> + size_t csize = MIN(esize - off, size);
> +
> + memcpy(data, rdptr + sizeof(cbuf_size_t) + off, csize);

So every block of data has the size first, then the data, all packed together?

> From c3c2bb0701f747e5c5c3f3d30cdf033137ae26e2 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Wed, 21 Oct 2015 16:39:04 -0700
> Subject: Implemented the new profiler
>
> Implemented the new profiler format and added simple userspace
> stack trace (waiting for copy_from_user()).

This is a huge commit that would have been easier to understand if it was broken
up a bit. For instance, I see there's a user backtrace list, syscall tracing
isn't using a qio queue anymore, the old Plan 9 profiler is gone (great!), and a
bunch of other things. It also looks like you squeezed in the change that the
start command also turns on the timers for all cores, which isn't something I
wanted.

> --- a/kern/arch/x86/kdebug.c
> +++ b/kern/arch/x86/kdebug.c
> @@ -365,6 +365,25 @@ size_t backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
> return nr_pcs;
> }
>
> +size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
> + size_t nr_slots)
> +{
> + size_t nr_pcs = 0;
> +
> + for (;;) {
> + if (unlikely(fp >= UMAPTOP) || unlikely(fp < BRK_END) ||

Why not is_user_raddr()? And once you use copy_from, those checks aren't needed
either. Also, I think it is possible for frame pointers to be below BRK_END,
which could happen if a user stack was malloc'd instead of mmapped.

> + unlikely(nr_pcs >= nr_slots))
> + break;
> +
> + /* For now straight memory access, waiting for copy_from_user(). */

copy_from is now available. =)

> + pcs[nr_pcs++] = pc;
> + pc = *(uintptr_t*)(fp + sizeof(uintptr_t));
> + fp = *(uintptr_t*)fp;
> + }
> +
> + return nr_pcs;
> +}
> +
> diff --git a/kern/drivers/dev/kprof.c b/kern/drivers/dev/kprof.c
> index da11b7e44d32..76e53d7f9d30 100644

> +static int oprof_timer_period = 1000;
^ rename prof_timer_period

> +static void kprof_start_profiler(void)
> +{
> + ERRSTACK(2);
>
> + sem_down(&kprof.lock);

If you're using a semaphore as a binary mutex (lock), please use a qlock. This
happens a bunch in this patch set.

> + if (waserror()) {
> + sem_up(&kprof.lock);
> + nexterror();
> + }
> + if (!kprof.profiling) {
> + profiler_init();
> + if (waserror()) {
> + profiler_cleanup();
> + nexterror();
> + }
> +
> + profiler_control_trace(1);
> +
> + for (int i = 0; i < num_cores; i++)
> + kprof_enable_timer(i, 1);

This looks like starting the profiler turns on the timer. I wanted to keep the
timer separate from the collection, such that the timer is one of potentially
many sources of data, as discussed in another email.

> +static void kprof_usage_fail(void)
> +{
> + static const char *ctlstring = "clear|start|stop|timer";

> + const char* const * cmds = profiler_configure_cmds();

> + size_t i, size, csize;
> + char msgbuf[128];
> +
> + strncpy(msgbuf, ctlstring, sizeof(msgbuf));
> + size = MIN(strlen(ctlstring), sizeof(msgbuf) - 1);
> + for (i = 0; cmds[i]; i++) {
> + csize = strlen(cmds[i]);
> + if ((csize + size + 2) > sizeof(msgbuf))
> + break;
> + msgbuf[size] = '|';
> + memcpy(msgbuf + size + 1, cmds[i], csize);
> + size += csize + 1;
> + }
> + msgbuf[size] = 0;
> +
> + error(EFAIL, msgbuf);
> +}

This is cool, and the pattern tends to come up a lot, where we'd like to
generate usage information. Perhaps there's a Plan 9 helper for this already?
(I don't know off the top of my head, maybe the Plan 9 guys do). We already
have stuff that parses command strings. If we don't have a helper, we can make
one, so that every device doesn't have to redo this logic.

Another part to it is that a device has several layers of arguments and commands
- it'd be neat if that was clearly written down somewhere and then we could
generate these switches and error messages more easily. This isn't something
that needs to be done now, but I like your kprof_usage_fail and it brought the
topic up. =)

> +static struct trace_printk_buffer *kprof_get_printk_buffer(void)
> +{
> + static struct trace_printk_buffer boot_tpb;
> + static struct trace_printk_buffer *cpu_tpbs;
> +
> + if (unlikely(booting))
> + return &boot_tpb;
> + if (unlikely(!cpu_tpbs)) {
> + /* Poor man per-CPU data structure. I really do no like littering global
> + * data structures with module specific data.
> + */
> + spin_lock_irqsave(&ktrace_lock);
> + if (!cpu_tpbs)
> + cpu_tpbs = kzmalloc(num_cores * sizeof(struct trace_printk_buffer),
> + 0);

If this alloc fails, (flags == 0), then we'll return 0 (+coreid) and PF later.

> + spin_unlock_irqsave(&ktrace_lock);
> }
> +
> + return cpu_tpbs + core_id();
> }

> --- a/kern/include/kdebug.h
> +++ b/kern/include/kdebug.h
> @@ -46,7 +48,7 @@ int printdump(char *buf, int buflen, uint8_t *data);
> extern bool printx_on;
> void set_printx(int mode);
> #define printx(args...) if (printx_on) printk(args)
> -#define trace_printx(args...) if (printx_on) trace_printk(args)

> +#define trace_printx(args...) if (printx_on) trace_printk(TRUE, args)

^FALSE?
We just want to print text, not a backtrace. That was also my intent for
trace_printk (it's just the print, not a backtrace).

> --- a/kern/src/mm.c
> +++ b/kern/src/mm.c
> @@ -24,6 +24,7 @@
> #include <kmalloc.h>
> #include <vfs.h>
> #include <smp.h>
> +#include <profiler.h>
>
> struct kmem_cache *vmr_kcache;
>
> @@ -692,6 +693,9 @@ void *do_mmap(struct proc *p, uintptr_t addr, size_t len, int prot, int flags,
> }
> }
> spin_unlock(&p->vmr_lock);
> +
> + profiler_notify_mmap(p, addr, len, prot, flags, file, offset);
> +

Do you need to know when something was unmapped?

> diff --git a/kern/src/process.c b/kern/src/process.c
> index bd26789e3ef1..2e4c69518e8a 100644
> --- a/kern/src/process.c
> +++ b/kern/src/process.c
> @@ -331,6 +331,8 @@ error_t proc_alloc(struct proc **pp, struct proc *parent, int flags)
> kmem_cache_free(proc_cache, p);
> return -ENOFREEPID;
> }
> + if (parent && parent->binary_path)
> + kstrdup(&p->binary_path, parent->binary_path);

Is this to fix a leak where paths weren't set for forked processes? If so, this
is an example of a fixup that could have been squashed into a previous commit.

> --- a/kern/src/profiler.c
> +++ b/kern/src/profiler.c

> +static inline char* vb_encode_uint64(char* data, uint64_t n)

> +{
> + for (; n >= 0x80; n >>= 7)
> + *data++ = (char) (n | 0x80);
> + *data++ = (char) n;
> +
> + return data;
> +}

This probably should be in a library or header (so we can reuse it) and with a
little explanation. It looks like we take 7 bits of n at a time and push it
into *data, with bit 0x80 signifying we're done or not.

> +static struct block *profiler_buffer_write(struct profiler_cpu_context *cpu_buf,
> + struct block *b)
> {
> - return (((uint64_t) 0xee01) << 48) | ((uint64_t) cpu << 16) |
> - (uint64_t) nbt;
> + if (b) {
> + qibwrite(profiler_queue, b);
> +
> + if (qlen(profiler_queue) > profiler_queue_limit) {
> + b = qget(profiler_queue);
> + if (likely(b)) {
> + cpu_buf->dropped_data_size += BLEN(b);
> + freeb(b);
> + }

This seems like a candidate for a feature added to qio, such as a "drop from the
front" mode.

> + }
> + }
> +
> + return iallocb(profiler_cpu_buffer_size);
> }

This function seems a little weird. If you're given a block, you write it.
That makes sense. But regardless, you alloc a new block?

Seems like we should not be too cute by having write return a fresh block, and
just do the allocation separately.

> +static void profiler_emit_current_system_status(void)
> {
> - struct block *b = cpu_buf->block;
> - size_t totalsize = sizeof(struct op_sample) +
> - size * sizeof(entry->sample->data[0]);
> -
> - if (unlikely((!b) || (b->lim - b->wp) < totalsize)) {
> - if (b)
> - qibwrite(profiler_queue, b);
> - /* For now. Later, we will grab a block off the
> - * emptyblock queue.
> - */
> - cpu_buf->block = b = iallocb(profiler_cpu_buffer_size);
> - if (unlikely(!b)) {
> - printk("%s: fail\n", __func__);
> - return NULL;
> + void enum_proc(struct vm_region *vmr, void *opaque)
> + {
> + struct proc *p = (struct proc *) opaque;
> +
> + profiler_notify_mmap(p, vmr->vm_base, vmr->vm_end - vmr->vm_base,
> + vmr->vm_prot, vmr->vm_flags, vmr->vm_file,
> + vmr->vm_foff);
> + }
> +
> + ERRSTACK(2);

I think you only need 1 here. It's based on the number of waserrors.

> + struct process_set pset;
> +
> + proc_get_set(&pset);
> + if (waserror()) {
> + proc_free_set(&pset);
> + nexterror();
> + }
> +
> + for (size_t i = 0; i < pset.num_processes; i++)
> + enumerate_vrms(pset.procs[i], enum_proc, pset.procs[i]);
^vmrs

> + poperror();
> + proc_free_set(&pset);
> +}
> +
> +static inline int profiler_is_tracing(struct profiler_cpu_context *cpu_buf)

Since this is named "is_tracing", it should be a bool, and return TRUE/False to
avoid any confusion.

> -static inline int profiler_add_sample(struct profiler_cpu_context *cpu_buf,
> - uintptr_t pc, unsigned long event)
> +static void alloc_cpu_buffers(void)
> {
> ERRSTACK(1);
> - struct op_entry entry;
> - struct block *b;
> + int i;
>
> + profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> + if (!profiler_queue)
> + error(-ENOMEM, NULL);
^just ENOMEM

> if (waserror()) {
> - poperror();
> - printk("%s: failed\n", __func__);
> - return 1;
> + free_cpu_buffers();
> + nexterror();
> }
>
> - b = profiler_cpu_buffer_write_reserve(cpu_buf, &entry, 0);
> - if (likely(b)) {
> - entry.sample->hdr = profiler_create_header(core_id(), 1);
> - entry.sample->event = (uint64_t) event;
> - profiler_cpu_buffer_add_data(&entry, &pc, 1);
> - }
> - poperror();
> + qdropoverflow(profiler_queue, 1);
> + qnonblock(profiler_queue, 1);

As mentioned in a previous patch, you might not want qnonblock.

Incidentally, qnonblock and qdropoverflow take a bool, not an int. So TRUE
instead of 1.

> + profiler_percpu_ctx =
> + kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores, KMALLOC_WAIT);
> + if (!profiler_percpu_ctx)
> + error(-ENOMEM, NULL);

Since you did a KMALLOC_WAIT, you will never have !profiler_percpu_ctx

> -static inline void profiler_begin_trace(struct profiler_cpu_context *cpu_buf)
> +int profiler_configure(struct cmdbuf *cb)
> {
> - cpu_buf->tracing = 1;
> + if (!strcmp(cb->f[0], "prof_qlimit")) {
> + if (cb->nf < 2)
> + error(EFAIL, "prof_qlimit KB");
> + profiler_queue_limit = atoi(cb->f[1]) * 1024;

Probably want a sanity check. Also, what happens when this changes after the
queue was already allocated?

> + } else if (!strcmp(cb->f[0], "prof_cpubufsz")) {
> + if (cb->nf < 2)
> + error(EFAIL, "prof_cpubufsz KB");
> + profiler_cpu_buffer_size = atoi(cb->f[1]) * 1024;

Is there any danger with the user setting this to be a very small value (like
0)? It looks like the assumption in profiler_cpu_buffer_write_reserve() is
that a fresh allocation (done by profiler_buffer_write()) is enough for size
bytes.

> + } else if (!strcmp(cb->f[0], "prof_btdepth")) {
> + if (cb->nf < 2)
> + error(EFAIL, "prof_btdepth DEPTH");
> + profiler_backtrace_depth = atoi(cb->f[1]);

It is dangerous to have the user control this. It's a stack allocation. Even
if it wasn't, we'd need a sanity check of some sort.

> +void profiler_cleanup(void)
> +{
> + sem_down(&profiler_mtx);
> + profiler_users--;
> + if (profiler_users == 0)
> + free_cpu_buffers();
> + sem_up(&profiler_mtx);
> }

I'm still concerned about this. If the only source of profiling data is from
the timer IRQs, then your current stuff is seems fine. (You disable the timers,
and thus the handlers) before freeing this stuff). But if we ever have any
other use of this, then we'll need to be careful.

But there's more to it than that. The assumption here is that once
profiler_users == 0, then there is no code, (RKM, pending IRQ, etc.) that has
access to profiler_percpu_ctx or profiler_queue.

Just as an example, say someone is calling profiler_notify_mmap(), which checks
profiler_percpu_ctx. They get past the if () check, then concurrently someone
calls profiler_cleanup and triggers free_cpu_buffers. Then the original thread
eventually calls profiler_push_pid_mmap, then qwrite, then page faults.

So a user could theoretically trigger a PF. This is part of the reason why I
was reluctant to have you try and free the buffers.

> @@ -572,6 +571,7 @@ static int sys_proc_create(struct proc *p, char *path, size_t path_l,
> user_memdup_free(p, kargenv);
> __proc_ready(new_p);
> pid = new_p->pid;
> + profiler_notify_new_process(new_p);
> proc_decref(new_p); /* give up the reference created in proc_create() */
> return pid;
> error_load_elf:
> @@ -728,6 +728,7 @@ static ssize_t sys_fork(env_t* e)
>
> printd("[PID %d] fork PID %d\n", e->pid, env->pid);
> ret = env->pid;
> + profiler_notify_new_process(env);
> proc_decref(env); /* give up the reference created in proc_alloc() */
> return ret;
> }

Do you need to update things when a process changes its binary path? (exec)

> From c0ab4ec0d729ad8e0555b3beef72340f90c23712 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Sat, 7 Nov 2015 18:47:16 -0800
> Subject: Enabled /prof/kptrace collection of anything which goes into cprintf
>
> Enabled /prof/kptrace collection of anything which goes into cprintf,
> printk, and its associates.
> The kptrace collector is a circular buffer whose default size is 128KB.

What happens when the ring is full? New stuff drops? My main concern here is
that trace_vprintk could somehow cause an issue and hold up the real printk,
which would be hard to debug.

> --- a/kern/drivers/dev/kprof.c
> +++ b/kern/drivers/dev/kprof.c
> @@ -19,6 +19,7 @@
> #include <error.h>
> #include <pmap.h>
> #include <smp.h>
> +#include <time.h>
> #include <circular_buffer.h>
> #include <umem.h>
> #include <profiler.h>
> @@ -62,7 +63,7 @@ struct dirtab kproftab[] = {
> {"mpstat-raw", {Kmpstatrawqid}, 0, 0600},
> };
>
> -extern int booting;
> +extern system_timing_t system_timing;

Should be able to get system_timing from a header.

> static struct kprof kprof;
> static bool ktrace_init_done = FALSE;
> static spinlock_t ktrace_lock = SPINLOCK_INITIALIZER_IRQSAVE;
> @@ -567,7 +568,7 @@ static struct trace_printk_buffer *kprof_get_printk_buffer(void)
> static struct trace_printk_buffer boot_tpb;
> static struct trace_printk_buffer *cpu_tpbs;
>
> - if (unlikely(booting))
> + if (unlikely(!num_cores))
> return &boot_tpb;

That seems a little odd. I'd think if we're still booting, we'd use the
boot_tpb. Was there some corner case that triggered this? I understand this
one:

> - if (likely(!booting))
> + if (likely(system_timing.tsc_freq))
> tsc2timespec(read_tsc(), &ts_now);

Barret Rhoden

unread,

Nov 12, 2015, 3:03:41 PM11/12/15

to aka...@googlegroups.com

Thanks for the rebase and the fixups. I avoided email for a little
while I worked through the content of the patches (see my other email).

Thanks,

Barret

On 2015-11-11 at 10:27 "'Davide Libenzi' via Akaros"

Davide Libenzi

unread,

Nov 12, 2015, 5:01:00 PM11/12/15

to Akaros

On Thu, Nov 12, 2015 at 12:02 PM, Barret Rhoden <br...@cs.berkeley.edu> wrote:

> diff --git a/kern/include/profiler.h b/kern/include/profiler.h
> new file mode 100644
> index 000000000000..2ec9d4308632
> --- /dev/null
> +++ b/kern/include/profiler.h

New, non-trivial files need a copyright header.

Already has, if you reviewed the tip of the branch.

> @@ -0,0 +1,18 @@
> +
> +#ifndef ROS_KERN_INC_PROFILER_H
> +#define ROS_KERN_INC_PROFILER_H

Can also do the pragma once now.

Already has.

> diff --git a/kern/src/profiler.c b/kern/src/profiler.c
> new file mode 100644
> index 000000000000..ca7197bbc20b
> --- /dev/null
> +++ b/kern/src/profiler.c

Needs a copyright. It looks like this might be new code, but if there's
old stuff from oprofile in it, we need to put that. I see there are a
couple things like "op_entry" still around.

Already has.

> +static inline int profiler_add_sample(struct profiler_cpu_context *cpu_buf,
> + uintptr_t pc, unsigned long event)

Do you want this to return a bool (TRUE for success) or an int (0 for
success)? See below:

This function is gone on tip.

Since this was v2, and v1 was never reviewed, you likely commented the v1 version.

> From 0a919619324782e873ddf5bbf9bd19e989f25162 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Tue, 20 Oct 2015 16:34:26 -0700
> Subject: Do not race when multiple init happen at the same time

> @@ -65,53 +64,63 @@ static inline size_t profiler_cpu_buffer_add_data(struct op_entry *entry,
> static void free_cpu_buffers(void)
> {
> kfree(profiler_percpu_ctx);
> + profiler_percpu_ctx = NULL;
> +
> + qclose(profiler_queue);
> + profiler_queue = NULL;
> }

Isn't this stuff possibly used from IRQ context? What's the plan for
cleaning up such that we are sure that there are no interrupts going off
while you are freeing things?

That function is only called when there are no more users.

Timers are stopped before calling profiler_cleanup().

Not sure how sync is removal of a timer.

In:
> static int alloc_cpu_buffers(void)
> {
> - if (!profiler_queue) {
> - profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> - if (!profiler_queue)
> - goto fail;
> - }
> + int i;
> +
> + profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> + if (!profiler_queue)
> + return -ENOMEM;
>
> - /* we *really* don't want to block. Losing data is better. */
> qdropoverflow(profiler_queue, 1);

Then later:

> + qnonblock(profiler_queue, 1);

This is fragile. qnonblock means "error with EAGAIN if we're full".
Right now, you'll get by, since you still have qdropoverflow above. But
this could break with an error() thrown from IRQ context somewhere down
the line.

Yes, there are paired. I am not sure which kind of bad condition can happen.

> void profiler_cleanup(void)
> {
> - free_cpu_buffers();
> + sem_down(&mtx);
> + profiler_users--;
> + if (profiler_users == 0)
> + free_cpu_buffers();
> + sem_up(&mtx);
> }

This might be a candidate for a kref, depending on how cleanup ends up
working. No need to change it now though.

kref don't block until the 0->1 transitioner is initializing, no?

> From fd096815dc200a550ea5e6b6d7f133df75e29ed9 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Thu, 22 Oct 2015 14:30:26 -0700
> Subject: Added full binary path into the proc structure
>

> @@ -791,6 +763,7 @@ static int sys_exec(struct proc *p, char *path, size_t path_l,
> t_path = copy_in_path(p, path, path_l);
> if (!t_path)
> return -1;
> + proc_replace_binary_path(p, t_path);

If we error out after this point, we'll have changed the path of the binary to
the new one even though the exec failed. I'm okay with that, just more of an
FYI.

I am fixing this.

> From 47bf34b1bc96b78a8091726cb3016b1f0d964847 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Thu, 22 Oct 2015 10:56:37 -0700
> Subject: Added APIs to access process startup information
>
> +char *get_startup_argv(struct proc *p, size_t idx, char *argp,
> + size_t max_size)
> +{
> + size_t stack_space = (const char *) USTACKTOP - (const char *) p->args_base;
> + const char *sptr = (const char *) p->args_base + sizeof(size_t) +
> + idx * sizeof(char *);
> + const char *argv = NULL;
> +
> + /* TODO,DL: Use copy_from_user() when available.
> + */

You've got these now, though memcpy_from_user will do it. (or will soon).
Either way, you don't need the TODO.

The strcpy is needed. Currently it copies up to max size, which is kind of a hack.

But I will not be fixing it here (in this CR, that is).

> + if (memcpy_from_user(p, &argv, sptr, sizeof(char *)))
> + return NULL;
> +
> + /* TODO,DL: Use strncpy_from_user() when available.
> + */
> + max_size = MIN(max_size, stack_space);
> + if (memcpy_from_user(p, argp, argv, max_size))
> + return NULL;
> + argp[max_size - 1] = 0;
> +
> + return argp;
> +}

How important is it that the command line data hasn't changed? We're trusting
the user to not have changed it since they were invoked.

Also, I guess we're going to want to parse the argv stuff later. Kevin put
together some code to pack and parse the args earlier. It might be easier and
to keep that blob around and parse it instead of this. Not a huge deal though.

We could keep the kernel buffer copy, but I'd prefer to eventually do that in a separate CL.

> From 1aede03303debfe3075aa7055c6f1dd843f806bc Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Tue, 27 Oct 2015 15:29:10 -0700
> Subject: Added API to enumerate the VM regions of a process

> --- a/kern/include/mm.h
> +++ b/kern/include/mm.h
> @@ -51,6 +51,9 @@ void isolate_vmrs(struct proc *p, uintptr_t va, size_t len);
> void unmap_and_destroy_vmrs(struct proc *p);
> int duplicate_vmrs(struct proc *p, struct proc *new_p);
> void print_vmrs(struct proc *p);
> +void enumerate_vrms(struct proc *p,
^enumerate_vmrs

Fixed. Took me a little to understand what was wrong ☺

> diff --git a/kern/src/mm.c b/kern/src/mm.c
> index ca91e63620b6..06a349840404 100644
> --- a/kern/src/mm.c
> +++ b/kern/src/mm.c
> @@ -359,6 +359,18 @@ void print_vmrs(struct proc *p)
> vmr->vm_file, vmr->vm_foff);
> }
>
> +void enumerate_vrms(struct proc *p,
^enumerate_vmrs

Ditto.

> From 2df52e6bc39bf90a1c90e5382298fa2c2a8316f5 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Thu, 5 Nov 2015 17:30:24 -0800
> Subject: Added API to append data into a circular buffer
>
> Added API to append data into a circular buffer. The buffer data is
> allocated once at init time, and no more allocations happen after that.

> diff --git a/kern/include/circular_buffer.h b/kern/include/circular_buffer.h
> new file mode 100644
> index 000000000000..4ab8c42e2673
> --- /dev/null
> +++ b/kern/include/circular_buffer.h

Can you put in a little info about how to use this and what the rules are? For
instance, what happens to writers when the buffer is full (overwrite, block, or
drop?) or to readers when the buffer is empty? (looks like you just get 0). If
a reader uses an offset, does that just drop a chunk from the buffer? What is
the cbuf_size_t for?

Locking is up to the reader.

In a circular buffer fashion, a write on full buffer discard oldest buffer, and write new one.

No, reading data does not remove data. The only way the read ptr is moved forward, is when the write ptr pushes it forward in order to fit new data.

Every block in the buffer, is pre-pended with its size. That is the type of the size.

> --- /dev/null
> +++ b/kern/lib/circular_buffer.c

> +void circular_buffer_clear(struct circular_buffer *cb)
> +{
> + if (cb->base) {
> + if (cb->mem)
> + kfree(cb->mem);

It's a little surprising that clear also frees the allocation, instead of just
resetting everything back to 0.

This clears all the resources of the buffer.

I used "clear" as there is an "init".

I would have used "free" if there was an "alloc".

> + cb->rdptr = cb->wrptr = cb->base = cb->mem = NULL;
> + cb->size = cb->allocated = 0;
> + }
> +}
> +
> +static bool circular_buffer_is_overlap(const struct circular_buffer *cb,
> + const char *rptr, const char *wptr,
> + size_t size)
> +{
> + return (cb->size > 0) && (rptr >= wptr) && (rptr < (wptr + size));
> +}
> +
> +static void circular_buffer_write_skip(struct circular_buffer *cb, char *wrptr,
> + size_t size)

What are these helpers doing?

The overlap check is a write operation is going to overwrite the block at which rdptr resides.

The skip, moves forward the rdptr until it no more overlaps with the incoming write.

> +size_t circular_buffer_read(struct circular_buffer *cb, char *data, size_t size,
> + size_t off)
> +{
> + size_t asize = cb->size, rsize = 0;
> + const char *rdptr = cb->rdptr;
> +
> + while (asize > 0 && size > 0) {
> + size_t bsize = *(const cbuf_size_t *) rdptr;
> +
> + if (likely(bsize)) {
> + size_t esize = bsize - sizeof(cbuf_size_t);
> +
> + if (off >= esize) {
> + off -= esize;
> + } else {
> + size_t csize = MIN(esize - off, size);
> +
> + memcpy(data, rdptr + sizeof(cbuf_size_t) + off, csize);

So every block of data has the size first, then the data, all packed together?

Yup.

> From c3c2bb0701f747e5c5c3f3d30cdf033137ae26e2 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Wed, 21 Oct 2015 16:39:04 -0700
> Subject: Implemented the new profiler
>
> Implemented the new profiler format and added simple userspace
> stack trace (waiting for copy_from_user()).

This is a huge commit that would have been easier to understand if it was broken
up a bit. For instance, I see there's a user backtrace list, syscall tracing
isn't using a qio queue anymore, the old Plan 9 profiler is gone (great!), and a
bunch of other things. It also looks like you squeezed in the change that the
start command also turns on the timers for all cores, which isn't something I
wanted.

Yes, it is a big change 😑

Braking it down would be a PITA at this point, since accumulated changes in the last 4 weeks that has been sitting there.

As for the timers, for Ron's case, you can still turn them on aside from the profiler.

For a profiler POV, I think the default behavior should be them on everywhere, otherwise you lose the full picture of what the system is doing.

We could add a selective CPU timer and profiler turn on, but IMO the default should be all on.

> --- a/kern/arch/x86/kdebug.c
> +++ b/kern/arch/x86/kdebug.c
> @@ -365,6 +365,25 @@ size_t backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
> return nr_pcs;
> }
>
> +size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t *pcs,
> + size_t nr_slots)
> +{
> + size_t nr_pcs = 0;
> +
> + for (;;) {
> + if (unlikely(fp >= UMAPTOP) || unlikely(fp < BRK_END) ||

Why not is_user_raddr()? And once you use copy_from, those checks aren't needed
either. Also, I think it is possible for frame pointers to be below BRK_END,
which could happen if a user stack was malloc'd instead of mmapped.

> + unlikely(nr_pcs >= nr_slots))
> + break;
> +
> + /* For now straight memory access, waiting for copy_from_user(). */

copy_from is now available. =)

I will do that is a separate CL.

> diff --git a/kern/drivers/dev/kprof.c b/kern/drivers/dev/kprof.c
> index da11b7e44d32..76e53d7f9d30 100644

> +static int oprof_timer_period = 1000;
^ rename prof_timer_period

Done.

> +static void kprof_start_profiler(void)
> +{
> + ERRSTACK(2);
>
> + sem_down(&kprof.lock);

If you're using a semaphore as a binary mutex (lock), please use a qlock. This
happens a bunch in this patch set.

Done.

> + if (waserror()) {
> + sem_up(&kprof.lock);
> + nexterror();
> + }
> + if (!kprof.profiling) {
> + profiler_init();
> + if (waserror()) {
> + profiler_cleanup();
> + nexterror();
> + }
> +
> + profiler_control_trace(1);
> +
> + for (int i = 0; i < num_cores; i++)
> + kprof_enable_timer(i, 1);

This looks like starting the profiler turns on the timer. I wanted to keep the
timer separate from the collection, such that the timer is one of potentially
many sources of data, as discussed in another email.

For future uses, yes. But today, with no timers, there is no data.

kptrace is a different data path, which is always on.

When we do, in the code I am working now, counter overflow based interrupt triggers, then we need to rework this file.

OK ☺

> +static struct trace_printk_buffer *kprof_get_printk_buffer(void)
> +{
> + static struct trace_printk_buffer boot_tpb;
> + static struct trace_printk_buffer *cpu_tpbs;
> +
> + if (unlikely(booting))
> + return &boot_tpb;
> + if (unlikely(!cpu_tpbs)) {
> + /* Poor man per-CPU data structure. I really do no like littering global
> + * data structures with module specific data.
> + */
> + spin_lock_irqsave(&ktrace_lock);
> + if (!cpu_tpbs)
> + cpu_tpbs = kzmalloc(num_cores * sizeof(struct trace_printk_buffer),
> + 0);

If this alloc fails, (flags == 0), then we'll return 0 (+coreid) and PF later.

We could panic, or return boot tpb.

Today a OOM crashes the box, so I am not sure what to do here.

> --- a/kern/include/kdebug.h
> +++ b/kern/include/kdebug.h
> @@ -46,7 +48,7 @@ int printdump(char *buf, int buflen, uint8_t *data);
> extern bool printx_on;
> void set_printx(int mode);
> #define printx(args...) if (printx_on) printk(args)
> -#define trace_printx(args...) if (printx_on) trace_printk(args)
> +#define trace_printx(args...) if (printx_on) trace_printk(TRUE, args)
^FALSE?
We just want to print text, not a backtrace. That was also my intent for
trace_printk (it's just the print, not a backtrace).

Seems pretty useful to have the option to get a BT, to understand from which path you are coming from.

A "grep" of the stirng simply tells where the print is.

> --- a/kern/src/mm.c
> +++ b/kern/src/mm.c
> @@ -24,6 +24,7 @@
> #include <kmalloc.h>
> #include <vfs.h>
> #include <smp.h>
> +#include <profiler.h>
>
> struct kmem_cache *vmr_kcache;
>
> @@ -692,6 +693,9 @@ void *do_mmap(struct proc *p, uintptr_t addr, size_t len, int prot, int flags,
> }
> }
> spin_unlock(&p->vmr_lock);
> +
> + profiler_notify_mmap(p, addr, len, prot, flags, file, offset);
> +

Do you need to know when something was unmapped?

Linux perf don't seem to care, as it does not have any command to report that.

> diff --git a/kern/src/process.c b/kern/src/process.c
> index bd26789e3ef1..2e4c69518e8a 100644
> --- a/kern/src/process.c
> +++ b/kern/src/process.c
> @@ -331,6 +331,8 @@ error_t proc_alloc(struct proc **pp, struct proc *parent, int flags)
> kmem_cache_free(proc_cache, p);
> return -ENOFREEPID;
> }
> + if (parent && parent->binary_path)
> + kstrdup(&p->binary_path, parent->binary_path);

Is this to fix a leak where paths weren't set for forked processes? If so, this
is an example of a fixup that could have been squashed into a previous commit.

I lost track for what it was 😀

But yes, could have been possibly squashed.

> --- a/kern/src/profiler.c
> +++ b/kern/src/profiler.c

> +static inline char* vb_encode_uint64(char* data, uint64_t n)
> +{
> + for (; n >= 0x80; n >>= 7)
> + *data++ = (char) (n | 0x80);
> + *data++ = (char) n;
> +
> + return data;
> +}

This probably should be in a library or header (so we can reuse it) and with a
little explanation. It looks like we take 7 bits of n at a time and push it
into *data, with bit 0x80 signifying we're done or not.

Yep, that's what t does.

I wasn't sure if variable byte encoding would be something a kernel would be doing elsewhere.

> +static struct block *profiler_buffer_write(struct profiler_cpu_context *cpu_buf,
> + struct block *b)
> {
> - return (((uint64_t) 0xee01) << 48) | ((uint64_t) cpu << 16) |
> - (uint64_t) nbt;
> + if (b) {
> + qibwrite(profiler_queue, b);
> +
> + if (qlen(profiler_queue) > profiler_queue_limit) {
> + b = qget(profiler_queue);
> + if (likely(b)) {
> + cpu_buf->dropped_data_size += BLEN(b);
> + freeb(b);
> + }

This seems like a candidate for a feature added to qio, such as a "drop from the
front" mode.

OK, but can we please do that in a separate CL? This branch has been sitting for some time already.

> + }
> + }
> +
> + return iallocb(profiler_cpu_buffer_size);
> }

This function seems a little weird. If you're given a block, you write it.
That makes sense. But regardless, you alloc a new block?

Seems like we should not be too cute by having write return a fresh block, and
just do the allocation separately.

I drop a block from the tail. I *could* be reusing it, but I have found no reinit API to re initialized a block.

Fixed.

> + struct process_set pset;
> +
> + proc_get_set(&pset);
> + if (waserror()) {
> + proc_free_set(&pset);
> + nexterror();
> + }
> +
> + for (size_t i = 0; i < pset.num_processes; i++)
> + enumerate_vrms(pset.procs[i], enum_proc, pset.procs[i]);
^vmrs

Done.

> + poperror();
> + proc_free_set(&pset);
> +}
> +
> +static inline int profiler_is_tracing(struct profiler_cpu_context *cpu_buf)

Since this is named "is_tracing", it should be a bool, and return TRUE/False to
avoid any confusion.

Done.

> -static inline int profiler_add_sample(struct profiler_cpu_context *cpu_buf,
> - uintptr_t pc, unsigned long event)
> +static void alloc_cpu_buffers(void)
> {
> ERRSTACK(1);
> - struct op_entry entry;
> - struct block *b;
> + int i;
>
> + profiler_queue = qopen(profiler_queue_limit, 0, NULL, NULL);
> + if (!profiler_queue)
> + error(-ENOMEM, NULL);
^just ENOMEM

Done.

> + qdropoverflow(profiler_queue, 1);
> + qnonblock(profiler_queue, 1);

As mentioned in a previous patch, you might not want qnonblock.

Not sure why not?

Incidentally, qnonblock and qdropoverflow take a bool, not an int. So TRUE
instead of 1.

Done.

> + profiler_percpu_ctx =
> + kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores, KMALLOC_WAIT);
> + if (!profiler_percpu_ctx)
> + error(-ENOMEM, NULL);

Since you did a KMALLOC_WAIT, you will never have !profiler_percpu_ctx

I am not sure waiting is a guarantee that you will get memory anyway.

> -static inline void profiler_begin_trace(struct profiler_cpu_context *cpu_buf)
> +int profiler_configure(struct cmdbuf *cb)
> {
> - cpu_buf->tracing = 1;
> + if (!strcmp(cb->f[0], "prof_qlimit")) {
> + if (cb->nf < 2)
> + error(EFAIL, "prof_qlimit KB");
> + profiler_queue_limit = atoi(cb->f[1]) * 1024;

Probably want a sanity check. Also, what happens when this changes after the
queue was already allocated?

The configuration must be done before the profiler is started.

> + } else if (!strcmp(cb->f[0], "prof_cpubufsz")) {
> + if (cb->nf < 2)
> + error(EFAIL, "prof_cpubufsz KB");
> + profiler_cpu_buffer_size = atoi(cb->f[1]) * 1024;

Is there any danger with the user setting this to be a very small value (like
0)? It looks like the assumption in profiler_cpu_buffer_write_reserve() is
that a fresh allocation (done by profiler_buffer_write()) is enough for size
bytes.

> + } else if (!strcmp(cb->f[0], "prof_btdepth")) {
> + if (cb->nf < 2)
> + error(EFAIL, "prof_btdepth DEPTH");
> + profiler_backtrace_depth = atoi(cb->f[1]);

It is dangerous to have the user control this. It's a stack allocation. Even
if it wasn't, we'd need a sanity check of some sort.

I will drop the BT depth config, and limit the others.

> +void profiler_cleanup(void)
> +{
> + sem_down(&profiler_mtx);
> + profiler_users--;
> + if (profiler_users == 0)
> + free_cpu_buffers();
> + sem_up(&profiler_mtx);
> }

I'm still concerned about this. If the only source of profiling data is from
the timer IRQs, then your current stuff is seems fine. (You disable the timers,
and thus the handlers) before freeing this stuff). But if we ever have any
other use of this, then we'll need to be careful.

As I said, when I come in with the overflow-triggered sampling, this will need some retuning.

But there's more to it than that. The assumption here is that once
profiler_users == 0, then there is no code, (RKM, pending IRQ, etc.) that has
access to profiler_percpu_ctx or profiler_queue.

Just as an example, say someone is calling profiler_notify_mmap(), which checks
profiler_percpu_ctx. They get past the if () check, then concurrently someone
calls profiler_cleanup and triggers free_cpu_buffers. Then the original thread
eventually calls profiler_push_pid_mmap, then qwrite, then page faults.

So a user could theoretically trigger a PF. This is part of the reason why I
was reluctant to have you try and free the buffers.

Well, we have no RCU ☺

In such case we should swap ptr with NULL, and RCU-free it.

> @@ -572,6 +571,7 @@ static int sys_proc_create(struct proc *p, char *path, size_t path_l,
> user_memdup_free(p, kargenv);
> __proc_ready(new_p);
> pid = new_p->pid;
> + profiler_notify_new_process(new_p);
> proc_decref(new_p); /* give up the reference created in proc_create() */
> return pid;
> error_load_elf:
> @@ -728,6 +728,7 @@ static ssize_t sys_fork(env_t* e)
>
> printd("[PID %d] fork PID %d\n", e->pid, env->pid);
> ret = env->pid;
> + profiler_notify_new_process(env);
> proc_decref(env); /* give up the reference created in proc_alloc() */
> return ret;
> }

Do you need to update things when a process changes its binary path? (exec)

That should lead to new mmaps.

> From c0ab4ec0d729ad8e0555b3beef72340f90c23712 Mon Sep 17 00:00:00 2001
> From: Davide Libenzi <dlib...@google.com>
> Date: Sat, 7 Nov 2015 18:47:16 -0800
> Subject: Enabled /prof/kptrace collection of anything which goes into cprintf
>
> Enabled /prof/kptrace collection of anything which goes into cprintf,
> printk, and its associates.
> The kptrace collector is a circular buffer whose default size is 128KB.

What happens when the ring is full? New stuff drops? My main concern here is
that trace_vprintk could somehow cause an issue and hold up the real printk,
which would be hard to debug.

Ring full drops old stuff. Never blocks.

> --- a/kern/drivers/dev/kprof.c
> +++ b/kern/drivers/dev/kprof.c
> @@ -19,6 +19,7 @@
> #include <error.h>
> #include <pmap.h>
> #include <smp.h>
> +#include <time.h>
> #include <circular_buffer.h>
> #include <umem.h>
> #include <profiler.h>
> @@ -62,7 +63,7 @@ struct dirtab kproftab[] = {
> {"mpstat-raw", {Kmpstatrawqid}, 0, 0600},
> };
>
> -extern int booting;
> +extern system_timing_t system_timing;

Should be able to get system_timing from a header.

Yup, arch/time.h, done.

> static struct kprof kprof;
> static bool ktrace_init_done = FALSE;
> static spinlock_t ktrace_lock = SPINLOCK_INITIALIZER_IRQSAVE;
> @@ -567,7 +568,7 @@ static struct trace_printk_buffer *kprof_get_printk_buffer(void)
> static struct trace_printk_buffer boot_tpb;
> static struct trace_printk_buffer *cpu_tpbs;
>
> - if (unlikely(booting))
> + if (unlikely(!num_cores))
> return &boot_tpb;

That seems a little odd. I'd think if we're still booting, we'd use the
boot_tpb. Was there some corner case that triggered this? I understand this
one:

> - if (likely(!booting))
> + if (likely(system_timing.tsc_freq))
> tsc2timespec(read_tsc(), &ts_now);

The booting flag is cleared really late, and I would not want stuff starting spinning on other cores, colliding on boot tbp.

Davide Libenzi

unread,

Nov 12, 2015, 5:06:31 PM11/12/15

to Akaros

Branch pushed.

Dan Cross

unread,

Nov 12, 2015, 10:22:43 PM11/12/15

to aka...@googlegroups.com

On Thu, Nov 12, 2015 at 3:02 PM, Barret Rhoden <br...@cs.berkeley.edu> wrote:

> +static void kprof_usage_fail(void)
> +{
> + static const char *ctlstring = "clear|start|stop|timer";
> + const char* const * cmds = profiler_configure_cmds();
> + size_t i, size, csize;
> + char msgbuf[128];
> +
> + strncpy(msgbuf, ctlstring, sizeof(msgbuf));

This is arguably incorrect. If one decreased the size of 'msgbuf' or made ctlstring larger so that sizeof(msgbuf) <= strlen(ctlstring), this this would omit the trailing NUL on 'msgbuf'. Even if that's not true, this is overwriting every byte of 'msgbuf' beyond the the copy of 'ctlstring' with 0s, seemingly unnecessarily.

> + size = MIN(strlen(ctlstring), sizeof(msgbuf) - 1);

Strlen(ctlstring) could then cause a crash in the previous example.

> + for (i = 0; cmds[i]; i++) {
> + csize = strlen(cmds[i]);
> + if ((csize + size + 2) > sizeof(msgbuf))
> + break;
> + msgbuf[size] = '|';
> + memcpy(msgbuf + size + 1, cmds[i], csize);
> + size += csize + 1;
> + }
> + msgbuf[size] = 0;

This loop could be replaced with,

strlcpy(msgbuf, ctlstring, sizeof(msgbuf));

for (int i = 0; i < nelem(cmds); i++) {

strlcat(msgbuf, "|", sizeof(msgbuf));

strlcat(msgbuf, cmds[i], sizeof(msgbuf));

}

Davide Libenzi

unread,

Nov 12, 2015, 10:33:15 PM11/12/15

to Akaros

On Thu, Nov 12, 2015 at 7:22 PM, Dan Cross <cro...@gmail.com> wrote:

On Thu, Nov 12, 2015 at 3:02 PM, Barret Rhoden <br...@cs.berkeley.edu> wrote:
> +static void kprof_usage_fail(void)
> +{
> + static const char *ctlstring = "clear|start|stop|timer";
> + const char* const * cmds = profiler_configure_cmds();
> + size_t i, size, csize;
> + char msgbuf[128];
> +
> + strncpy(msgbuf, ctlstring, sizeof(msgbuf));

This is arguably incorrect. If one decreased the size of 'msgbuf' or made ctlstring larger so that sizeof(msgbuf) <= strlen(ctlstring), this this would omit the trailing NUL on 'msgbuf'. Even if that's not true, this is overwriting every byte of 'msgbuf' beyond the the copy of 'ctlstring' with 0s, seemingly unnecessarily.

Right that could be strlcpy(). Will change that.

But, on one side you are worried about the performance (which is a function like this, arguably means nothing) of filling ~100 zeros, and ...

> + for (i = 0; cmds[i]; i++) {
> + csize = strlen(cmds[i]);
> + if ((csize + size + 2) > sizeof(msgbuf))
> + break;
> + msgbuf[size] = '|';
> + memcpy(msgbuf + size + 1, cmds[i], csize);
> + size += csize + 1;
> + }
> + msgbuf[size] = 0;

This loop could be replaced with,

strlcpy(msgbuf, ctlstring, sizeof(msgbuf));
for (int i = 0; i < nelem(cmds); i++) {
strlcat(msgbuf, "|", sizeof(msgbuf));
strlcat(msgbuf, cmds[i], sizeof(msgbuf));
}

... here, you are suggesting to walk the string from the first byte using strlcat() ☺

Dan Cross

unread,

Nov 12, 2015, 10:41:34 PM11/12/15

to aka...@googlegroups.com

I don't care about the performance; I just don't see the point of zeroing the rest of the buffer.

Davide Libenzi

unread,

Nov 13, 2015, 8:34:08 AM11/13/15

to Akaros

I argue that strcat() as interface is bad enough done once, but done in a loop, by continuously throwing away the buffer length at every call, is even evil.

Anyway, I chose evil in this case, as that function could be written in Cobol and still be performant enough, but that strcat loop should not be taught at school 😀

Dan Cross

unread,

Nov 13, 2015, 11:29:20 AM11/13/15

to aka...@googlegroups.com

On Fri, Nov 13, 2015 at 8:34 AM, 'Davide Libenzi' via Akaros <aka...@googlegroups.com> wrote:

I argue that strcat() as interface is bad enough done once, but done in a loop, by continuously throwing away the buffer length at every call, is even evil.
Anyway, I chose evil in this case, as that function could be written in Cobol and still be performant enough, but that strcat loop should not be taught at school 😀

Being evil is good. It means you can jump over buses on a bike while blasting slayer in the background: https://www.youtube.com/watch?v=WuHpP10YF7M&t=10

barret rhoden

unread,

Nov 13, 2015, 11:32:05 AM11/13/15

to aka...@googlegroups.com

On 2015-11-13 at 5:34 'Davide Libenzi' via Akaros wrote:
> I argue that strcat() as interface is bad enough done once, but done
> in a loop, by continuously throwing away the buffer length at every
> call, is even evil.

It might not be high performance, but from looking at the code it is
much more clear what it is doing and bugs are harder to hide:

> + for (i = 0; cmds[i]; i++) {
>> > + csize = strlen(cmds[i]);
>> > + if ((csize + size + 2) > sizeof(msgbuf))
>> > + break;
>> > + msgbuf[size] = '|';
>> > + memcpy(msgbuf + size + 1, cmds[i], csize);
>> > + size += csize + 1;
>> > + }
>> > + msgbuf[size] = 0;

this has a lot of "+2" and +1. hope we don't have an off-by-one
error! or forget the trailing msgbuf[size] = 0;

> This loop could be replaced with,
>
> strlcpy(msgbuf, ctlstring, sizeof(msgbuf));
> for (int i = 0; i < nelem(cmds); i++) {
> strlcat(msgbuf, "|", sizeof(msgbuf));
> strlcat(msgbuf, cmds[i], sizeof(msgbuf));
> }

not a big deal either way. but it's not like we're making an OS to
optimize the building of a message buffer. wait, maybe it's time to
grab www.msgbufos.org!

barret

Davide Libenzi

unread,

Nov 13, 2015, 12:03:25 PM11/13/15

to Akaros

On Fri, Nov 13, 2015 at 8:32 AM, barret rhoden <br...@cs.berkeley.edu> wrote:

> + for (i = 0; cmds[i]; i++) {
>> > + csize = strlen(cmds[i]);
>> > + if ((csize + size + 2) > sizeof(msgbuf))
>> > + break;
>> > + msgbuf[size] = '|';
>> > + memcpy(msgbuf + size + 1, cmds[i], csize);
>> > + size += csize + 1;
>> > + }
>> > + msgbuf[size] = 0;

this has a lot of "+2" and +1. hope we don't have an off-by-one
error! or forget the trailing msgbuf[size] = 0;

I'm confused. You say there is a bug in that code? Where is it?

barret rhoden

unread,

Nov 13, 2015, 12:19:34 PM11/13/15

to Akaros

On 2015-11-12 at 14:00 'Davide Libenzi' via Akaros wrote:
> This function is gone on tip.
> Since this was v2, and v1 was never reviewed, you likely commented
> the v1 version.

I went off of:

> The following changes since commit
> 6f3723cd8f883260a78fdf411911d7469464caa5:
>
> Update file-posix.c utest (2015-10-15 12:07:00 -0400)
>
> are available in the git repository at:
>
> g...@github.com:dlibenzi/akaros profiler_dev_v2
>
> for you to fetch changes up to
> 6e40b75927c808dd945cb3ad0ee6e52a5a2dc495:
>
> Added simple netcat client for Akaros (2015-10-28 14:49:56 -0700)

> > > @@ -65,53 +64,63 @@ static inline size_t
> > profiler_cpu_buffer_add_data(struct op_entry *entry,
> > > static void free_cpu_buffers(void)
> > > {
> > > kfree(profiler_percpu_ctx);
> > > + profiler_percpu_ctx = NULL;
> > > +
> > > + qclose(profiler_queue);
> > > + profiler_queue = NULL;
> > > }
> >
> > Isn't this stuff possibly used from IRQ context? What's the plan
> > for cleaning up such that we are sure that there are no interrupts
> > going off while you are freeing things?
> >
>
> That function is only called when there are no more users.
> Timers are stopped before calling profiler_cleanup().
> Not sure how sync is removal of a timer.

> > > */ qdropoverflow(profiler_queue, 1);
> >
> > Then later:
> >
> > > + qnonblock(profiler_queue, 1);
> >
> > This is fragile. qnonblock means "error with EAGAIN if we're full".
> > Right now, you'll get by, since you still have qdropoverflow
> > above. But this could break with an error() thrown from IRQ
> > context somewhere down the line.
> >
>
> Yes, there are paired. I am not sure which kind of bad condition can
> happen.

The bad condition is that you may unexpectedly get an error(). If you
have dropoverflow, nonblock does nothing, so do not set it. things may
change in qio in the future.

> > > void profiler_cleanup(void)
> > > {
> > > - free_cpu_buffers();
> > > + sem_down(&mtx);
> > > + profiler_users--;
> > > + if (profiler_users == 0)
> > > + free_cpu_buffers();
> > > + sem_up(&mtx);
> > > }
> >
> > This might be a candidate for a kref, depending on how cleanup ends
> > up working. No need to change it now though.
> >
>
> kref don't block until the 0->1 transitioner is initializing, no?

krefs never block, so i don't fully understand. my comment was that
cleanup seems dangerous here. this code is hand-rolling the kref
pattern, so in the end a kref might be the right tool here.

> > How important is it that the command line data hasn't changed?
> > We're trusting
> > the user to not have changed it since they were invoked.
>
>
> > Also, I guess we're going to want to parse the argv stuff later.
> > Kevin put together some code to pack and parse the args earlier.
> > It might be easier and
> > to keep that blob around and parse it instead of this. Not a huge
> > deal though.
> >
>
> We could keep the kernel buffer copy, but I'd prefer to eventually do
> that in a separate CL.

sounds fine.

> Locking is up to the reader.
> In a circular buffer fashion, a write on full buffer discard oldest
> buffer, and write new one.
> No, reading data does not remove data. The only way the read ptr is
> moved forward, is when the write ptr pushes it forward in order to
> fit new data. Every block in the buffer, is pre-pended with its size.
> That is the type of the size.

Great. Please put that in the source too.

> > It's a little surprising that clear also frees the allocation,
> > instead of just
> > resetting everything back to 0.
> >
>
> This clears all the resources of the buffer.
> I used "clear" as there is an "init".
> I would have used "free" if there was an "alloc".

This was originally unclear since it was not clear how the contents of
the ring buffer get discarded. we have another ring buffer in the
kernel that takes data, then stops taking *new* data until the old
contents were 'cleared.'

either way, 'clear' doesn't sound like destroy, free, deinit, or
shutdown. all of which have a more 'final' sound to it.

> > What are these helpers doing?
> >
>
> The overlap check is a write operation is going to overwrite the
> block at which rdptr resides.
> The skip, moves forward the rdptr until it no more overlaps with the
> incoming write.

Thanks. Please put that in a comment right above the functions.

> > This is a huge commit that would have been easier to understand if
> > it was broken
> > up a bit. For instance, I see there's a user backtrace list,
> > syscall tracing
> > isn't using a qio queue anymore, the old Plan 9 profiler is gone
> > (great!), and a
> > bunch of other things. It also looks like you squeezed in the
> > change that the
> > start command also turns on the timers for all cores, which isn't
> > something I
> > wanted.
> >
>

> Yes, it is a big change ____

> Braking it down would be a PITA at this point, since accumulated
> changes in the last 4 weeks that has been sitting there.

Agreed, this is more of a "please try to avoid this in the future
comment"

> As for the timers, for Ron's case, you can still turn them on aside
> from the profiler.
> For a profiler POV, I think the default behavior should be them on
> everywhere, otherwise you lose the full picture of what the system is
> doing. We could add a selective CPU timer and profiler turn on, but
> IMO the default should be all on.

One problem is I don't see how to turn on the profiler on a single
core. If you start collecting, you start the timer on all cores. I
don't see why it can't be the way it was before. "Pick the timers you
want, then turn on the accounting."

I think fix is to have multiple commands. One is the timer (as is),
one is to start just the collection, and one is the aggregate: all
timers and start the collection. This is a combination of all the
approaches. There's our existing style (timer, then collection) as
well as your desired style (just turn the whole thing on damnit).

> > > +size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t
> > > *pcs,
> > > + size_t nr_slots)
> > > +{
> > > + size_t nr_pcs = 0;
> > > +
> > > + for (;;) {
> > > + if (unlikely(fp >= UMAPTOP) || unlikely(fp <
> > > BRK_END) ||
> >
> > Why not is_user_raddr()? And once you use copy_from, those checks
> > aren't needed
> > either. Also, I think it is possible for frame pointers to be below
> > BRK_END,
> > which could happen if a user stack was malloc'd instead of mmapped.
> >
> > > + unlikely(nr_pcs >= nr_slots))
> > > + break;
> > > +
> > > + /* For now straight memory access, waiting for
> > copy_from_user(). */
> >
> > copy_from is now available. =)
> >
>
> I will do that is a separate CL.

The beauty of Git is that you can make that small change (just use
copy_from and remove the UMAPTOP and BRK_END checks) as a "WIP-commit",
then do a rebase -i and meld that fixup into this existing commit.

> > > + for (int i = 0; i < num_cores; i++)
> > > + kprof_enable_timer(i, 1);
> >
> > This looks like starting the profiler turns on the timer. I wanted
> > to keep the
> > timer separate from the collection, such that the timer is one of
> > potentially
> > many sources of data, as discussed in another email.
> >
>
> For future uses, yes. But today, with no timers, there is no data.
> kptrace is a different data path, which is always on.
> When we do, in the code I am working now, counter overflow based
> interrupt triggers, then we need to rework this file.

I think multiple commands should still do the trick. But if not, we
can talk about it. =)

> > > +static struct trace_printk_buffer *kprof_get_printk_buffer(void)
> > > +{
> > > + static struct trace_printk_buffer boot_tpb;
> > > + static struct trace_printk_buffer *cpu_tpbs;
> > > +
> > > + if (unlikely(booting))
> > > + return &boot_tpb;
> > > + if (unlikely(!cpu_tpbs)) {
> > > + /* Poor man per-CPU data structure. I really do no
> > > like
> > littering global
> > > + * data structures with module specific data.
> > > + */
> > > + spin_lock_irqsave(&ktrace_lock);
> > > + if (!cpu_tpbs)
> > > + cpu_tpbs = kzmalloc(num_cores *
> > > sizeof(struct
> > trace_printk_buffer),
> > > +
> > > 0);
> >
> > If this alloc fails, (flags == 0), then we'll return 0 (+coreid)
> > and PF later.
> >
>
> We could panic, or return boot tpb.
> Today a OOM crashes the box, so I am not sure what to do here.

Could do the alloc with KMALLOC_WAIT outside the lock. Then lock and
try to do the assign. If someone else already assigned cpu_tpbs, then
we free the one we just got.

> > > +#define trace_printx(args...) if (printx_on) trace_printk(TRUE,
> > > args)
> > ^FALSE?
> > We just want to print text, not a backtrace. That was also my
> > intent for trace_printk (it's just the print, not a backtrace).
> >
>
> Seems pretty useful to have the option to get a BT, to understand from
> which path you are coming from.
> A "grep" of the stirng simply tells where the print is.

I like the option for the backtrace. I'm just saying that i'd like a
simple trace_printk/printx that doesn't trigger the backtrace. we
already have TRACEME for the backtrace trace_printk.

Anyway, if you don't want to fix it, that's fine. If I get annoyed at
a long backtrace in the future when trace_printxing I'll change it.

> > > + profiler_notify_mmap(p, addr, len, prot, flags, file,
> > > offset);
> > > +
> >
> > Do you need to know when something was unmapped?
> >
>
> Linux perf don't seem to care, as it does not have any command to
> report that.

Sounds good, just checking. =)

> > This probably should be in a library or header (so we can reuse it)
> > and with a
> > little explanation. It looks like we take 7 bits of n at a time
> > and push it
> > into *data, with bit 0x80 signifying we're done or not.
> >
>
> Yep, that's what t does.

Cool, please note that in the code. I have a bad memory. =)

> I wasn't sure if variable byte encoding would be something a kernel
> would be doing elsewhere.

I wasn't sure either. It sounds pretty cool. We can keep it here for
now, or move it elsewhere if you'd like.

> > +static struct block *profiler_buffer_write(struct
> > profiler_cpu_context *cpu_buf,
> > > +
> > struct block *b)
> > > {
> > > - return (((uint64_t) 0xee01) << 48) | ((uint64_t) cpu << 16)
> > > |
> > > - (uint64_t) nbt;
> > > + if (b) {
> > > + qibwrite(profiler_queue, b);
> > > +
> > > + if (qlen(profiler_queue) > profiler_queue_limit) {
> > > + b = qget(profiler_queue);
> > > + if (likely(b)) {
> > > + cpu_buf->dropped_data_size +=
> > > BLEN(b);
> > > + freeb(b);
> > > + }
> >
> > This seems like a candidate for a feature added to qio, such as a
> > "drop from the
> > front" mode.
> >
>
> OK, but can we please do that in a separate CL? This branch has been
> sitting for some time already.

Ok. This is a minor thing, but I mentioned this because its an example
of a place where if we slowed down a little and saw the opportunity to
change qio, the whole kernel could be better.

As far as the age of this branch goes, it's a huge change that takes a
while to get through. Maybe I'm just slow, but I've spent 2 days on
this and counting.

> > > + return iallocb(profiler_cpu_buffer_size);
> > > }
> >
> > This function seems a little weird. If you're given a block, you
> > write it. That makes sense. But regardless, you alloc a new block?
> >
> > Seems like we should not be too cute by having write return a fresh
> > block, and
> > just do the allocation separately.
> >
>
> I drop a block from the tail. I *could* be reusing it, but I have
> found no reinit API to re initialized a block.

Ah, I see. In the future, you know you were given a block and you want
to return that block in a ready-to-go manner.

This is another case where we can make the function you need and add it
to the block code. That way you also don't need to worry about
checking the return value of this function (since iallocb can fail, but
if you return the block you got originally, that can't fail).

Not to harp on this too much, but I would rather have had that block
code done as a support patch before this commit, and then used here,
than doing it the way it is. That also would have been clearer to me,
would have improved the kernel, and would have saved some time. =)

But feel free to leave it as is now, and we can address that with a
future TODO.

> > + qdropoverflow(profiler_queue, 1);
> > > + qnonblock(profiler_queue, 1);
> >
> > As mentioned in a previous patch, you might not want qnonblock.
> >
>
> Not sure why not?

Because they do similar things in different ways.

> > > + profiler_percpu_ctx =
> > > + kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores,
> > KMALLOC_WAIT);
> > > + if (!profiler_percpu_ctx)
> > > + error(-ENOMEM, NULL);
> >
> > Since you did a KMALLOC_WAIT, you will never
> > have !profiler_percpu_ctx
>
> I am not sure waiting is a guarantee that you will get memory anyway.

I think it is in other OSs. In Akaros, the implementation isn't done
yet, but the design is that if you say KMALLOC_WAIT (or whatever we
call it later), that your thread will block until the memory allocation
succeeds.

> > -static inline void profiler_begin_trace(struct profiler_cpu_context
> > *cpu_buf)
> > > +int profiler_configure(struct cmdbuf *cb)
> > > {
> > > - cpu_buf->tracing = 1;
> > > + if (!strcmp(cb->f[0], "prof_qlimit")) {
> > > + if (cb->nf < 2)
> > > + error(EFAIL, "prof_qlimit KB");
> > > + profiler_queue_limit = atoi(cb->f[1]) * 1024;
> >
> > Probably want a sanity check. Also, what happens when this changes
> > after the
> > queue was already allocated?
> >
>
> The configuration must be done before the profiler is started.

Ok, then we need a check to return an error for this case. Otherwise a
buggy user could change that setting on the fly.

> > > + } else if (!strcmp(cb->f[0], "prof_cpubufsz")) {
> > > + if (cb->nf < 2)
> > > + error(EFAIL, "prof_cpubufsz KB");
> > > + profiler_cpu_buffer_size = atoi(cb->f[1]) * 1024;
> >
> > Is there any danger with the user setting this to be a very small
> > value (like
> > 0)? It looks like the assumption in
> > profiler_cpu_buffer_write_reserve() is
> > that a fresh allocation (done by profiler_buffer_write()) is enough
> > for size
> > bytes.
> >
> > > + } else if (!strcmp(cb->f[0], "prof_btdepth")) {
> > > + if (cb->nf < 2)
> > > + error(EFAIL, "prof_btdepth DEPTH");
> > > + profiler_backtrace_depth = atoi(cb->f[1]);
> >
> > It is dangerous to have the user control this. It's a stack
> > allocation. Even
> > if it wasn't, we'd need a sanity check of some sort.
> >
>
> I will drop the BT depth config, and limit the others.

Great.

> > > +void profiler_cleanup(void)
> > > +{
> > > + sem_down(&profiler_mtx);
> > > + profiler_users--;
> > > + if (profiler_users == 0)
> > > + free_cpu_buffers();
> > > + sem_up(&profiler_mtx);
> > > }
> >
> > I'm still concerned about this. If the only source of profiling
> > data is from
> > the timer IRQs, then your current stuff is seems fine. (You
> > disable the timers,
> > and thus the handlers) before freeing this stuff). But if we ever
> > have any other use of this, then we'll need to be careful.
> >
>
> As I said, when I come in with the overflow-triggered sampling, this
> will need some retuning.

Sounds good. TODO. =)

> > But there's more to it than that. The assumption here is that once
> > profiler_users == 0, then there is no code, (RKM, pending IRQ,
> > etc.) that has
> > access to profiler_percpu_ctx or profiler_queue.
> >
> > Just as an example, say someone is calling profiler_notify_mmap(),
> > which checks
> > profiler_percpu_ctx. They get past the if () check, then
> > concurrently someone
> > calls profiler_cleanup and triggers free_cpu_buffers. Then the
> > original thread
> > eventually calls profiler_push_pid_mmap, then qwrite, then page
> > faults.
> >
> > So a user could theoretically trigger a PF. This is part of the
> > reason why I
> > was reluctant to have you try and free the buffers.
> >
>

> Well, we have no RCU ___

> In such case we should swap ptr with NULL, and RCU-free it.

RCU wouldn't help as much here either, since the profiler_notify_mmap()
thread could have blocked at some point. And sleeping RCU sounds like
a pain.

And I imagine there are other places than this one example where it is
dangerous to free the buffers.

So I'm not convinced that this assumption is true:

> > The assumption here is that once profiler_users == 0, then there
> > is no code, (RKM, pending IRQ, etc.) that has access to
> > profiler_percpu_ctx or profiler_queue.

One simple "fix" would be to not free the buffers, and put a TODO on
it. that would alleviate the risk of the page fault (possibly caused
by a malicious user) at the expense of a small amount of memory.

> > > @@ -728,6 +728,7 @@ static ssize_t sys_fork(env_t* e)
> > >
> > > printd("[PID %d] fork PID %d\n", e->pid, env->pid);
> > > ret = env->pid;
> > > + profiler_notify_new_process(env);
> > > proc_decref(env); /* give up the reference created
> > > in
> > proc_alloc() */
> > > return ret;
> > > }
> >
> > Do you need to update things when a process changes its binary path?
> > (exec)
> >
>
> That should lead to new mmaps.

I guess the broader question is when does the profiler need to know
things like the binary path and it's mmaps. But if you're happy with
it, then I'm ok. =)

> > > - if (unlikely(booting))
> > > + if (unlikely(!num_cores))
> > > return &boot_tpb;
> >
> > That seems a little odd. I'd think if we're still booting, we'd
> > use the boot_tpb. Was there some corner case that triggered this?
> > I understand this
> > one:
> >
> > > - if (likely(!booting))
> > > + if (likely(system_timing.tsc_freq))
> > > tsc2timespec(read_tsc(), &ts_now);
> >
>
> The booting flag is cleared really late, and I would not want stuff
> starting spinning on other cores, colliding on boot tbp.

Ah, ok, makes sense.

Barret

Kevin Klues

unread,

Nov 13, 2015, 12:20:22 PM11/13/15

to aka...@googlegroups.com

> I'm confused. You say there is a bug in that code? Where is it?

I don't think he's saying there's a bug. He's just saying that it's
harder at first glance to see that it's correct. With the strcat it's
easier to verify.

barret rhoden

unread,

Nov 13, 2015, 12:21:26 PM11/13/15

to aka...@googlegroups.com

No, I'm saying that it is harder to write and be sure that code is
correct compared to the strlcat code. It's more of a defense of using
the simpler but slower code for the sake of maintainability/auditing.

But no need to change your existing code, and I didn't spot anything
wrong with it when I looked at it the first time.

Barret

Davide Libenzi

unread,

Nov 13, 2015, 12:22:30 PM11/13/15

to Akaros

I already changed it to use Dan's suggested strlcat() loop.

Barret Rhoden

unread,

Nov 13, 2015, 2:00:41 PM11/13/15

to aka...@googlegroups.com

On 2015-11-13 at 09:22 "'Davide Libenzi' via Akaros"

<aka...@googlegroups.com> wrote:
> I already changed it to use Dan's suggested strlcat() loop.

thanks.

Davide Libenzi

unread,

Nov 13, 2015, 10:54:29 PM11/13/15

to Akaros

On Fri, Nov 13, 2015 at 9:19 AM, barret rhoden <br...@cs.berkeley.edu> wrote:

> > > */ qdropoverflow(profiler_queue, 1);
> >
> > Then later:
> >
> > > + qnonblock(profiler_queue, 1);
> >
> > This is fragile. qnonblock means "error with EAGAIN if we're full".
> > Right now, you'll get by, since you still have qdropoverflow
> > above. But this could break with an error() thrown from IRQ
> > context somewhere down the line.
> >
>
> Yes, there are paired. I am not sure which kind of bad condition can
> happen.

The bad condition is that you may unexpectedly get an error(). If you
have dropoverflow, nonblock does nothing, so do not set it. things may
change in qio in the future.

OK, removing nonblock.

Coming to error(), should I always waserror() in my code, even if I have nothing to be undone?

Because, that kinda defeats the exception-look-alike error model.

> > > void profiler_cleanup(void)

> > > {
> > > - free_cpu_buffers();
> > > + sem_down(&mtx);
> > > + profiler_users--;
> > > + if (profiler_users == 0)
> > > + free_cpu_buffers();
> > > + sem_up(&mtx);
> > > }
> >
> > This might be a candidate for a kref, depending on how cleanup ends
> > up working. No need to change it now though.
> >
>
> kref don't block until the 0->1 transitioner is initializing, no?

krefs never block, so i don't fully understand. my comment was that
cleanup seems dangerous here. this code is hand-rolling the kref
pattern, so in the end a kref might be the right tool here.

Is not hand rolling, because kref will not make the immediately following thread going to up-it, wait for the first one to do the proper initialization of the resources.

But I think a blocking lock for init, coupled with a kref, might solve the buffers free issue. See down below ...

> Locking is up to the reader.
> In a circular buffer fashion, a write on full buffer discard oldest
> buffer, and write new one.
> No, reading data does not remove data. The only way the read ptr is
> moved forward, is when the write ptr pushes it forward in order to
> fit new data. Every block in the buffer, is pre-pended with its size.
> That is the type of the size.

Great. Please put that in the source too.

circular_buffer_write already has the comments in its body, which describes the operation.
Adding some more to the skip API.

> > It's a little surprising that clear also frees the allocation,
> > instead of just
> > resetting everything back to 0.
> >
>
> This clears all the resources of the buffer.
> I used "clear" as there is an "init".
> I would have used "free" if there was an "alloc".

This was originally unclear since it was not clear how the contents of
the ring buffer get discarded. we have another ring buffer in the
kernel that takes data, then stops taking *new* data until the old
contents were 'cleared.'

either way, 'clear' doesn't sound like destroy, free, deinit, or
shutdown. all of which have a more 'final' sound to it.

Renamed clear->destroy, and had clear do the simple clear but not free the resources.

> > What are these helpers doing?
> >
>
> The overlap check is a write operation is going to overwrite the
> block at which rdptr resides.
> The skip, moves forward the rdptr until it no more overlaps with the
> incoming write.

Thanks. Please put that in a comment right above the functions.

Done.

Agreed, this is more of a "please try to avoid this in the future
comment"

This code was composed by three categories.

First, totally new code files, for which, makes little sense to add single functions in single commits, especially when you, along the way, totally refactor older ones because the new APIs requires it (been told to squash commits touching the same code, to avoid commenting on code which is no more there).

Second, totally changed files, which really, had nothing more in common with previous ones.

Third, removed a lot of files which were no more used.

Bottom line, you were looking at a multi thousand lines of code CR, whose nature, and size, little would have helped some sugar coating in terms of micro commits 😐

> As for the timers, for Ron's case, you can still turn them on aside
> from the profiler.
> For a profiler POV, I think the default behavior should be them on
> everywhere, otherwise you lose the full picture of what the system is
> doing. We could add a selective CPU timer and profiler turn on, but
> IMO the default should be all on.

One problem is I don't see how to turn on the profiler on a single
core. If you start collecting, you start the timer on all cores. I
don't see why it can't be the way it was before. "Pick the timers you
want, then turn on the accounting."

I think fix is to have multiple commands. One is the timer (as is),
one is to start just the collection, and one is the aggregate: all
timers and start the collection. This is a combination of all the
approaches. There's our existing style (timer, then collection) as
well as your desired style (just turn the whole thing on damnit).

Let's please wait how this will be shaped, once the new overflow IRQ stuff comes in.

If you can please review the MSR stuff, I am almost done with the new profiler tool which uses libpfm4 to list CPU event counters, and resolve them by name, allowing to start/stop/read/write the counters.

So the sequence is going to be:

1) this CR (code in place, CR under review)

2) MSR (code in place, CR posted)

3) New prof tool with libpfm4 (code at 75%, no CR yet)

4) New overflow based sampling (no code yet)

> > > +size_t user_backtrace_list(uintptr_t pc, uintptr_t fp, uintptr_t
> > > *pcs,
> > > + size_t nr_slots)
> > > +{
> > > + size_t nr_pcs = 0;
> > > +
> > > + for (;;) {
> > > + if (unlikely(fp >= UMAPTOP) || unlikely(fp <
> > > BRK_END) ||
> >
> > Why not is_user_raddr()? And once you use copy_from, those checks
> > aren't needed
> > either. Also, I think it is possible for frame pointers to be below
> > BRK_END,
> > which could happen if a user stack was malloc'd instead of mmapped.
> >
> > > + unlikely(nr_pcs >= nr_slots))
> > > + break;
> > > +
> > > + /* For now straight memory access, waiting for
> > copy_from_user(). */
> >
> > copy_from is now available. =)
> >
>
> I will do that is a separate CL.

The beauty of Git is that you can make that small change (just use
copy_from and remove the UMAPTOP and BRK_END checks) as a "WIP-commit",
then do a rebase -i and meld that fixup into this existing commit.

Yes, but this branch became older than Matusalem, this is why I wanted to defer.

Defer for me, is not a month ahead, is like 2 days (or even before, if this gets merged).

Anyway, doing it in this CR for this.

I think it is better, because this might be called from printk(), from places where we can't sleep, not to KMALLOC_WAIT.

Considering also the fact that this is called *very* early in the boot process (due to linkage to printk), so if we do not have memory at that point, we can pretty much panic() there.

> > > +#define trace_printx(args...) if (printx_on) trace_printk(TRUE,
> > > args)
> > ^FALSE?
> > We just want to print text, not a backtrace. That was also my
> > intent for trace_printk (it's just the print, not a backtrace).
> >
>
> Seems pretty useful to have the option to get a BT, to understand from
> which path you are coming from.
> A "grep" of the stirng simply tells where the print is.

I like the option for the backtrace. I'm just saying that i'd like a
simple trace_printk/printx that doesn't trigger the backtrace. we
already have TRACEME for the backtrace trace_printk.

Anyway, if you don't want to fix it, that's fine. If I get annoyed at
a long backtrace in the future when trace_printxing I'll change it.

trace_printx() like the name suggests, traces BT.

printx() call printk(), which does not trace.

> > This probably should be in a library or header (so we can reuse it)
> > and with a
> > little explanation. It looks like we take 7 bits of n at a time
> > and push it
> > into *data, with bit 0x80 signifying we're done or not.
> >
>
> Yep, that's what t does.

Cool, please note that in the code. I have a bad memory. =)

Done.

> I wasn't sure if variable byte encoding would be something a kernel
> would be doing elsewhere.

I wasn't sure either. It sounds pretty cool. We can keep it here for
now, or move it elsewhere if you'd like.

Yeah, when we have a search engine in the kernel, we can pull it out for the posting lists 😀

As far as the age of this branch goes, it's a huge change that takes a
while to get through. Maybe I'm just slow, but I've spent 2 days on
this and counting.

Well, they say on AVG it takes 1/8 of the time taken to write the code, to review it.

So do your math ☺

Ah, I see. In the future, you know you were given a block and you want
to return that block in a ready-to-go manner.

This is another case where we can make the function you need and add it
to the block code. That way you also don't need to worry about
checking the return value of this function (since iallocb can fail, but
if you return the block you got originally, that can't fail).

Not to harp on this too much, but I would rather have had that block
code done as a support patch before this commit, and then used here,
than doing it the way it is. That also would have been clearer to me,
would have improved the kernel, and would have saved some time. =)

But feel free to leave it as is now, and we can address that with a
future TODO.

IMO it is a simple performance optimization, which can be done as follow up (together with the other drop tail if overflow change).

Blocks are written at the rate of an handful per second.

> > + qdropoverflow(profiler_queue, 1);
> > > + qnonblock(profiler_queue, 1);
> >
> > As mentioned in a previous patch, you might not want qnonblock.
> >
>
> Not sure why not?

Because they do similar things in different ways.

Removed nonblock.

> > > + profiler_percpu_ctx =
> > > + kzmalloc(sizeof(*profiler_percpu_ctx) * num_cores,
> > KMALLOC_WAIT);
> > > + if (!profiler_percpu_ctx)
> > > + error(-ENOMEM, NULL);
> >
> > Since you did a KMALLOC_WAIT, you will never
> > have !profiler_percpu_ctx
>
> I am not sure waiting is a guarantee that you will get memory anyway.

I think it is in other OSs. In Akaros, the implementation isn't done
yet, but the design is that if you say KMALLOC_WAIT (or whatever we
call it later), that your thread will block until the memory allocation
succeeds.

Removed the test for NULL.

IMO though, it is a bad idea to remove checks, as now you are bound to a design.

> > -static inline void profiler_begin_trace(struct profiler_cpu_context
> > *cpu_buf)
> > > +int profiler_configure(struct cmdbuf *cb)
> > > {
> > > - cpu_buf->tracing = 1;
> > > + if (!strcmp(cb->f[0], "prof_qlimit")) {
> > > + if (cb->nf < 2)
> > > + error(EFAIL, "prof_qlimit KB");
> > > + profiler_queue_limit = atoi(cb->f[1]) * 1024;
> >
> > Probably want a sanity check. Also, what happens when this changes
> > after the
> > queue was already allocated?
> >
>
> The configuration must be done before the profiler is started.

Ok, then we need a check to return an error for this case. Otherwise a
buggy user could change that setting on the fly.

OK. I fail on queue limit, if running, while the percpu buffer size can be changed on the fly.

> > But there's more to it than that. The assumption here is that once
> > profiler_users == 0, then there is no code, (RKM, pending IRQ,
> > etc.) that has
> > access to profiler_percpu_ctx or profiler_queue.
> >
> > Just as an example, say someone is calling profiler_notify_mmap(),
> > which checks
> > profiler_percpu_ctx. They get past the if () check, then
> > concurrently someone
> > calls profiler_cleanup and triggers free_cpu_buffers. Then the
> > original thread
> > eventually calls profiler_push_pid_mmap, then qwrite, then page
> > faults.
> >
> > So a user could theoretically trigger a PF. This is part of the
> > reason why I
> > was reluctant to have you try and free the buffers.
> >
>
> Well, we have no RCU ___
> In such case we should swap ptr with NULL, and RCU-free it.

RCU wouldn't help as much here either, since the profiler_notify_mmap()
thread could have blocked at some point. And sleeping RCU sounds like
a pain.

And I imagine there are other places than this one example where it is
dangerous to free the buffers.

So I'm not convinced that this assumption is true:

Sleeping RCU is not a problem. We did it in userspace (arguably more difficult than kernel) in Cassini, and the whole thing was less 1000 lines of code 😉

> > The assumption here is that once profiler_users == 0, then there
> > is no code, (RKM, pending IRQ, etc.) that has access to
> > profiler_percpu_ctx or profiler_queue.

One simple "fix" would be to not free the buffers, and put a TODO on
it. that would alleviate the risk of the page fault (possibly caused
by a malicious user) at the expense of a small amount of memory.

Here we have two paths.

The one used to write normal traces, that do not block (better be, being called from timer IRQ).

The one coming from mmap and process announcements, which ATM call qwrite(), which might block

I changed those to qiwrite(), as well the dropping the KMALLOC_WAIT in the functions that post mmap and process announcements.

This better be, as enum vmrs runs with a spinlock held.

Using kref coupled with qlock, we should be able to get the necessary semantics.

I added two commits (WIP:PROFILER and WIP:KPROF) at the end of the stream, which did not squash in the proper place yet.

Take a look. I really do not like the idea of having resources left allocated when the functionalities are not used.

Davide Libenzi

unread,

Nov 14, 2015, 9:06:47 AM11/14/15

to Akaros

Here a link to the top of the branch where the two WIP are:

https://github.com/dlibenzi/akaros/commits/profiler_dev_v2

Davide Libenzi

unread,

Nov 14, 2015, 9:32:19 AM11/14/15

to Akaros

Also, I am planning the new perf tool which is is the devarch_msr_cont branch, to also have options to activate and deactivate the profiler, hiding the /prof/ VFS internals.

Like:

$ prof --start-profiler ..other_options..

$ prof --stop-profiler ..other_options..

...

Eventually I will move the vast part of the kprof2perf file format converter tool, in a library, that both the new prof tool, and kprof2perf can use.

Barret Rhoden

unread,

Nov 17, 2015, 6:20:13 PM11/17/15

to Akaros

Hi -

On 2015-11-13 at 19:54 "'Davide Libenzi' via Akaros"

<aka...@googlegroups.com> wrote:
> Coming to error(), should I always waserror() in my code, even if I
> have nothing to be undone?
> Because, that kinda defeats the exception-look-alike error model.

I think no. You only need a waserror if you need to cleanup if a
function you call throws. If you have nothing to cleanup or if you
know that you don't call something that might throw, then you don't
need a waserror block.

> circular_buffer_write already has the comments in its body, which
> describes the operation.
> Adding some more to the skip API.

What I was getting at is that if someone looks at the code for
circular_buffer.h or circular_buffer.c, they won't know what it does
without looking into the code. This makes it more difficult to use.

Compare that to kern/include/trace.h, which has a bit of an explanation
and tells people how to use it.

> I added two commits (WIP:PROFILER and WIP:KPROF) at the end of the
> stream, which did not squash in the proper place yet.
> Take a look. I really do not like the idea of having resources left
> allocated when the functionalities are not used.

It seems okay. I understand the desire to free unused resources, but
the tradeoff is complexity and possible bugs. I don't know that a
small amount of RAM is worth that. But we'll see how this works out. =)

Anyway, once you squash those last two commits or do any other stuff,
let me know and I'll merge your latest.

Barret

Davide Libenzi

unread,

Nov 17, 2015, 7:53:46 PM11/17/15

to Akaros

Commits squashed.

Barret

Davide Libenzi

unread,

Nov 18, 2015, 7:57:06 AM11/18/15

to Akaros

Also added comment section to the head of the circular buffer header file.

Barret Rhoden

unread,

Nov 18, 2015, 1:28:27 PM11/18/15

to aka...@googlegroups.com

Thanks, I'll merge this all as is, to master at df0c3f4..bb3e85a (from,
to]

You can see the entire diff with 'git diff' or at
https://github.com/brho/akaros/compare/df0c3f4...bb3e85a

Can you fix up a couple more things in later commits? The main problem
I'm having is with the support tools, snc and kprof2perf.

- there is no output from the build or install process from these.
they silently build and (for snc) install to KFS.
- I can't cd into their directories and run make. For things like
tools/apps/busybox, you can cd into the directory and build it
manually.

Also, snc and kprof2perf leave blobs around that are tracked by git.
I fixed this myself with the following patch:

commit f622088961857a3153b8a3485d27e77dd3a77c4a
Author: Barret Rhoden <br...@cs.berkeley.edu>
Date: Wed Nov 18 10:17:42 2015 -0800

Ignore build output for snc and kprof2perf in git

Any byproducts of building must be ignored by git.

Signed-off-by: Barret Rhoden <br...@cs.berkeley.edu>

diff --git a/tools/apps/snc/.gitignore b/tools/apps/snc/.gitignore
new file mode 100644
index 0000000..29cf51c
--- /dev/null
+++ b/tools/apps/snc/.gitignore
@@ -0,0 +1 @@
+snc
diff --git a/tools/profile/kprof2perf/.gitignore b/tools/profile/kprof2perf/.gitignore
new file mode 100644
index 0000000..eaa7b7f
--- /dev/null
+++ b/tools/profile/kprof2perf/.gitignore
@@ -0,0 +1 @@
+kprof2perf-*

which is now merged to master as well.

On 2015-11-18 at 04:57 "'Davide Libenzi' via Akaros"

Reply all

Reply to author

Forward