Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

DTrace for FreeBSD - Status Update

15 views
Skip to first unread message

John Birrell

unread,
May 25, 2006, 2:55:10 AM5/25/06
to
It's nearly 8 weeks since I started porting DTrace to FreeBSD and I
thought I would post a status update including today's significant
emotional event. 8-)

For those who don't know what DTrace is or which company designed it,
here are a few links:

The BigAdmin: <http://www.sun.com/bigadmin/content/dtrace/>
A Blurb: <http://www.sun.com/2004-0518/feature/index.html>
The Guide: <http://docs.sun.com/app/docs/doc/817-6223>
My FreeBSD Project Page: <http://people.freebsd.org/~jb/dtrace/index.html>

Much of the basic DTrace infrastructure is in place now. Of the 1039
DTrace tests that Sun runs on Solaris, 793 now pass on FreeBSD.

We've got the following providers:

- dtrace
- profile
- syscall
- sdt
- fbt

As of today, loading those providers on a GENERIC kernel gives 32,519 probes.

Today's significant emotional event added over 30,000 of those, thanks
to the Function Boundary Tracing (fbt) provider. It provides the
instrumentation of the entry and return of every (non-leaf) function
in the kernel and (non-DTrace provider) modules.

Here is an example of what fbt can do.... The following script creates
a probe on the entry to the kernel malloc() function. It dereferences
the second argument to the malloc_type structure and then quantizes the
size of the mallocs being made according to the malloc type name.

The script:

fbt:kernel:malloc:entry
{
mt = (struct malloc_type *) arg1;
@[stringof(mt->ks_shortdesc)] = quantize(arg0)
}


The output:


vmem
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 56
8 | 0

ufs_dirhash
value ------------- Distribution ------------- count
4 | 0
8 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6
16 | 0
32 | 0
64 | 0
128 | 0
256 |@@@@@@@@@@@@@ 3
512 | 0

UMAHash
value ------------- Distribution ------------- count
512 | 0
1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1
2048 | 0

vnodemarker
value ------------- Distribution ------------- count
128 | 0
256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 6
512 | 0

Unitno
value ------------- Distribution ------------- count
8 | 0
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 130
32 | 0

sysctl
value ------------- Distribution ------------- count
4 | 0
8 |@@@@@@@@@@@@@@@@@@ 77
16 |@@@@@@@@@@@@@@@@@@@@@@ 95
32 | 0

DEVFS3
value ------------- Distribution ------------- count
32 | 0
64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 56
128 | 0

plimit
value ------------- Distribution ------------- count
64 | 0
128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 28
256 | 0

proc-args
value ------------- Distribution ------------- count
16 | 0
32 |@@@@@@@@@@@@@@@@@@@@@@ 48
64 |@@@@@@@@@@@@@@@@@@ 38
128 | 0

zombie
value ------------- Distribution ------------- count
32 | 0
64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 86
128 | 0

kmem
value ------------- Distribution ------------- count
16 | 0
32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 24
64 | 0
128 | 0
256 | 0
512 |@@@@@@@@@@@@ 10
1024 | 0

sysctltmp
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@ 28
8 |@@@@@@@@@@ 56
16 |@@@@@@@@@@ 56
32 |@@@@@@@@@@ 56
64 | 0
128 |@@@@@ 28
256 | 0

filedesc
value ------------- Distribution ------------- count
64 | 0
128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 86
256 | 0

nfsclient_req
value ------------- Distribution ------------- count
32 | 0
64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 213
128 | 0

DEVFS1
value ------------- Distribution ------------- count
64 | 0
128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 112
256 | 0

ioctlops
value ------------- Distribution ------------- count
2 | 0
4 |@@@@@@@@@@@@@@@@@@@@ 573
8 |@ 30
16 |@@ 60
32 |@@@@@@@@@ 264
64 |@@ 60
128 | 0
256 |@@@@@@ 175
512 | 0

soname
value ------------- Distribution ------------- count
8 | 0
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 8991
32 | 0

subproc
value ------------- Distribution ------------- count
1024 | 0
2048 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 86
4096 | 0

cred
value ------------- Distribution ------------- count
32 | 0
64 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10403
128 | 0

nfsserver_srvdesc
value ------------- Distribution ------------- count
4 | 0
8 |@@@@@@@@@@@@@@@@@@@@ 8991
16 | 0
32 | 0
64 | 0
128 |@@@@@@@@@@@@@@@@@@@@ 8991
256 | 0

temp
value ------------- Distribution ------------- count
4 | 0
8 |@@@@@@@@@@@@@ 935
16 |@@ 151
32 |@@@ 184
64 |@ 66
128 |@ 97
256 | 30
512 | 22
1024 | 13
2048 | 4
4096 | 28
8192 |@@@@@@@@@@@@@@@@@@@ 1359
16384 | 0

dtrace
value ------------- Distribution ------------- count
0 | 0
1 |@ 23
2 | 19
4 |@@@ 118
8 |@@@@@ 182
16 |@@@@@ 211
32 |@@@@@@@@@@@@@@@@@ 689
64 |@ 31
128 |@ 29
256 |@@ 99
512 |@ 24
1024 |@@@ 135
2048 | 5
4096 | 0
8192 | 0
16384 | 0
32768 | 0
65536 | 0
131072 | 0
262144 | 0
524288 | 0
1048576 | 10
2097152 | 0
4194304 |@ 20
8388608 | 0

There is still a lot of work to do and while that goes on, the code has
to remain in the FreeBSD perforce server. It isn't ready to get merged
into CVS-current yet.

I have asked the perforce-admins to mirror the project out to CVS (via
cvsup10.freebsd.org), but I'm not sure what the hold-up there is.

I had hoped that one or two of the Google SoC students would contribute
to this, but I only received one proposal and that wasn't for anything
that would help get DTrace/FreeBSD completed.

There are things people can do to help. Some of them are build related;
some are build tool related; some are user-land DTrace specific; and the
rest are kernel related. Speak up if you are interested in working on
this!

--
John Birrell
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Peter Jeremy

unread,
May 25, 2006, 4:26:33 AM5/25/06
to
On Thu, 2006-May-25 06:55:10 +0000, John Birrell wrote:
>My FreeBSD Project Page: <http://people.freebsd.org/~jb/dtrace/index.html>

Your "What processes fork during a buildworld and how many times?" output
doesn't look right: make should do lots of forking.

>There are things people can do to help. Some of them are build related;
>some are build tool related; some are user-land DTrace specific; and the
>rest are kernel related. Speak up if you are interested in working on
>this!

I'd like to help but not sure if I have the time. Do you have more
detail on what is needed?

BTW, how much of DTrace is MD and what CPU architectures are supported?

--
Peter Jeremy

Andrew Gallatin

unread,
May 25, 2006, 1:31:02 PM5/25/06
to

John Birrell writes:
> We've got the following providers:
>
> - dtrace
> - profile
> - syscall
> - sdt
> - fbt
>
> As of today, loading those providers on a GENERIC kernel gives 32,519 probes.

Awesome! As somebody who does a lot of driver development on
Solaris, I feel naked without dtrace. I'm very glad it is coming
to FreeBSD.

Do you plan to also port lockstat? That could be very useful in
the ongoing SMPng'ification of the kernel.

Drew


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de

Julian Elischer

unread,
May 25, 2006, 2:33:31 PM5/25/06
to
Peter Jeremy wrote:

>On Thu, 2006-May-25 06:55:10 +0000, John Birrell wrote:
>
>
>>My FreeBSD Project Page: <http://people.freebsd.org/~jb/dtrace/index.html>
>>
>>
>
>Your "What processes fork during a buildworld and how many times?" output
>doesn't look right: make should do lots of forking.
>
>

make probably does 'vfork()'

>
>
>>There are things people can do to help. Some of them are build related;
>>some are build tool related; some are user-land DTrace specific; and the
>>rest are kernel related. Speak up if you are interested in working on
>>this!
>>
>>
>
>I'd like to help but not sure if I have the time. Do you have more
>detail on what is needed?
>
>BTW, how much of DTrace is MD and what CPU architectures are supported?
>
>
>

Kip Macy

unread,
May 25, 2006, 3:01:33 PM5/25/06
to
On 5/25/06, Julian Elischer <jul...@elischer.org> wrote:
> Peter Jeremy wrote:
>
> >On Thu, 2006-May-25 06:55:10 +0000, John Birrell wrote:
> >
> >
> >>My FreeBSD Project Page: <http://people.freebsd.org/~jb/dtrace/index.html>
> >>
> >>
> >
> >Your "What processes fork during a buildworld and how many times?" output
> >doesn't look right: make should do lots of forking.
> >
> >
>
> make probably does 'vfork()'

Correct. Make uses vfork().

-Kip

John Birrell

unread,
May 25, 2006, 3:29:42 PM5/25/06
to
On Thu, May 25, 2006 at 01:31:02PM -0400, Andrew Gallatin wrote:
> Do you plan to also port lockstat? That could be very useful in
> the ongoing SMPng'ification of the kernel.

Yes.

I'm doing things in the order advised by Sun. lockstat is one of the next
things on their list.

--
John Birrell

John Birrell

unread,
May 25, 2006, 3:53:46 PM5/25/06
to
On Thu, May 25, 2006 at 06:26:33PM +1000, Peter Jeremy wrote:
> I'd like to help but not sure if I have the time. Do you have more
> detail on what is needed?

Most of the stuff is listed in my to-do list.

On the build side, the CTF tools are threaded and I am building them as
bootstrap tools because they are required to add the CTF data to objects
and programs during a buildworld. Sun's code requires a couple of extra
non-standard functions in libpthread. I've added those, but doing so
creates an upgrade issue which could be solved either by building
libpthread in the bootstrap phase of buildworld (yuk) or by creating stubs
with weak symbols in the CTF library to resolve the missing functions
during the upgrade.

The CTF tools themselves complain about some things in buildworld which
result in things not being added to the CTF data. This is pretty simple
to debug, but it is time consuming.

> BTW, how much of DTrace is MD and what CPU architectures are supported?

Most of DTrace is machine independent. Sun's code supports Sparc, i386
and amd68, so it's 64-bit clean and supports both endians.

Internally DTrace uses what is known as DTrace Intermediate Format (DIF)
which is an interpreted instruction set based on RISC-like instructions.
All that code is machine independent and thus easy to get working on
other architectures.

The place were DTrace is really, really machine dependent is in the trap
handling code. DTrace has what it calls 'safe' loads where it goes to
read from a memory address which a flag set to stop a panic if a trap
occurs during the message access. Also, the Function Boundary Tracing
(fbt) provider hooks itself into the code at runtime by replacing certain
instructions with an invalid opcode so that it deliberately causes a
trap which it then catches to do it's magic. At least that's how it works
on i386/amd64. On Sparc it's done differently -- I haven't looked into
that yet because I don't have access to a Sparc machine (and to port
code that affects trap handling, I really need the machine next to me
so that I can crash it frequently 8->).

--
John Birrell

g...@freebsd.org

unread,
May 26, 2006, 4:54:43 AM5/26/06
to
Excellent!

But, I have a naive question. Should this be integrated in some way
with the hwpmc work of Joseph Koshy since I think it could be useful
for DTRace to get information from the CPU as well.

Is there a quick "HOw to use DTRace" anywhere? I'd like to use it in
my networking stuff when I get to the point of caring more about
performance than correctness.

Thanks,
George

Joseph Koshy

unread,
May 26, 2006, 11:18:21 AM5/26/06
to
> But, I have a naive question. Should this be integrated in
> some way with the hwpmc work of Joseph Koshy since I think it
> could be useful for DTRace to get information from the CPU as
> well.

Disclaimer: I've only just started reading about DTrace.

There appear to be two ways to integrate hwpmc and DTrace:

- Augment the D virtual machine with a primitive that can
read PMC values (e.g.- using RDPMC or RDMSR instructions
on x86 CPUs). Make this primitive available
to scripts for allocating and reading from PMCs (say
a "pmcread()" builtin function).

This approach would work well with counting mode PMCs
(both process and system-mode counting PMCs) and would
allow PMCs to be read at arbitrary points of time.

We'll need a way of allocating system-wide & process-mode
PMCs; this could be done in userland (in dtrace(8)).

- hwpmc(4) can be augmented to be a 'DTrace provider'
allowing D scripts to be run, say when a PC sample is
recorded.

--
FreeBSD Developer, http://people.freebsd.org/~jkoshy

Ivan Voras

unread,
May 26, 2006, 11:23:07 AM5/26/06
to
John Birrell wrote:
> It's nearly 8 weeks since I started porting DTrace to FreeBSD and I
> thought I would post a status update including today's significant
> emotional event. 8-)

Extremely nice, thank you!

It would be a nice early demo to do "Quantize the size of reads during a
buildworld and beyond" on MySQL & supersmack :)

Joseph Koshy

unread,
May 26, 2006, 11:31:12 AM5/26/06
to
> The place were DTrace is really, really machine dependent is
> in the trap handling code. DTrace has what it calls 'safe'
> loads where it goes to read from a memory address which a
> flag set to stop a panic if a trap occurs during the
> message access.

Is there any way we can do some code refactoring when
DTrace is brought in?

For example, Dtrace has a 'stack()' primitive that walks
the kernel stack and a 'ustack()' primitive that walks
userland stacks.

Both of these are useful for hwpmc, and are useful in
other contexts (e.g., recording stack traces for userland
processes that dump core).

Similarly, alq(9), ktrace(2) and hwpmc(4) all implement
kernel->userland logging in some form or the other.
DTrace's logging requirements are probably a superset of
all of these so having a common logging layer could help
reduce code bloat in the kernel.

--
FreeBSD Developer, http://people.freebsd.org/~jkoshy

Gavin Atkinson

unread,
May 26, 2006, 11:38:04 AM5/26/06
to
On Fri, 2006-05-26 at 17:54 +0900, g...@freebsd.org wrote:
> Excellent!
>
> But, I have a naive question. Should this be integrated in some way
> with the hwpmc work of Joseph Koshy since I think it could be useful
> for DTRace to get information from the CPU as well.
>
> Is there a quick "HOw to use DTRace" anywhere? I'd like to use it in
> my networking stuff when I get to the point of caring more about
> performance than correctness.

This is the best resource I have found for using DTrace under Solaris:

http://users.tpg.com.au/adsln4yb/dtrace.html

It is great to see DTrace coming to FreeBSD, I've found myself recently
developing code on Solaris that I have no intention of actually using
under Solaris, just because DTrace is available.

Gavin

John Birrell

unread,
May 26, 2006, 3:50:18 PM5/26/06
to
On Fri, May 26, 2006 at 09:01:12PM +0530, Joseph Koshy wrote:
> >The place were DTrace is really, really machine dependent is
> >in the trap handling code. DTrace has what it calls 'safe'
> >loads where it goes to read from a memory address which a
> >flag set to stop a panic if a trap occurs during the
> >message access.
>
> Is there any way we can do some code refactoring when
> DTrace is brought in?
>
> For example, Dtrace has a 'stack()' primitive that walks
> the kernel stack and a 'ustack()' primitive that walks
> userland stacks.
>
> Both of these are useful for hwpmc, and are useful in
> other contexts (e.g., recording stack traces for userland
> processes that dump core).
>
> Similarly, alq(9), ktrace(2) and hwpmc(4) all implement
> kernel->userland logging in some form or the other.
> DTrace's logging requirements are probably a superset of
> all of these so having a common logging layer could help
> reduce code bloat in the kernel.

The problem with doing this is that DTrace is licensed under Sun's
CDDL. There is a software re-distribution requirement of the
CDDL, but it isn't viral and it only affects the files that Sun
provides.

There are actually a handful of files in OpenSolaris that
originated in FreeBSD. One of those is sys/i386/i386/exception.s
which is where the FBT/SDT invalid opcode hooks are.

I've added a KDTRACE kernel option which compiles in the hooks that
DTrace uses and a bit of extra exception handling code. This
is only minimal bloat.

Apart from those hooks, the DTrace kernel functionality is in the
'dtrace' device which is also the 'dtrace' provider. The other
providers register themselves with the 'dtrace' device.

If the DTrace device modules aren't loaded, there is very little
code in the kernel to share.

--
John Birrell

John Birrell

unread,
May 26, 2006, 4:03:22 PM5/26/06
to
On Fri, May 26, 2006 at 08:48:21PM +0530, Joseph Koshy wrote:
> >But, I have a naive question. Should this be integrated in
> >some way with the hwpmc work of Joseph Koshy since I think it
> >could be useful for DTRace to get information from the CPU as
> >well.
>
> Disclaimer: I've only just started reading about DTrace.
>
> There appear to be two ways to integrate hwpmc and DTrace:
>
> - Augment the D virtual machine with a primitive that can
> read PMC values (e.g.- using RDPMC or RDMSR instructions
> on x86 CPUs). Make this primitive available
> to scripts for allocating and reading from PMCs (say
> a "pmcread()" builtin function).
>
> This approach would work well with counting mode PMCs
> (both process and system-mode counting PMCs) and would
> allow PMCs to be read at arbitrary points of time.
>
> We'll need a way of allocating system-wide & process-mode
> PMCs; this could be done in userland (in dtrace(8)).
>
> - hwpmc(4) can be augmented to be a 'DTrace provider'
> allowing D scripts to be run, say when a PC sample is
> recorded.

Before modifying the virtual machine in FreeBSD's DTrace port, it would
be best to discuss this in the "DTrace Community"
<http://www.opensolaris.org/jive/forum.jspa?forumID=7>

The Sun guys will guide us on which way they think is appropriate.

--
John Birrell

John Birrell

unread,
May 26, 2006, 4:09:27 PM5/26/06
to
On Fri, May 26, 2006 at 05:54:43PM +0900, g...@FreeBSD.org wrote:
> Is there a quick "HOw to use DTRace" anywhere? I'd like to use it in
> my networking stuff when I get to the point of caring more about
> performance than correctness.

The DTrace guide <http://docs.sun.com/app/docs/doc/817-6223> is a fairly
large document. The first 10 chapters are important reading.

Other than that there are plenty of examples like:
<http://www.solarisinternals.com/si/dtrace/index.php>

--
John Birrell

Daniel O'Connor

unread,
May 27, 2006, 6:18:32 AM5/27/06
to
On Saturday 27 May 2006 05:20, John Birrell wrote:
> I've added a KDTRACE kernel option which compiles in the hooks that
> DTrace uses and a bit of extra exception handling code. This
> is only minimal bloat.
>
> Apart from those hooks, the DTrace kernel functionality is in the
> 'dtrace' device which is also the 'dtrace' provider. The other
> providers register themselves with the 'dtrace' device.
>
> If the DTrace device modules aren't loaded, there is very little
> code in the kernel to share.

Factoring out the common code into a separate module that dtrace/ktrace/etc
can depend would be a good approach IMO.

Although having just a single source copy but compiling it N times would be
better (modulo licensing concerns - perhaps the hwpmc version could be used?)

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

Robert Watson

unread,
May 30, 2006, 4:52:22 AM5/30/06
to

On Thu, 25 May 2006, John Birrell wrote:

> I had hoped that one or two of the Google SoC students would contribute to
> this, but I only received one proposal and that wasn't for anything that
> would help get DTrace/FreeBSD completed.
>
> There are things people can do to help. Some of them are build related; some
> are build tool related; some are user-land DTrace specific; and the rest are
> kernel related. Speak up if you are interested in working on this!

So, I sync'd up the dtrace branch on my new test box, and pretty rapidly ran
into problems:

cc -o gethost
-L/usr/obj/usr/home/robert/p4/projects/dtrace/src/tmp/legacy/usr/lib -O2
-fno-strict-aliasing -pipe -I.
-I/usr/home/robert/p4/projects/dtrace/src/bin/csh
-I/usr/home/robert/p4/projects/dtrace/src/bin/csh/../../contrib/tcsh
-D_PATH_TCSHELL='"/bin/csh"' -DHAVE_ICONV -g
-I/usr/obj/usr/home/robert/p4/projects/dtrace/src/tmp/legacy/usr/include
/usr/home/robert/p4/projects/dtrace/src/bin/csh/../../contrib/tcsh/gethost.c
===> bin/sh (obj,build-tools)
cc -O2 -fno-strict-aliasing -pipe -DSHELL -I.
-I/usr/home/robert/p4/projects/dtrace/src/bin/sh -g
-I/usr/obj/usr/home/robert/p4/projects/dtrace/src/tmp/legacy/usr/include -c
/usr/home/robert/p4/projects/dtrace/src/bin/sh/mkinit.c
ctfconvert -L VERSION mkinit.o
ctfconvert:No such file or directory
*** Error code 1

Stop in /usr/home/robert/p4/projects/dtrace/src/bin/sh.
*** Error code 1

Stop in /usr/home/robert/p4/projects/dtrace/src.
*** Error code 1

Stop in /usr/home/robert/p4/projects/dtrace/src.
*** Error code 1

Stop in /usr/home/robert/p4/projects/dtrace/src.

Sounds like ctfconvert needs to become a build tool?

Robert N M Watson


_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

--

Peter Jeremy

unread,
Jun 18, 2006, 5:58:15 AM6/18/06
to
On Thu, 2006-May-25 19:53:46 +0000, John Birrell wrote:
>On the build side, the CTF tools are threaded and I am building them as
>bootstrap tools because they are required to add the CTF data to objects
>and programs during a buildworld.

ctfmerge uses an undefined function getpagesizes(). This appears to
be hidden if you don't have CFLAGS=-O because the function calling
getpagesizes() - bigheap() - is itself static and only referenced via
"#pragma init(bigheap)" which doesn't appear to be supported by gcc.

bigheap()'s only purpose appears to be to align the heap to the largest
page size. I'm not sure why this is being done and the side-effect of
failing to do so.

--
Peter Jeremy

0 new messages