Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

contiguous memory allocation problem

0 views
Skip to first unread message

Hans Petter Selasky

unread,
Jun 30, 2006, 2:31:46 PM6/30/06
to
Hi,

I sometimes see that the USB driver is unable to allocate contiguous memory
for itself. For example I noticed that FreeBSD was unable to allocate
350kbytes of contiguous memory after that I had run "konqueror", the KDE web
browser and various other memory consuming applications for a while.

I am thinking about pre-allocating some memory for USB, but isn't that the job
of bus-dma, which the USB system uses for memory allocation?

The machine in question is running FreeBSD 7-current from April.

Any comments?

Thanks,
--HPS
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"

Kip Macy

unread,
Jun 30, 2006, 3:17:18 PM6/30/06
to
FreeBSD's strategy for doing page coloring makes contiguous memory
allocation much past boot quite difficult. This will change when
generalized superpage support is brought in (I hope in the near
future).

-Kip

Peter Jeremy

unread,
Jun 30, 2006, 4:37:20 PM6/30/06
to

--Yia77v5a8fyVHJSl
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, 2006-Jun-30 20:29:28 +0200, Hans Petter Selasky wrote:
>I sometimes see that the USB driver is unable to allocate contiguous memor=
y=20
>for itself. For example I noticed that FreeBSD was unable to allocate=20
>350kbytes of contiguous memory after that I had run "konqueror", the KDE w=
eb=20


>browser and various other memory consuming applications for a while.

I reported this about 18 months ago and I'm fairly certain you even
contributed to the thread at the time.

>Any comments?

The latest concensus seems to be that the USB system should make use of
the scatter-gather facilities in the hardware to avoid the need to
allocate large contiguous memory chunks. iedowse@ had mostly finished
implementing this in mid May.

--=20
Peter Jeremy

--Yia77v5a8fyVHJSl
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEpYpZ/opHv/APuIcRAjxVAKCTqQ0w/BhWCkGJ2Xg5fZE/MyALswCcDxw2
9FmvK4D82NGvlcZj1JE9+ng=
=DFrr
-----END PGP SIGNATURE-----

--Yia77v5a8fyVHJSl--

Hans Petter Selasky

unread,
Jul 1, 2006, 5:14:34 AM7/1/06
to
On Friday 30 June 2006 22:32, Peter Jeremy wrote:
> On Fri, 2006-Jun-30 20:29:28 +0200, Hans Petter Selasky wrote:
> >I sometimes see that the USB driver is unable to allocate contiguous
> > memory for itself. For example I noticed that FreeBSD was unable to
> > allocate 350kbytes of contiguous memory after that I had run "konqueror",
> > the KDE web browser and various other memory consuming applications for a

> > while.
>
> I reported this about 18 months ago and I'm fairly certain you even
> contributed to the thread at the time.
>
> >Any comments?
>
> The latest concensus seems to be that the USB system should make use of
> the scatter-gather facilities in the hardware to avoid the need to
> allocate large contiguous memory chunks. iedowse@ had mostly finished
> implementing this in mid May.

Yes, but scatter and gather will add extra complexity to the driver, and maybe
an extra memory copy in most cases. The idea is to allocate less than or
equal to a page of memory, and then avoid the problem?

The most important thing is to keep memory allocations of constant size. For
example under my USB system, all memory is allocated at attach. There is no
longer allocation and freeing of memory during usage, with a few exceptions.
I was thinking I could pre-allocate 2-4MB for the USB system, then make a
list of freed memory blocks, and then search this list first, before
allocating new memory.

Depending on how many different kinds of equipment one plugs, this should work
fine.

What do you think?

Hans Petter Selasky

unread,
Jul 1, 2006, 9:33:14 AM7/1/06
to
On Saturday 01 July 2006 14:34, Peter Jeremy wrote:

> On Sat, 2006-Jul-01 10:44:54 +0200, Hans Petter Selasky wrote:
> >Yes, but scatter and gather will add extra complexity to the driver, and
> > maybe an extra memory copy in most cases. The idea is to allocate less
> > than or equal to a page of memory, and then avoid the problem?
>
> The idea is to allocate multiple disjoint pages of physical memory.
> The USB hardware is capable of supporting this and page translation
> hides the lack of contiguousness from userland. I don't see why
> additional memory copies would be required.

Because most USB device drivers depend on a contigous buffer. In your case one
would have to implement something similar to "uiomove()" for copying, that
will do the scatter and gathering ?

> >The most important thing is to keep memory allocations of constant size.
> > For example under my USB system, all memory is allocated at attach. There
> > is no longer allocation and freeing of memory during usage, with a few
> > exceptions.
>

> Are you talking about USB device attach - which could happen at any
> time whilst the system is running - or USB bus attach?

I am talking about the USB device attach.

>
> >I was thinking I could pre-allocate 2-4MB for the USB system, then make a
> >list of freed memory blocks, and then search this list first, before
> >allocating new memory.
>

> That strikes me as wasteful. Currently, umass needs something like
> 64K (or maybe 128K) and ulpt needs a few KB (they are the only USB
> devices I normally use). I would be surprised if the peak USB RAM
> requirement was more than 256K for most people.

Yes, but don't forget high-speed USB transfers. They require larger buffers.
For example 1024 bytes for ULPT is too little. The interrupt rate will be so
high, that it is unrealistic to transfer 20MB/s using 1024 byte interrupts.
My rewritten ULPT now uses "2*(1<<17)" buffers.

> "vmstat -m" on my
> systems shows that the current biggest consumer is devbuf - with
> 3-4MB. Most other consumers are orders of magnitude smaller than this
> (though my video capture card grabs about 4MB RAM when it's in use).

Peter Jeremy

unread,
Jul 1, 2006, 9:38:06 AM7/1/06
to

--uxuisgdDHaNETlh8

Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, 2006-Jul-01 10:44:54 +0200, Hans Petter Selasky wrote:
>Yes, but scatter and gather will add extra complexity to the driver, and m=
aybe
>an extra memory copy in most cases. The idea is to allocate less than or=
=20


>equal to a page of memory, and then avoid the problem?

The idea is to allocate multiple disjoint pages of physical memory.
The USB hardware is capable of supporting this and page translation
hides the lack of contiguousness from userland. I don't see why
additional memory copies would be required.

>The most important thing is to keep memory allocations of constant size. F=
or=20
>example under my USB system, all memory is allocated at attach. There is n=
o=20
>longer allocation and freeing of memory during usage, with a few exception=
s.=20

Are you talking about USB device attach - which could happen at any
time whilst the system is running - or USB bus attach?

>I was thinking I could pre-allocate 2-4MB for the USB system, then make a=
=20
>list of freed memory blocks, and then search this list first, before=20
>allocating new memory.

That strikes me as wasteful. Currently, umass needs something like
64K (or maybe 128K) and ulpt needs a few KB (they are the only USB
devices I normally use). I would be surprised if the peak USB RAM

requirement was more than 256K for most people. "vmstat -m" on my


systems shows that the current biggest consumer is devbuf - with
3-4MB. Most other consumers are orders of magnitude smaller than this
(though my video capture card grabs about 4MB RAM when it's in use).

--=20
Peter Jeremy

--uxuisgdDHaNETlh8
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)

iD8DBQFEpmvj/opHv/APuIcRAryCAJ9Y/Xk3IMRd1KWxs1/hJivW5Cv54wCgn79C
vGZ0NLw5uDP+8ZLkBaNWZ5k=
=0mb4
-----END PGP SIGNATURE-----

--uxuisgdDHaNETlh8--

David Malone

unread,
Jul 1, 2006, 8:43:04 PM7/1/06
to
On Sat, Jul 01, 2006 at 10:44:54AM +0200, Hans Petter Selasky wrote:
> > The latest concensus seems to be that the USB system should make use of
> > the scatter-gather facilities in the hardware to avoid the need to
> > allocate large contiguous memory chunks. iedowse@ had mostly finished
> > implementing this in mid May.
>
> Yes, but scatter and gather will add extra complexity to the driver, and maybe

> an extra memory copy in most cases. The idea is to allocate less than or
> equal to a page of memory, and then avoid the problem?

I believe the USB drivers in -current grew scatter/gather support
about a month ago. See the commit message below. Is this likely to
help?

David.

Use the limited scatter-gather capabilities of ehci, ohci and uhci
host controllers to avoid the need to allocate any multi-page
physically contiguous memory blocks. This makes it possible to use
USB devices reliably on low-memory systems or when memory is too
fragmented for contiguous allocations to succeed.

The USB subsystem now uses bus_dmamap_load() directly on the buffers
supplied by USB peripheral drivers, so this also avoids having to
copy data back and forth before and after transfers. The ehci and
ohci controllers support scatter/gather as long as the buffer is
contiguous in the virtual address space. For uhci the hardware
cannot handle a physical address discontinuity within a USB packet,
so it is necessary to copy small memory fragments at times.

Hans Petter Selasky

unread,
Jul 2, 2006, 5:40:50 AM7/2/06
to
On Saturday 01 July 2006 21:44, David Malone wrote:
> On Sat, Jul 01, 2006 at 10:44:54AM +0200, Hans Petter Selasky wrote:
> > > The latest concensus seems to be that the USB system should make use of
> > > the scatter-gather facilities in the hardware to avoid the need to
> > > allocate large contiguous memory chunks. iedowse@ had mostly finished
> > > implementing this in mid May.
> >
> > Yes, but scatter and gather will add extra complexity to the driver, and
> > maybe an extra memory copy in most cases. The idea is to allocate less
> > than or equal to a page of memory, and then avoid the problem?
>
> I believe the USB drivers in -current grew scatter/gather support
> about a month ago. See the commit message below. Is this likely to
> help?

I see.

But there is one problem, that has been overlooked, and that is High speed
isochronous transfers, which are not supported by the existing USB system. I
don't think that the EHCI specification was designed for scatter and gather,
when you consider this:

8 transfers of 0xC00 bytes has to fit on 7 pages. If this is going to work,
and I am right, one page has to contain two transfers. (see page 43 of
ehci-r10.pdf)

Currently the PAGE_SIZE is 0x1000, right? As you can see "(0xC00 * 2) >
0x1000".

So is it possible to change the PAGE_SIZE ?

--HPS

Ian Dowse

unread,
Jul 2, 2006, 8:09:39 AM7/2/06
to
In message <200607021138....@c2i.net>, Hans Petter Selasky writes:
>But there is one problem, that has been overlooked, and that is High speed
>isochronous transfers, which are not supported by the existing USB system. I
>don't think that the EHCI specification was designed for scatter and gather,
>when you consider this:
>
>8 transfers of 0xC00 bytes has to fit on 7 pages. If this is going to work,
>and I am right, one page has to contain two transfers. (see page 43 of
>ehci-r10.pdf)

I haven't looked into the details, but the text in section 3.3.3
seems to suggest that EHCI is designed to not require physically
contiguous allocations here either, so the same approach of using
bus_dmamap_load() should work:

This data structure requires the associated data buffer to be
contiguous (relative to virtual memory), but allows the physical
memory pages to be non-contiguous. Seven page pointers are provided
to support the expression of 8 isochronous transfers. The seven
pointers allow for 3 (transactions) * 1024 (maximum packet size)
* 8 (transaction records) (24576 bytes) to be moved with this
data structure, regardless of the alignment offset of the first
page.

Ian

Hans Petter Selasky

unread,
Jul 2, 2006, 8:35:53 AM7/2/06
to
On Sunday 02 July 2006 14:05, Ian Dowse wrote:
> In message <200607021138....@c2i.net>, Hans Petter Selasky
writes:
> >But there is one problem, that has been overlooked, and that is High speed
> >isochronous transfers, which are not supported by the existing USB system.
> > I don't think that the EHCI specification was designed for scatter and
> > gather, when you consider this:
> >
> >8 transfers of 0xC00 bytes has to fit on 7 pages. If this is going to
> > work, and I am right, one page has to contain two transfers. (see page 43
> > of ehci-r10.pdf)
>
> I haven't looked into the details, but the text in section 3.3.3
> seems to suggest that EHCI is designed to not require physically
> contiguous allocations here either, so the same approach of using
> bus_dmamap_load() should work:
>
> This data structure requires the associated data buffer to be
> contiguous (relative to virtual memory), but allows the physical
> memory pages to be non-contiguous. Seven page pointers are provided
> to support the expression of 8 isochronous transfers. The seven
> pointers allow for 3 (transactions) * 1024 (maximum packet size)
> * 8 (transaction records) (24576 bytes) to be moved with this
> data structure, regardless of the alignment offset of the first
> page.

3 * 1024 bytes = 0xC00 bytes

8 * 0xC00 = 0x6000 bytes maximum

According to this you need "6" "EHCI pages", because "6 * 0x1000 = 0x6000".
The seventh "EHCI page" is just there to allow one to start at any page
offset. There is no eight "EHCI page".

The only solution I see, is to have a double layer ITD. The first layer have
the 4 first transfers activated, and the second layer have the 4 last
transfers activated.

A little more complicated, but not impossible.

--HPS

Ian Dowse

unread,
Jul 2, 2006, 9:27:11 AM7/2/06
to
In message <200607021433....@c2i.net>, Hans Petter Selasky writes:
>On Sunday 02 July 2006 14:05, Ian Dowse wrote:
>> This data structure requires the associated data buffer to be
>> contiguous (relative to virtual memory), but allows the physical
>> memory pages to be non-contiguous. Seven page pointers are provided
>> to support the expression of 8 isochronous transfers. The seven
>> pointers allow for 3 (transactions) * 1024 (maximum packet size)
>> * 8 (transaction records) (24576 bytes) to be moved with this
>> data structure, regardless of the alignment offset of the first
>> page.
>
>3 * 1024 bytes = 0xC00 bytes
>
>8 * 0xC00 = 0x6000 bytes maximum
>
>According to this you need "6" "EHCI pages", because "6 * 0x1000 = 0x6000".
>The seventh "EHCI page" is just there to allow one to start at any page
>offset. There is no eight "EHCI page".
>
>The only solution I see, is to have a double layer ITD. The first layer have
>the 4 first transfers activated, and the second layer have the 4 last
>transfers activated.
>
>A little more complicated, but not impossible.

The trick is that if the 0x6000 bytes are contiguous in virtual
memory then they never span more than 6 pages so one iTD is enough.

i.e. you can just do malloc(0x6000) and you don't need multi-page
physically contiguous buffers or extra memory-memory copies regardless
of how the virtual buffer maps to physical pages. This seems to be
the general extent of scatter-gather support offered by the various
USB host controllers (modulo various caveats such as assuming pages
are >= 4k, handling physical addresses > 4GB on non-IOMMU hardware
and UHCI's lack of support for mid-packet non-contiguous page
boundaries).

Ian

Ian Dowse

unread,
Jul 2, 2006, 11:22:48 AM7/2/06
to
In message <2006070214...@nowhere.iedowse.com>, Ian Dowse writes:
>The trick is that if the 0x6000 bytes are contiguous in virtual
>memory then they never span more than 6 pages so one iTD is enough.

Sorry, I meant of course 6 page boundaries, which means no more
than 7 pages. This is why the 7 physical address slots in the iTD
is always enough for 8 x 3k transaction records if the 24k buffer
is contiguous in virtual memory.

Randall Hyde

unread,
Jul 2, 2006, 12:10:21 PM7/2/06
to

----- Original Message -----
From: "Peter Jeremy" <peter...@optushome.com.au>
To: "Randall Hyde" <rand...@earthlink.net>
Sent: Saturday, July 01, 2006 11:10 PM
Subject: Re: FLEX, was Re: Return value of malloc(0)

The following compiles without error:
%{
typedef int YYSTYPE;
typedef int YYLTYPE;

/*
** Allow for a recursive version of Bison parser.
*/
#undef YY_DECL
#define YY_DECL int yylex( YYSTYPE *yylval, YYLTYPE *yylloc)
%}

%%
. ECHO;
%%

I'll accept that you are having a problem getting HLA to build. No-one
else is reporting problems. If you want assistance from other people
then you are going to need to help by either providing a test case to
reproduce the failure you are seeing or you are going to need to provide
the pre-processed context where the error occurs.

<<<<<<<<<<<<<<<<<<<<<<<

Uh, is the above *not* the test case you are asking for?
Does this particular code snippet compile for you? If so, then I've
definitely got some configuration problems with GCC on my machine.

BTW, if I haven't made myself clear, the problem with the code above is a
GCC error, not a FLEX error. That is, FLEX happily produces the yy.lex.c
output file, GCC under FreeBSD rejects the source code it produces. Yet that
*same exact* source code compiles just fine under Linux with GCC, Windows
with Borland C++, and Windows with VC++.

Creating a recursive lexer is a documented feature of FLEX. Indeed, I used
the example present in the FLEX documentation to pull this off. And I've
been using this code for about eight years on Windows and about four years
on Linux. Perhaps FreeBSD types have never tried to create a recursive
parser/lexer and haven't had to deal with this issue, but as you can see
from the code snippet above, this really has nothing to do with HLA, per se,
other than the fact that it uses a recursive lexer/parser.

And the fact remains, the code compiles just fine under Windows and Linux.
The compilation uses a documented feature of FLEX. There *is* a problem with
the implementation of GCC under FreeBSD (or header files under FreeBSD).
It's not specific to HLA. And I can't imagine how to give you a better test
case than the code above.
Cheers,
Randy Hyde

Steve Kargl

unread,
Jul 2, 2006, 1:24:30 PM7/2/06
to
On Sun, Jul 02, 2006 at 09:00:07AM -0700, Randall Hyde wrote:
>
> ----- Original Message -----
> From: "Peter Jeremy" <peter...@optushome.com.au>
> To: "Randall Hyde" <rand...@earthlink.net>
> Sent: Saturday, July 01, 2006 11:10 PM
> Subject: Re: FLEX, was Re: Return value of malloc(0)
>
> The following compiles without error:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> %{
> typedef int YYSTYPE;
> typedef int YYLTYPE;
>
> /*
> ** Allow for a recursive version of Bison parser.
> */
> #undef YY_DECL
> #define YY_DECL int yylex( YYSTYPE *yylval, YYLTYPE *yylloc)
> %}
>
> %%
> . ECHO;
> %%
>
> I'll accept that you are having a problem getting HLA to build. No-one
> else is reporting problems. If you want assistance from other people
> then you are going to need to help by either providing a test case to
> reproduce the failure you are seeing or you are going to need to provide
> the pre-processed context where the error occurs.
>
> <<<<<<<<<<<<<<<<<<<<<<<
>
> Uh, is the above *not* the test case you are asking for?

Carefully, read what Peter wrote. The code above compiles
on FreeBSD. As does the 37 *.y files under /usr/src
(excluding the 30 *.y under /usr/src/contrib which may
or may not be used during compilation).

> Does this particular code snippet compile for you?

Yes, it does.

> If so, then I've definitely got some configuration problems
> with GCC on my machine.

What version of FreeBSD, and did you upgrade from a previous
version? Perhaps, you have some stale header files. Did
you install a different version of gcc under /usr/local
(or other directory) that appears early in your path?

--
Steve

Hans Petter Selasky

unread,
Jul 2, 2006, 3:08:53 PM7/2/06
to
On Sunday 02 July 2006 17:20, Ian Dowse wrote:
> In message <2006070214...@nowhere.iedowse.com>, Ian Dowse writes:
> >The trick is that if the 0x6000 bytes are contiguous in virtual
> >memory then they never span more than 6 pages so one iTD is enough.
>
> Sorry, I meant of course 6 page boundaries, which means no more
> than 7 pages. This is why the 7 physical address slots in the iTD
> is always enough for 8 x 3k transaction records if the 24k buffer
> is contiguous in virtual memory.
>

Ok. So the solution to my problem is to use scatter and gather. I will see
about updating my USB system to do it like that.

But there is one thing I do not understand yet. When you load a page that
physically resides above 4GB, because a computer has more than 4GB of memory,
how does "bus_dmamap_load()" move that page down below 4GB, so that the
32-bit USB host controllers can reach it?

--HPS

Ian Dowse

unread,
Jul 2, 2006, 5:36:25 PM7/2/06
to
In message <200607022106....@c2i.net>, Hans Petter Selasky writes:
>Ok. So the solution to my problem is to use scatter and gather. I will see
>about updating my USB system to do it like that.
>
>But there is one thing I do not understand yet. When you load a page that
>physically resides above 4GB, because a computer has more than 4GB of memory,
>how does "bus_dmamap_load()" move that page down below 4GB, so that the
>32-bit USB host controllers can reach it?

What should happen is that bus_dma allocates a bounce buffer and
performs copies as required from within the bus_dmamap_sync() calls.
This is something I haven't been able to verify yet with the USB
code though, so there could easily be bugs there.

BTW, as far as I know bus_dma is also missing support for multi-segment
allocations, so for example if you ask it to allocate 16k in at
most 4 segments below the 4GB mark, it will actually attempt a
physically contiguous allocation. If this was fixed it could be
used by usbd_alloc_buffer() to give directly usable buffers without
contiguous allocations.

Ian

0 new messages