I sometimes see that the USB driver is unable to allocate contiguous memory
for itself. For example I noticed that FreeBSD was unable to allocate
350kbytes of contiguous memory after that I had run "konqueror", the KDE web
browser and various other memory consuming applications for a while.
I am thinking about pre-allocating some memory for USB, but isn't that the job
of bus-dma, which the USB system uses for memory allocation?
The machine in question is running FreeBSD 7-current from April.
Any comments?
Thanks,
--HPS
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"
-Kip
On Fri, 2006-Jun-30 20:29:28 +0200, Hans Petter Selasky wrote:
>I sometimes see that the USB driver is unable to allocate contiguous memor=
y=20
>for itself. For example I noticed that FreeBSD was unable to allocate=20
>350kbytes of contiguous memory after that I had run "konqueror", the KDE w=
eb=20
>browser and various other memory consuming applications for a while.
I reported this about 18 months ago and I'm fairly certain you even
contributed to the thread at the time.
>Any comments?
The latest concensus seems to be that the USB system should make use of
the scatter-gather facilities in the hardware to avoid the need to
allocate large contiguous memory chunks. iedowse@ had mostly finished
implementing this in mid May.
--=20
Peter Jeremy
--Yia77v5a8fyVHJSl
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)
iD8DBQFEpYpZ/opHv/APuIcRAjxVAKCTqQ0w/BhWCkGJ2Xg5fZE/MyALswCcDxw2
9FmvK4D82NGvlcZj1JE9+ng=
=DFrr
-----END PGP SIGNATURE-----
--Yia77v5a8fyVHJSl--
Yes, but scatter and gather will add extra complexity to the driver, and maybe
an extra memory copy in most cases. The idea is to allocate less than or
equal to a page of memory, and then avoid the problem?
The most important thing is to keep memory allocations of constant size. For
example under my USB system, all memory is allocated at attach. There is no
longer allocation and freeing of memory during usage, with a few exceptions.
I was thinking I could pre-allocate 2-4MB for the USB system, then make a
list of freed memory blocks, and then search this list first, before
allocating new memory.
Depending on how many different kinds of equipment one plugs, this should work
fine.
What do you think?
Because most USB device drivers depend on a contigous buffer. In your case one
would have to implement something similar to "uiomove()" for copying, that
will do the scatter and gathering ?
> >The most important thing is to keep memory allocations of constant size.
> > For example under my USB system, all memory is allocated at attach. There
> > is no longer allocation and freeing of memory during usage, with a few
> > exceptions.
>
> Are you talking about USB device attach - which could happen at any
> time whilst the system is running - or USB bus attach?
I am talking about the USB device attach.
>
> >I was thinking I could pre-allocate 2-4MB for the USB system, then make a
> >list of freed memory blocks, and then search this list first, before
> >allocating new memory.
>
> That strikes me as wasteful. Currently, umass needs something like
> 64K (or maybe 128K) and ulpt needs a few KB (they are the only USB
> devices I normally use). I would be surprised if the peak USB RAM
> requirement was more than 256K for most people.
Yes, but don't forget high-speed USB transfers. They require larger buffers.
For example 1024 bytes for ULPT is too little. The interrupt rate will be so
high, that it is unrealistic to transfer 20MB/s using 1024 byte interrupts.
My rewritten ULPT now uses "2*(1<<17)" buffers.
> "vmstat -m" on my
> systems shows that the current biggest consumer is devbuf - with
> 3-4MB. Most other consumers are orders of magnitude smaller than this
> (though my video capture card grabs about 4MB RAM when it's in use).
On Sat, 2006-Jul-01 10:44:54 +0200, Hans Petter Selasky wrote:
>Yes, but scatter and gather will add extra complexity to the driver, and m=
aybe
>an extra memory copy in most cases. The idea is to allocate less than or=
=20
>equal to a page of memory, and then avoid the problem?
The idea is to allocate multiple disjoint pages of physical memory.
The USB hardware is capable of supporting this and page translation
hides the lack of contiguousness from userland. I don't see why
additional memory copies would be required.
>The most important thing is to keep memory allocations of constant size. F=
or=20
>example under my USB system, all memory is allocated at attach. There is n=
o=20
>longer allocation and freeing of memory during usage, with a few exception=
s.=20
Are you talking about USB device attach - which could happen at any
time whilst the system is running - or USB bus attach?
>I was thinking I could pre-allocate 2-4MB for the USB system, then make a=
=20
>list of freed memory blocks, and then search this list first, before=20
>allocating new memory.
That strikes me as wasteful. Currently, umass needs something like
64K (or maybe 128K) and ulpt needs a few KB (they are the only USB
devices I normally use). I would be surprised if the peak USB RAM
requirement was more than 256K for most people. "vmstat -m" on my
systems shows that the current biggest consumer is devbuf - with
3-4MB. Most other consumers are orders of magnitude smaller than this
(though my video capture card grabs about 4MB RAM when it's in use).
--=20
Peter Jeremy
--uxuisgdDHaNETlh8
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (FreeBSD)
iD8DBQFEpmvj/opHv/APuIcRAryCAJ9Y/Xk3IMRd1KWxs1/hJivW5Cv54wCgn79C
vGZ0NLw5uDP+8ZLkBaNWZ5k=
=0mb4
-----END PGP SIGNATURE-----
--uxuisgdDHaNETlh8--
I believe the USB drivers in -current grew scatter/gather support
about a month ago. See the commit message below. Is this likely to
help?
David.
Use the limited scatter-gather capabilities of ehci, ohci and uhci
host controllers to avoid the need to allocate any multi-page
physically contiguous memory blocks. This makes it possible to use
USB devices reliably on low-memory systems or when memory is too
fragmented for contiguous allocations to succeed.
The USB subsystem now uses bus_dmamap_load() directly on the buffers
supplied by USB peripheral drivers, so this also avoids having to
copy data back and forth before and after transfers. The ehci and
ohci controllers support scatter/gather as long as the buffer is
contiguous in the virtual address space. For uhci the hardware
cannot handle a physical address discontinuity within a USB packet,
so it is necessary to copy small memory fragments at times.
I see.
But there is one problem, that has been overlooked, and that is High speed
isochronous transfers, which are not supported by the existing USB system. I
don't think that the EHCI specification was designed for scatter and gather,
when you consider this:
8 transfers of 0xC00 bytes has to fit on 7 pages. If this is going to work,
and I am right, one page has to contain two transfers. (see page 43 of
ehci-r10.pdf)
Currently the PAGE_SIZE is 0x1000, right? As you can see "(0xC00 * 2) >
0x1000".
So is it possible to change the PAGE_SIZE ?
--HPS
I haven't looked into the details, but the text in section 3.3.3
seems to suggest that EHCI is designed to not require physically
contiguous allocations here either, so the same approach of using
bus_dmamap_load() should work:
This data structure requires the associated data buffer to be
contiguous (relative to virtual memory), but allows the physical
memory pages to be non-contiguous. Seven page pointers are provided
to support the expression of 8 isochronous transfers. The seven
pointers allow for 3 (transactions) * 1024 (maximum packet size)
* 8 (transaction records) (24576 bytes) to be moved with this
data structure, regardless of the alignment offset of the first
page.
Ian
3 * 1024 bytes = 0xC00 bytes
8 * 0xC00 = 0x6000 bytes maximum
According to this you need "6" "EHCI pages", because "6 * 0x1000 = 0x6000".
The seventh "EHCI page" is just there to allow one to start at any page
offset. There is no eight "EHCI page".
The only solution I see, is to have a double layer ITD. The first layer have
the 4 first transfers activated, and the second layer have the 4 last
transfers activated.
A little more complicated, but not impossible.
--HPS
The trick is that if the 0x6000 bytes are contiguous in virtual
memory then they never span more than 6 pages so one iTD is enough.
i.e. you can just do malloc(0x6000) and you don't need multi-page
physically contiguous buffers or extra memory-memory copies regardless
of how the virtual buffer maps to physical pages. This seems to be
the general extent of scatter-gather support offered by the various
USB host controllers (modulo various caveats such as assuming pages
are >= 4k, handling physical addresses > 4GB on non-IOMMU hardware
and UHCI's lack of support for mid-packet non-contiguous page
boundaries).
Ian
Sorry, I meant of course 6 page boundaries, which means no more
than 7 pages. This is why the 7 physical address slots in the iTD
is always enough for 8 x 3k transaction records if the 24k buffer
is contiguous in virtual memory.
----- Original Message -----
From: "Peter Jeremy" <peter...@optushome.com.au>
To: "Randall Hyde" <rand...@earthlink.net>
Sent: Saturday, July 01, 2006 11:10 PM
Subject: Re: FLEX, was Re: Return value of malloc(0)
The following compiles without error:
%{
typedef int YYSTYPE;
typedef int YYLTYPE;
/*
** Allow for a recursive version of Bison parser.
*/
#undef YY_DECL
#define YY_DECL int yylex( YYSTYPE *yylval, YYLTYPE *yylloc)
%}
%%
. ECHO;
%%
I'll accept that you are having a problem getting HLA to build. No-one
else is reporting problems. If you want assistance from other people
then you are going to need to help by either providing a test case to
reproduce the failure you are seeing or you are going to need to provide
the pre-processed context where the error occurs.
<<<<<<<<<<<<<<<<<<<<<<<
Uh, is the above *not* the test case you are asking for?
Does this particular code snippet compile for you? If so, then I've
definitely got some configuration problems with GCC on my machine.
BTW, if I haven't made myself clear, the problem with the code above is a
GCC error, not a FLEX error. That is, FLEX happily produces the yy.lex.c
output file, GCC under FreeBSD rejects the source code it produces. Yet that
*same exact* source code compiles just fine under Linux with GCC, Windows
with Borland C++, and Windows with VC++.
Creating a recursive lexer is a documented feature of FLEX. Indeed, I used
the example present in the FLEX documentation to pull this off. And I've
been using this code for about eight years on Windows and about four years
on Linux. Perhaps FreeBSD types have never tried to create a recursive
parser/lexer and haven't had to deal with this issue, but as you can see
from the code snippet above, this really has nothing to do with HLA, per se,
other than the fact that it uses a recursive lexer/parser.
And the fact remains, the code compiles just fine under Windows and Linux.
The compilation uses a documented feature of FLEX. There *is* a problem with
the implementation of GCC under FreeBSD (or header files under FreeBSD).
It's not specific to HLA. And I can't imagine how to give you a better test
case than the code above.
Cheers,
Randy Hyde
> %{
> typedef int YYSTYPE;
> typedef int YYLTYPE;
>
> /*
> ** Allow for a recursive version of Bison parser.
> */
> #undef YY_DECL
> #define YY_DECL int yylex( YYSTYPE *yylval, YYLTYPE *yylloc)
> %}
>
> %%
> . ECHO;
> %%
>
> I'll accept that you are having a problem getting HLA to build. No-one
> else is reporting problems. If you want assistance from other people
> then you are going to need to help by either providing a test case to
> reproduce the failure you are seeing or you are going to need to provide
> the pre-processed context where the error occurs.
>
> <<<<<<<<<<<<<<<<<<<<<<<
>
> Uh, is the above *not* the test case you are asking for?
Carefully, read what Peter wrote. The code above compiles
on FreeBSD. As does the 37 *.y files under /usr/src
(excluding the 30 *.y under /usr/src/contrib which may
or may not be used during compilation).
> Does this particular code snippet compile for you?
Yes, it does.
> If so, then I've definitely got some configuration problems
> with GCC on my machine.
What version of FreeBSD, and did you upgrade from a previous
version? Perhaps, you have some stale header files. Did
you install a different version of gcc under /usr/local
(or other directory) that appears early in your path?
--
Steve
Ok. So the solution to my problem is to use scatter and gather. I will see
about updating my USB system to do it like that.
But there is one thing I do not understand yet. When you load a page that
physically resides above 4GB, because a computer has more than 4GB of memory,
how does "bus_dmamap_load()" move that page down below 4GB, so that the
32-bit USB host controllers can reach it?
--HPS
What should happen is that bus_dma allocates a bounce buffer and
performs copies as required from within the bus_dmamap_sync() calls.
This is something I haven't been able to verify yet with the USB
code though, so there could easily be bugs there.
BTW, as far as I know bus_dma is also missing support for multi-segment
allocations, so for example if you ask it to allocate 16k in at
most 4 segments below the 4GB mark, it will actually attempt a
physically contiguous allocation. If this was fixed it could be
used by usbd_alloc_buffer() to give directly usable buffers without
contiguous allocations.
Ian