Any versions of memset() for words instead of bytes?

MikeC

unread,

Mar 30, 2000, 3:00:00 AM3/30/00

to

Is there any version of memset() which would work on words instead of
bytes? For Visual C++ on Intel processors.

dot r dot wood @worldnet.att.net.null Charles Wood

unread,

Mar 30, 2000, 3:00:00 AM3/30/00

to

Never heard of one, but you could always make one real easy.

Just be carefull of starting on odd bytes. (Slows my comp down real fast).

--
Charles Wood
REMOVEME...@worldnet.att.net

MikeC wrote in message <38e3b933...@news.cig.mot.com>...

Michael Kochetkov

unread,

Mar 31, 2000, 3:00:00 AM3/31/00

to

Try to have a look in a debugger as MS memset works. You should get an
answer.

With regards,
Michael Kochetkov.

MikeC <user...@remove.hotmail.com> wrote in message
news:38e3b933...@news.cig.mot.com...

dot r dot wood @worldnet.att.net.null Charles Wood

unread,

Mar 31, 2000, 3:00:00 AM3/31/00

to

While I was recently told that:

start:
do what ever here
loop start:

Was slower than:

start:
cmp cx,0
jg start

Where as my compiler generates:

repne movsd

For pentium class code..

Odd.

--
Charles Wood
REMOVEME...@worldnet.att.net

Nightcap wrote in message
<6bSE4.1677$is2.1...@bgtnsc05-news.ops.worldnet.att.net>...
>X-No-Archive: Yes

>MikeC wrote:
>> Is there any version of memset() which would work on words instead of
>> bytes? For Visual C++ on Intel processors.

>I doubt that, as such a function couldn't be generic. You however can make
one yourself, and not
>only a word-wide but a two-word one. Well, depending on your processor, of
course, but with i86s you
>can plop a double-word at a time, so depending on what exactly you need and
what your data are you
>might speed up this operation significantly. Also, keep in mind, if you're
on a Pentium type of
>processor (what else is there these days?) "rep" is not faster and _is_
slower than a simple loop.

Stephen Howe

unread,

Mar 31, 2000, 3:00:00 AM3/31/00

to

Charles Wood <c dot r dot wood @ worldnet.att.net.null> wrote in message
news:iiSE4.1521$TM.1...@bgtnsc06-news.ops.worldnet.att.net...

> While I was recently told that:
>
> start:
> do what ever here
> loop start:
>
> Was slower than:
>
> start:
> cmp cx,0
> jg start
>
> Where as my compiler generates:
>
> repne movsd
>
> For pentium class code..
>
> Odd.

I know what compiler you use.

I know this is now going off-topic (so last response ;-) for both newsgroups
but I could not resist.

The advice is too generic. It does not cover Level 1 or Level 2 caches nor
586 class Pentiums or 686 class Pentiums. Sometimes repne movsd is faster
than than an unrolled loop and sometimes it is not depending on situation.
Then again if the source or destination is video, either will do as the
video will saturate it.

What _does_ speed things up is to use 8-byte moves. So using MOVQ if MMX is
available or FILD if coprocessor is available is faster.

Stephen Howe

Paul Black

unread,

Mar 31, 2000, 3:00:00 AM3/31/00

to

"Charles Wood" <c dot r dot wood @ worldnet.att.net.null> wrote:
>
> Never heard of one, but you could always make one real easy.
>
> Just be carefull of starting on odd bytes. (Slows my comp down real fast).

memset should be as efficient as you can make it (assuming a good
quality compiler). If you start on an unsuitable boundary it should
clear memory until it gets to a suitable boundary then use a faster
(e.g. word/quadlet based) method.

Paul

Tom

unread,

Mar 31, 2000, 3:00:00 AM3/31/00

to

On Thu, 30 Mar 2000 20:31:47 GMT, user...@remove.hotmail.com (MikeC)
wrote:

>
>Is there any version of memset() which would work on words instead of
>bytes? For Visual C++ on Intel processors.
>

A standard way would be:

#include <algorithm>

int main()
{
const int size = 512;
//whatever your WORD type is:
unsigned short* ptr = new unsigned short[size];
//memset line
std::fill(ptr, ptr + size, 0);
return 0;
}

I have no idea how fast this would be - I suppose you could specialize
std::fill for various types like unsigned short, uint, etc, writing it
with the relevant memset. This might actually degrade performance
however.

Something like:

typedef unsigned short ushort;

template<> inline
void fill<ushort*, ushort>(ushort* _F, ushort* _L, ushort _X)
{
memset(static_cast<void*>(_F), _X, sizeof(ushort) * (_L - _F));
}

Tom

James Curran

unread,

Mar 31, 2000, 3:00:00 AM3/31/00

to

That won't work.

What everyone (except perhaps the original poster) seem to be forgetting
is that memset can be used for things *beside* zeroing out memory. You can
actually use a real value in it. i.e..:

char a[100];
int b[100];

memset(a,0,sizeof(a));
memset(b,0,sizeof(b));

perform roughly the same effect on both a & b. However,

memset(a,1,sizeof(a)); // fills every element of a with 0x01
memset(b,1,sizeof(b)); // fills every element of b with 0x01010101

Now, I assume the original question was not, how to use x86 assembler to
speed this up, but what command to use to fill every element of b with
0x00000001, and the answer to that is:

#include <algorithm>
std::fill(b, b+sizeof(b), 1); // or
std::fill_n(b, sizeof(b), 1);
--
Truth,
James Curran
http://www.NJTheater.com
http://www.NJTheater.com/JamesCurran

"Tom" <rhino...@hotmail.com> wrote in message
news:38e4b649...@news.demon.co.uk...

Thomas_...@tecmar.com

unread,

Mar 31, 2000, 3:00:00 AM3/31/00

to

In article <iiSE4.1521$TM.1...@bgtnsc06-news.ops.worldnet.att.net>,

"Charles Wood" <see@message> wrote:
> While I was recently told that:
>
> start:
> do what ever here
> loop start:
>
> Was slower than:
>
> start:
> cmp cx,0
> jg start
>
> Where as my compiler generates:
>
> repne movsd
>
> For pentium class code..
>
> Odd.
>

> --
> Charles Wood
> REMOVEME...@worldnet.att.net
>
> Nightcap wrote in message
> <6bSE4.1677$is2.1...@bgtnsc05-news.ops.worldnet.att.net>...
> >X-No-Archive: Yes

> >MikeC wrote:
> >> Is there any version of memset() which would work on words instead
of
> >> bytes? For Visual C++ on Intel processors.

> >I doubt that, as such a function couldn't be generic. You however can
make
> one yourself, and not
> >only a word-wide but a two-word one. Well, depending on your
processor, of
> course, but with i86s you
> >can plop a double-word at a time, so depending on what exactly you
need and
> what your data are you
> >might speed up this operation significantly. Also, keep in mind, if
you're
> on a Pentium type of
> >processor (what else is there these days?) "rep" is not faster and
_is_
> slower than a simple loop.

I have calculated that many copies inside the loop is faster:
; Assume R0 is source, R1 destination, R2 counter, R3 data transfer:
loop:
ldrh r3, [r1], #2
strh r3, [r0, #2]
ldrh r3, [r1], #2
strh r3, [r0, #2]
ldrh r3, [r1], #2
strh r3, [r0, #2]
ldrh r3, [r1], #2
strh r3, [r0, #2]
subs r3, r3, #4
jgt loop

{Arm7 thumb platform}

Sent via Deja.com http://www.deja.com/
Before you buy.

Guillaume Landru

unread,

Apr 3, 2000, 3:00:00 AM4/3/00

to

Sorry to arrive a bit late in this thread but did anybody try the following?

#pragma intrinsic(memset)

Depending on the count entry, the compilator choose the better one.

It works also for others as: memcpy,str*,...

Cheers,
Guillaume.

MikeC wrote in message <38e3b933...@news.cig.mot.com>...
>