Does anyone have feedback, suggestions, or reasons for which this would
not be an unambiguous win? I think the current code is just horribly
crufty, to be honest. Testing a flag shouldn't require different code
for the first 32 flags than for the second 32.
-s
--
Copyright 2001, all wrongs reversed. Peter Seebach / se...@plethora.net
C/Unix wizard, Pro-commerce radical, Spam fighter. Boycott Spamazon!
Consulting & Computers: http://www.plethora.net/
> Now that 2.9.2 is out, I'm going to have another look at replacing
> flag1..flag6 with flags[6] and test/set macros smart enough to look for
> flags in the appropriate word.
What do you want to achieve by this? How is an array of 6 bitfields better
that 6 separate bitfields? You will only get a headache when rewriting the
declarations in init1.c.
Do it the NH way at least: pour all the flags into one giant bitfield (or
appropriate Angband equivalent) and let the compiler worry where to find
them.
> Does anyone have feedback, suggestions, or reasons for which this would
> not be an unambiguous win? I think the current code is just horribly
> crufty, to be honest. Testing a flag shouldn't require different code
> for the first 32 flags than for the second 32.
How come? It doesn't, unless you call referencing another structure member
"different code".
GSN
Testing flags in parallel is a good thing about the current system.
Look at the monster AI, which masks off the "useful" spells of a
particular type, for example.
That part of the game is designed for speed and portability, over
prettyness.
Steven
>What do you want to achieve by this?
Eliminating any hint of caring which word is being set. Imagine,
for a moment, the implications of
if (flag3 & RF4_FOO)
if RF4 hints at something being in flag 4.
>How is an array of 6 bitfields better
>that 6 separate bitfields?
You don't need to remember which one is which. You don't have all
this crufty code copying all six flags into dummy variables, testing
things against them...
Also, we can drop the "RF1" vs "RF3" thing. All flags can just be "RF" or
whatever.
>You will only get a headache when rewriting the
>declarations in init1.c.
I don't see why. That code looks easy.
>Do it the NH way at least: pour all the flags into one giant bitfield (or
>appropriate Angband equivalent) and let the compiler worry where to find
>them.
Well, that depends. For compatability, it seems as though the same 96
or 192 or however many bits should be in use, but that's no big deal.
>> Does anyone have feedback, suggestions, or reasons for which this would
>> not be an unambiguous win? I think the current code is just horribly
>> crufty, to be honest. Testing a flag shouldn't require different code
>> for the first 32 flags than for the second 32.
>How come? It doesn't, unless you call referencing another structure member
>"different code".
Well, it's a point of possible failure. It allows for the *possibility*
of doing it wrong, and there's no *reason* for it.
Basically, I am unable to find anything in the code which *benefits* from
these being separately declared objects.
Note also:
object1.c: if (f2 & (TR2_RES_NEXUS))
wizard1.c: if (flags3 & (RF3_RES_NEXU)) vp[vn++] = "nexus";
Why should we need to remember that the TR RES_NEXUS flag is in TR2, but
the RF RES_NEXUS flag is in RF3? Why not just
if (flag(o_ptr, TR_RES_NEXUS))
or
if (flag(r_ptr, RF_RES_NEXUS))
or something similar?
These are not six separate and unrelated things; this is a single set of
flags. It should be spelled that way.
I'm not sure I understand this explanation. What's it doing that it
couldn't do with a generic flag bit?
I do see some potential for things that you can currently do by or'ing
flags... but those are all annoyingly dependant on ordering and grouping,
and it'd be nice to solve the problem more generically.
>That part of the game is designed for speed and portability, over
>prettyness.
I see no portability issues here. This is easy to do with a macro that
will work everywhere. :)
I've seen this happen. Unlike most bugs - this one is fairly easy to
spot, and even easier to fix.
Steven
You can do things in parallel for speed. This is done quite alot in the
more heavily optimised parts of the code.
>
> I do see some potential for things that you can currently do by or'ing
> flags... but those are all annoyingly dependant on ordering and grouping,
> and it'd be nice to solve the problem more generically.
Annoyingly dependant on ordering and grouping? Since when do you need
change the position of a flag?
>
> >That part of the game is designed for speed and portability, over
> >prettyness.
>
> I see no portability issues here. This is easy to do with a macro that
> will work everywhere. :)
Bitflags tend to have portability problems due to endian differences.
So you do see the speed issue then? Alot of the things in the code are
optimised for speed and lack of memory useage. The current flags code
is no exception.
I'd agree with you if the "ugliness" of the code was a big problem...
but it simply isn't one.
Steven
I wouldn't worry about that, you can parallelize array ops.
>> I do see some potential for things that you can currently do by or'ing
>> flags... but those are all annoyingly dependant on ordering and grouping,
>> and it'd be nice to solve the problem more generically.
>Annoyingly dependant on ordering and grouping? Since when do you need
>change the position of a flag?
You rarely do, but why should you need to know?
>Bitflags tend to have portability problems due to endian differences.
Not at all, or at least, none they don't have right now.
>So you do see the speed issue then? Alot of the things in the code are
>optimised for speed and lack of memory useage. The current flags code
>is no exception.
I don't think there's a measurable speed issue on any modern compiler.
>I'd agree with you if the "ugliness" of the code was a big problem...
>but it simply isn't one.
I dunno. It looks ugly to me.
Really? Can you tell me how so I can speed up some other code?
>
> >> I do see some potential for things that you can currently do by or'ing
> >> flags... but those are all annoyingly dependant on ordering and grouping,
> >> and it'd be nice to solve the problem more generically.
>
> >Annoyingly dependant on ordering and grouping? Since when do you need
> >change the position of a flag?
>
> You rarely do, but why should you need to know?
Why not know? You can use the extra information to your advantage...
> >So you do see the speed issue then? Alot of the things in the code are
> >optimised for speed and lack of memory useage. The current flags code
> >is no exception.
>
> I don't think there's a measurable speed issue on any modern compiler.
Have a look at the AI code. It uses masking all over the place. The AI
is a major bottleneck of the game.
If you think you can make this faster - tell me how. It is very easy to
make it slower.
> >I'd agree with you if the "ugliness" of the code was a big problem...
> >but it simply isn't one.
>
> I dunno. It looks ugly to me.
Something to think about: I've seen a total of one instance where the
flags got mixed up, in some rather active development of the game. This
bug was fairly easy to find. Even though it may look bad to you - it
doesn't affect development in any real way. That is all that counts
IMHO.
Steven
> Eliminating any hint of caring which word is being set. Imagine,
> for a moment, the implications of
> if (flag3 & RF4_FOO)
> if RF4 hints at something being in flag 4.
This is certainly a problem, however, any experienced maintainer does the
agreement unconsciously :)
> >How is an array of 6 bitfields better
> >that 6 separate bitfields?
>
> You don't need to remember which one is which. You don't have all
> this crufty code copying all six flags into dummy variables, testing
> things against them...
The flags are never copied, nor are they in need to.
After your change you'll get r_ptr->flags[1] & RF1_FOO. You have gained
nothing and added an extra pair of brackets.
> Also, we can drop the "RF1" vs "RF3" thing. All flags can just be "RF" or
> whatever.
No way, how will you remember which one goes into which bitfield?
> >You will only get a headache when rewriting the
> >declarations in init1.c.
>
> I don't see why. That code looks easy.
Oh, you'll have to join the six separate arrays into one, won't you?
>
> Why should we need to remember that the TR RES_NEXUS flag is in TR2, but
> the RF RES_NEXUS flag is in RF3? Why not just
> if (flag(o_ptr, TR_RES_NEXUS))
> or
> if (flag(r_ptr, RF_RES_NEXUS))
> or something similar?
Looks good in theory, but how about the details?
GSN
Partability isn't lost if you change the six variables to an array of
six variables.
You acn still test them in parallel if you want. And you can use indexed
acces i.e. for iteraing all six fields if you need. Flexibility is the
gain you get.
Nevertheless I'd suggest a bitfield, like Gwidon did in the other branch
of the thread, this is the only proper solution.
> Steven
c.u.
Hajo
If the current system exists because of speed, why not just do:
#define P_RES_NETHER (flag2 & RF2_RES_NETHER)
Thus gaining the best of both worlds. (Easier to read code, and fast code.)
Ah well.
Thanks
Luc "Simplistic" French
> If the current system exists because of speed, why not just do:
> #define P_RES_NETHER (flag2 & RF2_RES_NETHER)
Of course you know that flag2 must be initialized prior to the expansion
of this macro, and there may be bugs galore if you forget about it. All in
all, it's how you should NOT write macros.
You could amend this by defining RES_NETHER(x) where x is the monster's
structure, but that sort of self-defeats itself.
GSN
True.
>> >How is an array of 6 bitfields better
>> >that 6 separate bitfields?
>>
>> You don't need to remember which one is which. You don't have all
>> this crufty code copying all six flags into dummy variables, testing
>> things against them...
>
>The flags are never copied, nor are they in need to.
object_flags()
Of course, object_flags() may be somewhat awkward to re-do in this
manner.
>After your change you'll get r_ptr->flags[1] & RF1_FOO. You have gained
>nothing and added an extra pair of brackets.
He's talking macros that hide the precise location of the flag from you.
>> Also, we can drop the "RF1" vs "RF3" thing. All flags can just be "RF" or
>> whatever.
>
>No way, how will you remember which one goes into which bitfield?
You don't care; the idea is that the compiler knows.
--
Julian Lighton jl...@fragment.com
"ISO 9001 certification is a license to kill!" -- MST3K
>> I wouldn't worry about that, you can parallelize array ops.
>Really? Can you tell me how so I can speed up some other code?
The compiler does it. Explicitly parallelizing code is just plain silly
on modern compilers.
>> You rarely do, but why should you need to know?
>Why not know? You can use the extra information to your advantage...
No, you can't. You can do things which, on a PDP-11 compiler, might
have produced a marginal benefit. It's not *useful* now.
>Have a look at the AI code. It uses masking all over the place. The AI
>is a major bottleneck of the game.
Okay. That's a good thing to look at. To what extent is it depending on
certain groups of bits all being in the same word?
>Something to think about: I've seen a total of one instance where the
>flags got mixed up, in some rather active development of the game. This
>bug was fairly easy to find. Even though it may look bad to you - it
>doesn't affect development in any real way. That is all that counts
>IMHO.
It affects development all the time; witness debates about whether a flag
bit is "free", or why something is in flags3 or flags4. All of that is
an artifact of treating a single set of 192 flags as if it were six totally
unrelated objects.
Now, to be fair, there *are* groupings... but they aren't precisely on the
word boundaries, and a whole word is rarely a meaningful grouping.
That said, there's a good case to be made for an organization such that,
in general, sets of related flags are all in one place. I'm not sure how
much it matters.
If we're going to argue performance, we should probably be profiling the code
and finding out how much time we currently spend masking things, in different
ways. My guess is that there are a lot of other things that would go first;
for instance, if I wanted Angband to be fast, I'd calculate chances
"correctly" up front, rather than re-rolling until I got a result I liked
every time. That's an operation which has no upper bound on how long it
takes. :)
>The flags are never copied, nor are they in need to.
I had misread the object_flags stuff, but there are places in wizard where
flags are copied into variables named "flags1".
>> Also, we can drop the "RF1" vs "RF3" thing. All flags can just be "RF" or
>> whatever.
>No way, how will you remember which one goes into which bitfield?
I don't understand the question. My proposal just has flags be numbered,
e.g., 0 through 191, and they get set and masked in the corresponding
words of an array.
>> I don't see why. That code looks easy.
>Oh, you'll have to join the six separate arrays into one, won't you?
If my initial skimming is correct, yeah, but I'd probably prefer to make
it a generated thing so the enum and the array of strings line up.
>> Why should we need to remember that the TR RES_NEXUS flag is in TR2, but
>> the RF RES_NEXUS flag is in RF3? Why not just
>> if (flag(o_ptr, TR_RES_NEXUS))
>> or
>> if (flag(r_ptr, RF_RES_NEXUS))
>> or something similar?
>Looks good in theory, but how about the details?
I am *SO* glad I just wrote the code for the spellbook hack rather than
asking about it. It's obvious to a casual observer that you can't do this
to spellbooks, either.
It'll be marginally more expensive when you actually need all the flags,
and somewhat cheaper in the most common cases, where you only use a few
of them.
>>After your change you'll get r_ptr->flags[1] & RF1_FOO. You have gained
>>nothing and added an extra pair of brackets.
>He's talking macros that hide the precise location of the flag from you.
Yes. The code would be
flag(r_ptr, RF_FOO);
and no "RF1" or "RF2".
#define flag(p, n) ((p)->flags[(n)/32] & (1 << ((n) % 32)))
For extra credit, expand it to return 0 for n out of range. (Since range
differs, you might want "rflag" and "oflag".
Now, I know *someone* will jump on me for the expensive math, but remember,
RF_FOO is a *CONSTANT*. I doubt Angband targets a single compiler that
won't fold this all up at compile time.
>You don't care; the idea is that the compiler knows.
Exactly! And that, my friends, is the perceived "advantage".
Thinking about this further, there's a number of flag-related ops that
might want special treatment... but I think it'd be a win to centralize
that treatment. object_flags{,_aux} is useful, but it's doing more work
than it probably needs to in a lot of cases.
Honestly, the most expensive part of this is still the distinction between
flags a thing has and flags the player knows about, but that's pretty easy
to handle. Indeed, we can do exactly what we do now, except that instead
of declaring three objects to hold flag values, we declare one.
"The compiler does it." doesn't help me. Which particular code
fragments does the compiler optimise to execute in that way? In my
experience, a human can quite often out-optimise a compiler by using
different data structures. This particular instance comes under that
case.
>
> >> You rarely do, but why should you need to know?
>
> >Why not know? You can use the extra information to your advantage...
>
> No, you can't. You can do things which, on a PDP-11 compiler, might
> have produced a marginal benefit. It's not *useful* now.
Really?
I'd like to see a compiler convert
for(i = 0; i < 96; i++)
{
if (mask[i] && flag_set(flag, i))
{
flag_on(flag, i);
}
else
{
flag_off(flag, i);
}
}
into
flag1 &= mask1;
flag2 &= mask2;
flag3 &= mask3;
In my experience, compilers aren't that smart.
>
> >Have a look at the AI code. It uses masking all over the place. The AI
> >is a major bottleneck of the game.
>
> Okay. That's a good thing to look at. To what extent is it depending on
> certain groups of bits all being in the same word?
It doesn't depend on that.
It depends on being fast.
>
> >Something to think about: I've seen a total of one instance where the
> >flags got mixed up, in some rather active development of the game. This
> >bug was fairly easy to find. Even though it may look bad to you - it
> >doesn't affect development in any real way. That is all that counts
> >IMHO.
>
> It affects development all the time; witness debates about whether a flag
> bit is "free", or why something is in flags3 or flags4.
Don't be silly. The "free flag" debates still need to look in defines.h
in any case. In most cases there is no reason "why" something is in a
particular flag - so there is no point arguing about that either.
> All of that is
> an artifact of treating a single set of 192 flags as if it were six totally
> unrelated objects.
Nope - it is an artifact of someone not understanding how the code
works.
>
> Now, to be fair, there *are* groupings... but they aren't precisely on the
> word boundaries, and a whole word is rarely a meaningful grouping.
>
> That said, there's a good case to be made for an organization such that,
> in general, sets of related flags are all in one place. I'm not sure how
> much it matters.
>
> If we're going to argue performance, we should probably be profiling the code
> and finding out how much time we currently spend masking things, in different
> ways.
I already have done this. There was a significant speedup when I
converted the code from something that uses masks, to something that
uses masks in parallel. You want to convert it to something that will
only be serial... that will be noticably slow.
> My guess is that there are a lot of other things that would go first;
> for instance, if I wanted Angband to be fast, I'd calculate chances
> "correctly" up front, rather than re-rolling until I got a result I liked
> every time. That's an operation which has no upper bound on how long it
> takes. :)
Are you talking about the autoroller? You might want to look at the
formula used to work out the acceptance or rejection of a roll.
Calculating the probability isn't trivial.
Steven
Though object_flags is hardly an efficiency bottleneck.
>>>After your change you'll get r_ptr->flags[1] & RF1_FOO. You have gained
>>>nothing and added an extra pair of brackets.
>
>>He's talking macros that hide the precise location of the flag from you.
>
>Yes. The code would be
> flag(r_ptr, RF_FOO);
>and no "RF1" or "RF2".
> #define flag(p, n) ((p)->flags[(n)/32] & (1 << ((n) % 32)))
>
>For extra credit, expand it to return 0 for n out of range.
I think an error would be preferred, if possible. (Though an error
that would work even if it's used with run-time-determined flags,
while not complicating things the rest of the time, would be beyond
me.)
>Now, I know *someone* will jump on me for the expensive math, but remember,
>RF_FOO is a *CONSTANT*. I doubt Angband targets a single compiler that
>won't fold this all up at compile time.
In the simple case, I suspect it won't be any performance difference.
(When you're checking multiple flags in the same word, I'd expect it
to depend on how smart the compiler is.)
The place I see it possibly hurting is choose_attack_spell(). (melee2.c)
While it's not the efficiency bottleneck of the view/lite code, or of
find_safety and find_hiding, it's called a lot, while the player is
waiting for a turn, and does huge amounts of bit-masking. It might
become noticable on slow machines, (like this Mac LC here) under
semi-degenerate conditions. (Lots of spellcasters.)
(Still, one could always get behind the protective macros, and do
old-fashioned bit-bashing on the array itself. I'm sure one could use
macros to create a constant array of bit-masks based on the RF_
values. If not, it could be initialized at game start-up.)
I have to wonder how easy it would be to do macro up more complicated
flag-mangling, though it's not like I've thought too hard about it.
Having actually gone through the source to add additional flag
variables, I can say that it's a royal pain.
(And, if we're at it anyway, why not split the monster flags into
flags and spell flags?)
--
Julian Lighton jl...@fragment.com
"Sleep my friend and you will see / That dream is my reality
They keep me locked up in this cage
Can't they see it's why my brain says rage?" -- Metallica
> #define flag(p, n) ((p)->flags[(n)/32] & (1 << ((n) % 32)))
Replace (1 << foo) with ((1U & 0xffffffffU) << foo).
'U' suffix to make it unsigned, &0xffffffffU to make the type big enough
for 32-bit math without using unnecessary 'long' (and slower) types.
> For extra credit, expand it to return 0 for n out of range. (Since range
> differs, you might want "rflag" and "oflag".
Or cruftify the new flag macros so that simple constants give
compile-time errors, and the flags used outside your macros do too (in
case the old bit-mask flags are used as arguments to the new macro, or
the new flags in code which expects bit-masks). E.g.:
#define clear_flag(var,flag) ((void) FLAG_OP(var, &=~ , flag))
#define FLAG_OP(var,op,flag) (var[FLAG_NUM flag / 32] op \
(1U & 0xffffffffU) << (FLAG_NUM flag % 32))
#define FLAG_NUM(num,dummy) num
#define RF_UNIQUE (0, Unique Monster)
...
Now 'flag & RF_UNIQUE' is definitely not going to compile...
--
Hallvard
> #define flag(p, n) ((p)->flags[(n)/32] & (1 << ((n) % 32)))
Replace (1 << foo) with ((1U & 0xffffffffU) << foo).
'U' suffix to make it unsigned, &0xffffffffU to make the type big enough
for 32-bit math without using unnecessary 'long' (and slower) types.
> For extra credit, expand it to return 0 for n out of range. (Since range
> differs, you might want "rflag" and "oflag".
Or cruftify the new flag macros so that simple constants give
compile-time errors, and the flags used outside your macros do too (in
case the old bit-mask flags are used as arguments to the new macro, or
the new flags in code which expects bit-masks). E.g.:
#define clear_flag(var,flag) ((void) FLAG_OP(var, &=~ , flag))
#define FLAG_OP(var,op,flag) (var[FLAG_NUM flag / 32] op \
((1U & 0xffffffffU) << (FLAG_NUM flag % 32)))
Yeah, but that's heavily pessimized code there. Somebody's got to
write code to let you mask the flag array, but there's no reason why
it can't be the person who wrote the rest of the flag-handling macros.
(I suppose that if you wanted to do something weird, you'd need to
crack open the internal data structures, but I don't think
bulk-masking is that weird.)
>> >Have a look at the AI code. It uses masking all over the place. The AI
>> >is a major bottleneck of the game.
>>
>> Okay. That's a good thing to look at. To what extent is it depending on
>> certain groups of bits all being in the same word?
>
>It doesn't depend on that.
>
>It depends on being fast.
How fast does the spell-choosing code need to be? I know I optimized
it when I was working on the AI stuff, but that was partly just to
make it cleaner. I don't remember it ranking that highly when I
profiled the old version. (But that was quite a while back, so I could
be forgetting.)
--
Julian Lighton jl...@fragment.com
"Don't try to reach me, 'cause I'd tear up your mind" -- Black Sabbath
> >It depends on being fast.
>
> How fast does the spell-choosing code need to be? I know I optimized
> it when I was working on the AI stuff, but that was partly just to
> make it cleaner. I don't remember it ranking that highly when I
> profiled the old version. (But that was quite a while back, so I could
> be forgetting.)
[Z] has pits/nests where all the monsters are of the same type. Try
walking next to a ring mimic pit (for example - a ring mimic is a
'typical' monster with lots of spells), with and without that speedup.
The old code was so slow that the game used to freeze in such situations
on my 486 75. The current code is much faster, so that other areas
become the bottleneck.
> --
> Julian Lighton jl...@fragment.com
> "Don't try to reach me, 'cause I'd tear up your mind" -- Black Sabbath
Steven
I know. Z pits (That's Zephyr hound, not Zangband) were what caused
me to optimize find_hiding().
>Try
>walking next to a ring mimic pit (for example - a ring mimic is a
>'typical' monster with lots of spells), with and without that speedup.
>The old code was so slow that the game used to freeze in such situations
>on my 486 75.
I don't want to think about it on this 68020 here, then.
> The current code is much faster, so that other areas
>become the bottleneck.
I think the main speedup comes from removing all the function calls
Keldon's code used. You've still got a loop through 96 spells. The
code you posted would add another one, true, but I don't know how much
it would hurt.
(Btw, having just looked at the code again for the first time in ages,
you can combine all three of those loops at the end into one. There's
no reason I can think of that the flags4 spells have to go in before
the flags5, etc. Don't know how I missed that the first time around.)
--
Julian Lighton jl...@fragment.com
"He got feet down below his knee
Hold you in his armchair you can feel his disease"
-- The Beatles
>>For extra credit, expand it to return 0 for n out of range.
>I think an error would be preferred, if possible. (Though an error
>that would work even if it's used with run-time-determined flags,
>while not complicating things the rest of the time, would be beyond
>me.)
Can't Be Done. Actually, it sorta can, but not portably. ;) It's
certainly possible to generate an error in some cases...
>The place I see it possibly hurting is choose_attack_spell(). (melee2.c)
Hmm.
>While it's not the efficiency bottleneck of the view/lite code, or of
>find_safety and find_hiding, it's called a lot, while the player is
>waiting for a turn, and does huge amounts of bit-masking. It might
>become noticable on slow machines, (like this Mac LC here) under
>semi-degenerate conditions. (Lots of spellcasters.)
True... although that might be easily solved by a little caching.
>(And, if we're at it anyway, why not split the monster flags into
>flags and spell flags?)
This would actually solve the problem *better*, because one of the
most common masking things would no longer be a problem.
If you find yourself doing a lot of masks involving bunches of bits, that's
a sign that you may be storing two distinct things in one variable, and that's
never a good sign. :)
My thinking is:
Phase 1: Just turn everything into arrays.
Phase 2: Start using test/set macros instead of bit twiddling.
Phase 3: Remove the hints as to which word a flag is in, because
they're irrelevant in most cases.
Phase 4: Split words up if certain subsets need to be masked *often*.
>Replace (1 << foo) with ((1U & 0xffffffffU) << foo).
Why not just 1UL? :)
>'U' suffix to make it unsigned, &0xffffffffU to make the type big enough
>for 32-bit math without using unnecessary 'long' (and slower) types.
That's a little crufty, but I guess it's probably correct.
>> For extra credit, expand it to return 0 for n out of range. (Since range
>> differs, you might want "rflag" and "oflag".
>Or cruftify the new flag macros so that simple constants give
>compile-time errors, and the flags used outside your macros do too (in
>case the old bit-mask flags are used as arguments to the new macro, or
>the new flags in code which expects bit-masks). E.g.:
>
>#define clear_flag(var,flag) ((void) FLAG_OP(var, &=~ , flag))
>#define FLAG_OP(var,op,flag) (var[FLAG_NUM flag / 32] op \
> ((1U & 0xffffffffU) << (FLAG_NUM flag % 32)))
>
>#define FLAG_NUM(num,dummy) num
>
>#define RF_UNIQUE (0, Unique Monster)
>...
>Now 'flag & RF_UNIQUE' is definitely not going to compile...
Hmm. That's outright Evil. Sorta clever, though.
>"The compiler does it." doesn't help me.
In practice, it certainly does.
You can't outperform a modern compiler *on multiple platforms*. Write the
code as legibly as possible, and don't try to tweak it unless you can show
that it's a bottleneck.
>Which particular code
>fragments does the compiler optimise to execute in that way?
That will depend on the compiler and the platform... but it does *now*,
and the *current* code is not necessarily run in parallel on any given
platform.
>In my
>experience, a human can quite often out-optimise a compiler by using
>different data structures. This particular instance comes under that
>case.
It may for a specific, single, processor/compiler combo. This is portable
code.
>I'd like to see a compiler convert
>for(i = 0; i < 96; i++)
>{
> if (mask[i] && flag_set(flag, i))
> {
> flag_on(flag, i);
> }
> else
> {
> flag_off(flag, i);
> }
>}
As long as the flag things are macros, this is pretty reasonable.
>In my experience, compilers aren't that smart.
They may not all be that smart... but keep in mind, the *flag* code is
certainly welcome to know about this kind of magic... It's just the
normal code which should be staying clear.
In other words, inside the flag code, if I want to do
flags_and(int *a, int *b, int MAX_FLAG)
I can do it, and it'll parallelize well, and modern compilers will
even inline it.
>> Okay. That's a good thing to look at. To what extent is it depending on
>> certain groups of bits all being in the same word?
>It doesn't depend on that.
>It depends on being fast.
I don't think discussion on this point is useful without profiling. The
lore is full of examples of very careful optimization effort which was
totally misplaced.
>Don't be silly. The "free flag" debates still need to look in defines.h
>in any case. In most cases there is no reason "why" something is in a
>particular flag - so there is no point arguing about that either.
There's good reasons to batch them, though, because you could potentially
replace a check of multiple separate words with a single op.
>I already have done this. There was a significant speedup when I
>converted the code from something that uses masks, to something that
>uses masks in parallel. You want to convert it to something that will
>only be serial...
Not really. If there's a regular need for masking one thing with another,
great, let's add a flag_and function to and sets of flags.
>> My guess is that there are a lot of other things that would go first;
>> for instance, if I wanted Angband to be fast, I'd calculate chances
>> "correctly" up front, rather than re-rolling until I got a result I liked
>> every time. That's an operation which has no upper bound on how long it
>> takes. :)
>Are you talking about the autoroller?
And object generation, and monster generation, and level generation...
the game is *FULL* of things which just try things at random, with equal
probability, and then reject results to create higher or lower probabilities.
>You might want to look at the
>formula used to work out the acceptance or rejection of a roll.
>Calculating the probability isn't trivial.
Not at all... but if we did that for a bunch of the common ops (item
generation, monster generation), we could probably improve performance
noticably. It's doable.
Situations where it would help are not really common enough to be worth
the effort.
>>(And, if we're at it anyway, why not split the monster flags into
>>flags and spell flags?)
>
>This would actually solve the problem *better*, because one of the
>most common masking things would no longer be a problem.
No, it wouldn't help. Monster spells are in RF[4-6], while other
flags are in RF[1-3]. I was just thinking that, if we're doing the
work anyway, it's be nice to have r_ptr->flags and r_ptr->spells,
instead of just r_ptr->flags.
The bit-masking is being done to remove various sets of spells from
consideration, and to determine of the monster has any spells in the
sets.
--
Julian Lighton jl...@fragment.com
"In the dark I have heard a calling / Broken wings that beg to fly
Is there time or is it too late / Is the cut flower no longer alive"
-- Savatage
Still, it is possible for a human to do better, portable, optimization
than a compiler can. This is usually by throwing out bad
algorithms/data structures, and replacing them with better ones. (The
AI code of the past is a good example of this.)
The spell choice code would be slower with the code below instead of
the current stuff. How much slower, I can't say. (You'd probably
have to loop over the spell list twice, instead of once, but the
amount of work done within each iteration is fairly small.)
>>I'd like to see a compiler convert
>>for(i = 0; i < 96; i++)
>>{
>> if (mask[i] && flag_set(flag, i))
>> {
>> flag_on(flag, i);
>> }
>> else
>> {
>> flag_off(flag, i);
>> }
>>}
>
>As long as the flag things are macros, this is pretty reasonable.
Though it has to be done for seven different classes of spells. (Still
probably fairly negligable.)
>>In my experience, compilers aren't that smart.
>
>They may not all be that smart... but keep in mind, the *flag* code is
>certainly welcome to know about this kind of magic... It's just the
>normal code which should be staying clear.
Since the flag code can do the magic, behind the scene, the code will
probably end up doing about the same thing.
>>> Okay. That's a good thing to look at. To what extent is it depending on
>>> certain groups of bits all being in the same word?
>
>>It doesn't depend on that.
>
>>It depends on being fast.
>
>I don't think discussion on this point is useful without profiling. The
>lore is full of examples of very careful optimization effort which was
>totally misplaced.
The code in question is, I think, known to be a possible bottleneck,
even if only when you have a buttload of spellcasters in the vicinity.
>>> My guess is that there are a lot of other things that would go first;
>>> for instance, if I wanted Angband to be fast, I'd calculate chances
>>> "correctly" up front, rather than re-rolling until I got a result I liked
>>> every time. That's an operation which has no upper bound on how long it
>>> takes. :)
>
>>Are you talking about the autoroller?
>
>And object generation, and monster generation, and level generation...
>the game is *FULL* of things which just try things at random, with equal
>probability, and then reject results to create higher or lower probabilities.
Those are things that it's likely a waste of time optimizing.
Compared to the monster behavior code, they're never called at all.
(Of course, the light and view code is the real bottleneck, but Ben
seems to have done a pretty good optimization job there.)
--
Julian Lighton jl...@fragment.com
"I try never to get involved in my own life - too much trouble."
-- Babylon 5
Wrong. "The best optimiser is between you ears" is a good piece of
advice. You can outperform any compiler by changing to a different
algorithm.
>
> >Which particular code
> >fragments does the compiler optimise to execute in that way?
>
> That will depend on the compiler and the platform... but it does *now*,
> and the *current* code is not necessarily run in parallel on any given
> platform.
Of course not. However, I can guarantee that it will be faster than the
other method if you use any normal sort of processor. You just need to
know what operations are usually fast, and which ones usually are not.
> >In my
> >experience, a human can quite often out-optimise a compiler by using
> >different data structures. This particular instance comes under that
> >case.
>
> It may for a specific, single, processor/compiler combo. This is portable
> code.
Nope, I think you've missed the point. Changing the algorithm to
something more efficient will beat the compiler hands down. You know
what the code *does*, rather than what it *is*. The compiler doesn't
know this. You can use the extra information to do things the compiler
cannot.
>
> >I'd like to see a compiler convert
> >for(i = 0; i < 96; i++)
> >{
> > if (mask[i] && flag_set(flag, i))
> > {
> > flag_on(flag, i);
> > }
> > else
> > {
> > flag_off(flag, i);
> > }
> >}
>
> As long as the flag things are macros, this is pretty reasonable.
Is a factor of ten in speed reasonable?
> >In my experience, compilers aren't that smart.
>
> They may not all be that smart... but keep in mind, the *flag* code is
> certainly welcome to know about this kind of magic... It's just the
> normal code which should be staying clear.
Of course it is... but then you've abstracted all the "magic" one step
away - something that is much more likely to create bugs. It really
looks like this "fix" actually makes the code more problematic than
before. Wasn't that something you were hoping to change?
>
> In other words, inside the flag code, if I want to do
> flags_and(int *a, int *b, int MAX_FLAG)
> I can do it, and it'll parallelize well, and modern compilers will
> even inline it.
Yep - the problem is in defining what *b is... things aren't quite as
easy as you think. I can't see a compiler optimising an array of bools
into a flag of bits anytime soon.
&g