On Wednesday, October 31, 2018 at 9:18:26 AM UTC, Ivan Godard wrote:
> On 10/31/2018 1:16 AM,
lk...@lkcl.net wrote:
> > On Wednesday, October 31, 2018 at 7:43:51 AM UTC, Ivan Godard wrote:
> >
> >>> stepping *outside* of the specification is, i feel, beyond the scope
> >>> of discussion.
> >>
> >> Then you have also stepped out of the domain of architecture, which must
> >> concern itself with the reality of physics and users, including failing
> >> and malicious ones. This is an architecture board, BTW
> >
> > ok, so allow me to clarify what i care about, so as to illustrate
> > precisely and exactly what i mean by "stepping outside of the
> > scope / specification"
> >
> > * there is a specification (finalised and implemented in the case
> > of RV-Base; proposed in the case of Multi-LR/SC)
> > * if there exists a design or other flaw that allows an arbitrary
> > user to execute malicious code or adversely affect anything other
> > than their own program, i care.
> > * if there exists a design or implementation flaw in the compiler
> > that allows the same, i care.
> > * if the USER does NOT CONFORM TO THE SPECIFICATION, stepping OUTSIDE
> > of that specification, and does NOT cause malicious damage, but
> > solely and exclusively causes their OWN (one) program to behave
> > in unexpected and unanticipated ways, i genuinely do not give a
> > rat's posterior.
>
> You show your naivete.
as someone who reverse-engineered NT Domains network traffic,
and was a member of ISS X-Force Security Research in 1999,
it is more likely that i display a lack of clarity in my ability
to communicate.
> Those of us who practice architecture know that
> defining what it does when it works is easy; the hard part is handling
> when it doesn't work, both ensuring that system stability is maintained
> and precluding deliberate exploit.
indeed: i apologise for assuming that it would be clear that such
mission-critical prerequisites would be clear and paramount
requirements.
> > so yes of course (and i apologise for assuming that it would go
> > without saying), if the user manages to corrupt other peoples' data,
> > that's a problem. but if the sole outcome of failing to comply
> > with the specification is that their own program crashes? Not Our Problem.
>
> While not absolutely essential (as so often demonstrated) there is also
> an aesthetic aspect: even if it can be used when used correctly, your
> design should not invite error, nor cause unnecessary grief among those
> who must support it.
indeed. this is one of the fascinating aspects of RISC, namely that
RISC systems are on the borderline of providing the absolute bare
minimum primitives that are an absolute pain in the ass to work with
without a compiler (and associated libc6 support) having them "baked in"
on the users' behalf as intrinsics and so on.
where a normal CISC system would have a single instruction, easy,
call it, done, the RISC paradigm minimises and dramatically simplifies
the hardware, and expects the compiler writers, library writers and
assembly-level programmers to obey the contract or suffer the
consequences.
does a multi-LR/SC go even further than that, and impose on the
user even beyond the usual contract that RISC-based systems expect?
honestly i have no idea.
> >>>> Consider what happens if you enter your loop with already existing
> >>>> dangling LRs from some unrelated code that branched out in the middle?
> >>>
> >>> that would be a bug in the program (or, much less likely, the compiler).
> >>
> >> And your specifications excludes bugs :-)
> >
> > nooo, i'm *asking* if there are any bugs in it. and also if there are
> > any real-world algorithms that would benefit from a multi-LR/SC
> > instruction.
> >
>
> There are no bugs in it, because you have defined them away.
great! i can get implementing, straight away! [i'm kidding...
simulations first... actually, corroborative constructive
input and review first... ]
> And
> essentially all programs with any parallelism at all, which could use a
> concurrent atomic update. So far you have not shown that multi-LR/SC (as
> you have described it) achieves concurrent atomic update.
*it* does not. a loop - a *sequence* of instructions - *uses* LR/SC to
test *whether* concurrent atomic update[s] have [restrospectively]
*been* achieved. this is completely different.
i've mentioned this subtle distinction about three times, now, and
each time you have not indicated that you understand this distinction,
and, unfortunately, continue to use the same [ambiguous, unclear]
language that insists, most unfortunately, that "i am saying that
multi-LR/SC *is* an atomic update", when it most certainly is not.
i have *never* said that "it" is.
> Oh, everyone would like multi-atomic operations. And they are easy to
> do: just set a Whopping Big Lock (TM), do everything, and reset it.
indeed: in a cross-over post, where bruce kindly clarified about the
RISC-V spec, he mentioned that it's possible for implementors to
put the reservation on a cache line, on a TLB, and i mention that
hypothetically the reservation could be on the entirety of memory
[the load-reservation equivalent of a WBL (TM)]
> But
> you asked whether your
... not mine. i typically do not use personal pronouns to discuss
computing concepts. the use of personal pronouns typically indicates
efforts to mentally "claim ownership" (or, worse, to _assign_ ownership),
leading to difficulties in discussions, division and conflict, and, in a
lot of cases, leads to accidental "personal insult" where constructive
criticism of the *idea* leads to misinterpretation as "criticism and
attack of the person".
after 20 years of interacting in free software forums i've generally
found that dropping personal pronouns entirely, "your project",
"your idea", and using "the project" or "the idea", and so on, are
much less likely to be misinterpreted.
> multi-LL/SC is a way to do multi-atomic,
again: i did not say that, at all.
> and so far the answer is no.
... because you believe that i said "multi-LL/SC is a multi-atomic operation"
when i said nothing of the sort.
> > so the question is: can a multi-LR/SC with multiple memory reservations
> > just like mitch's 66000 ISA work as well?
>
> Not without a transaction notion.
>
> Single LL/SC works because the hardware atomicity provides a
> transactional notion for single loads and stores, although you need to
> experience unaligned memory accesses to appreciate how much a pain that
> is.
i believe i may have a way to help you to understand multi-LR/SC,
based as it is on single-LR/SC.
(1) let us assume that there are multiple LRs which happen to
fall on the same cache line
(2) let us assume that the hardware implementation reserves
memory addresses on a cache-line level of granularity
i believe it is quite trivially shown through this example that
even if there are N bytes in the cache line and there are N
single-byte separate and distinct LR instructions, the loop
around multi-LR/SC can trivially be shown to be absolutely
identical to the single-LR/SC case.
now let us expand that to a TLB page. again, due to the
fact that all of the LRs all fall into the same TLB shows
that, indeed, the multi-LR/SC would be successful i.e.
directly equivalent to a single-LR/SC.
now we move to there being two TLB pages. now four. now
50. now the entire TLB. now the entirety of memory
(the WBL (TM)).
in each case, the multi-LR/SC primitives may be shown
to be... "moot".
the next phase from there is to go back and say, "ok,
let's get one of the LRs to reserve cache line 1,
while the second reserves cache line 2". and if another
hardware thread issues an overlapping LR **ON EITHER
CACHE LINE ONE ***OR*** CACHE LINE 2**, then **BOTH
ARE INVALIDATED**.
is it _really_ hard to see that the extension of
single-LR/SC to multi-LR/SC semantics is directly
equivalent, where one places a global reservtion on
a single area and the other *also* places a global
reservation... just on multiple cache lines (or other
suitable granular areas)?
context, for emphasis: LR/SC are *not* atomicity instructions.
they're primitives that are used to *DETECT* if a memory-based
load/store sequence was violated by another hardware thread / SMP core.
> >> Would the OS be able to use the
> >> proposal to implement context switch?
> >
> > very good question. in the question on the isa-dev mailing list
> > that led to this thread, it was hinted (without detail being
> > provided), that yes, multi-LR/SC can indeed be used to implement
> > a context-switch.
> >
> > another algorithm that was mentioned was this:
> >
> > "One example I know myself that would benefit quite a
> > lot from this is the popular Boehm-Demers-Weiser conservative
> > garbage collector"
> >
> > i do not have the expertise to evaluate these things, hence
> > the reason why i am asking for help.
>
> More circularity then:
not really.
> if a context switch invalidates the primitive,
> how can it implement the switch without invalidating itself?
this is invalid reasoning. once in the different context
(supervisor / machine mode), LR/SC usage in the higher context
level has absolutely nothing to do with the userspace context
from which it just switched.
under no circumstances would LR/SC be used or expected to be used
*over* i.e. *between* the userspace-supervisor or
supervisor-userspace boundary.
whilst in the supervisor space, LR/SC used there would work perfectly
(as long as no traps or context-switches occurred)
whilst in the user space, LR/SC used there would work perfectly
(as long as no traps or context-switches occurred)
> The problem here is that you are thinking too abstractly.
i'm just asking questions, and using this as an opportunity
to clarify.
> Use your own example:
> LL LL LL LL
> SC SC SC
> <interupt, or some other invalidator>
the 4 load-reservations would be invalidated
> SC
this SC would fail (returning an indication to the program that
failure had occurred), the internal state reset to "zero reservations",
and the loop that goes round the 4-LLs plus 4-SCs would be
re-executed.
on re-execution, the 4 LLs would reserve (anew) the same 4 memory
addresses.
at *some* point, there would be a repetition of the loop that
did not involve an interrupt (or other invalidator).
it might take a while, and that's fine.
> How do you ensure all-or-none semantics on the SCs, as required by the
> others who are using the data structure concurrently with you?
not with me: i make no personal claim of the data structures,
nor mentally associate myself with them. see above as to why.
the hardware guarantees the invalidation of all four reservations,
just as the hardware guarantees the invalidation of one (in
the single-LR/SC case).
at *no time* would there *ever* be an instance where *only*
one of the memory addresses would be invalidated whilst other
(reserved) addresses were not.
this is required by the hardware. any hardware implementation
that allowed even a single other LR/SC instruction (on *any* hardware
thread / SMP core) to be executed in between the time when the
"invalidating" conditions are detected and the time when the
invalidation of the reservations is completed would be a clear
violation of the specification (i.e. of the load-reservation
contract).
it's really quite simple, ivan - a lot simpler than i believe
you are imagining it is. the multiple reservations are
conceptually treated as a single reservation. and, by
logical inference, if single-LR/SC works, then multi-LR/SC
should as well... AS LONG AS the multiple reservations ARE
indeed treated conceptually the same as a single one.
> I think I'm done; perhaps over-done. Have fun :-)
appreciated your time: you've helped enormously to clarify the
context of the question, so that others do not have to do the
same.
l.