On 10/14/2023 8:48 PM, MitchAlsup wrote:
> On Saturday, October 14, 2023 at 1:17:44 PM UTC-5, BGB wrote:
>> On 10/14/2023 11:33 AM, Michael S wrote:
>>> On Friday, October 13, 2023 at 9:13:04 PM UTC+3, John Levine wrote:
>>>> According to John Dallman <
j...@cix.co.uk>:
>>>>> Overall, Oracle's vertical integration plan for making their database
>>>>> work better on their own hardware was not a success. It turned out that
>>>>> Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
>>>>> and there were no easy gains there.
>>>> What could they do that would make it better than an ARM or RISC V chip
>>>> running at the same speed? Transactional memory?
>>>>
>>>
>>> It seems, by time of acquisition (late 2009) it was already know internally
>>> that Rock (SPARC processor with TM support) is doomed. Although it was not
>>> enounced publicly until next year.
>>>
>> I also personally have difficulty imagining what exactly one could do in
>> a CPU that would give it much advantage for database tasks that would
>> not otherwise make sense with a general purpose CPU.
> <
> Bigger caches and a slower clock rate help data base but do not help general
> purpose. More slower CPUs and thinner cache hierarchy helps. Less prediction
> helps too.
> <
> So, instead of 64KB L1s and 1MB L2s and 8MB L3s:: do a 256KB L1 and
> 8MB L2 with no L3s.
> <
> It does not mater how fast the clock rate is if you are waiting for memory to
> respond.
OK.
In my own use-cases, 16/32/64K seemed near optimal, but I guess if the
working sets are larger, this could favor bigger L1s.
It had seemed like, at 32 or 64K, one tends towards a 98 or 99% hit-rate.
In the past, my attempts to increase MHz tended to come at the cost of
significant reduction to L1 cache size (mostly, the "magic" being to
make it small enough to fit effectively into LUTRAMs).
The disadvantage being that this offsets any gains from more MHz in
terms of more cycles spent in cache misses.
I now have it "mostly working" (with the bigger caches), but still need
to determine if "more MHz" is enough to offset the significant increase
in clock cycles spent on interlock / register-RAW stalls. Still falls
short of consistently passing timing though (and also currently lacks
the low-precision FP-SIMD unit).
As-is:
Losses (for now):
Disabled dedicated FP-SIMD unit (*2);
Many more ops have higher latency;
Dropped RISC-V support and 96-bit address space support for now;
No more LDTEX instruction for now;
Back to no FDIV or FSQRT ops;
...
Partial:
Compare-Branch reduced to compare-with-zero only;
For now, have disabled 128-bit ALU operations.
Running the ringbus and L2 cache at higher clock-speeds does seem to
have increased memory bandwidth (including for accessing external RAM).
*2: Pulling off single-precision operations in a 3 cycle latency at
75MHz seemed to still be asking a bit too much.
But, on the other side, dropping MHz to allow bigger L1s could also make
sense, if the code has a naturally higher L1 miss rate.
In my own use cases, 33 or 25 MHz wouldn't allow much gain over 50MHz,
apart from the ability to possibly pull off fully pipelined
double-precision FPU operations or similar.
>>
>> I would expect database workloads to be primarily dominated by memory
>> and IO.
> <
> More so memory and less so I/O once they got main memories in the TB region.
> Those TBs of main memory are used like the older index partitions on disks.
Hmm...
I still have 48GB; and (on average) the most RAM-intensive thing I tend
to do is running Firefox (well, except in rare cases where I try to
recompile LLVM and it brings my computer to its knees, *).
Where, seemingly Firefox is a gas that will soon expand to use all
available RAM (until one periodically terminates the whole process tree
and reloads it).
*: Partial reason I invested a lot more effort in BGBCC, which I can
rebuild in a few seconds without totally owning my PC. Nevermind if I
frequently end up spending hours or days on long running Verilator
simulations...
AFAIK, a lot of the servers were using 10K RPM SAS drives and similar.
>>
>> So, it seems like one would mostly just want a CPU with significant
>> memory bandwidth and similar. Maybe some helper operations for things
>> like string compare and similar.
>>
> Mostly irrelevant. You are waiting for the comparison values to arrive a lot
> more often than you spend performing the comparisons themselves. The
> whole B-Tree stuff used to be performed on the disks themselves, leaving
> the CPU(s) to do other stuff while the Tree was being searched.
?...
I hadn't heard of anything like this; and none of the disk interfaces I
am aware of had presented much of an abstraction beyond that of 512B
sectors and linear block addresses.
RAID was typically multiple HDDs (with hardware error correction, ...),
but still presenting a "linear array of sectors" abstraction.
I am aware that older HDDs typically used C/H/S addressing. Early on,
one needing to set up the drive's values exactly as printed on the drive
or it wouldn't work.
I guess, older still, apparently one had to keep the HDD and controller
matched, as the drives could not be read with a different type of
controller than the one that initialized it (with different controllers
using different recording strategies, etc).
Apparently this was fixed with IDE drives, but AFAIK still existed with
floppy drives (but, with more standardization in terms of the recording
strategies used on floppy disks, since changing any of this would render
the floppies incompatible with those made on a different computer).
Though, admittedly, by the time I was using computers, they were mostly
already using IDE drives; but we still used floppies for a while. There
was also the wonkiness that early CD-ROM drives seemed to use
non-standardized cables and typically plugged into the soundcard rather
than using IDE.
It wasn't until after I was an adult that things moved over to SATA.
When I saw videos of early HDDs, I guess the difference was that they
were a lot bigger (roughly the same footprint as a CD-ROM drive but
twice as tall), and also connected with a pair of ribbon cables (in
addition to the power cable).
Well, and I guess there was some sort of older HDD tech where the HDD
was the size of a washing machine and had a lid that could be opened to
access a stack of platters.
Also saw a video where someone was trying to interface an 8-inch floppy
drive with a newer PC, but then having to deal with issues that
apparently these drives ran on 110VAC-60HZ and used 24V data signals
rather than 5V or 3.3V, ...
I guess someone could maybe get creative and mount one sideways in a
full-tower PC case?...
Dude: "Hey, check out my rig!", *kerchunk*, proceeds to pull out an 8
inch floppy... Then maybe rig up a power-bypass, and voltage leveling
stuff to try to plug it into a USB based floppy controller, maybe so
they could store data on it.
Or, maybe build one using some slightly newer tech, but still use 8-inch
floppies, to make "absurdly large" floppies (if one could achieve
similar recording density to a 3.5" floppy, they could fit significantly
more data on an 8" disk).
Then again, practically, this makes almost about as much sense as trying
to stick digital data onto a vinyl record...
Then again, it seemed like a lot of the Gen Z people gained an interest
in vinyl records; which for me seemed a little odd as this was old tech
even by my standards...
Waves cane, "Back in my day, we had CDs and MP3's!".
>>
>> Though, assuming the B-Tree's were structured in an appropriate way,
>> things like "packed search" / "packed compare" instructions could be useful.
> <
> Why do you think this is a job for a CPU ??
I hadn't heard of there being anything else to run the B-Tree walks...
It seemed like one would need a mechanism for hopefully moderately
efficient "strncmp()" and similar though...