> > Even in 1978, when the 8086 was launched, segementation was recognized
> > as a poor choice for memory management.
> Well I don't recognize that even in 2021. What am I missing?
Another reason that we have clear evidence for: Intel's previous chip, the 8080, supported a 16-bit, 64KiB, memory space. The dominant operating system for it was CP/M. Intel and Microsoft both made a serious effort to make the new environment as source-compatible with CP/M code as possible.
I will give examples from MS-DOS 1.0, because that was the most important and best-documented OS that took advantage of this feature. Back when the ISA was being developed, IBM had not yet chosen the 8088 for its Personal Computer Model 5150, there was a large library of 8-bit CP/M software, and memory was even more expensive, all the considerations I am about to mention were even more crucial.
The segmentation scheme allowed an OS for the 8088/8086 to emulate an 8080 running CP/M with minimal hardware resources. Every MS-DOS program was initialized with a Program Segment Prefix, which just like it says on the tin, was loaded at the start of the program segment. This was designed to emulate the Zero Page of CP/M. In particular, the 8080 instruction to make a system call in CP/M was CALL 5. If you use that instruction in a MS-DOS program, it will still work. The Program Segment Prefix will be loaded into CS, and CS:0005h contains a jump to the system-call handler of MS-DOS.
Segmentation effectively gave every legacy program its own 16-bit memory space, allowing it to use 16-bit pointers as it always had. This both saved memory and saved 8-bit code from needing to be extensively rewritten. (Recall that most software today runs on 64-bit CPUs, but ships as 32-bit programs because smaller pointers are more efficient; this was even more crucial on an original IBM PC model 5150.) A .COM program was based on the executable format of CP/M, and got by default a single segment for code, data and stack, so a program ported from CP/M could treat the 8086 as a weird 8080 where it had the whole 64KiB of memory to itself and the registers had different names. MS-DOS 2.0 let programs start with separate segments for their code, stack, data and extra data, still using 16-bit pointers and offsets. Of course, a program aware of the full 20-bit address space could request more memory from the OS. In MS-DOS, it could request memory in 16-byte "paragraphs" (bigger than a word, smaller than a page) and would get them in a new relocatable segment, whose exact value it did not need to care about or waste a precious general-purpose register to store. (Unlike shared libraries for the x86 and x86_64 today!)
But why not shift segment registers 16 bits to the left, instead of four, for even less complexity, and let programs use only the lower-order 8 or 16 bits of a 32-bit address space, for offsets within the current page of memory? The apparent advantage is that it allowed segments to be aligned on any 16-byte boundary, instead of a 65,536-byte boundary. Any program back then needed to use a lot less than 64KiB of memory, and exceptionally few computers even shipped with the 256KiB that the CS, DS, ES and SS registers could address. The OS, the program in the foreground and every program to Terminate-and-Stay-Resident could not all have gotten their own 16-bit address spaces, much less separate ones for their code, data, and stacks, if every segment had needed to be on a 64KiB boundary. But, with the memory model Intel used, programs could use 16-bit pointers with much smaller memory blocks.