Info as of 1/29/18 - Linux and Intel Patching - Torvalds Releases Linux Kernel 4.15 with Patches ...

5 views

Skip to first unread message

OldMidWest Swabbie

unread,

Jan 30, 2018, 1:24:12 PM1/30/18

to Akron Linux Users Group

To All,

I would like to present the latest information that I came across from the Net regarding Linux and the Meltdown and Spectre Patches (summary) :

From eWeek

" Among the highlights of the new Linux 4.15 kernel is the core reason for the kernel's delay, namely the Meltdown and Spectre CPU flaws, that first became public on Jan. 3. Linux developers had been quietly working since at least November 2017 on dealing with the Meltdown issue in particular through an effort known as Page Table Isolation (PTI). "

" Torvalds wrote in his release announcement. "The extra two weeks were obviously mainly due to that whole timing issue."
Torvalds also noted that there is still more work to be done to further protect users against the Meltdown and Spectre vulnerabilities. That said, he emphasized that Linux 4.15 is about more than just patches for CPU vulnerabilities. "

"While Spectre/Meltdown has obviously been the big news this release cycle, it's worth noting that we obviously had all the *normal* updates going on too,"Torvalds wrote. "The work everywhere else didn't just magically stop, even if some developers have been distracted by CPU issues."

" On AMD, Linux now supports the AMD Secure Encrypted Virtualization (SEV) capability."

I will provide the link to this article for you to read for yourself and learn about two (2) additional features that are being implemented and more.

Linus Torvalds is not happy about the patches that Intel has developed to protect the Linux kernel from the Spectre and Linux flaws. From ZdNet, it didn't start well. As Linux's creator Linus Torvalds said on the LKML when the news of the problems broke, "I think somebody inside of intel needs to really take a long hard look at their CPUs, and actually admit that they have issues. " Later, Greg Kroah-Hartman, maintainer of the Linux stable branch, wrote that this is "a textbook example of how to interact with the Linux kernel community properly".

Then, things heated up again when, annoyed by new intel suggested patches, Torvalds snarled, " Is Intel really planning on making this shit architectural ? Anybody talked to them and told them they are f*cking insane ? "

The work continues on a way to avoid " garbage " patches, while still keeping Intel Skylake, and that generation processor family -- safe. Something has to be done. these holes enable hackers to get around system protections on almost all PCs, Servers, and smartphones. So far, knock on silicon, no one's managed to exploit them. But it's only a matter of time.

From an earlier ZdNet article dated 1/8/18, If you're not running Linux on x86 or ARM64, be careful out there. There are no patches for other processor types for now. We know that x86 (AMD and Intel chipsets), POWER 8, POWER 9, System z, and SPARC are also vulnerable. ( I have not seen this notice anywhere else).

The developer said that to get complete protection against the Meltdown vulnerability, your Linux kernels must have the --CONFIG_PAGE_TABLE_ISOLATION build option enabled, but work continues on the Spectre front, which those who discover it said it would hunt us for years. There are some patches out there for Spectre, as well as various solutions, but the truth is that there will be lots of updates released for the next few years to mitigate this bug.

From Softpedia News, as of 1/26/18, Ubuntu 16.04.4 LTS delayed due to Meltdown and Spectre security vulnerabilities. Canonical decided to delay the upcoming 16.04.4 point release of the long-term supported Ubuntu 16.04 LTS (Xenial Xerus) operating system due to these vulnerabilities.

Originally scheduled for release next month on February 15, 2018, it was supposed to ship with up-to-date kernel and graphics stacks based on those from the last stable release, Ubuntu 17.10 (Artful Aardvark).

They plan on waiting until the dust is settled a bit before placing the finishing touches on the Ubuntu 16.04.4 LTS point release. Here are the article links :

< Torvalds Releases Linux 4.15 With Improved Meltdown, Spectre Patches >

< Ubuntu 16.04.4 LTS Delayed Due to Meltdown and Spectre Security Vulnerabilities >

< Linux 4.15 Released >

< Linux and Intel slowly hack their way to a Spectre patch / Linus Torvalds and Intel developers are slowly moving to answers for Linux >

< Linux creator Linus Torvalds criticises Intel's 'garbage' patches >

< The Linux vs Meltdown and Spectre battle continues >

< Greg Kroah-Hartman on Meltdown and Spectre Bugs: Go Update Your Linux Kernel >

You will find the CVE's mentioned in several of these articles, as well as, features that you may want to implement within your kernel such as new features coming up - IBPB, STIBP, and IBRS.

You may want to create your own list regarding these fixes and features that are being mentioned. It does not seem that these flaws are going to be "completely" fixed for a while or until a new generation of processors.

I now understand, why our parents and grandparents after WWII, while still managing businesses, wanted NOT to support a single vendor for our military and business. Besides supporting diversification (having different based tech) it also provides LESS of an easy technical target to breach or overcome. Sure a single vendor is easier and cheaper to support, but this can be and ended-up to be also a much bigger security risk. These are my thoughts now looking back on how I grew-up and what I worked with while working in IT (MIS), too.

The Best to You all who are supporting Linux !

Sincerely, Harvey Rothenberg

Steve Alexander

unread,

Jan 30, 2018, 5:58:46 PM1/30/18

to OldMidWest Swabbie, Akron Linux Users Group

On 01/30/2018 01:24 PM, OldMidWest Swabbie wrote:

To All,

I would like to present the latest information that I came across from the Net regarding Linux and the Meltdown and Spectre Patches (summary) :

...

Thanks Harvey, but another iteration of Torvalds whining like a baby with a loaded diaper doesn't pass as 'information'. Just for the record Linus has a bad habit of going off the rails when reality doesn't meet his expectations.   You might want to review his negative attitude against TRIM commands on SSDs - it's as tho' he failed to understand that the drive controller requires this information to perform garbage collection.

For some better sources of information I'll suggest ...
https://meltdownattack.com/
(read the two linked paper for a deep understanding).

Also this is an excellent 2014 paper, released in 2015 that shows a basis of all three attacks.
https://eprint.iacr.org/2013/448.pdf

--

In rough terms the basic attack mechanism requires a system where a user process can reliably flush or evict cache entries, and has access to a high-resolution clock that can distinguish dram from cache accesses   In my new laptop the TSC (cpu tick) clock runs at 2Ghz (0.5ns resolution) and cache accesses take <60 ticks, while ddr4-2400 accesses take more than 700 ticks - reliably.   For reasons we can discuss later, the measurement of these access times are not 100% - there is a fraction of 1%error as sometimes extra delays or cache changes take place outside the control of the user process.

So a user process can clear cache lines, then later measure if a particular address , within it's legitimate access range, is in cache or not - so what ?

--
The first exploit using this FLUSH+RELOAD (or EVICT+RELOAD) involves using shared libraries, openSSL. While one process way busy encrypting data with a key, repeatedly, the evil-process was flushing the instruction cache, and then testing after a context switch to see which code path in shared memory was used. This took on the order of one hour, but the evil process was able to accurately reconstruct the gpg key by just observing the code-paths the other process took within shared libraries.    This was considered a relatively low urgency threat since the amount of time, knowledge of the specific code and requirement for a local binary exploit made it unlikely.

The three new threats (Spectre(2) & Meltdown(1)) are all based on speculative execution.   All advanced CPUs [incl Intel & AMD x86' and also some higher-end ARMs] pre-fetch/decode/conditionally-execute instructions. This in effect makes the processor more-parallel, but sometimes (due to a conditional branch, or an exception) the fetched/decoded/conditionally-executed instructions are not used and all results must be discarded.   It's a bit like having a 'ghost processor' running instructions in advance of the committed instruction point. If the ghost has taken the right path, then we make it 'undead' - we deposit it's results into register and memory and actually change the observable machine state to match the good-ghost. If instead a branch, or exception means the ghost has executed instructions on a 'wrong path' then we make the ghost disappear and it has no impact on the observable machine state. Sounds great - this tentative pre-execution of instructions speeds up thread execution and improves the utilization of the hardware.   So long as the 'bad-ghosts'(unretired instruction) leave no change in the system state it's perfect.

So here is the rub - no advanced cpu design executes it's internal operations in the sort of strict-order you might naively expect.    Meltdown is caused by Intel(&ARMs and some earlier AMDs) not comparing process privilege against page-table control bits before speculatively pre-fetching operands. Spectre1 applies to Intel&AMD&ARM where they do not perform bound-checking before speculative execution fetches (a different set of permission violations).   The 2nd computed branch exploit is a bit more complex, but also relies on doing things in an order that differs from expectations.

Here is some pseudo-code that explains how Meltdown and the Bound-checking exploits can work ...

# define CACHE_LINE_SIZE 64
char mydata[CACHE_LINE_SIZE][256];
volatile double dud = 0.0;

char *naughty_address; // some address in kernel or beyond an array bound

clear_or_evict(mydata); // empty the cache entries for the array

if( 1.0 / dud ) mydata[0][(int)*naughty_address]; // exception when dud = 0

....
There are more reliable ways to do this, but the divide by zero causes an exception, so the fetch from mydata[*naughty_address] is never retired (it's a bad ghost) BUT that data is (usually) loaded into cache.
So after handling the exception we can examine all 256 mydata cache lines, and we expect that one of them will be in cache and not the other 255. If mydata[any][42] is the one in cache, it means that the value at *naughtydata must have been 42 ! The bad ghost left a very indirect trace of it's existence in cache and that is very very bad.

The Google Zero Lab paper shows they were able to read ~50k bytes per second from pages where they lacked permission.

The fundamental flaw is that user-processes can determine cache occupancy state, and machine state models for security didn't take that into account.

-------
IMO these were unforeseeable design problems.   No one expected that info could be leaked this way before 2014/15. Even then it looked like a minor exploit mechanism until this recent work. Spectre is in many ways the worst, since javascript implementation have been produced, it means that any website running javascript on your system could potentially see what is happening in other browser tabs (like entering passwords or decoding your banking transactions).

The Meltdown issue is described in the Google paper as ....

We also tried to reproduce the Meltdown bug on several ARM and AMD CPUs. However, we did not manage to successfully leak kernel memory with the attack described in Section 5, neither on ARM nor on AMD. The reasons for this can be manifold. [....]However, for both ARM and AMD, the toy example as described in Section 3 works reliably, indicating that out-of-order execution generally occurs and instructions past illegal memory accesses are also performed.

So presuming AMD&ARM aren't affected at all by Meltdown is wishful thinking.

Of the three exploits Meltdown has been to most costly to address. In Linux, Windows and the BSD kernels I used to develop for - the kernel and the user process share one set of page-table entries (different permissions per page). This makes system calls faster. The KPTI/retrampoline fix creates separate maps for most of the kernel vs user-process. The exception is that the upper-half interrupt handlers must still share a map with the user process for performance.    The 'further work' likely involved removing critical data from legacy interrupt handlers.   In CPU intensive benchmarks, this fix has no impact, in system-call intensive benchmarks (like the apache test suite or sql tests) the figure might be around 20% [see Phoronix for actual results].

The branch exploit is being fixed in microcode by both Intel & AMD - this appears from reports to serialize instructions around the branch to avoid the out-of-order mechanism. This is said to cost 2.5-3% performance, tho ' it's probably quite load dependent too. I doubt such a fix is available for ARM (unsure), tho' it's probably the most vulnerable to attack wrt phones.

The worst part of the bounds violation exploit is being addressed in software. The first swag was to remove high-resolution timers from javascript access. This is a very very incomplete and sh*tty sort of fix.

Take Away:
1/ Linus is a whiner. He complains about the fix that requires kernel changes and ignores there are equal or worse security flaws.
2/ Update your kernels ! The KPTI kernel fix is not 100%. I'd have concerns about interrupt handlers for legacy serial driver & such.   Still it makes the chances for a non-targeted exploit quite low.
3/ Use the latest Microcode.   When it's sorted-out it will completely prevent the branch exploit.
4/ Update your browser. That is not even vaguely close to a 100% fix, but it's something.

The guys still developing the secure BSD I used to develop-for must have their hair on fire. This form of exploit allows co-operating processes to communicate surreptitiously across partitions in a way that violates the NIST standards they intend to comply with. Even detecting access to the interrupt handlers (by cache timing) ruins the security they intend.

-Steve

Reply all

Reply to author

Forward

0 new messages