You seem to be mixing up a range of different things here. (That is not
surprising - ARM has /seriously/ confusing terminology here, with
similar names for different things, different names for similar things,
different numbers for instruction sets, architectures, and implementations.)
The answers you got from Richard are all correct - I am just wording
things a little differently, to see if that helps your understand.
In the beginning, Arm used a 32-bit fixed-size instruction set. It was
fast, but not very compact. Most instructions had three registers (so
you have "Rx = Ry + Rz"), with 16 registers - that adds up to 12 bits of
instruction just for the registers.
When they started getting serious in the microcontroller market, this
extra space was a cost - it meant code was bigger than for other
microcontrollers, and you needed a bigger and more expensive flash. So
they invented the "Thumb" mode. Here, instructions are 16-bit only and
so much more compact. These were a subset of the ARM instructions -
instructions only cover two registers, and for many operations these
could only come from 8 registers (so only needed 6 bits of the
instruction set). The cpu decoder expanded the Thumb instructions into
32-bit ARM instructions before executing them.
Thumb gave more compact code, but slower code, and there were some
things you simply could not do in Thumb mode. So the microcontrollers
had to support both instruction sets, and you would switch sets in code
(interrupts, for example, would be in ARM mode for speed). It was all a
bit of a mess.
So ARM invented "Thumb-2". This was a set of 32-bit instructions that
can be mixed along with the slightly re-named "Thumb-1" 16-bit
instructions. Sometimes it is now all called "Thumb2", or just "Thumb"
(since for the vast majority of ARM programmers, 16-bit only thumb was
ancient history from before they started using the devices). 16-bit and
32-bit instructions are called "narrow" and "wide" forms. These new
32-bit additions to thumb meant you could get a mixed code that was
compact, fast, and covered all needs (including full access to all
registers).
So for the Cortex-M devices, Thumb2 is the only set needed - they don't
support "old-style" 32-bit Arm instructions. You no longer need to
worry about changing states or modes, you are always in "thumb2" mode.
Different Cortex-M processors support different subsets and extensions
of this, so it is important that you inform your compiler of exactly
which device you are using. Do that, and you don't really need to care
about the details for the most part - the compiler does the work. But
it can be interesting to know what is going on.
There are also Cortex-A devices for applications processors - basically,
running Linux (including Android) or, occasionally, Windows. And there
are Cortex-R devices for safety-critical systems. But I expect you are
talking about Cortex-M devices here.