The problem with a noisy channel, is that data loss can occur. Mostly in short bursts, but often in deep fades. Thus, any cipher system needs to operate in this environment.
In packet radio, we normally sent 64 octet packets over HF. If this packet didn't pass the checksum at the other end, then it was tossed in the bit-bucket. You sent a negative to the sender (NAK) and the sender sent that packet again. The sender might try a few times, and if no success, it would close down and try again later.
The rule in cipher security is to not encrypt header information. There's a few reasons for this, but the main being that the header is repeated, so after a short time, you can figure out the key.
In ax25 packet, the header may be up to 70 bytes long, given a list of repeaters. Of course on HF you didn't use the digipeater function, as it lengthened the packet. Thus, 14 bytes (a source and destination callsign) was used. Added to that was two bytes of checksum. Leaving us with (64 - 14 - 2) 48 octets of information that could be sent per packet (25% overhead).
It is this (and only this) 48 octets that is encrypted.
Now the problem is, what kind of encryption? There are two choices: 1) stream, 2) block. Now obviously a stream cipher is not going to be a secure choice. Even if you used RC4 with a 256 octet key, it is going to output the same bits every time it starts. For example if the cipher has a key 0x01 then the stream might be:
6d 1c 28 8c 03 ff a0 33 
If the key is 0x02 then the stream might be:
8f 33 53 bd fb 90 eb 83
These octets are exclusive-or (XOR) with the plain-text data producing cipher-text data. The next eight octets will be something else, and the next eight, something else again. But! If you drop your Push-to-Talk, and then Push-to-Talk for the next transmission, the XOR stream is identical. So, continuing with the example above, your stream repeats itself every 48 octets. This is not secure. It's like pseudo-random numbers with the same seed. You get the same numbers every time you start the generator.
So, in radio systems, a block cipher with a counter mode (CTR) is used. Now you can make a block cipher into a stream cipher without modifying the internals. But you have to modify the engine if you want blocks out of a stream cipher.
So, we key the microphone. We want a different starting point every time. To do this, a counter is used. For example, take a 24 bit counter and randomly set a value. Now every 10 voice frames, you increment the counter and get a different XOR sequence. So this counter affects the key. Of course if the counter is always the same number, then you'll always get the same XOR sequence. The counter has to increment.
The neat thing about this counter, is that it doesn't have to be secret. The key is secret, the counter only stirs the pot.
So in radio, what we have to do is send this counter to the receiver. If this counter is incremented every 10 voice frames then you have to send the receiver the counter every 10 voice frames! Then before the next PTT, you set in a new random value to the counter.
Yes, this takes bandwidth. If you add the counter bits with the voice bits, then that is your data rate necessary. If you are already consuming 100% of your data rate, then there is no room for sending the counter. Obviously you need to create space.
One way of creating space in voice encryption (and not increasing the bit-rate), is to create a function called Voice Activity Detector (VAD). This algorithm examines the vocoder output and determines when silence is detected. If the silence is longer than the number of bits required for the counter, then presto, we set the VAD bit and send the counter. Then the receiver has to maintain sync until it receives the next counter. If a noise burst destroys the data, the data is garbage until the next counter is received. If it is five second later, then the receiver gets five seconds of comfort noise.
If someone made a VAD system for codec2, it would be a great boon to encryption over low bit-rate channels. For example, 
MELPE has a VAD that is used by PairPhone over a 1200 bit/s channel. Of course we don't want to use MELPE.
That's all I wanted to convey here. That a secure voice system uses a block cipher with a counter mode. You can't use a stream cipher, unless you modify the internals. Something I did with 
RC4. I created a counter mode out of a stream cipher, but it wasn't an add-on, it was a new engine (using the same S-box design).
Now, on the subject of bit-scrambling, then yes - a stream cipher is adequate. We actually want it to duplicate the stream. We even want it to duplicate the stream fast. For example the normal way is to use a Linear Feedback Shift Register (LFSR). Even with a small number of bits, say 13 bits. If the data is corrupted by the ether, then when the stream starts over, you again fall in sync. These are called Self-synchronizing stream ciphers. They are XOR just like cryptographically-secure stream ciphers, but are of course not secure.
Since the first 13 bits output from the LFSR are just the LFSR starting value, then the first 13-bits output are unscrambled. Thus, a system using an LFSR needs to have a header that gives time for the LFSR to increment. Generally, a 16 bit header is sufficient.
Is this too long? Suis-je un intello ?
Happy Saturday! Time to go mow the lawn (weeds)...