Fast 12 bit to 16 bit sample conversion?

Peter Heitzer

unread,

Jul 20, 2015, 9:10:22 AM7/20/15

to

I am currently writing a python script to extract samples from old Roland 12 bit sample
disks and save them as 16 bit wav files.

The samples are layouted as follows

0 [S0 bit 11..4] [S0 bit 3..0|S1 bit 3..0] [S1 bit 11..4]
3 [S2 bit 11..4] [S2 bit 3..0|S3 bit 3..0] [S3 bit 11..4]

In other words
sample0=(data[0]<<4)|(data[1]>>4)
sample1=(data[2]<<4)|(data[1] & 0x0f)

I use this code for the conversion (using the struct module)

import struct
from array import array

def getWaveData(diskBuffer):
offset=0
words=array('H')
for i in range(len(diskBuffer)/3):
h0=struct.unpack_from('>h',diskBuffer,offset)
h1=struct.unpack_from('<h',diskBuffer,offset+1)
words.append(h0[0] & 0xfff0)
words.append(h1[0] & 0xfff0)
offset+=3
return words

I unpack the samples in an array of unsigned shorts for I later can use the byteswap() method
if the code is running on a big endian machine.

What options using pure python do I have to make the conversion faster?
I thought of unpacking more bytes at once e.g. using a format '>hxhxhxhx' for 4 even samples
and '<xhxhxhxh' for 4 odd samples vice versa.
Can I map the '& 0xfff0' to the whole array?

MRAB

unread,

Jul 20, 2015, 10:43:38 AM7/20/15

to pytho...@python.org

On 2015-07-20 14:10, Peter Heitzer wrote:
> I am currently writing a python script to extract samples from old Roland 12 bit sample
> disks and save them as 16 bit wav files.
>
> The samples are layouted as follows
>
> 0 [S0 bit 11..4] [S0 bit 3..0|S1 bit 3..0] [S1 bit 11..4]
> 3 [S2 bit 11..4] [S2 bit 3..0|S3 bit 3..0] [S3 bit 11..4]
>
> In other words
> sample0=(data[0]<<4)|(data[1]>>4)
> sample1=(data[2]<<4)|(data[1] & 0x0f)
>
> I use this code for the conversion (using the struct module)
>
> import struct
> from array import array
>
> def getWaveData(diskBuffer):
> offset=0
> words=array('H')
> for i in range(len(diskBuffer)/3):

If the 2 12-bit values are [0xABC, 0xDEF], the bytes will be [0xAB,
0xCF, 0xDE].

> h0=struct.unpack_from('>h',diskBuffer,offset)

This gives 0xABCF, which is ANDed to give 0xABC0. Good.

> h1=struct.unpack_from('<h',diskBuffer,offset+1)

This gives 0xDECF, which is ANDed to give 0xDEC0. Not what you want.

> words.append(h0[0] & 0xfff0)
> words.append(h1[0] & 0xfff0)
> offset+=3
> return words
>
> I unpack the samples in an array of unsigned shorts for I later can use the byteswap() method
> if the code is running on a big endian machine.
>
> What options using pure python do I have to make the conversion faster?
> I thought of unpacking more bytes at once e.g. using a format '>hxhxhxhx' for 4 even samples
> and '<xhxhxhxh' for 4 odd samples vice versa.

You could try using lookup tables to decode even-numbered and
odd-numbered pairs of bytes.

> Can I map the '& 0xfff0' to the whole array?
>

That's something the numpy could do.

edmondo.g...@gmail.com

unread,

Jul 20, 2015, 11:15:16 AM7/20/15

to

I'll try to read the binary data with numpy.fromfile, reshape the array in [n,3] matrix, and then you can operate with the columns to get what you want.
:-)

Peter Heitzer

unread,

Jul 20, 2015, 11:23:41 AM7/20/15

to

MRAB <pyt...@mrabarnett.plus.com> wrote:
>On 2015-07-20 14:10, Peter Heitzer wrote:
>> I am currently writing a python script to extract samples from old Roland 12 bit sample
>> disks and save them as 16 bit wav files.
>>
>> The samples are layouted as follows
>>
>> 0 [S0 bit 11..4] [S0 bit 3..0|S1 bit 3..0] [S1 bit 11..4]
>> 3 [S2 bit 11..4] [S2 bit 3..0|S3 bit 3..0] [S3 bit 11..4]
>>
>> In other words
>> sample0=(data[0]<<4)|(data[1]>>4)
>> sample1=(data[2]<<4)|(data[1] & 0x0f)
>>
>> I use this code for the conversion (using the struct module)
>>
>> import struct
>> from array import array
>>
>> def getWaveData(diskBuffer):
>> offset=0
>> words=array('H')
>> for i in range(len(diskBuffer)/3):

>If the 2 12-bit values are [0xABC, 0xDEF], the bytes will be [0xAB,
>0xCF, 0xDE].

>> h0=struct.unpack_from('>h',diskBuffer,offset)

>This gives 0xABCF, which is ANDed to give 0xABC0. Good.

>> h1=struct.unpack_from('<h',diskBuffer,offset+1)

>This gives 0xDECF, which is ANDed to give 0xDEC0. Not what you want.

You are right! It looked to me as if it was little endian, but only for the MSB.

Mark Lawrence

unread,

Jul 20, 2015, 10:46:30 PM7/20/15

to pytho...@python.org

By "pure python" I'm assuming you mean part of the stdlib.

Referring to https://wiki.python.org/moin/PythonSpeed/PerformanceTips
you could end with something like this (untested).

def getWaveData(diskBuffer):
offset = 0
words = array('H')
wx = words.extend #saves two lookups and a function call
su = struct.unpack_from #saves two lookups
# 'i' not used in the loop so throw it away
for _ in range(len(diskBuffer)/3): # use xrange on Python 2
h0 = su('>h',diskBuffer,offset)
h1 = su('<h',diskBuffer,offset+1)
wx((h0[0] & 0xfff0), (h1[0] & 0xfff0)) # MRAB pointed out a
problem with the masking in the second section???
offset += 3
return words

> I thought of unpacking more bytes at once e.g. using a format '>hxhxhxhx' for 4 even samples
> and '<xhxhxhxh' for 4 odd samples vice versa.

If that reduces the number of times around the loop why not? Combine it
with MRAB's suggestion of lookups and I'd guess you'd get a speedup, but
knowing Python I'm probably way out on that? There's only one way to
find out.

I'm also thinking that you could user one of the itertools functions or
recipes to grab the data and hence simplify the loop even more, but it's
now 3:45 BST, so I can't think straight, hence bed.

> Can I map the '& 0xfff0' to the whole array?

If it works :)

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence