downsamp .wav file 48kHz --> 16kHz?

973 views
Skip to first unread message

Jeremy Gray

unread,
May 20, 2013, 11:37:36 AM5/20/13
to pyo-d...@googlegroups.com
Hi all,

I am experiencing some strangeness with downsamp. Most conversions I've tried work fine, but 48000 -> 16000 gives me a basically empty file (44 bytes). This is with pyo 0.66 r1104, compiled today from source on mac 10.8.3. I've only tried .wav files.

It does not appear to be a problem starting from 48000, ending at 16000, nor with downsampling by a ratio of 3. These work fine:
48000 -> 8000 (ratio 6)
32000 -> 16000 (ratio 2)
24000 -> 8000 (ratio 3)

"Working fine" means there's a reasonable file size, and google-speech can tell me the word I said. google speech requires 8000 or 16000 Hz files as input, so I could use 8000 as a last resort but would rather opt for higher quality recordings as input (presumably higher accuracy on the speech recognition). I need to record at 48000 in order to be able to insert a high-frequency marker tone (19000 Hz) which I later detect using FFT.

Any ideas? Thanks,

--Jeremy

ps.  btw, making a sym-link /Developer/SDKs/MacOSX10.5.sdk to point to the 10.6 sdk solved all my compiling problems on 10.8

Olivier Bélanger

unread,
May 20, 2013, 12:14:12 PM5/20/13
to pyo-discuss
Hi Jeremy,

I'm at work now, but I'll take a deeper look this evening. At first glance, it shouldn't be a problem to downsample from 48k to 16k...

Olivier


2013/5/20 Jeremy Gray <jrg...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "pyo-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Bryan Smart

unread,
May 20, 2013, 4:13:11 PM5/20/13
to pyo-d...@googlegroups.com
Olivier,

Thanks for continuing to update Pyo. It is an amazing system with some incredibly powerful features.

For some projects where I'm using Pyo, I'd like some additional features. I realize that everyone has wishes, and your time is limited. I've donated in the past, but would be glad to donate toward specific new features, if that was an option.

1. My first group of projects uses Pyo to render virtual sound environments for different purposes. I use the HeadSpace object for this a lot! Since HeadSpace isn't a native object, though, it isn't able to take parametric input from Pyo objects. For virtual sound environments, parametric input would allow HeadSpace parameters to automatically scale as parameters to other objects are adjusted. I also haven't worked out a clean way to smooth movement of HeadSpace objects. Perhaps cross-fading impulses somehow would get rid of the clicking when position repeatedly changes? Anyway, I'm sure that you must know how to do this the right way. A native HeadSpace object would certainly require a bit less processing power.

2. My second group of projects is focused on musical ideas. Pyo is generally good for this, as long as sounds are completely synthesized. I'd like to be able to use multi-sample instruments for some situations, though. It would be very good if we could use sampled instruments in a popular format that carries info about key/velocity mappings like SoundFont or SFZ. At one time, you wrote about connecting to FluidSynth as a possible solution. I'm not particularly attached to either SoundFonts or SFZ as a format, nor FluidSynth as a solution, but it would be great to have some sort of solution for easily incorporating multi-sample instruments.

#1 is most important to me. The first practical use of this tech is my effort to add positional sound as an additional information channel to the NVDA open source screen reading program for the blind (nvaccess.org). Currently, all information about the computer's user interface is presented in a narrative style by a speech synthesizer. Attempting to narrate all that happens in a UI is a challenge, partly as speech doesn't intuitively carry any positional information, but also because speech, even at high speed, is a slow channel for conveying complex information. While most people can only focus on a single conversation at a time, our brains are able to process non-spoken audio cues in parallel. My goal is to use audio cues to move as much information away from the speech channel as is possible, thereby dramatically increasing the speed that blind people are able to operate a computer through a screen reading program (currently quite slow). Positional audio will also naturally be able to express spacial relationships that can only be roughly and slowly expressed through speech. The NVDA screen reader is almost entirely written in Python, and easily integrates with Pyo. My efforts will be freely available as part of the larger NVDA project.

Would you, or someone else on the list, have time to work on either of these? My C++ skills are poor and rusty. I would probably be able to help more by providing a little financial incentive.

Bryan

Olivier Bélanger

unread,
May 23, 2013, 8:05:39 AM5/23/13
to pyo-discuss
Hi Bryan,

I'll take a look to convert HeadSpace into a builtin PyoObject, that should allow for continuous parameter changes. Not sure if interpolating tables will remove clicks but I'll try...

For soundfont support, I think it's a good idea, I'll try to take some times this summer to work on that. I'll post on the list when I have some news...

Olivier


2013/5/20 Bryan Smart <bryan...@bryansmart.com>

Bryan

Bryan Smart

unread,
May 23, 2013, 1:14:44 PM5/23/13
to pyo-d...@googlegroups.com
Olivier, thank you so much for working on these!

Bryan

Jeremy Gray

unread,
May 24, 2013, 12:44:37 PM5/24/13
to pyo-d...@googlegroups.com
Just giving this thread a nudge -- any ideas? Here's a little more detail:

>>> from pyo import *
>>> getVersion()
(0, 6, 6)
>>> downsamp('rec48.wav', 'rec16.wav', 3)  # fails
>>> open('rec16.wav', 'r').read()
'RIFF$\x00\x00\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x02\x00\x80>\x00\x00\x00\xfa\x00\x00\x04\x00\x10\x00data\x00\x00\x00\x00'

>>> downsamp('rec48.wav', 'rec24.wav', 2)  # fine
>>> downsamp('rec48.wav', 'rec8.wav', 6)
>>> downsamp('rec32.wav', 'rec16.wav', 2)

>>> upsamp('rec48.wav', 'rec96.wav', 2)
>>> downsamp('rec96.wav', 'rec16', 6)  # fails

Thanks,

--Jeremy

Jeremy Gray

unread,
May 28, 2013, 2:32:05 PM5/28/13
to pyo-d...@googlegroups.com
Hi Olivier,

Digging into this further, I wrote a test script to generate .wav files of known sample sizes, downsamp() them, and display the number of samples in the resulting file. To me it looks like downsamp() is fine with single channel data, but is hit or miss with two channels, in a way that varies with the precise number of samples. If you run the test script (below), you get 4 columns of output: channels, # original samples, down (= the parameter for downsamp), #down-sampled samples. Here's output for 1007 samples, first with 1 channel data and then for 2 channel data:
...
1 1007 1 1007
1 1007 2  503
1 1007 3  335
1 1007 4  251
1 1007 5  201
1 1007 6  167
1 1007 7  143
...
2 1007 1 1007
2 1007 2    0  # 0 == no sound data in the file
2 1007 3    0
2 1007 4    0
2 1007 5  201
2 1007 6    0
2 1007 7    0
...

This effect of channels is extremely robust. I never see a problem with 1 channel data, and always see some problems with 2 channel data. Same effect with realistic sample sizes (e.g., 100000 samples).

Best,

--Jeremy

code:

import pyo
import numpy as np
from scipy.io import wavfile
import os

# not relevant but can try out:
SR = 48000
order = 256

print 'channels #samples down #ds-samples'
for CHANNELS in (1,2):
    for SAMPLES in xrange(1000, 1010):
        # Generate data of a given size and number of channels:
        origdata = np.zeros((SAMPLES, CHANNELS), dtype=np.int16)
        filename = 'file' + str(SAMPLES) + '.wav'
        wavfile.write(filename, SR, origdata)
        
        # pyo.downsamp() it, get size of data in new file:
        for down in range(1,8):
            dsfilename = 'ds' + str(down) + filename
            pyo.downsamp(filename, dsfilename, down, order)
            sr, dsdata = wavfile.read(dsfilename)
            print CHANNELS, len(origdata), down, "%4d" % len(dsdata)
            os.unlink(dsfilename)
        os.unlink(filename)

Olivier Bélanger

unread,
May 28, 2013, 2:46:26 PM5/28/13
to pyo-discuss
Thanks Jeremy, I know where to start looking now!

Olivier


2013/5/28 Jeremy Gray <jrg...@gmail.com>

Olivier Bélanger

unread,
Jun 11, 2013, 12:30:22 PM6/11/13
to pyo-discuss
Hi Jeremy,

It's fixed now in the sources. Bad integer rounding when allocating samples for the generated sound...

Olivier


2013/5/28 Olivier Bélanger <bela...@gmail.com>

Jeremy Gray

unread,
Jun 11, 2013, 12:42:39 PM6/11/13
to pyo-d...@googlegroups.com
thanks Olivier! much appreciated.

--Jeremy
Reply all
Reply to author
Forward
0 new messages