Process some audio

After an audio context has been created with no errors, audio can be processed using either process() or processOffline().

These functions take an input and an output float32 array as parameters and populate the output array in place. This approach allows more optimal buffering and memory management within the sdk.

The key difference is that processOffline() is intended for processing a single large buffer of audio, for example the contents of a file. The process() API is intended for processing multiple small sequential segments of an audio stream.

Note

The processed audio may be subject to an initial frame offset, meaning that output samples may be delayed from the original input samples. There are a number of factors that can contribute to this offset. These can include buffer size, the resampling ratio if resampling is required, and the models used in the processing chain.

processOffline() will account for this delay and make a best effort to align the output audio with the input audio.

process() is suitable for use in a streaming scenario, and may return leading zeros.

When streaming, the most optimal approach is to create an input and output buffer up front and repeatedly populate the input buffer, call process and read from the output buffer. If the audio data is already in a numpy array then this can be used directly assuming the size is correct.

The best buffer size to use is dependent on the processing chain set up in the audio context request.

// Process a single block
std::vector<float> input(config.bufferLength, 1.0f);  // vector filled with 1s
std::vector<float> output(config.bufferLength, 0.0f); // vector filled with 0s

Iris::process(config.contextId, input.data(), output.data(), config.bufferLength);
// output now contains the processed audio block

# Process a single block
input = np.ones(config.buffer_length, dtype=np.float32)     # array filled with 1s
output = np.zeros(config.buffer_length, dtype=np.float32)   # array filled with 0s
iris.sdk.process(config.context_id,input,output,config.buffer_length)
# output now contains the processed audio block

// Process a single block
const input = new Float32Array(config.bufferLength).fill(1);     // array filled with 1s
const output = new Float32Array(config.bufferLength).fill(0);     // array filled with 0s
iris.process(config.contextId, input, output, config.bufferLength);
// output now contains the processed audio block

The last parameter to process() is the size in frames of the audio data to process. A frame is defined as the number of individual samples multiplied by the number of channels. For multi channel audio the expected data format is interleaved, eg [L0,R0,L1,R1,L2,R2….]

If the sdk client is able to provide a consistent buffer size on each call to process, then the fixed frame count attribute should be set to true in the audio context request. This is the default value. When receiving audio from some audio interfaces or hardware drivers the amount of audio may fluctuate slightly. If this is the case fixed frame count should be set to false.

If the context is given less data than it requires to successfully run then the data will be buffered internally.

Note

process() is not intended to be used for large buffers of audio. Instead it should be called in a loop as each block of audio becomes available from the stream. When processing a file break it into blocks corresponding to the buffer length in the audio context config. See the example code for more on this. A typical buffer size to use is around 480 samples for 48kHz audio, or 256 samples for 16kHz audio. Passing much larger buffers will not increase processing speed but will incur a larger memory overhead.

Some processors incur a transport delay when processing data. This means that the samples in the output may be offset from the corresponding samples in the input. The frame offset attribute in audio context config describes how many frames the output will be delayed by. This value is constant for a given context and may be 0 or higher.

In all cases the sizes of the input and output buffers should be equal.

Warning

Don’t cross the streams!!!

Each context is valid only for one contiguous stream of audio input. When processing multiple input sources at once make sure to create an audio processing context for each input source and pass the appropriate context id to process().

Similarly do not repeatedly create a new processing context for each block of audio. Mixing and matching input audio blocks within the same context will significantly reduce output quality.

create a context
process all the audio in blocks (until the end of the file or stream)
discard the context