top of page

Time-Shifting based Primary-Ambient Extraction for Spatial Audio Reproduction

 

Publications:

[1] J. He, E. L. Tan, and W. S. Gan, “Time-shifted principal component analysis based cue extraction for stereo audio signals,” in Proc. ICASSP, Vancouver, Canada, 2013, pp. 266-270.

[2] J. He, E. L. Tan, and W. S. Gan, “Time-shifting based primary-ambient extraction for spatial audio reproduction,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 10, pp. 1576-1588, Oct. 2015.

In spatial audio analysis and reproduction, one of the key issues is to decompose a signal into primary and ambient components based on their different spatial features, where primary components are directional and ambient components are diffuse. Principal component analysis (PCA) has been widely employed to extract primary and ambient components from stereo signals which are often modeled as a linear mixture of primary and ambient components. However, the performance of PCA based primary ambient extraction (PAE) is limited to the ideal case, where all assumptions of the signal model are satisfied, and its performance is not well studied in the non-ideal cases. In this paper, we investigate the performance degradation of PCA based PAE in primary-complex case, one of most frequently encountered non-ideal cases in practice, where the primary component is partially correlated at zero lag. To alleviate the performance degradation in this case, the time-shifted PCA (SPCA) based PAE is proposed in this paper. This approach involves time shifting the input signal according to the estimated inter-channel time difference (ICTD) of the primary component prior to the PCA decomposition in PAE. Based on the results from our experiments, the proposed SPCA based PAE approach is found to be superior to the conventional PCA based PAE in terms of extraction accuracy and spatial accuracy.
 
Below are some test tracks to compare the conventional PCA and proposed SPCA.

Input Signal

Type 1

(Speech+Wave sound, 8s)

PCA

SPCA

Primary

Ambient

Observation

Speech: clearer and localization more accurate;

Wave sound: cleaner

Type 2

(MatchBox+Wave sound, 10s)

Primary

The localization of the shaking matchbox sound is more accurate.

bottom of page