A generalized approach for model-based speaker-dependent single channel speech separation



In this paper, we present a new technique for separating two speech signals received from one microphone or one communication channel. In this special case, the separation problem is too ill-conditioned to be handled with common blind source separation techniques. The proposed technique is a generalized approach to model-based speaker-dependent single channel speech separation techniques in which a priori knowledge of the underlying speakers is used to separate speech signals. The proposed technique not only preserves the advantages of model-based speaker dependent single channel speech separation algorithms (i.e. high separability), but also is able to separate the speech signals of an unlimited number of speakers given the speakers' models (i.e. generality). The whole algorithm consists of three stages: classification, identification, and separation. The identities of speakers speech signals form the mixed signal are first determined at the classification and identification stages. Identified speakers' model is then used to separate the underlying signals using a novel approach consisting of Gaussian mixture modeling, maximum likelihood estimation and Wiener filtering. Evaluation results conducted on a database consisting of 100 mixed speech signals with target-to-interference ratios (TIR) ranging from -9 dB to +9 dB show significant performance improvements over those techniques which use a single model for separation.