arXiv:2512.15532v1 Announce Type: cross 
Abstract: In this paper we propose a conditioned UNet for Music Source Separation (MSS). MSS is generally performed by multi-output neural networks, typically UNets, with each output representing a particular stem from a predefined instrument vocabulary. In contrast, conditioned MSS networks accept an audio query related to a stem of interest alongside the signal from which that stem is to be extracted. Thus, a strict vocabulary is not required and this enables more realistic tasks in MSS. The potential of conditioned approaches for such tasks has been somewhat hidden due to a lack of suitable data, an issue recently addressed with the MoisesDb dataset. A recent method, Banquet, employs this dataset with promising results seen on larger vocabularies. Banquet uses Bandsplit RNN rather than a UNet and the authors state that UNets should not be suitable for conditioned MSS. We counter this argument and propose QSCNet, a novel conditioned UNet for MSS that integrates network conditioning elements in the Sparse Compressed Network for MSS. We find QSCNet to outperform Banquet by over 1dB SNR on a couple of MSS tasks, while using less than half the number of parameters.

تم اقتراح بنية UNet مشروطة جديدة لفصل مصادر الموسيقى (MSS)، مما يسمح باستخراج جذوع صوتية محددة بناءً على استعلام صوتي، وبالتالي القضاء على الحاجة إلى مفردات صارمة للأدوات. تعتمد هذه الطريقة على مجموعة البيانات التي تم تطويرها مؤخرًا MoisesDb لتعزيز واقعية مهام MSS.

Se ha propuesto una nueva arquitectura UNet condicionada para la separación de fuentes musicales (MSS), que permite la extracción de tallos de audio específicos basados en una consulta de audio, eliminando así la necesidad de un vocabulario de instrumentos estricto. Este enfoque aprovecha el conjunto de datos recientemente desarrollado MoisesDb para mejorar el realismo de las tareas de MSS.

Une nouvelle architecture UNet conditionnée a été proposée pour la séparation des sources musicales (MSS), permettant l'extraction de tiges audio spécifiques en fonction d'une requête audio, éliminant ainsi le besoin d'un vocabulaire d'instruments strict. Cette approche s'appuie sur le jeu de données récemment développé MoisesDb pour améliorer le réalisme des tâches MSS.

A novel conditioned UNet architecture has been proposed for Music Source Separation (MSS), allowing for the extraction of specific audio stems based on an audio query, thus eliminating the need for a strict instrument vocabulary. This approach leverages the recently developed MoisesDb dataset to enhance the realism of MSS tasks.

A Conditioned UNet for Music Source Separation

Was this article worth reading? Share it

LucidQuery AI

Airparser

Mubert

Music Maker AI

Unifab

Sound Of Meme

Ready to build your own newsroom?