Since the late 2010s, the separation of sound sources, and particularly that of music sources (MSS), is a hot topic in audio processing using artificial intelligence. In 2019 the first machine learning algorithms dedicated to this were published (e.g. Spleeter from Deezer Research), which were able to properly infer up to three different music sources (vocals, drums and bass). The The Beatles: Get Back documentary by Peter Jackson, released in 2021, spread this technology on a large scale, as it was widely used in its post-production. Since then, hundreds of algorithms have been developed over a dozen architectures. Most require their training with large volumes of data.
A selection of more than 220 sound source separation algorithms over different architectures was used:
In order to separate voices from instruments and other sounds, a two-stream neural network for hybrid spectrogram/waveform analysis was mostly used, while for the separation of brasses/horns/winds, between voices or for the discrimination of ambient noises (such as live audience), reverb and echo, a magnitude spectrogram analysis neural network was used. To separate drums (and their elements), bass, guitars and piano, different implementations of a four-fold neural network for hybrid spectrogram/waveform analysis were mostly used. For the separation of strings, a multi-scale, four-frequency band dense network was used. In order to separate synthesizers or guitars of different types, a twelve-layer U-net was used. For the separation of exotic instruments, methodologies based on shallow training were used.
All the algorithms were optimized and partially re-trained by Wilki Amieva at Hg Prods (Buenos Aires) using their own database, with more than 250 GB of excerpts from high definition mixes and isolated tracks.
The virtual mixer has a main body, where the input channels are grouped, with their controls and displays. To its right are two level indicator needles, one for each output channel. Below that are the clock and playback controls. To access all the mixer features, it is necessary to access it from a computer (not from a mobile device).
Each input channel has three buttons and a knob at the top, a fader that also shows its level, and its name at the bottom.
The 'M' (mute) button mutes the track, the 'S' (solo) mutes the rest (unless they already have the respective button pressed), and the 'PFL' (pre-fader level) makes the level display does not take into account the position (or movement) of the fader. The buttons light up in sky blue when in 'on' position. The mixer loads with all buttons in the 'off' position.
The knob controls the pan position of the channel, from full left (L) to full right (R). The mixer loads with all knobs in their center position.
The faders are used to individually regulate the level of each channel, from the minimum (silence) to the maximum, down to up. The mixer loads with all the faders at their middle levels.
Next to the fader you can see the level of the corresponding channel, in sky blue color (peaks are shown in yellow). This level takes into account the position of the fader (unless the 'PFL' button is pressed, see below).
Tracks are named after abbreviations or acronyms commonly used in music production jargon. Some examples:
Main vocals: Vx, Voz
Backing vocals: BVs, Coros
Brasses/Horns/Winds: Brass, Horn, Wind, Caños
Piano: Pno
Keyboards/Sinthesizers: Keys, KB, Synths, Tecla, Sinte
Guitars: Gtr, Viola (acoustic guitar: AcGtr, electric guitar: EGtr)
Bajo: Bs, Bajo
Drums: Drms, Bata
Effects (reverb, echo): FX
In case the rhythm section is separated into parts:
Cymbals: Platos, Platillos
Hi-hat: HH
Snare: Snr, Redo, Tacho
Applauses: Claps
Toms/Timbales: Toms
Kick drum: Kick, KD, BD, Bombo
Percussion: Perc
The volume unit (VU) meters show the output level for the left (L) and (R) channeles, up and down respectively.
Shows the playing time, within 1/100 of a second.
They are used to start/pause playback, do rewind or fast forward (in 10-second intervals), or restart playback.