Demo page for listening to rendered violin samples.
Neural synthesis for musical instruments has the potential to revolutionize current practices that use concatenative synthesis and a sample library. However, most research focuses on piano synthesis and expressive performance generation; little work has been done on continuously articulated instruments like the violin, let alone rendering them with playing techniques and dynamics. We present VIOLET, a latent-diffusion framework for controllable violin synthesis, which uses a Diffusion Transformer (DiT) with rectified flow to synthesize high-fidelity audio from MIDI notes, playing techniques, and continuous dynamics. To train VIOLET, in addition to using a few existing datasets, we curate a new dataset named CSV-TD, which contains 39h of 48kHz synthetic audio and time-aligned annotations of MIDI notes, note-level techniques, and continuous dynamics curves. Objective and subjective evaluations show that VIOLET synthesizes violin performances with high technique adherence, accurate pitch and timing alignment, and good dynamics control. It outperforms the current state-of-the-art neural violin synthesis system and approaches a top commercial virtual instrument in terms of technique clarity, naturalness, and dynamics following.
Three excerpts from the CSV-TD test set, each rendered by five systems: a commercial Virtual Instrument (VI), VIOLET (Full), VIOLET (Synth), VIOLET (w/o Cond), and ViolinDiff. Each excerpt mixes multiple playing techniques within a single MIDI file; the technique active for each note is shown in the strip below the piano roll and highlighted in sync with playback. Dynamics curve is shown at the top.
MID_FiLD_0108
MID_FiLD_1547
MID_FiLD_4356
We present both single- and multi-technique samples drawn from our collected violin etudes evaluation set. The single-technique portion includes two excerpts for each of the six evaluated playing techniques, while the multi-technique portion features three excerpts.
Each file contains a single playing technique throughout (except for trill).
Sample 1
Sample 2
Sample 1
Sample 2
Sample 1
Sample 2
Sample 1
Sample 2
Sample 1
Sample 2
Sample 1
Sample 2
Each excerpt mixes multiple playing techniques. The sheet music score is shown for reference. Three renderings are provided: a commercial Virtual Instrument (VI), VIOLET, and ViolinDiff. ViolinDiff system is not conditioned on technique and dynamics. Note that the MIDI notation for a harmonic note is one octave lower than the actual note.
Concerto No. 1 in A Minor
Gavotte from “Mignon”
La Cinquantaine