Hypercinema Blog

Week 3 - Synthetic Media

The example I have chosen is not fully synthetic in the sense that we need to input material in order for the AI to output something, but I guess if we're pedantic we can argue that behind every artificially generated media is a huge set of input data from outside world made and collected by humans. The AI in and of itself has no idea what a cat is but through constant exposure to an enormous collection of cat images and after a set of procedures it can recognize a cat's image or generate one. This service is an AI tool that captures the pitch and volume of a random sample and transforms it into a piece performed by a certain instrument. It reads the tone and the volume of a sample and imitates what it would sound like if you were to play the same pitch on an instrument with the timbre and nuances of that instrument built in. The results are mesmerizing in that most often they sound like an instrument but not really?! There are fine details in a bird chirping or a person speaking that would be impossible to imitate on an instrument with rather limited capabilities but this tool helps us hear what it would sound like if we were to do that.

The team behind this project have used a technique called DDSP. Typical DSP deals with processing audio signals, in theory we can build any complex sound we hear using basic waveforms, synthesizers have been trying to do so for many years but because the structure of a natural sound can be so complicated, designing a realistic sound using a synthesizer with three oscillators and two filters and a basic ADSR is not possible, so they introduce DDSP, DSP on steroids, they've introduced a 10-minute sample of a musical instrument being played to a model and after iterations of training the model it was able to produce rather convincing instances of novel pieces with pitches not used in the original recording. The model uses different DSP modules to generate intricacies of an instrument, like the reverb of the room the original sample was performed in or the details of a hand or bow noises. This technology opens up a whole new range of possibilities that were never achievable before and is specially useful for composers and producers seeking a new and extraordinary sound to add to their palette of textures.

Personally, I can't see any ethical ramifications in this model. Maybe if the model gets forked and is trained to replicate the nuances of human speech, we can face a situation where it can be misused. As it stands now, it is quite impressing and can open new horizons in the world of sound design and synthetic media.

Nima Niazi