Skip to content

opendot/ml4a-soundcube

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

SOUNDCUBE

Soundcube

Concept

The project has two main objectives: the first is to use convolutional neural networks to classify audio samples according to a predefined set of categories, with a focus on environmental sounds. The second objective is to build a 3d immersive virtual environment where sounds are placed based on their similarity, allowing users to explore them and play them as he "walks" inside the space.

The first part of the project could be useful for sound designers: they could build their own dataset and obtain an automatic classification of new sounds they want to use. It could also be useful to listen to the environment and detect special events happening in the "soundscape" surrounding the user.

The second part of the project, on the other hand, can be especially useful for exhibitions, or for music performances. The way in which the 3d environment can be explored are many: for example, a composer could define a logic to walk inside the 3d space in a semi automated way, or devices like a Kinect or a Leap Motion could be used to move the focus on a particular zone in the environment.

Realization

We based our implementation on a paper by Karol J. Piczak, describing a suitable approach for what we wanted to do.

Architecture

The Architecture of the approach proposed in the paper

A dataset of approximately 4000 sounds divided in 4 categories (wood, water, fans and voices) was built in order to have a training set for the machine learning step. The idea is to use the last layer of the convnet as feature vector for each sound, and to use t-SNE in order to reduce the dimenions of the feature vector to 3 and be able to plot each audio sample on our 3d environment.

Results

While the sound classification is still a work in progress, we managed to build the 3d environment using OpenFrameworks and using t-SNE directly with some spectral features describing each audio chunk. The current ways to exlore it are either by using a mouse and hovering on the spheres to listen to the associated sound, or to send a set of X,Y and Z coordinates via OSC to position the listener in the space: the 6 nearest sounds will be heard according to their position, using ambisonics.

Team

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published