This repository provides a VGGish model, implemented in Keras with tensorflow backend (since tf.slim
is deprecated, I think we should have an up-to-date interface). This repository is developed
based on the model for AudioSet.
For more details, please visit the slim version.
pip install vggish-keras
Weights will be downloaded the first time they are requested. You can also run python -m vggish_keras.download_helpers.download_weights
which will download them.
Basic - simple & efficient method:
import librosa
import numpy as np
import vggish_keras as vgk
# loads the model once and provides a simple function that takes in `filename` or `y, sr`
compute = vgk.get_embedding_function(hop_duration=0.25)
# model, pump, and sampler are available as attributes
compute.model.summary() # take a peak at the model
# compute from filename
Z, ts = compute(librosa.util.example_audio_file())
# compute from pcm
y, sr = librosa.load(librosa.util.example_audio_file())
Z, ts = compute(y=y, sr=sr)
Alternatives - using the under-the-hood helper functions:
# get the embeddings - WARNING: it instantiates a new model each time
Z, ts = vgk.get_embeddings(librosa.util.example_audio_file(), hop_duration=0.25)
# create model, pump, sampler once and pass to vgk.get_embeddings
# - more typing :'(
model, pump, sampler = vgk.get_embedding_model(hop_duration=0.25)
Z, ts = vgk.get_embeddings(
librosa.util.example_audio_file(),
model=model, pump=pump, sampler=sampler)
Manually, using the keras model and pump directly:
import librosa
import numpy as np
import vggish_keras as vgk
# define the model
pump = vgk.get_pump()
model = vgk.VGGish(pump)
sampler = vgk.get_sampler(pump)
# transform audio into VGGish embeddings
filename = librosa.util.example_audio_file()
X = np.concatenate([
x[vgk.params.PUMP_INPUT]
for x in sampler(pump(filename))])
Z = model.predict(X)
# calculate timestamps
ts = vgk.get_timesteps(Z, pump, sampler)
assert Z.shape == (13, 512)
-
Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017
-
Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017
I include a weight conversion script in download_helpers/convert_ckpt.py which shows how I converted the weights from .ckpt
to .h5
for those that are interested.
- currently, parameters (sample rate, hop size, etc) can be changed globally via
vgk.params
- I'd like to allow for parameter overrides to be passed tovgk.VGGish
- currently it relies on bmcfee/pumpp#123. Once merged, remove custom github install location