Hello,
I am trying to identificate a short audio inside a longer one. I use librosa to extract the stft and perform a cross similarity.
When the audios match, I get the "diagonal" line in the plot as the documentation show.
Now I need to calculate the score of it to take a decision if the code assume that there is a match or not. So how can I calculate it after using cross_similarity?
Probably
not a question related directly to librosa but I've spend a lot of time
trying to do it, as I am learning AI/ML, starting with librosa and
numpy. Maybe I am doing it wrong.
Thank you for any help or pointing direction.
code:
import librosa
import librosa.display
import matplotlib.pyplot as plt
# Load ad and record
y_ad, sr_ad = librosa.load('ad.mp3')
y_rec, sr_rec = librosa.load('rec.mp3')
# Offset for matching
ad_offset = tuple([slice(None), slice(*list(librosa.time_to_frames([20, 25])))])
rec_offset = tuple([slice(None), slice(*list(librosa.time_to_frames([35, 40])))])
# Offset NOT matching
rec_offset_fail = tuple([slice(None), slice(*list(librosa.time_to_frames([50, 55])))])
# Chroma stft
ad_chroma = librosa.feature.chroma_stft(y=y_ad)
rec_chroma = librosa.feature.chroma_stft(y=y_rec)
fig, ax = plt.subplots(nrows=2)
fig.set_size_inches(20, 5)
fig.set_facecolor("white")
ax[0].set_title("ad")
ax[1].set_title("rec")
librosa.display.specshow(ad_chroma[ad_offset], y_axis="chroma", x_axis="time", ax=ax[0])
librosa.display.specshow(rec_chroma[rec_offset], y_axis="chroma", x_axis="time", ax=ax[1])
plt.show()
# Cross similarity
sim = librosa.segment.cross_similarity(ad_chroma[ad_offset], rec_chroma[rec_offset], k=5)
sim_fail = librosa.segment.cross_similarity(ad_chroma[ad_offset], rec_chroma[rec_offset_fail], k=5)
fig, ax = plt.subplots(ncols=2)
fig.set_facecolor("white")
fig.set_size_inches(10, 5)
ax[0].set_title("Match OK")
librosa.display.specshow(sim, ax=ax[0])
ax[1].set_title("Match FAIL")
librosa.display.specshow(sim_fail, ax=ax[1])
plt.show()