Problem about speed with acoustid-index

118 views
Skip to first unread message

chili...@gmail.com

unread,
Sep 25, 2015, 4:18:19 AM9/25/15
to AcoustID

hello Acoustid,


I using acoustid-index to find candidate tracks, entire stored 3M tracks. I had the problem about speed, it is very slow, 20-50 seconds per query. Does anyone have a solution to solved the problem?.


Thanks,


LinhLC,

Lukáš Lalinský

unread,
Sep 25, 2015, 4:51:42 AM9/25/15
to Acoustid
On Fri, Sep 25, 2015 at 10:18 AM, <chili...@gmail.com> wrote:

I using acoustid-index to find candidate tracks, entire stored 3M tracks. I had the problem about speed, it is very slow, 20-50 seconds per query. Does anyone have a solution to solved the problem?.

I can't really help without knowing what's in the index, how are you searching in it, what kind of hardware are you running it on, etc.

Based on your previous post here, I assume that you are not using the rest of the AcoustID, so I'd really need to know how are you indexing the fingerprints and how are you searching in them.

Lukas

chili...@gmail.com

unread,
Sep 25, 2015, 7:05:30 AM9/25/15
to AcoustID
Hi Lukas,

I have a storage capacity of 3 million songs, my goal is to be able to identify duplicate song, system using:
- chromaprint-fpcalc
- acoustid-index (index full raw integer arrays from fpcalc)
- matching (using match_fingerprints2 function in acoustid_compare.c)

Result: query with ~60 seconds of song (~450 integer arrays), very slow on step acoustid-index server find cadidate songs before matching step, 20-50 seconds per query.

My server info "HP G8, 32g ram, 24 cores, 2x300 SAS 10k raid 1"

P/s: I have read the topic "https://groups.google.com/forum/#!topic/acoustid/C3EHIkZVpZI", you can search in 7M fingerprints in about 1ms, that's really fast.

Thanks,

LinhLC,

Vào 15:51:42 UTC+7 Thứ Sáu, ngày 25 tháng 9 năm 2015, Lukáš Lalinský đã viết:

Lukáš Lalinský

unread,
Sep 25, 2015, 7:33:30 AM9/25/15
to Acoustid
On acoustid.org I use the index a little differently, which I think is a big reason why it's so slow for you. I index only a small part (15 seconds) of each song. The part of the song to be indexes is selected by the acoustid_extract_query function in acoustid_compare.c. I don't even index the full hashes, I strip some bits from them. Here is a more detailed explanation of that: https://oxygene.sk/2011/12/inside-the-acoustid-server/

The current index at acoustid.org has 30M fingerprints in it and takes 16G on disk. I store the index on SSD disks (currently RAID1 with two SSDs), but I also make sure that the entire index is cached in RAM. Having the whole index in RAM is critical. If you don't have enough RAM dedicated just for disk caches, it will be super slow. Just as an example, if I need to restart the index, I absolutely can't put it directly to production because it would explode. I need to wam-up the disk caches using a script like this:


You should see the fpi-index process using as much virtual memory as it takes to store the index files on disk.

I don't know the current timing for just the index lookups on acoustid.org, but the entire search request, including matching, selecting metadata from PostgreSQL and preparing the JSON/XML response takes 50ms on average and most of that time is spent in PostgreSQL getting MusicBrainz data and then in Python formatting them. So I expect acoustid-index lookups are still below 10ms.

I suggest you have a look at some system statistics. If you see it using too much CPU, you most likely have it compiled in debug mode. If you see it doing too many disk reads, the index is not cached. I'm attaching some statistics from my monitoring, so that you have something to compare against.

Lukas



--
You received this message because you are subscribed to the Google Groups "AcoustID" group.
To unsubscribe from this group and stop receiving emails from it, send an email to acoustid+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

cpu-month.png
memory-month.png
md1-month.png
acoustid_lookup_time-month.png

chili...@gmail.com

unread,
Sep 27, 2015, 9:07:08 AM9/27/15
to AcoustID
Hi Lukas, 

I was very grateful for your long and detailed answer and fast response.

I will try to index sub-fingerprint with 15 seconds (the part between 0:10 and 0:25).

Thanks,

LinhLC.

Vào 18:33:30 UTC+7 Thứ Sáu, ngày 25 tháng 9 năm 2015, Lukáš Lalinský đã viết:
Reply all
Reply to author
Forward
0 new messages