ICAT performance

3 views

Skip to first unread message

Steve Fisher

unread,

Dec 18, 2012, 12:20:27 PM12/18/12

to icat-de...@googlegroups.com

Hi,

I have just discovered that the JPA generated indices are not quite as they should be. Specifically, a datafile has an index on the DB columns: "NAME", "LOCATION", "DATASET_ID" because of the uniqueness constraint on these fields. This is useful for searches on name or name and location or for all three but it is useless for searching on dataset_id which is exactly what is required when finding the set of datafiles in a dataset. Consequently the database has to scan all entries which takes us about 4 seconds for 2 million entries. In the short term I have manually indexed this column ( a non-unique index of course) and for the next release I will ensure that the uniqueness constraints are ordered to produce the most useful indices. In addition eclipselink provides an @Index annotation which I will use to provide any extra indices that might appear top be useful. In the case above, had I specified the ordering: "DATASET_ID, "NAME", "LOCATION" all would have been well. In addition I have already proposed to remove location from this particular key. Fortunately the orderings for datasetparameters and datafile parameters are fine. They are not good for finding the datasets in an investigation but this is less of a problem as the number of datasets is lower than that of datafiles, so the one index I suggest above should be the main thing that is needed.