It's me again, back with another join question, where Slick appears to be generating slow queries (or, more likely, I'm using Slick incorrectly).
This time on Postgresql 9.6.3.
Here's my code:
val query = releases
.join(releaseBarcodes.filter(barcodeFilter)).on(_.sourceUri === _.sourceUri)
.joinLeft(releaseCats).on(_._1.sourceUri === _.sourceUri)
.joinLeft(releaseGenres).on(_._1._1.sourceUri === _.sourceUri)
.joinLeft(releaseTypes).on(_._1._1._1.sourceUri === _.sourceUri)
.joinLeft(releaseVotedTags).on(_._1._1._1._1.sourceUri === _.sourceUri)
.joinLeft(releaseImages).on(_._1._1._1._1._1.sourceUri === _.sourceUri)
.joinLeft(media).on(_._1._1._1._1._1._1.sourceUri === _.sourceUri)
.joinLeft(tracks).on{ case(soFar, trackTable) => soFar._2.map(_.id === trackTable.medium)}
.joinLeft(trackArtists).on{ case(soFar, trackArtistTable) => soFar._2.map(_.id === trackArtistTable.track)}
.joinLeft(trackComposers).on{ case(soFar, trackComposerTable) => soFar._1._2.map(_.id === trackComposerTable.track)}
.joinLeft(releaseUrls).on(_._1._1._1._1._1._1._1._1._1._1.sourceUri === _.sourceUri)
.joinLeft(releaseLabels).on(_._1._1._1._1._1._1._1._1._1._1._1.sourceUri === _.sourceUri)
/* linked cat no */
.joinLeft(releaseCats).on{ case(soFar, catNoTable) => soFar._1._1._1._1._1._1._1._1._1._1._2.map(_.cat === catNoTable.cat)}
.joinLeft(releases).on{ case(soFar, relTable) => soFar._2.map(_.sourceUri === relTable.sourceUri)}
val q = for {
((((((((((((((r, rb), rcat), rg), rt), rvt), rimg), rmed), rtrck), trckArtist), trckComp), rUrl), rLbl), rLinkedCats), rLinkedRelsCat) <- query
} yield (r.sourceUri, r.title, r.subtitle, r.artistName, r.country, r.releaseDate, r.asin, rg.map(_.genre), rt.map(_.`type`), rvt.map(vt => (vt.votedTag, vt.votes)), rimg, rmed, rtrck, trckArtist, trckComp, rUrl.map(_.url), rLbl.flatMap(_.name))//, rLinkedRelsCat.flatMap(_.sourceUri))
I attached the query that is created, because it's so large I don't want to interrupt the flow of the thread. Sorry, it's not formatted (anyone know a decent formatter? the online ones don't cope with these nested queries)...
Here's the explain output from Postgres:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Nested Loop Left Join (cost=15291.63..367830181335340285952.00 rows=8062002970089247211520 width=676) |
| Join Filter: CASE WHEN (x415."SOURCE_URI" IS NOT NULL) THEN ((x415."SOURCE_URI")::text = (x47."SOURCE_URI")::text) ELSE NULL::boolean END |
| -> Nested Loop Left Join (cost=15291.63..53031043002887.09 rows=1767529209075750 width=900) |
| Join Filter: CASE WHEN (CASE WHEN (x414."CAT" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN ((x414."CAT")::text = (x415."CAT")::text) ELSE NULL::boolean END |
| -> Nested Loop Left Join (cost=15291.63..5166730086.91 rows=172147962900 width=846) | | Join Filter: CASE WHEN (CASE WHEN (x297."ID" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN (x297."ID" = x258."TRACK") ELSE NULL::boolean END |
| -> Nested Loop Left Join (cost=15291.63..2291101.66 rows=71430690 width=816) |
| Join Filter: CASE WHEN (CASE WHEN (x297."ID" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN (x297."ID" = x213."TRACK") ELSE NULL::boolean END |
| -> Nested Loop Left Join (cost=15291.63..148094.28 rows=33270 width=786) |
| Join Filter: CASE WHEN (CASE WHEN (x330."ID" IS NULL) THEN NULL::integer ELSE 1 END IS NOT NULL) THEN (x330."ID" = x297."MEDIUM") ELSE NULL::boolean END |
| -> Nested Loop Left Join (cost=15291.63..146700.24 rows=4 width=724) |
| -> Nested Loop Left Join (cost=15291.22..146681.05 rows=4 width=654) |
| -> Nested Loop Left Join (cost=15290.80..146667.32 rows=4 width=542) |
| -> Nested Loop Left Join (cost=15290.25..146643.82 rows=4 width=496) |
| -> Nested Loop Left Join (cost=15289.69..146621.36 rows=4 width=448) |
| -> Hash Right Join (cost=15289.27..146602.54 rows=4 width=438) |
| Hash Cond: ((x355."SOURCE_URI")::text = (x410."SOURCE_URI")::text) |
| -> Seq Scan on "RELEASE_IMAGE" x355 (cost=0.00..111558.45 rows=5267945 width=120) |
| -> Hash (cost=15289.22..15289.22 rows=4 width=318) |
| -> Hash Right Join (cost=13996.67..15289.22 rows=4 width=318) |
| Hash Cond: ((x376."SOURCE_URI")::text = (x410."SOURCE_URI")::text) |
| -> Seq Scan on "RELEASE_VOTED_TAG" x376 (cost=0.00..1118.94 rows=46294 width=82)
|
| -> Hash (cost=13996.62..13996.62 rows=4 width=236) |
| -> Hash Right Join (cost=13934.61..13996.62 rows=4 width=236) |
| Hash Cond: ((x330."SOURCE_URI")::text = (x410."SOURCE_URI")::text) |
| -> Seq Scan on "MEDIUM" x330 (cost=0.00..53.00 rows=2400 width=83) |
| -> Hash (cost=13934.56..13934.56 rows=4 width=153) |
| -> Nested Loop (cost=0.56..13934.56 rows=4 width=153) |
| -> Seq Scan on "RELEASE_BARCODE" (cost=0.00..13900.21 rows=4 width=40) |
| Filter: (("BARCODE")::text = ANY ('{75992731324,075992731324,0075992731324}'::text[])) |
| -> Index Scan using "RELEASE_pkey" on "RELEASE" x410 (cost=0.56..8.58 rows=1 width=153) |
| Index Cond: (("SOURCE_URI")::text = ("RELEASE_BARCODE"."SOURCE_URI")::text) |
| -> Index Only Scan using "RELEASE_CAT_PK" on "RELEASE_CAT_NO" x414 (cost=0.41..4.70 rows=1 width=74) |
| Index Cond: ("SOURCE_URI" = (x410."SOURCE_URI")::text) |
| -> Index Only Scan using "RELEASE_GENRE_PK" on "RELEASE_GENRE" x409 (cost=0.56..5.61 rows=1 width=48) |
| Index Cond: ("SOURCE_URI" = (x410."SOURCE_URI")::text) |
| -> Index Only Scan using "RELEASE_TYPE_PK" on "RELEASE_TYPE" x394 (cost=0.56..5.83 rows=4 width=46) |
| Index Cond: ("SOURCE_URI" = (x410."SOURCE_URI")::text) |
| -> Index Only Scan using "RELEASE_URL_PK" on "RELEASE_URL" x165 (cost=0.41..3.41 rows=2 width=112) |
| Index Cond: ("SOURCE_URI" = (x410."SOURCE_URI")::text) |
| -> Index Scan using release_label_source_uri on "RELEASE_LABEL" x111 (cost=0.41..4.79 rows=1 width=134) |
| Index Cond: ((x410."SOURCE_URI")::text = ("SOURCE_URI")::text) |
| -> Materialize (cost=0.00..437.53 rows=16635 width=62) |
| -> Seq Scan on "TRACK" x297 (cost=0.00..354.35 rows=16635 width=62) |
| -> Materialize (cost=0.00..97.41 rows=4294 width=30) |
| -> Seq Scan on "TRACK_COMPOSER" x213 (cost=0.00..75.94 rows=4294 width=30) |
| -> Materialize (cost=0.00..110.30 rows=4820 width=30)
| -> Seq Scan on "TRACK_ARTIST" x258 (cost=0.00..86.20 rows=4820 width=30) |
| -> Materialize (cost=0.00..579.02 rows=20535 width=74) |
| -> Seq Scan on "RELEASE_CAT_NO" x415 (cost=0.00..476.35 rows=20535 width=74) |
| -> Materialize (cost=0.00..366235.13 rows=9122342 width=40) |
| -> Seq Scan on "RELEASE" x47 (cost=0.00..249354.42 rows=9122342 width=40) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
There are a lot of sequential scans going on there.
Rather than all the nested queries etc, here's how I'd rewrite by hand:
SELECT r."SOURCE_URI", r."TITLE", r."SUBTITLE", r."ARTIST_NAME", r."COUNTRY", r."RELEASE_DATE", r."ASIN", rgen."GENRE", rtype."TYPE", rtag."VOTED_TAG", rtag."VOTES", rimg."URI", rimg."WIDTH", rimg."HEIGHT", rimg."TYPE", rimg."SCORE", med."POSITION", med."TITLE", med."FORMAT", trck."POSITION", trck."TITLE", trck."DURATION", trckart."ARTIST", trckcomp."COMPOSER", rurl."URL", rlbl."NAME"
FROM "RELEASE" r
JOIN "RELEASE_BARCODE" rb on rb."SOURCE_URI"=r."SOURCE_URI" and rb."BARCODE" IN ('75992731324','075992731324','0075992731324')
LEFT JOIN "RELEASE_CAT_NO" rcat on rcat."SOURCE_URI"=r."SOURCE_URI"
LEFT JOIN "RELEASE_GENRE" rgen on rgen."SOURCE_URI"=r."SOURCE_URI"
LEFT JOIN "RELEASE_TYPE" rtype on rtype."SOURCE_URI"=r."SOURCE_URI"
LEFT JOIN "RELEASE_VOTED_TAG" rtag on rtag."SOURCE_URI"=r."SOURCE_URI"
LEFT JOIN "RELEASE_IMAGE" rimg on rimg."SOURCE_URI"=r."SOURCE_URI"
LEFT JOIN "MEDIUM" med on med."SOURCE_URI"=r."SOURCE_URI"
LEFT JOIN "TRACK" trck on trck."MEDIUM"=med."ID"
LEFT JOIN "TRACK_ARTIST" trckart on trckart."TRACK"=trck."ID"
LEFT JOIN "TRACK_COMPOSER" trckcomp on trckcomp."TRACK"=trck."ID"
LEFT JOIN "RELEASE_URL" rurl on rurl."SOURCE_URI"=r."SOURCE_URI"
LEFT JOIN "RELEASE_LABEL" rlbl on rlbl."SOURCE_URI"=r."SOURCE_URI"
LEFT JOIN "RELEASE_CAT_NO" rcatLinked on rcatLinked."CAT"=rcat."CAT"
LEFT JOIN "RELEASE" rLinkedCat on rLinkedCat."SOURCE_URI"=rcatLinked."SOURCE_URI"
The execution plan is far, far faster:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| QUERY PLAN |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Hash Right Join (cost=763.84..868.43 rows=106 width=500) |
| Hash Cond: (trckart."TRACK" = trck."ID") |
| -> Seq Scan on "TRACK_ARTIST" trckart (cost=0.00..86.20 rows=4820 width=30) |
| -> Hash (cost=762.52..762.52 rows=106 width=502) |
| -> Hash Right Join (cost=670.21..762.52 rows=106 width=502) |
| Hash Cond: (trckcomp."TRACK" = trck."ID") |
| -> Seq Scan on "TRACK_COMPOSER" trckcomp (cost=0.00..75.94 rows=4294 width=30) |
| -> Hash (cost=668.88..668.88 rows=106 width=488) |
| -> Nested Loop Left Join (cost=70.44..668.88 rows=106 width=488) |
| -> Nested Loop Left Join (cost=67.90..414.70 rows=15 width=458) |
| -> Nested Loop Left Join (cost=67.05..364.05 rows=4 width=418) |
| -> Nested Loop Left Join (cost=66.64..344.85 rows=4 width=352) |
| -> Nested Loop Left Join (cost=66.23..331.12 rows=4 width=308) |
| -> Nested Loop Left Join (cost=65.67..245.09 rows=4 width=232) |
| -> Nested Loop Left Join (cost=65.25..190.51 rows=4 width=218) |
| -> Nested Loop Left Join (cost=64.70..167.02 rows=4 width=212) |
| -> Nested Loop Left Join (cost=64.14..144.56 rows=4 width=204) |
| -> Hash Right Join (cost=63.72..125.73 rows=4 width=194) |
| Hash Cond: ((med."SOURCE_URI")::text = (r."SOURCE_URI")::text) |
| -> Seq Scan on "MEDIUM" med (cost=0.00..53.00 rows=2400 width=83) |
| -> Hash (cost=63.67..63.67 rows=4 width=153) |
| -> Nested Loop (cost=0.99..63.67 rows=4 width=153)
|
| -> Index Scan using "RELEASE_BARCODE_BARCODE_idx" on "RELEASE_BARCODE" rb (cost=0.42..29.32 rows=4 width=40) |
| Index Cond: (("BARCODE")::text = ANY ('{75992731324,075992731324,0075992731324}'::text[])) |
| -> Index Scan using "RELEASE_pkey" on "RELEASE" r (cost=0.56..8.58 rows=1 width=153) |
| Index Cond: (("SOURCE_URI")::text = (rb."SOURCE_URI")::text) |
| -> Index Only Scan using "RELEASE_CAT_PK" on "RELEASE_CAT_NO" rcat (cost=0.41..4.70 rows=1 width=74) |
| Index Cond: ("SOURCE_URI" = (r."SOURCE_URI")::text) |
| -> Index Only Scan using "RELEASE_GENRE_PK" on "RELEASE_GENRE" rgen (cost=0.56..5.61 rows=1 width=48) |
| Index Cond: ("SOURCE_URI" = (r."SOURCE_URI")::text) |
| -> Index Only Scan using "RELEASE_TYPE_PK" on "RELEASE_TYPE" rtype (cost=0.56..5.83 rows=4 width=46) |
| Index Cond: ("SOURCE_URI" = (r."SOURCE_URI")::text) |
| -> Index Scan using "RELEASE_VOTED_TAG_SOURCE_URI_idx" on "RELEASE_VOTED_TAG" rtag (cost=0.41..13.59 rows=5 width=82) |
| Index Cond: (("SOURCE_URI")::text = (r."SOURCE_URI")::text) |
| -> Index Scan using "RELEASE_IMAGE_SOURCE_URI_idx1" on "RELEASE_IMAGE" rimg (cost=0.56..21.42 rows=9 width=116) |
| Index Cond: (("SOURCE_URI")::text = (r."SOURCE_URI")::text) |
| -> Index Only Scan using "RELEASE_URL_PK" on "RELEASE_URL" rurl (cost=0.41..3.41 rows=2 width=112) |
| Index Cond: ("SOURCE_URI" = (r."SOURCE_URI")::text) |
| -> Index Scan using release_label_source_uri on "RELEASE_LABEL" rlbl (cost=0.41..4.79 rows=1 width=130) |
| Index Cond: (("SOURCE_URI")::text = (r."SOURCE_URI")::text) |
| -> Nested Loop Left Join (cost=0.85..12.65 rows=1 width=50) |
| -> Index Scan using release_cat_cat_no on "RELEASE_CAT_NO" rcatlinked (cost=0.29..4.66 rows=1 width=74) |
| Index Cond: (("CAT")::text = (rcat."CAT")::text) |
| -> Index Only Scan using "RELEASE_pkey" on "RELEASE" rlinkedcat (cost=0.56..7.99 rows=1 width=40) |
| Index Cond: ("SOURCE_URI" = (rcatlinked."SOURCE_URI")::text) |
| -> Bitmap Heap Scan on "TRACK" trck (cost=2.54..16.88 rows=7 width=62) |
| Recheck Cond: ("MEDIUM" = med."ID")
|
| -> Bitmap Index Scan on track_medium (cost=0.00..2.54 rows=7 width=0) |
| Index Cond: ("MEDIUM" = med."ID") |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
So am I writing my joinLeft's incorrectly?
Dan