seems unnecessary INNER JOIN on tsearch

Leonid Morozov

unread,

Sep 13, 2016, 12:54:24 PM9/13/16

to Case Commons Development

Hello,

I have a question about unnecessary INNER JOIN in query

```

EXPLAIN

SELECT "resources_search_view".*

FROM "resources_search_view"

INNER JOIN (

SELECT "resources_search_view"."id" AS pg_search_id, (ts_rank(("resources_search_view"."tsv_text"), (to_tsquery(?, ? || ? || ?)), ?)) AS rank

FROM "resources_search_view"

WHERE ((("resources_search_view"."tsv_text") @@ (to_tsquery(?, ? || ? || ?))))) AS pg_search_32fe59e9a0bdc99fb16185

ON "resources_search_view"."id" = pg_search_32fe59e9a0bdc99fb16185.pg_search_id

WHERE xxx

ORDER BY pg_search_32fe59e9a0bdc99fb16185.rank DESC

LIMIT 25 OFFSET 0

;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------

Limit (cost=11050.19..11050.21 rows=25 width=2047)

-> Sort (cost=11050.19..11050.44 rows=491 width=2047)

Sort Key: (ts_rank(resources_search_view_1.tsv_text, '''cnncnntitl'''::tsquery, 1)) DESC

-> Nested Loop (cost=11.19..11047.42 rows=491 width=2047)

-> Bitmap Heap Scan on resources_search_view resources_search_view_1 (cost=11.10..3783.18 rows=2000 width=240)

Recheck Cond: (tsv_text @@ '''cnncnntitl'''::tsquery)

-> Bitmap Index Scan on idx_tsv_text_view_search (cost=0.00..11.00 rows=2000 width=0)

Index Cond: (tsv_text @@ '''cnncnntitl'''::tsquery)

-> Index Scan using idx_view_id on resources_search_view (cost=0.09..3.63 rows=1 width=1811)

Index Cond: (id = resources_search_view_1.id)

Filter: (embeddable AND ((resource_type)::text <> 'Question'::text) AND (license = 0))

(11 rows)

```

VS

```

EXPLAIN

SELECT "resources_search_view".*,

(ts_rank(("resources_search_view"."tsv_text"), (to_tsquery(?, ? || ? || ?)), ?)) AS rank

FROM "resources_search_view"

WHERE xxx AND ((("resources_search_view"."tsv_text") @@ (to_tsquery(?, ? || ? || ?))))

ORDER BY rank DESC,

LIMIT 25 OFFSET 0

;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Limit (cost=3797.46..3797.48 rows=25 width=2047)

-> Sort (cost=3797.46..3798.46 rows=2000 width=2047)

Sort Key: (ts_rank(tsv_text, '''cnncnntitl'''::tsquery, 1)) DESC, (CASE WHEN (array_length(common_standard_ids_confirmed, 1) > 0) THEN 1 ELSE 0 END) DESC, calc_avg_effectiveness DESC NULLS LAST, featured DESC, license, calc_avg_rating DESC NULLS LAST, id

-> Bitmap Heap Scan on resources_search_view (cost=11.10..3786.18 rows=2000 width=2047)

Recheck Cond: (tsv_text @@ '''cnncnntitl'''::tsquery)

-> Bitmap Index Scan on idx_tsv_text_view_search (cost=0.00..11.00 rows=2000 width=0)

Index Cond: (tsv_text @@ '''cnncnntitl'''::tsquery)

(7 rows)

```

So, what's the point to have unnecessary INNER JOIN in query?

Is there any way to have queries without INNER JOIN?

There's the scope we're using:

```

pg_search_scope :search_descriptive, lambda { |params|

{

:against => [:title, :description],

:using => {

:tsearch => {

:dictionary => "english",

:tsvector_column => ["tsv_text"],

:any_word => params[:any_word],

}

},

:query => params[:descriptive]

}

```

Thank you.

- Leonid

Grant Hutchins

unread,

Sep 13, 2016, 12:59:26 PM9/13/16

to Case Commons Development

I noticed that the first query you show returns 11 results and the second returns 7 results. I'd make sure that you're not changing what the query actually returns.

One reason for the INNER JOIN is so that we can figure out the rank for each record without having to add it to the SELECT clause. It's possible to generate queries where adding that to the SELECT would break things.

I'd be interested in anything that you figure out. I'm also open to pull requests that change the underlying query structure while keeping all the tests passing. We have pretty good coverage of some edge cases, so that might help make the motivations behind the current implementation more clear.

Grant

--

---
You received this message because you are subscribed to the Google Groups "Case Commons Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casecommons-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Grant Hutchins

Principal Software Developer

Pivotal Labs, Austin

ghut...@pivotal.io

512-769-7909

leo...@opened.io

unread,

Sep 13, 2016, 10:07:28 PM9/13/16

to Case Commons Development

Grant,

11 vs 7 rows is number of rows in EXPLAIN QUERY PLAN in both cases:

11 rows -- with INNER JOIN
7 rows -- without INNER JOIN (rank in SELECT clause)

Also you can see that the total cost in case with INNER JOIN is ~11050, without INNER JOIN is ~3797

Looks like 3 times faster.

I got the idea about to make a PR.
Could you please provide any details about WHY rank in SELECT clause may break the things?
We may configure the rank name in scope options to make it unique (resolve conflicts).
Is there any other potential issues?