seems unnecessary INNER JOIN on tsearch

30 views
Skip to first unread message

Leonid Morozov

unread,
Sep 13, 2016, 12:54:24 PM9/13/16
to Case Commons Development
Hello,

I have a question about unnecessary INNER JOIN in query

```
EXPLAIN
SELECT "resources_search_view".*
FROM "resources_search_view"
INNER JOIN (
  SELECT "resources_search_view"."id" AS pg_search_id, (ts_rank(("resources_search_view"."tsv_text"), (to_tsquery(?, ? || ? || ?)), ?)) AS rank
  FROM "resources_search_view"
  WHERE ((("resources_search_view"."tsv_text") @@ (to_tsquery(?, ? || ? || ?))))) AS pg_search_32fe59e9a0bdc99fb16185
ON "resources_search_view"."id" = pg_search_32fe59e9a0bdc99fb16185.pg_search_id
WHERE xxx
ORDER BY pg_search_32fe59e9a0bdc99fb16185.rank DESC
LIMIT 25 OFFSET 0
;


                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=11050.19..11050.21 rows=25 width=2047)
   ->  Sort  (cost=11050.19..11050.44 rows=491 width=2047)
         Sort Key: (ts_rank(resources_search_view_1.tsv_text, '''cnncnntitl'''::tsquery, 1)) DESC
         ->  Nested Loop  (cost=11.19..11047.42 rows=491 width=2047)
               ->  Bitmap Heap Scan on resources_search_view resources_search_view_1  (cost=11.10..3783.18 rows=2000 width=240)
                     Recheck Cond: (tsv_text @@ '''cnncnntitl'''::tsquery)
                     ->  Bitmap Index Scan on idx_tsv_text_view_search  (cost=0.00..11.00 rows=2000 width=0)
                           Index Cond: (tsv_text @@ '''cnncnntitl'''::tsquery)
               ->  Index Scan using idx_view_id on resources_search_view  (cost=0.09..3.63 rows=1 width=1811)
                     Index Cond: (id = resources_search_view_1.id)
                     Filter: (embeddable AND ((resource_type)::text <> 'Question'::text) AND (license = 0))
(11 rows)
```

VS 

```
EXPLAIN
SELECT "resources_search_view".*,
(ts_rank(("resources_search_view"."tsv_text"), (to_tsquery(?, ? || ? || ?)), ?)) AS rank
FROM "resources_search_view"
WHERE xxx AND ((("resources_search_view"."tsv_text") @@ (to_tsquery(?, ? || ? || ?))))
ORDER BY rank DESC,
LIMIT 25 OFFSET 0
;


                                                                                                                               QUERY PLAN                                                                                                                               
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3797.46..3797.48 rows=25 width=2047)
   ->  Sort  (cost=3797.46..3798.46 rows=2000 width=2047)
         Sort Key: (ts_rank(tsv_text, '''cnncnntitl'''::tsquery, 1)) DESC, (CASE WHEN (array_length(common_standard_ids_confirmed, 1) > 0) THEN 1 ELSE 0 END) DESC, calc_avg_effectiveness DESC NULLS LAST, featured DESC, license, calc_avg_rating DESC NULLS LAST, id
         ->  Bitmap Heap Scan on resources_search_view  (cost=11.10..3786.18 rows=2000 width=2047)
               Recheck Cond: (tsv_text @@ '''cnncnntitl'''::tsquery)
               ->  Bitmap Index Scan on idx_tsv_text_view_search  (cost=0.00..11.00 rows=2000 width=0)
                     Index Cond: (tsv_text @@ '''cnncnntitl'''::tsquery)
(7 rows)
```

So, what's the point to have unnecessary INNER JOIN in query?
Is there any way to have queries without INNER JOIN?

There's the scope we're using:

```
    pg_search_scope :search_descriptive, lambda { |params|
      {
        :against => [:title, :description],
        :using => {
          :tsearch => {
            :dictionary => "english",
            :tsvector_column => ["tsv_text"],
            :any_word => params[:any_word],
          }
        },
        :query => params[:descriptive]
      }
    }
```

Thank you.

- Leonid

Grant Hutchins

unread,
Sep 13, 2016, 12:59:26 PM9/13/16
to Case Commons Development
I noticed that the first query you show returns 11 results and the second returns 7 results. I'd make sure that you're not changing what the query actually returns.

One reason for the INNER JOIN is so that we can figure out the rank for each record without having to add it to the SELECT clause. It's possible to generate queries where adding that to the SELECT would break things.

I'd be interested in anything that you figure out. I'm also open to pull requests that change the underlying query structure while keeping all the tests passing. We have pretty good coverage of some edge cases, so that might help make the motivations behind the current implementation more clear.

Grant

--

---
You received this message because you are subscribed to the Google Groups "Case Commons Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to casecommons-d...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Grant Hutchins
Principal Software Developer
Pivotal Labs, Austin

leo...@opened.io

unread,
Sep 13, 2016, 10:07:28 PM9/13/16
to Case Commons Development
Grant,

11 vs 7 rows is number of rows in EXPLAIN QUERY PLAN in both cases:
11 rows -- with INNER JOIN
7 rows -- without INNER JOIN (rank in SELECT clause)

Also you can see that the total cost in case with INNER JOIN is ~11050, without INNER JOIN is ~3797

Looks like 3 times faster.

I got the idea about to make a PR.
Could you please provide any details about WHY rank in SELECT clause may break the things?
We may configure the rank name in scope options to make it unique (resolve conflicts).
Is there any other potential issues?

Thank you.

- Leonid
Reply all
Reply to author
Forward
0 new messages