Jira (PDB-5253) Investigate if we can we add indexes on nested facts

10 views
Skip to first unread message

Bogdan Irimie (Jira)

unread,
Aug 25, 2021, 4:06:03 AM8/25/21
to puppe...@googlegroups.com
Bogdan Irimie created an issue
 
PuppetDB / Task PDB-5253
Investigate if we can we add indexes on nested facts
Issue Type: Task Task
Assignee: Unassigned
Created: 2021/08/25 1:05 AM
Priority: Normal Normal
Reporter: Bogdan Irimie
Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Bogdan Irimie (Jira)

unread,
Aug 25, 2021, 4:07:02 AM8/25/21
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Aug 25, 2021, 4:07:03 AM8/25/21
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Aug 26, 2021, 3:16:03 AM8/26/21
to puppe...@googlegroups.com

Andrei Filipovici (Jira)

unread,
Aug 26, 2021, 6:37:03 AM8/26/21
to puppe...@googlegroups.com
Andrei Filipovici commented on Task PDB-5253
 
Re: Investigate if we can we add indexes on nested facts

Used this query with explain analyze for the nested fact os.lsb.distid:

EXPLAIN analyse select (stable||volatile->'os'->'lsb'->>'distid') from factsets where ((stable||volatile)->'os'->'lsb'->>'distid') ~ '.*MND';

The result shows it performed a sequencial scan:

Seq Scan on factsets  (cost=0.00..50.22 rows=72 width=32) (actual time=1.850..34.532 rows=1 loops=1)
  Filter: (((((stable || volatile) -> 'os'::text) -> 'lsb'::text) ->> 'distid'::text) ~ '.*MND'::text)
  Rows Removed by Filter: 999
Planning time: 3.502 ms
Execution time: 34.757 ms

After that I created an index for that fact with:

CREATE INDEX dist_id_values_idx ON factsets USING GIN (((stable||volatile)->'os'->'lsb'->>'distid') gin_trgm_ops);

Running the same explain analyse query resulted:

Bitmap Heap Scan on factsets  (cost=8.56..37.90 rows=72 width=32) (actual time=0.474..0.476 rows=1 loops=1)
  Recheck Cond: (((((stable || volatile) -> 'os'::text) -> 'lsb'::text) ->> 'distid'::text) ~ '.*MND'::text)
  Heap Blocks: exact=1
  ->  Bitmap Index Scan on dist_id_values_idx  (cost=0.00..8.54 rows=72 width=0) (actual time=0.355..0.355 rows=1 loops=1)
        Index Cond: (((((stable || volatile) -> 'os'::text) -> 'lsb'::text) ->> 'distid'::text) ~ '.*MND'::text)
Planning time: 27.009 ms
Execution time: 0.764 ms

It shows that the index is used.

Andrei Filipovici (Jira)

unread,
Aug 26, 2021, 7:08:03 AM8/26/21
to puppe...@googlegroups.com

We can use the nested index from PDB if we use the dotted projections on the inventory endpoint. An example would be:

["~", "facts.os.lsb.distid", "MND.*"]

The query plan for the above query is in the attachement.

If we use the fact-contents endpoint with a query like this:

["and", ["=", "path", ["os", "lsb", "distid"]], ["~", "value", "MND.*"]]

the index will not be used.

Andrei Filipovici (Jira)

unread,
Aug 26, 2021, 7:09:03 AM8/26/21
to puppe...@googlegroups.com
Andrei Filipovici updated an issue
 
Change By: Andrei Filipovici
Attachment: query_plan_local.json
Reply all
Reply to author
Forward
0 new messages