Jira (PDB-5218) Investigate query performance for "ESTATE - nodes with specific operating system"

12 views
Skip to first unread message

Bogdan Irimie (Jira)

unread,
Jul 29, 2021, 9:39:03 AM7/29/21
to puppe...@googlegroups.com
Bogdan Irimie created an issue
 
PuppetDB / Task PDB-5218
Investigate query performance for "ESTATE - nodes with specific operating system"
Issue Type: Task Task
Assignee: Unassigned
Created: 2021/07/29 6:38 AM
Priority: Normal Normal
Reporter: Bogdan Irimie

From the performance test we ran in https://docs.google.com/spreadsheets/d/1eaOZhSoTyZ3k_0905_3BSDZpViz3WbTTJGQzGdoTsnI/edit?usp=sharing

curl --location -g --request GET 'http://localhost:8080/pdb/query/v4/nodes?query=["extract", [["function", "count"]], ["and",["~", ["fact", "operatingsystem"], "Cent.*"]]]'

is an outlier and we should investigate if this query can be rewritten in a more efficient way or if we can make changes to the query engine in order to improve it's performance

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Bogdan Irimie (Jira)

unread,
Jul 29, 2021, 9:49:02 AM7/29/21
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Aug 11, 2021, 4:14:03 AM8/11/21
to puppe...@googlegroups.com

Bogdan Irimie (Jira)

unread,
Aug 11, 2021, 4:19:06 AM8/11/21
to puppe...@googlegroups.com

Andrei Filipovici (Jira)

unread,
Aug 16, 2021, 7:39:02 AM8/16/21
to puppe...@googlegroups.com

Andrei Filipovici (Jira)

unread,
Aug 23, 2021, 7:28:03 AM8/23/21
to puppe...@googlegroups.com
Andrei Filipovici commented on Task PDB-5218
 
Re: Investigate query performance for "ESTATE - nodes with specific operating system"

After talking to Lynsay Wright from Estate Reporting, we found that this query is not hardcoded, it based on user input. So the user may do a query based on other fact from the few hundreds available.
So improving just on query out of potential hundreds might not be such a good idea, because the indexes might take a long time to create and require additional space.
Plus the user might use custom facts that we are not aware of.

Anyway, we did some testing on the 100k nodes sandbox with 20 users, but on the n1 server and the query took on average 17707 ms.
Then we added an index on the operatingsystem fact and on average, the request took 7123 ms. That's an improvement of approximately 60%.

The sql command we used to create the index was:

CREATE INDEX operating_system_values_idx ON factsets USING GIN (((stable||volatile)->>'operatingsystem') gin_trgm_ops);

Reply all
Reply to author
Forward
0 new messages