Jira (PDB-5643) PQL parsing of OR clauses can result in OOM errors

8 views
Skip to first unread message

Joshua Partlow (Jira)

unread,
May 3, 2023, 4:27:02 PM5/3/23
to puppe...@googlegroups.com
Joshua Partlow created an issue
 
PuppetDB / Bug PDB-5643
PQL parsing of OR clauses can result in OOM errors
Issue Type: Bug Bug
Affects Versions: PDB 7.12.1
Assignee: Unassigned
Created: 2023/05/03 1:26 PM
Priority: Normal Normal
Reporter: Joshua Partlow

There seems to be enough inefficiency in instaparse such that large(ish) sets of OR clauses can eat up available memory.

In a 2021.7.2 e2-custom-4-10240 GCP test instance with 1G allocated to puppetdb's heap, this query was enough to oom in ~3m:

reports {
  latest_report? = true and (
    certname = "foo22" or
    certname = "foo21" or
    certname = "foo20" or
    certname = "foo19" or
    certname = "foo18" or
    certname = "foo17" or
    certname = "foo16" or
    certname = "foo15" or
    certname = "foo14" or
    certname = "foo13" or
    certname = "foo12" or
    certname = "foo11" or
    certname = "foo10" or
    certname = "foo9" or
    certname = "foo8" or
    certname = "foo7" or
    certname = "foo6" or
    certname = "foo5" or
    certname = "foo4" or
    certname = "foo3" or
    certname = "foo2" or
    certname = "foo1"
  )
}

The effect is exponential. Calling it with 21 OR clauses, the query succeeded in 31s, with 20 in 7s, with 19 in 4s, and so on back down to <1 once you reach 16 clauses.

We'll need to see if our grammar can be tweaked again to avoid this, similar to what Rob did in PDB-5260, or if we're looking at a parser re-write to fix this.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.20.11#820011-sha1:0629dd8)
Atlassian logo

Cas Donoghue (Jira)

unread,
May 4, 2023, 2:43:03 PM5/4/23
to puppe...@googlegroups.com
Cas Donoghue updated an issue
Change By: Cas Donoghue
There seems to be enough inefficiency in [instaparse|https://github.com/Engelberg/instaparse] such that large(ish) sets of OR clauses can eat up available memory.


In a 2021.7.2 e2-custom-4-10240 GCP test instance with 1G allocated to puppetdb's heap, this query was enough to oom in ~3m:

{code}
{code}

The effect is exponential. Calling it with 21 OR clauses, the query succeeded in 31s, with 20 in 7s, with 19 in 4s, and so on back down to <1 once you reach 16 clauses.

We'll need to see if our grammar can be tweaked again to avoid this, similar to what Rob did in PDB-5260, or if we're looking at a parser re-write to fix this.


 

ACTION:

Look through and see if there is an improvement using instaparse. Do not necessarily re-write the parser just for this bug. 

Cas Donoghue (Jira)

unread,
May 4, 2023, 2:44:03 PM5/4/23
to puppe...@googlegroups.com

Joshua Partlow (Jira)

unread,
May 4, 2023, 2:58:03 PM5/4/23
to puppe...@googlegroups.com
Joshua Partlow updated an issue
Change By: Joshua Partlow
  WORKAROUNDS:

Use IN instead of a set of ORs; or wrap the ORs in parentheses, although in general an IN will be faster on the postgres side once this eventually parses and postgres is actually running it.

ACTION:

Look through and see if there is an improvement using instaparse. Do not necessarily re-write the parser just for this bug. 
Reply all
Reply to author
Forward
0 new messages