Jira (PUP-10115) "glob" matches are not sorted in alphanumerical order as documented, but in directory order instead

20 views
Skip to first unread message

Henrik Lindberg (JIRA)

unread,
Oct 28, 2019, 5:47:02 PM10/28/19
to puppe...@googlegroups.com
Henrik Lindberg moved an issue
 
Puppet / Bug PUP-10115
"glob" matches are not sorted in alphanumerical order as documented, but in directory order instead
Change By: Henrik Lindberg
Key: HI PUP - 614 10115
Project: Hiera Puppet
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Henrik Lindberg (JIRA)

unread,
Oct 28, 2019, 5:48:02 PM10/28/19
to puppe...@googlegroups.com

Henrik Lindberg (JIRA)

unread,
Oct 28, 2019, 5:48:03 PM10/28/19
to puppe...@googlegroups.com
Henrik Lindberg commented on Bug PUP-10115
 
Re: "glob" matches are not sorted in alphanumerical order as documented, but in directory order instead

Moved ticket to PUP project since HI project is for deprecated Hiera 3 only.

Rob Braden (JIRA)

unread,
Nov 4, 2019, 12:48:03 PM11/4/19
to puppe...@googlegroups.com

Rob Braden (JIRA)

unread,
Nov 4, 2019, 12:48:03 PM11/4/19
to puppe...@googlegroups.com

otheus (Jira)

unread,
Apr 28, 2021, 12:58:04 PM4/28/21
to puppe...@googlegroups.com
otheus commented on Bug PUP-10115
 
Re: "glob" matches are not sorted in alphanumerical order as documented, but in directory order instead

Confirming and expounding. After intense testing, I found that:

  1. Directory entries are read in directory order, not "alphanumeric" order (which is probably an incorrect way of describing standard unix shell behavior).
  2. Problem exists in 6.22.1 as well.
  3. The reason this is so important is that it affects Hiera merge strategies and can result in changing & unpredictable behavior in a system using this. For instance, imagine you have a.yaml and c.yaml . Normally, the merge would treat `a.yaml` with higher priority. But after renaming `a.yaml` to `b.yaml`, depending on the filesystem, `c.yaml` might now come first in the directory order, and thus have higher priority.  As such, **this functionality becomes next to useless**.  (It's not completely useless, but one basically now needs a syntax checker to ensure there aren't conflicts or unpredictable merge-conflicts every time there is a change).
  4. The problem extends also to the glob */  mechanism: sub-directories are also processed in directory-order (by depth). 
  5. The documentation should be updated to explain that directory depth is a relevant factor in the determination of Hiera priority. At least in this regard, the behavior is close to what is expected. (ie, dir-a/.yaml is processed before any entries in dir-a/dir-b/.yaml when */.yaml   is the glob pattern.
  6. Glob patterns (plural) that are specified in an array actually are processed in the order specified. I would argue this is desirable behavior, but it one surface is at variance with the documentation, which implies that all glob array entries will be processed, files collated, and then sorted.  Thus:
    • -name: "xxx"
       globs:
         - "dirA/dirX/**/*.yaml"
         - "dirB/*.yaml"
         - "dirC/*.yaml"

Files processed in first entry (dirA/dirX/*...) will always have higher priorities than files processed in subsequent entries. Again, I find this desirable behavior as it makes the system more predictable, especially for deeply nested directories, such as Hostgroup-specific parsers.

Since the code isn't (and apparently hasn't ever) worked, fixing the bug could easily mean producing reasonable behavior and updating the documentation accordingly.

  1. On UNIX, "hidden" files and directories (beginning with "." are excluded from the glob pattern). This should be documented.
  2. It should also be documented what sorting by  "alphanumeric" order actually means. For instance, what happens to the sort order for the following:

2_0.yaml
2.0.yaml
2+0.yaml
2@0.yaml

Or for the following?

a.yaml
0.yaml
00.yaml
aa.yaml
z.yaml
_z.yaml
z_.yaml

In UNIX, 00 comes first, then 0, then aa, then a, then z, then z , then z, then zz. I don't understand it either. The POSIX definition was not really helpful here. 

Systems on a windows server might find the sort order is at odds with what they expect, so this should probably be a systems- and locale-specific feature that allows users to at least expect some kind of normalcy.  The point is to produce an ordering that users will not find confusing.

  1. It should also be specified what happens when (1) multiple glob patterns match the same file multiple times: are repeats ignored? processed twice? if twice, how does that affect the merge priority?  (2) infinite recursion is prevented, ie, soft-links that result in an endless loop. Both problems are addressed in UNIX by referring to the entry's inode and keeping a list of those seen per glob entry (across all patterns in one entry). 
  2. Henrik Lindberg's appears to be a meta-comment on the status of this problem as reported in a different Issue tracker. Henrik, please update the comment accordingly – at first glance it sounds as if this is a deprecated feature when it certainly is not.
    #
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

otheus (Jira)

unread,
Apr 28, 2021, 3:08:04 PM4/28/21
to puppe...@googlegroups.com

Josh Cooper (Jira)

unread,
Apr 28, 2021, 3:42:03 PM4/28/21
to puppe...@googlegroups.com
Josh Cooper commented on Bug PUP-10115
 
Re: "glob" matches are not sorted in alphanumerical order as documented, but in directory order instead

Thanks for writing this up otheus! FWIW the Dir.glob order behavior comes from ruby and they've changed the default behavior in ruby 3 Regardless, we should enforce a stable sorting order for older ruby versions, though we'll have to be careful about compatibility. 

David McTavish (Jira)

unread,
Jan 14, 2022, 11:16:02 AM1/14/22
to puppe...@googlegroups.com
David McTavish updated an issue
 
Change By: David McTavish
Labels: final_triage
This message was sent by Atlassian Jira (v8.20.2#820002-sha1:829506d)
Atlassian logo

Molly Waggett (Jira)

unread,
Feb 22, 2022, 1:56:01 PM2/22/22
to puppe...@googlegroups.com

Molly Waggett (Jira)

unread,
Feb 22, 2022, 1:56:01 PM2/22/22
to puppe...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages