| Confirming and expounding. After intense testing, I found that:
- Directory entries are read in directory order, not "alphanumeric" order (which is probably an incorrect way of describing standard unix shell behavior).
- Problem exists in 6.22.1 as well.
- The reason this is so important is that it affects Hiera merge strategies and can result in changing & unpredictable behavior in a system using this. For instance, imagine you have a.yaml and c.yaml . Normally, the merge would treat `a.yaml` with higher priority. But after renaming `a.yaml` to `b.yaml`, depending on the filesystem, `c.yaml` might now come first in the directory order, and thus have higher priority. As such, **this functionality becomes next to useless**. (It's not completely useless, but one basically now needs a syntax checker to ensure there aren't conflicts or unpredictable merge-conflicts every time there is a change).
- The problem extends also to the glob */ mechanism: sub-directories are also processed in directory-order (by depth).
- The documentation should be updated to explain that directory depth is a relevant factor in the determination of Hiera priority. At least in this regard, the behavior is close to what is expected. (ie, dir-a/.yaml is processed before any entries in dir-a/dir-b/.yaml when */.yaml is the glob pattern.
- Glob patterns (plural) that are specified in an array actually are processed in the order specified. I would argue this is desirable behavior, but it one surface is at variance with the documentation, which implies that all glob array entries will be processed, files collated, and then sorted. Thus:
-
-name: "xxx" |
globs: |
- "dirA/dirX/**/*.yaml" |
- "dirB/*.yaml" |
- "dirC/*.yaml" |
Files processed in first entry (dirA/dirX/*...) will always have higher priorities than files processed in subsequent entries. Again, I find this desirable behavior as it makes the system more predictable, especially for deeply nested directories, such as Hostgroup-specific parsers. Since the code isn't (and apparently hasn't ever) worked, fixing the bug could easily mean producing reasonable behavior and updating the documentation accordingly.
- On UNIX, "hidden" files and directories (beginning with "." are excluded from the glob pattern). This should be documented.
- It should also be documented what sorting by "alphanumeric" order actually means. For instance, what happens to the sort order for the following:
2_0.yaml |
2.0.yaml |
2+0.yaml |
2@0.yaml |
Or for the following?
a.yaml |
0.yaml |
00.yaml |
aa.yaml |
z.yaml |
_z.yaml |
z_.yaml |
In UNIX, 00 comes first, then 0, then aa, then a, then z, then z , then z, then zz. I don't understand it either. The POSIX definition was not really helpful here. Systems on a windows server might find the sort order is at odds with what they expect, so this should probably be a systems- and locale-specific feature that allows users to at least expect some kind of normalcy. The point is to produce an ordering that users will not find confusing.
- It should also be specified what happens when (1) multiple glob patterns match the same file multiple times: are repeats ignored? processed twice? if twice, how does that affect the merge priority? (2) infinite recursion is prevented, ie, soft-links that result in an endless loop. Both problems are addressed in UNIX by referring to the entry's inode and keeping a list of those seen per glob entry (across all patterns in one entry).
- Henrik Lindberg's appears to be a meta-comment on the status of this problem as reported in a different Issue tracker. Henrik, please update the comment accordingly – at first glance it sounds as if this is a deprecated feature when it certainly is not.
#
|