Hello,
On my private INN leaf system, I am running INN 2.7.1 with groupbaseexpiry=true,
ovmethod=ovdb, hismethod=hisv6, and tradspool.
I am having an issue where the news.daily job is:
* For SOME articles, correctly expiring them from all places
* For others, even in the same group, removing them from lownumber in active
(and presumably also overview) but leaving them on disk (and sometimes in
history).
I noticed this when doing a routine examination of what was using space on this
system, which is getting a full text-only feed.
I picked the group at the top of the storage space list under alt. (Please do
not derail; this group was picked solely by its disk space usage, and is not one
I read anyhow; this is a purely technical question)
news:~/articles/alt/atheism$ ls | wc -l
118089
news:~/articles/alt/atheism$ grep ^alt.atheism\ /var/lib/news/active
alt.atheism 0000289856 0000288979 y
According to active, there are 877 articles in that group. But according to ls,
over 100,000.
I added a specific like to expire.ctl for this hierarchy for testing:
alt.atheism.*:A:5:5:5
alt.atheism:A:5:5:5
My expire.log shows:
expireover start Tue Jan 16 04:18:25 CST 2024: ( -z/var/log/news/expire.rm -Z/var/log/news/expire.lowmark)
Article lines processed 9050290
Articles dropped 16771
Overview index dropped 17663
expireover end Tue Jan 16 04:37:41 CST 2024
lowmarkrenumber begin Tue Jan 16 04:37:41 CST 2024: (/var/log/news/expire.lowmark)
lowmarkrenumber end Tue Jan 16 04:37:41 CST 2024
expirerm start Tue Jan 16 04:37:41 CST 2024
expirerm end Tue Jan 16 04:38:02 CST 2024
expire begin Tue Jan 16 04:38:32 CST 2024: (-v1)
Article lines processed 8051376
Articles retained 5934390
Entries expired 2116986
expire end Tue Jan 16 04:52:01 CST 2024
all done Tue Jan 16 04:52:01 CST 2024
Well that's weird. My expire.list file does have some alt/atheism/* files in
it, and THOSE files are gone. But:
news:~/articles/alt/atheism$ ls -ltr | head
total 584041
-rw-rw-r-- 4 news news 5967 Aug 30 2021 1
-rw-rw-r-- 4 news news 2612 Aug 30 2021 2
-rw-rw-r-- 4 news news 2609 Aug 30 2021 3
-rw-rw-r-- 4 news news 3596 Aug 30 2021 16
-rw-rw-r-- 4 news news 3511 Aug 30 2021 24
-rw-rw-r-- 4 news news 4217 Aug 30 2021 25
-rw-rw-r-- 4 news news 3303 Aug 30 2021 27
-rw-rw-r-- 4 news news 2362 Aug 30 2021 28
-rw-rw-r-- 4 news news 3994 Aug 30 2021 32
Clearly something isn't right here.
Looking at the message-IDs from these, for some of them (for instance, the very
oldest in the list) grephistory doesn't show any entries. For others -- such as
this one from February 2023, nearly a year ago:
news:~/articles/alt/atheism$ grephistory -l '
c0eeuhhbc3ebr3t8v...@4ax.com'
[43098AA196CF303E586EDE4477A98426] 1676098879~-~1676097578 @0500000002F00000000000030D4000000000@
And those dates match. But this is clearly outside what is listed in active.
ovdb seems to match active:
news:~/articles/alt/atheism$ ovdb_stat -c alt.atheism
alt.atheism: counted: low: 288979, high: 289856, count: 878
news:~/articles/alt/atheism$ ovdb_stat -g alt.atheism
alt.atheism: groupstats: low: 288979, high: 289856, count: 878, flag: y
news:~/articles/alt/atheism$ ovdb_stat -i alt.atheism
alt.atheism: flags: none
alt.atheism: gid: 752; Stored in: ov00027
alt.atheism: last expired: 2024-01-16 04:20:27 CST
alt.atheism: by process id: 138688
Even expire.list is weird:
alt/atheism/288826
alt/atheism/288827
alt/atheism/288828
alt/atheism/288829
alt/atheism/288830
alt/atheism/288833
alt/atheism/288834
alt/atheism/288835
alt/atheism/288836
alt/atheism/288840
alt/atheism/288842
I don't know how to explain those gaps; 288831 and 288832 do exist on disk and
should be expired, for instance.
I believe this is happening in a number of other groups as well. That is, it's
not specific to this one. This is just the biggest example.
So my questions are:
1) Why is this happening?
2) Once I fix #1, how can I fix it? I could brute force a 'find . -mtime +x
-delete' but that might not leave the history in a consistent state.
I am using the default cronjob for calling news.daily expireover lowmark delayrm.
Thanks,
John