Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

force expiration by path?

8 views
Skip to first unread message

Dave McGuire

unread,
Dec 10, 2023, 8:12:06 PM12/10/23
to

Hi folks. Can anyone tell me if there's a way to tell INN to expire
a set of articles, as a one-time operation, based on their path?

I'm sure it's obvious that my goal is to get rid of all the Google
spam from the spool. I just filtered them in my cleanfeed configuration
but would like to purge the articles that are already there, as my
server is set up with a long expiration period.

A perusal of the docs for expire and such have turned up nothing, so
I'd appreciate some advice on whether or not there's a way to do this.

Thanks,
-Dave

--
Dave McGuire, President/Curator
Large Scale Systems Museum
New Kensington, PA

Grant Taylor

unread,
Dec 11, 2023, 12:14:05 AM12/11/23
to
On 12/10/23 19:12, Dave McGuire wrote:
>   Hi folks.  Can anyone tell me if there's a way to tell INN to expire
> a set of articles, as a one-time operation, based on their path?

Maybe and it depends. (More below.)

>   I'm sure it's obvious that my goal is to get rid of all the Google
> spam from the spool.  I just filtered them in my cleanfeed configuration
> but would like to purge the articles that are already there, as my
> server is set up with a long expiration period.

I was doing that very thing as we type this thread. -- I just checked
and a long running command finished.

time says that my command ran for:

84021.02s user 19364.71s system 29% cpu 98:13:31.85 total

This is a tradspool on a four (spinning rust) drive ZFS pool.

Seeing as how I'm using tradspool, I'm able to delete files from the
spool directory.

I suspect that this isn't proper, much less pure, from an INN sense. I
bet I should have extracted the article number and passed a given a
cancel message to INN, likely via ctlinnd. But, I did a hack and I'll
deal with it if / when it becomes a problem.

That being said, I did a find across /var/spool/news/articles and had it
exec a script per article that looked for Message-IDs that ended with
@googlegroups.com.

This is actually the second time I've done this. The first time I did
it the process removed nearly seven million articles. Then I found out
that the Message-ID had a different pattern, likely as fields grew over
time. So I re-ran the process with a more forgiving format.

export LC_ALL=C
egrep -lm1 "^Message-ID:
<[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+@googlegroups.com>$"
${1} > /dev/null 2>&1
if [ ${?} -eq 0 ]; then
echo -n "X"
rm ${1}
fi

I'm sure there are other ways to do this. But it worked for me. I was
able to let it run in the background in a window.

time (clear; find $(pwd) -type d | while read DIR; do echo -n
"${TS}${${DIR/\/var\/spool\/news\/articles\//}//\//.}${FS}"; find ${DIR}
-maxdepth 1 -type f -exec
/root/remove-google-groups-news-posting-if-its-spam.sh {} \; ; done; echo)

The echo / ${TS} / ${FS} isn't important, much less required. It's
there because I wanted to update the window title to be the newsgroup
that was being worked on.

I'm sure there are better ways to do this. But this has worked for me
to do exactly what you're asking to do.

>   A perusal of the docs for expire and such have turned up nothing, so
> I'd appreciate some advice on whether or not there's a way to do this.

I'm not aware of anything built in to INN that will do this. But this
is one way that you can do this outside of INN.

N.B. what I did is possibly very specific to the tradspool method. I
have no idea about other methods. It may be possible, but would likely
require using ctlinnd to cancel the articles.



--
Grant. . . .

Julien ÉLIE

unread,
Dec 11, 2023, 3:38:05 PM12/11/23
to
Hi Dave,

> Can anyone tell me if there's a way to tell INN to expire
> a set of articles, as a one-time operation, based on their path?

Grant's method naturally works on tradspool and you can use it.

In a more general case, you can parse the history file (in <pathdb> as
set in inn.conf), retrieve the headers of each article (sm -H) and run
the regexps you wish on these headers.
As you're asking for a search based on the Path header field, the
following command will write to a googlegroups.tokens file the storage
tokens of articles sent from Google Groups:

perl -ne 'chomp; our ($hash, $timestamps, $_) = split " "; print
"$_\n" if $_ and qx/sm -q -H "$_" | grep Path/ =~
/!google-groups\.googlegroups\.com!not-for-mail$/' history >
googlegroups.tokens

The command will take a bit of time to run, as INN retrieves every article.

Then, to delete these articles from your history file, just run "sm -d"
on them. Something like:

xargs sm -d < googlegroups.tokens


Before doing that, check that your regexp worked, by retrieving a few
storage tokens and verifying they're coming from Google Groups. You can
see the contents of an article with:

sm -R '@...token...@'

(-R in uppercase)


> A perusal of the docs for expire and such have turned up nothing, so
> I'd appreciate some advice on whether or not there's a way to do this.

The next run of news.daily will properly clean the overview, etc.

--
Julien ÉLIE

« Qui habet aures audiendi, audiat. » (Évangiles)

Dave McGuire

unread,
Dec 13, 2023, 9:42:35 PM12/13/23
to
Hi Grant, thank you, I'll give this a shot. The window title thing
is a nice touch. :)

Grant Taylor

unread,
Dec 13, 2023, 10:54:53 PM12/13/23
to
On 12/13/23 20:42, Dave McGuire wrote:
>   Hi Grant, thank you, I'll give this a shot.  The window title thing
> is a nice touch. :)

Hi Dave,

You're welcome.

Please let me know if it works or if you have questions.

Tom Furie

unread,
Dec 14, 2023, 1:32:32 AM12/14/23
to
Grant Taylor <gta...@tnetconsulting.net> writes:

> export LC_ALL=C
> egrep -lm1 "^Message-ID:
> <[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+@googlegroups.com>$"

Just an aside in case you need to do something like this again...

A pattern like

"^Message-ID: <([[:alnum:]]+-)+[[:alnum:]]+[[:alpha:]]"

before the "@" should allow the hyphenated grouping to expand
arbitrarily without intervention required to modify the pattern. I'm
pretty certain that's all hex (probably a hash of something) until the
"n@google...", and I don't think I've ever noticed anything other than
"n" immediately preceding the "@", so I guess the pattern could be
something like

"^Message-ID: <([0-9a-f]+-)+[0-9a-f]+n@google..."

Dave McGuire

unread,
Jan 7, 2024, 1:22:26 PMJan 7
to
On 12/13/23 22:54, Grant Taylor wrote:
>>    Hi Grant, thank you, I'll give this a shot.  The window title thing
>> is a nice touch. :)
>
> Hi Dave,
>
> You're welcome.
>
> Please let me know if it works or if you have questions.

Hi Grant, yes it did indeed work. Thank you for your advice.

Grant Taylor

unread,
Jan 9, 2024, 11:15:00 PMJan 9
to
On 1/7/24 12:22, Dave McGuire wrote:
> Hi Grant, yes it did indeed work.  Thank you for your advice.

Hi Dave,

Thank you for the follow up. I'm glad that it worked for you. :-)
0 new messages