No, no - mine is the One True Report Format (TM) ...
(over in the corner I can hear joey having a fit
of giggles 'cause he knows his is best...)
|This script has not been tested with B News. In theory it should work
|OK, since those fields of the log file which are looked at are
|supposedly the same in the C New log file.
Uhhh... you've forgotten, haven't you 8^)...
BTW, I have the feeling that "i" and "s" records
are not really in the "accepted" category. As I
recall, "i" records show the message-id of an
article which is sent to a remote site in response
to having accepted its "ihave" message; they're
really an "i-want" record. "s" records show the
message-id of an article sent to a remote site as
a result of having accepted its "sendme" message -
more or less a "send-to". Anyhow this may all be
different when the new ihave/sendme stuff arrives
on our doorsteps...
You might want to have the patches by Larry Blair
which give "c" & type for control messages, and "f"
for failed cancels, etc. They make for a richer log
format and include the path for duplicates. "Don't
leave /usr/home without it"...
Cheers,
--
,u, Bruce Becker Toronto, Ontario
a /i/ Internet: b...@becker.gts.org, br...@gpu.utcs.toronto.edu
`\o\-e UUCP: ...!lsuc!becker!bdb
_< /_ "Ceci n'est pas un \"" - Rene "Day" Taxi # 12 & 35
This is my day to be off-by-one!
Index: newsstats.sh
6c6
> #ident "@(#)cnews.Local:newsstats.sh 1.2 92/02/24 20:58:06 (woods)"
---
> #ident "@(#)cnews.Local:newsstats.sh 1.3 92/02/25 00:43:54 (woods)"
30c30
< if [ $# -gt 1 ] ; then
---
> if [ $# -ge 1 ] ; then
--
Greg A. Woods
wo...@robohack.UUCP wo...@Elegant.COM VE3-TCP UniForum Canada & Elegant Comm.
(416) 443-1734 [home] (416) 595-5425 [work] Toronto, Ontario; CANADA
"Want" for " i " records, "Sent" for " s ".
Neither represents an incoming article in the
sense of "accepted" in your awk script.
It's not clear that you need "Want" anyhow, its
just a count of how many articles your system
asked to be sent in response to ihave's from
other systems (no guarantee any of them will
actually be sent). On the other hand the "Sent"
field is actually more interesting than the
"Xmit" field for ihave/sendme, since for those
sites most (but not necessarily all) entries
are offered in ihave messages, but may not
all be requested for transmission by the
remote system.
|> You might want to have the patches by Larry Blair
|> which give "c" & type for control messages, and "f"
|> for failed cancels, etc. They make for a richer log
|> format and include the path for duplicates. "Don't
|> leave /usr/home without it"...
|
|No thanks.... I don't that much can be gained by slowing down
|relaynews to this stuff in the log. Regardless, I'll leave log-file
|format and content to Henry and Geoff.
Slowing down rela.... hmmm, Greg, this
can hardly be a realistic response. If you
wanted to speed things up a lot (at least
for ihave/sendme) the " i " and " s " records
are written for every article requested or
sent, as well as to the batch logs. Better to
just make up control articles directly and log
message-id pointers to them...
Anyhow many folks do use the patches, they're
quite minimal and unobtrusive, etc. You might
allow for the differences in your script for
those who do use them...
|As has also been pointed out to me, and, with my memory sufficiently
|jogged, I in recall that I once knew this too, the "local" entries
|with "remote" message-id's are "fake" articles used to hold place in
|the history file for the articles who's cancels precede them (thus
|causing the "canceled" article to be rejected as a "duplicate").
|
|This is rather groty too, since it's nearly impossible to tell in a
|portable way if the message-id's hostname part matches $4, thus if
|it's a "faked" article or not. It would be much simpler if the record
|contained a different log flag ($5), such as 'c', to indicate the
|special status of this record.
Please do not use " c " - the well-distributed
patch mentioned above uses that for describing
control messages. There's an " f " flag in the
patch that documents failed cancels and failed
supersedes...
I think I'll wait for the "spring cleanup" release of C News before I
try ihave/sendme and figure out what the log entries really mean.
> Slowing down rela.... hmmm, Greg, this
> can hardly be a realistic response. If you
> wanted to speed things up a lot (at least
> for ihave/sendme) the " i " and " s " records
> are written for every article requested or
> sent, as well as to the batch logs. Better to
> just make up control articles directly and log
> message-id pointers to them...
Well again, since I don't use ihave/sendme yet, and since this is some
of the stuff that's supposed to be much more efficient in the new
release, I'm not concerned with it either way.
Besides, the stats reporting should work for the lowest common
denominator of C News sites....
Anyway, I'm planning to take any suggestions I receive and incorporate
them into a new version and eventually post it too. I've already
added rudimentary support for 'c' and 'f' records. (hint, hint, and I
really need more info about how nntp might impact the normal log file.)
I am indeed rather annoyed that the fakehist() routine in relaynews
isn't smart enough to differentiate its log entries from regular
ones. The coding change required is *extremely* simple and would not
affect performance in any way. Hopefully this will also be fixed in
the upcoming fixes.
Another frustrating "feature" of relaynews logging (which may indeed
be fixed in newer versions than the one I'm running), is the failure
to assure that the message-id field written out to the log file
(especially those by fakehist()) are "legal". One of the upcoming
"fixes" to stats.awk is a little check for this (and it costs only 5%
increase in cpu time!).
Well I don't know if you need to change them, it's matter of persective
really, either look at as an "i-have received" or a "sendme sent" or
even both.
>As has also been pointed out to me, and, with my memory sufficiently
>jogged, I in recall that I once knew this too, the "local" entries
>with "remote" message-id's are "fake" articles used to hold place in
>the history file for the articles who's cancels precede them (thus
>causing the "canceled" article to be rejected as a "duplicate").
On Contact, where this script was first written, local posting were
recorded to different log file so anything marked as being from contact
in normal log file would've been one of these fake articles.
Another problem with this script is that it's possible for an aritcle to
be both accepted and junked, screwing up the totals.
>X# newsstats.awk -- by Ross Ridge (ro...@contact.uucp) Public Domain
Could you change my address here, or at least put a note the address doesn't
exist anymore?
>X#
>X# this loop should merge duplicates before counting....
>X#
I don't think so, if site is mentioned more than once after the article id,
it'll get sent to that site more than once.
Ross Ridge
--
\\ //
[OO] -|-=============================+<>+============================-|- [OO]
\()\ | Ross Ridge ro...@zooid.guild.org The Great HTMU | /()/
\\ -|-=============================+<>+============================-|- //
True enough.....
> On Contact, where this script was first written, local posting were
> recorded to different log file so anything marked as being from contact
> in normal log file would've been one of these fake articles.
I do this too, but it's still rather yucky in terms of reporting....
I did patch my relaynews to use the 'f' flag for the faked ones, and
my current stats report the total #. I'd like to get a percentage of
the total # of cancels, but I'm not sure I want to add 'c' records to
my log just yet....
> Another problem with this script is that it's possible for an aritcle to
> be both accepted and junked, screwing up the totals.
Hmm... yes.
> >X# newsstats.awk -- by Ross Ridge (ro...@contact.uucp) Public Domain
>
> Could you change my address here, or at least put a note the address doesn't
> exist anymore?
Done in my current copy already.... <ro...@zooid.guild.org>
> >X#
> >X# this loop should merge duplicates before counting....
> >X#
>
> I don't think so, if site is mentioned more than once after the article id,
> it'll get sent to that site more than once.
Not if you filter the togo file through uniq like all sane versions of
sendbatches! :-)
OK, here's what I use, and I'll even throw in an
update of Larry Blair's patches for more detailed
log file format. The output is fairly detailed &
is shown on a per-site basis, plus a list and
count of junked newsgroups is included...
--------- 8< --------- 8< --------- 8< --------- 8< --------- 8< ---------
[ news.repz ]
#!/bin/sh
#
# Shell script for chewing up Cnews log files and spitting out
# summary news stats. Reads from standard input or give it a news log
# file as an argument. Bug fixes and enhancements welcome.
#
#
# Many changes & enhancements - Bruce Becker 26/Feb/92
#
# John A. Palkovic 1/31/91
#
# Keith Cantrell (kcan...@digi.lonestar.org) added the ability to
# print the summary of files sent out. 1/30/90
#
. ${NEWSCONFIG-/usr/lib/news/bin/config}
ME=`uname -n`
FILE=$1; if [ -z "$FILE" ]; then FILE="-"; fi
awk -f $NEWSBIN/maint/news.repz.awk ME=$ME $FILE
--------- 8< --------- 8< --------- 8< --------- 8< --------- 8< ---------
[ news.repz.awk ]
# awk script for chewing up Cnews log files and spitting out
# summary news stats. Bug fixes and enhancements welcome.
#
#
# Many changes & enhancements - Bruce Becker 26/Feb/92
#
# If you don't run with the relaynews daemon patches, change
# "SENT_FIELD = 8" just below to 'SENT_FIELD = 7'. Everything
# except the "control" and "failed" stuff should work the same.
#
# John A. Palkovic 1/31/91
#
# Keith Cantrell (kcan...@digi.lonestar.org) added the ability to
# print the summary of files sent out. 1/30/90
#
BEGIN {
SENT_FIELD = 8 # change this to 7 for vanilla C news
posted[""] = 0; psent[""] = 0
control[""] = 0; cancel[""] = 0; ihave[""] = 0; sendme[""] = 0
chkgrp[""] = 0; rmgroup[""] = 0; newgrp[""] = 0; sendsys[""] = 0
version[""] = 0; uunames[""] = 0; cother[""] = 0; csent[""] = 0
junked[""] = 0
reject[""] = 0; dupl[""] = 0; nosub[""] = 0; unapp[""] = 0
daterr[""] = 0; hdrerr[""] = 0; rother[""] = 0
iwant[""] = 0; sendto[""] = 0
failed[""] = 0; other[""] = 0
}
{
if (NF < 6 || $4 ~ /[\000-,/:-@[-`{-\377]/ || $6 !~ /^<.+@.+>$/)
{ if ($0 !~ /^$/) print; next }
l = length($4)
if ($4 ~ /\.uucp$/) nm = substr($4, 1, l-5)
else if ($4 ~ /\.UUCP$/) nm = substr($4, 1, l-5)
else if ($4 == ME) nm = "<local>"
else if ($4 == "local") nm = "<local>"
else if ($4 == "maps") nm = "<maps>"
else nm = $4
sys[nm]++; count++
if ($5 == "+") {
posted[nm]++
for (i=SENT_FIELD; i <= NF; i++) { psent[$i]++; sys[$i]++ }
}
else if ($5 == "c") {
control[nm]++
if ($8 == "cancel") { cancel[nm]++; S = 10
if ($9 !~ /^<.+@.+>$/) { print; S = 0 } }
else if ($8 == "ihave") { ihave[nm]++; S = 10 }
else if ($8 == "sendme") { sendme[nm]++; S = 10 }
else if ($8 == "checkgroups") { chkgrp[nm]++; S = 9 }
else if ($8 == "rmgroup") { print; rmgroup[nm]++; S = 10 }
else if ($8 == "newgroup") { print; newgrp[nm]++; S = 10
if ($10 == "moderated") S++ }
else if ($8 == "sendsys") { print; sendsys[nm]++; S = 9 }
else if ($8 == "version") { print; version[nm]++; S = 9 }
else if ($8 == "senduuname") { print; uunames[nm]++; S = 9 }
else { print; cother[nm]++; S = 0 }
if (S > 0) for (i=S; i <= NF; i++) {
l = length($i)
if ($i ~ /\-ctl$/) n = substr($i, 1, l-4)
else n = $i
csent[n]++; sys[n]++
}
}
else if ($5 == "j") {
junked[nm]++
l = length($NF); if (substr($NF, l, 1) == "'") l--
s = 1; if (substr($NF, 1, 1) == "`") { s++; l-- }
g = substr($NF, s, l); n = split(g, ng, ",")
for (i=1; i <= n; i++) { g = ng[i]; if (g != "") jng[g]++ }
}
else if ($5 == "-") {
reject[nm]++
if ($7 == "duplicate") dupl[nm]++
else if ($0 ~ /no subscribed/) nosub[nm]++
else if ($0 ~ /unapproved/) unapp[nm]++
else if ($0 ~ /ancient|too far in the future|unparsable Date/)
daterr[nm]++
else if ($0 ~ / (no|empty) .* header|contains non-|Message-ID|space in/)
{ print; hdrerr[nm]++ }
else { print; rother[nm]++ }
}
else if ($5 == "i") {
l = length($7)
if ($7 ~ /\-send\-ids$/) n = substr($7, 1, l-9)
else n = $7
iwant[n]++; sys[n]++
}
else if ($5 == "s") {
l = length($7)
if ($7 ~ /\-real$/) n = substr($7, 1, l-5)
else n = $7
sendto[n]++; sys[n]++
}
else if ($5 == "f") failed[nm]++
else { print; other[nm]++ }
}
END {
printf "\n\n\t\t\t Articles Incoming\t\t | Articles Outgoing\n"
printf "Hosts Posted Contrl Junked Duplic Reject Total | Queued Contrl Total\n"
printf "------------- ------ ------ ------ ------ ------ ------ | ------ ------ ------\n"
if (count > 0) for (host in sys) {
rp = posted[host]; rc = control[host]
rj = junked[host]; rd = dupl[host]; rr = reject[host]
rec = rp + rc + rr
sp = psent[host]; sc = csent[host]
snt = sp + sc
if (rec <= 0 && snt <= 0) continue
rp -= rj; rr -= rd
printf "%-14.14s %6d %6d %6d %6d %6d %6d %6d %6d %6d\n", \
host, rp, rc, rj, rd, rr, rec, sp, sc, snt
totalpost += rp; totalcontrol += rc; totaljunked += rj
totaldupl += rd; totalreject += rr; totalrec += rec
totalpsent += sp; totalcsent += sc; totalsent += snt
}
printf "------------- ------ ------ ------ ------ ------ ------ ------ ------ ------\n"
printf "%-14.14s %6d %6d %6d %6d %6d %6d %6d %6d %6d\n", \
"Totals", totalpost, totalcontrol, totaljunked, \
totaldupl, totalreject, totalrec, \
totalpsent, totalcsent, totalsent
printf "\n"
printf "\n\t\t\t\tControl Messages\n"
printf "Hosts Cancl Ihave Sndme Ckgrp Rmgrp Nwgrp Sdsys Versn Uunme Other\n"
printf "------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----\n"
if (totalcontrol > 0) for (host in control)
if (host != "" && control[host] > 0) {
ca = cancel[host]; ih = ihave[host]; sm = sendme[host]
ck = chkgrp[host]; rm = rmgroup[host]; nw = newgrp[host]
ss = sendsys[host]; vr = version[host]; uu = uunames[host]
co = cother[host]
printf "%-14.14s %5d %5d %5d %5d %5d %5d %5d %5d %5d %5d\n", \
host, ca, ih, sm, ck, rm, nw, ss, vr, uu, co
totalca += ca; totalih += ih; totalsm += sm
totalck += ck; totalrm += rm; totalnw += nw
totalss += ss; totalvr += vr; totaluu += uu
totalco += co
}
printf "------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----\n"
printf "%-14.14s %5d %5d %5d %5d %5d %5d %5d %5d %5d %5d\n", \
"Totals", totalca, totalih, totalsm, totalck, \
totalrm, totalnw, totalss, totalvr, totaluu, totalco
printf "\n"
printf "\n\t\t\t\tRejects and Miscellany\n"
printf "Hosts Dupli Nosub Unapp DatNG HdrNG Rmisc Iwant Sndto Cfail Other\n"
printf "------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----\n"
if (count > 0) for (host in sys) {
iw = iwant[host]; sn = sendto[host]
fa = failed[host]; ot = other[host]
if (reject[host] > 0 || \
iw > 0 || sn > 0 || fa > 0 || ot > 0) {
du = dupl[host]; ns = nosub[host]; un = unapp[host]
dt = daterr[host]; hd = hdrerr[host]; ro = rother[host]
printf "%-14.14s %5d %5d %5d %5d %5d %5d %5d %5d %5d %5d\n", \
host, du, ns, un, dt, hd, ro, iw, sn, fa, ot
totaldu += du; totalns += ns; totalun += un
totaldt += dt; totalhd += hd; totalro += ro
totaliw += iw; totalsn += sn
totalfa += fa; totalot += ot
}
}
printf "------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----\n"
printf "%-14.14s %5d %5d %5d %5d %5d %5d %5d %5d %5d %5d\n", \
"Totals", totaldu, totalns, totalun, \
totaldt, totalhd, totalro, \
totaliw, totalsn, totalfa, totalot
printf "\n"
if (totaljunked > 0) {
print "\nCount of Junked Newsgroups"
for (g in jng) printf "%5d\t %s\n", jng[g], g
printf "\n"
}
}
--------- 8< --------- 8< --------- 8< --------- 8< --------- 8< ---------
Here's part of Larry's original posting, with revised
patches relative to the 22-Dec-1991 version of C news:
From: l...@vicom.com (Larry Blair)
Newsgroups: news.software.b,news.sysadmin,comp.sources.d
Subject: Patch to C News for adequate logging for statistics
Message-ID: <1989Jul1.2...@vicom.com>
Date: 1 Jul 89 20:51:26 GMT
Organization: VICOM Systems Inc., San Jose, CA
[...]
The patches are to two files. The first patch, to history.c, adds the
following:
The line for all accepted articles now contains the newsgroups to which
the article were posted.
Accepted control messages are indicated with a "c" instead of a "+",
and the type of the control message appears on the line.
The dummy history entry for cancel messages that arrive before the
actual article are now indicated with a "f" instead of a "+".
This patch adds very little to the size of the log, but provides all the
information necessary to produce the site statistics.
The second patch, to procart.c, places the path by which a rejected
duplicate arrived in the log. Since the duplicates usually arrive by
some long path, this patch will significantly increase the size of the
log if you receive a lot of duplicates. On the other hand, without this
information it is nearly impossible for sites receiving a lot of dups to
tune their incoming feeds so as to reduce them.
[updated patches follows]
*** history.c.orig Fri Nov 8 14:17:54 1991
--- history.c Mon Feb 17 15:17:30 1992
***************
*** 193,200 ****
if (startlog) {
timestamp(stdout, &now);
! if (printf(" %s + %s", sendersite(nullify(art->h.h_path)),
! msgid) == EOF)
fulldisk(art, "stdout");
} else
now = time(&now);
--- 193,208 ----
if (startlog) {
timestamp(stdout, &now);
! if(art->h.h_ngs == NULL) {
! if (printf(" %s f %s",
! sendersite(nullify(art->h.h_path)), msgid) == EOF)
! fulldisk(art, "stdout");
! } else if (printf(" %s %c %s %s",
! sendersite(nullify(art->h.h_path)),
! (art->h.h_ctlcmd) ? 'c' : '+', msgid,
! art->h.h_ngs) == EOF)
! fulldisk(art, "stdout");
! if ( art->h.h_ctlcmd && printf(" %s", art->h.h_ctlcmd) == EOF)
fulldisk(art, "stdout");
} else
now = time(&now);
*** procart.c.orig Fri Nov 8 14:25:22 1991
--- procart.c Mon Feb 17 15:27:57 1992
***************
*** 393,399 ****
(void) printf("no subscribed groups in `%s'\n", ngs);
} else if (alreadyseen(hdrs->h_msgid)) {
prefuse(art);
! (void) fputs("duplicate\n", stdout);
} else
return; /* art was accepted */
decline(art);
--- 393,399 ----
(void) printf("no subscribed groups in `%s'\n", ngs);
} else if (alreadyseen(hdrs->h_msgid)) {
prefuse(art);
! (void) printf("duplicate - %s\n", hdrs->h_path);
} else
return; /* art was accepted */
decline(art);
--------- 8< --------- 8< --------- 8< --------- 8< --------- 8< ---------