Kaspersky wins yet another comparison

Jari Lehtonen

unread,

Feb 27, 2004, 7:02:39 AM2/27/04

to

Tested by AV-Comparatives organization., the Kaspersky Antivirus gets
the best on-demand results with 99.85% of malware detected, McAfee
seconds with 95.41%.

The comparison looks quite professionally made.
http://www.av-comparatives.org/seiten/ergebnisse_2004_02.php

Jari

nu...@zilch.com

unread,

Feb 27, 2004, 7:58:27 AM2/27/04

to

On Fri, 27 Feb 2004 14:02:39 +0200, Jari Lehtonen
<spamR...@hukassa.com> wrote:

>Tested by AV-Comparatives organization., the Kaspersky Antivirus gets
>the best on-demand results with 99.85% of malware detected, McAfee
>seconds with 95.41%.

Looks like McAfee seconds with 99.24%, not 95.41%.

Finally! I've been waiting for a "crud" detection category, and this
test has it (they call it "unwanted files"). I see McAfee is the super
crud detector here, "winning" by 75% to 68% over KAV :)

>The comparison looks quite professionally made.
>http://www.av-comparatives.org/seiten/ergebnisse_2004_02.php

Art
http://www.epix.net/~artnpeg

Roy Coorne

unread,

Feb 27, 2004, 8:00:51 AM2/27/04

to

Jari Lehtonen wrote:

Many comparisons look quite professionally - but many only include
detection rates.

For me, as a user who builds his rigs at home, several qualities of an
AV scanner are important:
- detection rate
- frequency of updating
- scanning of several POP3 accounts
- handling comfort (e.g. it must be easy to activate/deactivate the
background on-access scanner)
- easy registration (no activation)

And remember: Safe Hex is at least as important as AV scanning;-)

--
This posting is provided "As Is" with no warranties and confers no rights.

Anonymous Sender

unread,

Feb 27, 2004, 9:24:50 AM2/27/04

to

"Jari Lehtonen" <spamR...@hukassa.com> escribio en mensaje
news:74cu30129g2o43m94...@4ax.com...

> Tested by AV-Comparatives organization., the Kaspersky Antivirus gets

> the best on-demand results with 99.85 of malware detected, McAfee
> seconds with 95.41.

>
> The comparison looks quite professionally made.
> http://www.av-comparatives.org/seiten/ergebnisse_2004_02.php

"AV-Comparatives Organization" is one guy Andreas Clementi.

Every VX collector wants to be an anti-virus tester these days. :-)

He has made a much improved attempt than AV-Test.org and VirusP
crap, but his test is still not "professional" quality. Kaspersky will not
detect 1531 genuine Windows viruses more than Trend Micro, for an
example. Such a ridiculous figure is not possible. Trend Micros huge
failing will be blasted by 5 million unhappy users, if it is true.

I wish him Good Luck for future testing. He seems to be trying very
hard to succeed.

nu...@zilch.com

unread,

Feb 27, 2004, 12:23:16 PM2/27/04

to

Well, how do you explain this? I noticed in the Uni Hamburge VTC test
of April 2001, Executive Summary, Note 3 ..... that Trend will no
longer be tested there because results will not be favorable to them.
They claimed that their scanner is skewed toward on-access and ITW
malware. Since the subject test is strictly on-demand and not just
ITW, the test might just be showing up what Trend itself claims.

Art
http://www.epix.net/~artnpeg

kurt wismer

unread,

Feb 27, 2004, 12:22:12 PM2/27/04

to

Jari Lehtonen wrote:

yes, so professional they neglected to provide any information on the
testing methodology used...

you can't really judge the quality of a comparative by how pretty the
table looks or now many significant digits they represent their
percentages in...

also, i'm quite suspicious of some of the numbers used... 217,000 dos
viruses? there aren't that many viruses...

--
"we're the first ones to starve, we're the first ones to die
the first ones in line for that pie in the sky
and we're always the last when the cream is shared out
for the worker is working when the fat cat's about"

nu...@zilch.com

unread,

Feb 27, 2004, 3:03:18 PM2/27/04

to

On Fri, 27 Feb 2004 12:22:12 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>Jari Lehtonen wrote:
>
>> Tested by AV-Comparatives organization., the Kaspersky Antivirus gets
>> the best on-demand results with 99.85% of malware detected, McAfee
>> seconds with 95.41%.
>>
>> The comparison looks quite professionally made.
>> http://www.av-comparatives.org/seiten/ergebnisse_2004_02.php
>
>yes, so professional they neglected to provide any information on the
>testing methodology used...
>
>you can't really judge the quality of a comparative by how pretty the
>table looks or now many significant digits they represent their
>percentages in...
>
>also, i'm quite suspicious of some of the numbers used... 217,000 dos
>viruses? there aren't that many viruses...

Depends on your criteria. Does five bytes of code difference
constitute a different virus? I'm not surprised at numbers exceeding
250,000 for all viruses (just extrapolating from past data and the
rapid increases seen) but it does seem strange that there would be
that many in the DOS category.

Art
http://www.epix.net/~artnpeg

Willie Nickels

unread,

Feb 27, 2004, 3:14:54 PM2/27/04

to

Anonymous Sender <anon...@remailer.metacolo.com> wrote in message news:<b6346cabf9051891...@remailer.metacolo.com>...

I was surprised by Trend's showing, too. It's inconsistent with
previous tests I've seen and with latest VB testing results against
ITW viruses. I will send the URL to Trend's customer support and
reqest a comment.

kurt wismer

unread,

Feb 27, 2004, 3:45:37 PM2/27/04

to

nu...@zilch.com wrote:
[snip]

>>also, i'm quite suspicious of some of the numbers used... 217,000 dos
>>viruses? there aren't that many viruses...
>
>
> Depends on your criteria.

the exact number may, but 217,000 is still outside of the ballpark...

> Does five bytes of code difference
> constitute a different virus? I'm not surprised at numbers exceeding
> 250,000 for all viruses

i am... that's nearly 3 times as many as i would expect to hear about...

> (just extrapolating from past data and the
> rapid increases seen)

rapid increases in the rate of virus writing?...

last year (or was it the year before) there was supposedly somewhere
around 80,000 total... 250,000 means that there were about 170,000
written in one year... that math works out to about one new virus every
3 minutes...

on top of that, however, 250,000 viruses would take an incredibly long
time to verify... at a modest 15 minutes per sample it would take more
than 7 years... get it done in one year you'd have to cut that time
down to 2 minutes per sample on average...

all of this seems very unlikely to me...

nu...@zilch.com

unread,

Feb 27, 2004, 3:25:50 PM2/27/04

to

On 27 Feb 2004 12:14:54 -0800, BlindNic...@hotmail.com (Willie
Nickels) wrote:

If you look at the home page of the test, you'll see that only
scanners that scored 100% ITW are included.

Art
http://www.epix.net/~artnpeg

nu...@zilch.com

unread,

Feb 27, 2004, 4:34:36 PM2/27/04

to

On Fri, 27 Feb 2004 15:45:37 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>nu...@zilch.com wrote:

>last year (or was it the year before) there was supposedly somewhere
>around 80,000 total... 250,000 means that there were about 170,000
>written in one year... that math works out to about one new virus every
>3 minutes...

I used different data ... from here:

http://www.cknow.com/vtutor/vtnumber.htm

And I extrapolated the exponential increase using 50% per year from
the year 2000. So I got:

2000 50,000
2001 75,000
2002 112,500
2003 168,750
2004 253,125

An exponential increase is not only consistent with past history but
also the number of people (both users and vxers) involved. And who
knows? Maybe the number of vxers is growing at a larger exponential
rate than that of PCs and users.

>on top of that, however, 250,000 viruses would take an incredibly long
>time to verify... at a modest 15 minutes per sample it would take more
>than 7 years... get it done in one year you'd have to cut that time
>down to 2 minutes per sample on average...

He's a busy guy, all right :) Faster than the speed of light!

>all of this seems very unlikely to me...

I'm just talking about the total number of viruses ... and I wouldn't
be surprised at 250,000 this year. But I dunno.

Art
http://www.epix.net/~artnpeg

IBK

unread,

Feb 27, 2004, 4:39:23 PM2/27/04

to

> "AV-Comparatives Organization" is one guy Andreas Clementi.
>
> Every VX collector wants to be an anti-virus tester these days. :-)
>
> He has made a much improved attempt than AV-Test.org and VirusP
> crap, but his test is still not "professional" quality. Kaspersky will not
> detect 1531 genuine Windows viruses more than Trend Micro, for an
> example. Such a ridiculous figure is not possible. Trend Micros huge
> failing will be blasted by 5 million unhappy users, if it is true.
>
> I wish him Good Luck for future testing. He seems to be trying very
> hard to succeed.

One guy that had in past academical support and now wanted that also
the public can read the results (in order to make this possible NOW
[and not in some years] and to be really independent I had to make it
without the support of the university). I am not in no way related to
illegal activities like VX, such a statement is really offensive. We
(not just 1 guy) are in contact with AV companies since years and they
agreed to be tested by us. The testing does still need improvements,
thats right, I will be at the EICAR conference in May as I have to
talk with some AV persons personally on how to improve the tests
further. TrendMicro reiceved the missed samples, if they are wrongly
listed as not detected, the results will be corrected. Anyway please
note that the testbeds contains uncommonly appearing zoo-samples that
are usually only contained in big collections like AV companies and
other organizations can have. As I said, TrendMicro received the
missed samples, and did in last days already added about 160 of the
missed windows samples and is still adding the misses. The results
just show you the state of the the 6. February. If I would test the
scanners in 2 months over the databases of the 4 february, probably
all would have scores at least over 95%. As detection rates changes
really fast, you can be sure that you are protected enough against
viruses/malware that can really reach you (itw). Anyway I want to
remind all that the tested products are all really good scanners with
very high detection rates; even if our databases are quite large, they
are still just a "little" part of virus/malware samples that exist.
If someone does not like the test or does not appreciate our hard work
(that we do for free and cost us much time & money), he is free to do
not look at it or to pay attention to it ;-)
Thanks for your wishes! Please understand that due lack of time, the
data provided to the public is not much; the main address of the data
we do provide to AV companies if possible, in order that they can
improve their scanners and give even more protection to users.

P.S.: I am nearly never on newsgroups, so I can not answer always.
P.P.S: Sorry for my non-perfect english

IBK

unread,

Feb 27, 2004, 5:01:27 PM2/27/04

to

Anonymous Sender <anon...@remailer.metacolo.com> wrote in message news:<b6346cabf9051891...@remailer.metacolo.com>...

> "Jari Lehtonen" <spamR...@hukassa.com> escribio en mensaje
> news:74cu30129g2o43m94...@4ax.com...
>

> He has made a much improved attempt than AV-Test.org and VirusP
> crap, but his test is still not "professional" quality.

i forgot one note: AV-Test.org makes really good and professional
tests; AV-Test.org has testbeds of excellent quality, as every samples
is replicated (I suppose that VX collectors like VirusP does not do
that). Anyway in every big testbeds there are some badsamples, our
testbeds needs still much improvements, during next months we will
recheck again the testbeds better, so that next time (in August) the
testbeds would be of better quality than now. Improvements are always
necessary. As we know that our tests are still far from the
perfection, we do everything for free; and if something is to pay, I
have to pay it from my own money (as I am still a student with limited
time/money, you can understand that I would like to see that my
efforts are appreciated and not pushed down from Users on some
forums). Well, I believe in the good part of what I do, I hope also
some others will appreciate this.

Jari Lehtonen

unread,

Feb 27, 2004, 5:14:20 PM2/27/04

to

On 27 Feb 2004 13:39:23 -0800, vsa...@web.de (IBK) wrote:

>Thanks for your wishes! Please understand that due lack of time, the
>data provided to the public is not much; the main address of the data
>we do provide to AV companies if possible, in order that they can
>improve their scanners and give even more protection to users.
>

Thank you very much for replying here and explaining a little more. As
I said before the test seems to be very professionally made and the
results are in line with other (good) tests too.

Can you please inform here in the future when you publish new tests in
your site? We are certainly interested in follow the develepment of
AV-products.

Jari Lehtonen

Blevins

unread,

Feb 27, 2004, 6:23:18 PM2/27/04

to

"Jari Lehtonen" <spamR...@hukassa.com> wrote in message
news:74cu30129g2o43m94...@4ax.com...

I'm sure that for most users, both McAfee and KAV are equally adequate
regardless of the slight variations of results on any test. Both put a huge
amount of work into their products and both use quality scanning engines.

kurt wismer

unread,

Feb 27, 2004, 6:36:57 PM2/27/04

to

IBK wrote:
[snip]

> that). Anyway in every big testbeds there are some badsamples,

not necessarily...

> our
> testbeds needs still much improvements, during next months we will
> recheck again the testbeds better, so that next time (in August) the
> testbeds would be of better quality than now.

well, that should prove an interesting challenge... rechecking nearly
300,000 samples in 5 months gives you about 45 seconds per sample...
better get cracking...

> Improvements are always
> necessary. As we know that our tests are still far from the
> perfection,

it seems they're not just far from perfect, they're far from the best
you expect to be able to do aswell... which makes me wonder why you did
it... why perform the test when you know you aren't going to do the
best possible job you can at it?

> we do everything for free;

which is no excuse for conducting a test before being prepared to do it
properly...

i'm not trying to be insulting, but you've just told us that the
integrity of your test bed is not certain - that means your results are
suspect and shouldn't be trusted... i understand testing takes time and
effort and even money, but i also know that if you cut corners you wind
up with compromised data... it's bad science...

George Orwell

unread,

Feb 27, 2004, 8:02:03 PM2/27/04

to

<nu...@zilch.com> escribio en mensaje news:nquu30tcfb75rdo3i...@4ax.com...

> Well, how do you explain this? I noticed in the Uni Hamburge VTC test
> of April 2001, Executive Summary, Note 3 ..... that Trend will no
> longer be tested there because results will not be favorable to them.
> They claimed that their scanner is skewed toward on-access and ITW
> malware. Since the subject test is strictly on-demand and not just
> ITW, the test might just be showing up what Trend itself claims.

It is a very sad state of affairs, when anti-virus companies must add detection
of crap simply to "pass" some test. :-(

Anonymous via the Cypherpunks Tonga Remailer

unread,

Feb 27, 2004, 8:53:33 PM2/27/04

to

"IBK" <vsa...@web.de> escribio en mensaje news:3f9b3fb.04022...@posting.google.com...

>
> i forgot one note: AV-Test.org makes really good and professional
> tests

A test is only as good as the testbed and the methodology and the competence
of the tester. AV-Test org lacks in all three areas.

> AV-Test.org has testbeds of excellent quality, as every samples is replicated

I am sorry to say, that is not true, Andreas Marx himself personally has admitted
this much.

> (I suppose that VX collectors like VirusP does not do that).

But have YOU replicated every sample in YOUR collection? :-)

> Anyway in every big testbeds there are some badsamples,

This should not be so. It can lead only to flawed results. Virus Bulletin testbed has
NO bad samples. This I know as a fact, from many years.

> our testbeds needs still much improvements, during next months we will recheck
> again the testbeds better,

That is good. I know your credentials, but not you personally. I am sure you strive
for betterment.

> (as I am still a student with limited time/money,

I know, it is hard for you. You can not look to anti-virus companies for financial
support without tainting your independent status. This is the Devil of academia,
"Do I take a grant from a pharmaceutical company and accept future suspicion of
bias in my research, or do I keep my integrity and live in poverty for another year."
:-)

> you can understand that I would like to see that my efforts are appreciated and
> not pushed down from Users on some forums).

I appreciate your efforts, and I am not in any way trying to push you down. Sorry for
the mis-understanding. My bad. :-(

You are greatly ahead of af all the other present day "independent" anti-virus testers
already, with just one published test, and I feel sure you will imrove, and stay ahead.

Good Luck!

George Orwell

unread,

Feb 27, 2004, 9:50:12 PM2/27/04

to

"IBK" <vsa...@web.de> escribio en mensaje news:3f9b3fb.0402...@posting.google.com...

> I am not in no way related to illegal activities like VX, such a statement is
> really offensive.

Sorry Andreas, my bad use of English.

I meant not to be offensive, to suggest YOU are VX like VirusP. (Although, I
have heard, he is trying hard to become a White Hat.) :-)

With "VX", I referred to your collection, not you.

> Anyway please note that the testbeds contains uncommonly appearing
> zoo-samples that are usually only contained in big collections like AV
> companies

Do you have access to AV company collections?

> and other organizations can have.

"other organizations" is the weak link. :-)

> even if our databases are quite large, they are still just a "little" part of
> virus/malware samples that exist.

I may be wrong, but I am in doubt that your collection contains any genuine
virus that is not owned by Virus Bulletin already. But perhaps many samples in
your collection are not a genuine virus?

I know your credentials, you are a White Hat already, and I know you are trying
your very best. But like all University testers, your testbed is obtained from
sources outside the legitimate anti-virus circle, like VX collections. Until every
sample is replicated and fully verified, your tests can not possibly be accurate.

Looking back in a few months, you might think it would have been better to
have delayed your public debut as an anti-virus tester until you had cleaned up
your testbed. :-)

I bear you no malice. Quite the opposite. I simply make the point, at this
present time, your test is good, but it is not fully "professional" quality. But,
I am sure, you will improve. You have the desire to succeed, and you do not
antagonize professional anti-virus people by trying to lie your way out of your
mistakes.

IBK

unread,

Feb 28, 2004, 2:41:39 AM2/28/04

to

I am not going to recheck 300.000 samples in 5 months :-P
I will this time concentrate more on the missed samples, as they have
higher probability to be better removed from the collections. Anyway,
the results would really not change that much, but I of course want
the best possible quality for the testbeds too.

IBK

unread,

Feb 28, 2004, 4:30:59 AM2/28/04

to

Yes, I have access to AV collection (but not all). And even those does
sometimes contain some samples that need to be removed. I also
received in past from forum members or sites that provides virus
analysis their collections; I must say that such collections are
really of bad quality and it takes quite much time to filter them out.
I guess that peoples like underground virus collectors simply do
collect everything without checking the files further, and part of
those collections are then spread/send around various AV companies or
other persons that would like to have some samples.
VB has (as far as i know) access to much more collections than me, the
difference is probably that they wait a bit longer before including
them into the test set, in order that all av companies have months of
time to add detection for them (i just suppose ;-).

Clive

unread,

Feb 28, 2004, 5:57:01 AM2/28/04

to

Simple answer that many probably agree with......

Let's say you have a AV prgram that gains 100% success in every test - BUT,
slows your system down a lot! (Symantec, Mcafee KAV...)

AND you have an AV that scores 90% success, but with limited effect on
system resources......

I know which I would go for... :-)

Clive

Jari Lehtonen

unread,

Feb 28, 2004, 7:28:54 AM2/28/04

to

On Fri, 27 Feb 2004 14:02:39 +0200, Jari Lehtonen
<spamR...@hukassa.com> wrote:

>The comparison looks quite professionally made.
>http://www.av-comparatives.org/seiten/ergebnisse_2004_02.php
>

I have been reading peoples reactions to this test, and they seem to
be quite critical and they have the attitute tahat "they know better
how to test AV-products".

I really wouid like that those experts could tell me where I can find
a nonbiased, professionally made, scientifical and reliable test for
antivirus programs. As far I haven't. I personally thing Virus
Bulletins 100 test is crap. Nod32 always gets 100% so does Norton. And
for my own experience and what people have written here Norton is far
from perfect scanner.

Jari

nu...@zilch.com

unread,

Feb 28, 2004, 7:51:46 AM2/28/04

to

On Sat, 28 Feb 2004 10:57:01 GMT, "Clive" <som...@NOTHANKS.com>
wrote:

But there's nothing to stop you from using a top notch scanner
on-demand as a backup. In fact, if you have a clue and practice safe
hex, on-demand scanning is all you need. You can (almost) do without a
resident scanner completely in that case, since KAV, RAV and Dr Web
(at least) have single file upload av scan sites. I say "almost" since
there are restrictions on file size you can upload for scanning. And
it's _always_ a good idea to get more than one "opinion" on suspect
files.

The realtime scanner market is aimed at the clueless and those
concerned with PCs not directly/complely under their control. In any
case realtime scanners are aimed at the clueless end user problem.
They've become bloated monstrosities. And no single av scanner
realtime offers the kind of real protection safe hex affords.

I know which _ones_ I go for :)

Art
http://www.epix.net/~artnpeg

Tweakie

unread,

Feb 28, 2004, 8:07:40 AM2/28/04

to

On Sat, 28 Feb 2004, Clive wrote:

> Simple answer that many probably agree with......
>
> Let's say you have a AV prgram that gains 100% success in every test - BUT,
> slows your system down a lot! (Symantec, Mcafee KAV...)
>
> AND you have an AV that scores 90% success, but with limited effect on
> system resources......
>
> I know which I would go for... :-)
>

The big problem is : can you show us an indepndant and rigorous test
showing the impact of different AVs on system performances ?

I know two recent tests that give some hints for a few AV products :

http://www.pcmag.com/image_popup/0,3003,s=400&iid=53574,00.asp
(impact of AVs on Winstone benchmark - October 03)

http://www.rokop-security.de/main/article.php?sid=693&mode=thread&order=0
(Nb processes, RAM used and scan speed - January && February 04)

On-access scan speed, that have an impact on system perfs. is also
mentionned here :

http://www.f-secure.de/tests/ctvergleichstest0304.pdf

The fastest is eTrust using VET engine. Nod, Sophos and McAffee
are less that 25% slower than it. Slowest are KAV and AVs using
several scanning engines : F-Secure and AVK.

--
Tweakie

IBK

unread,

Feb 28, 2004, 8:25:28 AM2/28/04

to

As promised I fixed again the results and the documents. Fixed results
are now online; anyway as i predicted the resutls does not change that
much even if the remaining garbage is removed. Anyway I will not
change again the results, but I will notify everyone if some more
issues are found. I better like to admit errors than provide bad data
(even if this brings bad light on me).

@AV companies: please update the documents from my webftp and get the
data regarding the bad samples.

nu...@zilch.com

unread,

Feb 28, 2004, 11:13:00 AM2/28/04

to

On 28 Feb 2004 05:25:28 -0800, NO-...@av-comparatives.org (IBK)
wrote:

>As promised I fixed again the results and the documents. Fixed results
>are now online; anyway as i predicted the resutls does not change that
>much even if the remaining garbage is removed. Anyway I will not
>change again the results, but I will notify everyone if some more
>issues are found. I better like to admit errors than provide bad data
>(even if this brings bad light on me).

Why did you eliminated the "crud" detection results? Was there
pressure from one or more av vendors to do this?

Also, how do you account for the huge number of DOS viruses you used?
As has been pointed out, the number seems to exceed estimates of the
total number of viruses of all kinds. And were they all tested for
viability?

Art
http://www.epix.net/~artnpeg

Nick FitzGerald

unread,

Feb 28, 2004, 12:18:14 PM2/28/04

to

<nu...@zilch.com> to Kurt:

> >last year (or was it the year before) there was supposedly somewhere
> >around 80,000 total... 250,000 means that there were about 170,000
> >written in one year... that math works out to about one new virus every
> >3 minutes...
>
> I used different data ... from here:
>
> http://www.cknow.com/vtutor/vtnumber.htm

And what precisely is Tom's source for his numbers?

Don't get me wrong, it's a good page, but certainly not a reliable source
for the kinds of numbers (and other information) you need to make the
guesstimates you go on to make below...

> And I extrapolated the exponential increase using 50% per year from
> the year 2000. So I got:

Hmmmmm -- "exponential" maybe, but 50% per annum is way too high.

> 2000 50,000
> 2001 75,000
> 2002 112,500
> 2003 168,750
> 2004 253,125
>
> An exponential increase is not only consistent with past history but
> also the number of people (both users and vxers) involved. And who
> knows? Maybe the number of vxers is growing at a larger exponential
> rate than that of PCs and users.

Wrong.

For one, Tom's numbers leading up to that 2000 number of 50,000 are not
well bounded. Also, you appear to have failed to allow for the "some
schmuck wasted great gobs of his time generating approximately 15,000
trivial DOS viruses with a kit" factor a few years back. Some scanner
developers did not add 15,000 to their detection counts as they already
detected all (or almost all) of these "new" viruses because they had
good generic detection of that kit's output, but eventually all (?)
developers have added that 15,000 to their count to keep it (roughly)
in line with all the other developers. This happened over the course
of two or three calendar years and as Tom's source is unclear, it is
equally unclear whether his 2000 figure is exaggerate by this factor or
not. Of course, extrapolating from data that does or does not contain
such a massively distorting one-off "oddball" event is bound to be
fraught with problems (even if you otherwise get the right curve and
growth factor...).

<<snip>>

> I'm just talking about the total number of viruses ... and I wouldn't
> be surprised at 250,000 this year. But I dunno.

I would be -- that number seems to me, as it clearly does to Kurt too,
to be out by a factor of approximately 2.5 to 3.

It seems that when Andreas says "X viruses" he means "X samples of some
unknown number of viruses". There is a _very_ important distinction
here that the editors of Virus Bulletin previous to me took great pains
to point out and ensure that the VB tests did not fudge...

Of course, properly classifying which proven viral files are samples of
the same, and which samples of different viruses, is a major research
undertaking and for very many viruses it will be a much more time-
consuming effort than actually proving that a sample is viral in some
real-world environment.

--
Nick FitzGerald

kurt wismer

unread,

Feb 28, 2004, 12:05:11 PM2/28/04

to

Jari Lehtonen wrote:
> On Fri, 27 Feb 2004 14:02:39 +0200, Jari Lehtonen
> <spamR...@hukassa.com> wrote:
>
>
>>The comparison looks quite professionally made.
>>http://www.av-comparatives.org/seiten/ergebnisse_2004_02.php
>>
>
> I have been reading peoples reactions to this test, and they seem to
> be quite critical and they have the attitute tahat "they know better
> how to test AV-products".
>
> I really wouid like that those experts could tell me where I can find
> a nonbiased, professionally made, scientifical and reliable test for
> antivirus programs.

personally, i think the virus test centre at uni-hamburg
(http://agn-www.informatik.uni-hamburg.de/vtc/naveng.htm) does a really
good job of documenting their methodology (absolutely necessary for the
review of a scientific test)...

> As far I haven't. I personally thing Virus
> Bulletins 100 test is crap.

it's not a test, it's a byproduct of their *real* comparative... an
example can be found in
http://www.virusbtn.com/magazine/archives/pdf/2003/200308.pdf ... i'm
not quite as thrilled with the availability of documentation for the
methodology of these tests though - there may be more documentation
than what i've found, but it should be easier to find...

> Nod32 always gets 100% so does Norton. And
> for my own experience and what people have written here Norton is far
> from perfect scanner.

there is no test that can show how well an anti-virus will deal with
new viruses in the future, nor is there any test that can show how
usable or efficient an anti-virus is, nor how well it deals with
non-virus issues... comparative reviews are of only limited utility
when judging an anti-virus....

kurt wismer

unread,

Feb 28, 2004, 12:19:01 PM2/28/04

to

IBK wrote:

oh well, so much for any hope of scientifically valid results... you
are obviously not willing to sacrifice your *big* testbed for the sake
of scientific rigor by only using the samples that you've verified...
it's a shame you value size over quality...

nu...@zilch.com

unread,

Feb 28, 2004, 12:48:09 PM2/28/04

to

On Sun, 29 Feb 2004 06:18:14 +1300, "Nick FitzGerald"
<ni...@virus-l.demon.co.uk> wrote:

><nu...@zilch.com> to Kurt:
>
>> >last year (or was it the year before) there was supposedly somewhere
>> >around 80,000 total... 250,000 means that there were about 170,000
>> >written in one year... that math works out to about one new virus every
>> >3 minutes...
>>
>> I used different data ... from here:
>>
>> http://www.cknow.com/vtutor/vtnumber.htm
>
>And what precisely is Tom's source for his numbers?

I have no idea, but the exponential growth info and the number 50,000
by the year 2,000 didn't seem out of whack to me.

>Don't get me wrong, it's a good page, but certainly not a reliable source
>for the kinds of numbers (and other information) you need to make the
>guesstimates you go on to make below...
>
>> And I extrapolated the exponential increase using 50% per year from
>> the year 2000. So I got:
>
>Hmmmmm -- "exponential" maybe, but 50% per annum is way too high.
>
>> 2000 50,000
>> 2001 75,000
>> 2002 112,500
>> 2003 168,750
>> 2004 253,125

Not based on the historical info he gave where in some years there was
100% or more increase. I picked 50% as a kind of rough mean of the
historical info.

>> An exponential increase is not only consistent with past history but
>> also the number of people (both users and vxers) involved. And who
>> knows? Maybe the number of vxers is growing at a larger exponential
>> rate than that of PCs and users.
>
>Wrong.
>
>For one, Tom's numbers leading up to that 2000 number of 50,000 are not
>well bounded. Also, you appear to have failed to allow for the "some
>schmuck wasted great gobs of his time generating approximately 15,000
>trivial DOS viruses with a kit" factor a few years back.

Wrong! :) That's precisely what I did and do think and brought up with
Kurt (not in so many words) when I asked if a five byte difference
counts as a different virus.

>Some scanner
>developers did not add 15,000 to their detection counts as they already
>detected all (or almost all) of these "new" viruses because they had
>good generic detection of that kit's output, but eventually all (?)
>developers have added that 15,000 to their count to keep it (roughly)
>in line with all the other developers. This happened over the course
>of two or three calendar years and as Tom's source is unclear, it is
>equally unclear whether his 2000 figure is exaggerate by this factor or
>not. Of course, extrapolating from data that does or does not contain
>such a massively distorting one-off "oddball" event is bound to be
>fraught with problems (even if you otherwise get the right curve and
>growth factor...).
>
><<snip>>
>> I'm just talking about the total number of viruses ... and I wouldn't
>> be surprised at 250,000 this year. But I dunno.
>
>I would be -- that number seems to me, as it clearly does to Kurt too,
>to be out by a factor of approximately 2.5 to 3.
>
>It seems that when Andreas says "X viruses" he means "X samples of some
>unknown number of viruses". There is a _very_ important distinction
>here that the editors of Virus Bulletin previous to me took great pains
>to point out and ensure that the VB tests did not fudge...

I know that, Nick. It obviously all depends on exactly what you're
counting.

>Of course, properly classifying which proven viral files are samples of
>the same, and which samples of different viruses, is a major research
>undertaking and for very many viruses it will be a much more time-
>consuming effort than actually proving that a sample is viral in some
>real-world environment.

Hey, I realized much of this years ago. And if it was far from a one
man effort years ago, it's far worse now.

Art
http://www.epix.net/~artnpeg

Nick FitzGerald

unread,

Feb 28, 2004, 12:50:33 PM2/28/04

to

"IBK" <vsa...@web.de> wrote:

> Yes, I have access to AV collection (but not all). And even those does

> sometimes contain some samples that need to be removed. ...

Where "some" could easily be as high as 10%...

Some of these collections contain not only "known good" but also "known
bad" samples. The latter may seem odd at first glance, but it is usually
because it was easier to add detection of them than to explain to some
dumb-arse luser that detecting a non-virus in some utterly screwed and
corrupted file they are complaining about is not only not necessary but a
waste of your and their time. Such samples are normally well labelled
(being in separate folders or sample trees and the like).

Some of these collections though, have a nasty habit of including all
files created by the associated malware, even if they have no malicious
use or purpose per se.

> ... I also

> received in past from forum members or sites that provides virus
> analysis their collections; I must say that such collections are
> really of bad quality and it takes quite much time to filter them out.
> I guess that peoples like underground virus collectors simply do

> collect everything without checking the files further, ...

Of course.

"Underground VX" is a huge dick-waving club. Have you ever heard the
expression "never mind the quality -- feel the width"? That applies in
spades to the underground VX scene.

> ... and part of

> those collections are then spread/send around various AV companies or
> other persons that would like to have some samples.

Yep and even the respectable AV'ers are sometimes "forced" to add
detection of such crap because some of their competitors have and thus
incompetent testers says "ScamScan detects it and your scanner doesn't,
therefore your product is worse". If such files keep popping up in the
test-sets used in "influential' tests (say a regular test in a large,
high-volume though perhaps low-quality popular monthly computer
magazine), even the principled developers like Frisk have to bend and
add detection of such crap (much like some of the macro garbage you
sent FSI a while back...).

> VB has (as far as i know) access to much more collections than me, the
> difference is probably that they wait a bit longer before including
> them into the test set, in order that all av companies have months of
> time to add detection for them (i just suppose ;-).

Your suggestion that VB may add samples taken straight from vendor-
supplied collections to its test-sets is quite offensive. Although I
cannot speak with certainty for the current situation, I seriously hope
things have not changed for the worse as you imply. When I was there
_no_ samples went into the test-set that either I or other VB technical
staff had not verified were viral. Further, the standard approach for
determining virality was:

1. Take original sample from whatever source
2. Replicate it
3. Take replicants from 2 and test for replicability for two more
generations
4. Throw out any samples from 2 that did not pass test 3
5. Randomly select one or more (this depended on all manner of issues
that don't matter here) samples from 2 for inclusion in the final
test-set
6. Randomly select one or two (or a suitable number of "representative
samples") from the remaining samples from 2 for inclusion in the
"missed samples" set. This was the set that developers were sent
samples from if they missed a sample of a virus and provided a good
mechanism for providing a "confirming sample" without ever having
to disclose a sample from the test-set itself -- arguably you
should remove any sample from the test-set ever sent to a developer
to remove the possibility of them cheating on a future test by
simply adding detection of your samples they missed (and yes,
there are known and documented cases of this happening!).

Boot infectors were not necessarily treated quite this way though...

--
Nick FitzGerald

Jari Lehtonen

unread,

Feb 28, 2004, 12:53:10 PM2/28/04

to

On Sat, 28 Feb 2004 12:05:11 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>

>personally, i think the virus test centre at uni-hamburg
>(http://agn-www.informatik.uni-hamburg.de/vtc/naveng.htm) does a really
>good job of documenting their methodology (absolutely necessary for the
>review of a scientific test)...
>

But it does not help much if your newest test si soon 1 year old. The
virus (and Antivirus) world is quite different now than it was April
2003

So we come to the conclusion that Av products are quite impossible to
test or compare?

Jari

kurt wismer

unread,

Feb 28, 2004, 1:23:25 PM2/28/04

to

Jari Lehtonen wrote:

> On Sat, 28 Feb 2004 12:05:11 -0500, kurt wismer <ku...@sympatico.ca>
> wrote:
>
>
>>personally, i think the virus test centre at uni-hamburg
>>(http://agn-www.informatik.uni-hamburg.de/vtc/naveng.htm) does a really
>>good job of documenting their methodology (absolutely necessary for the
>>review of a scientific test)...
>>
>
> But it does not help much if your newest test si soon 1 year old. The
> virus (and Antivirus) world is quite different now than it was April
> 2003

i was under the impression you wanted examples of good tests so as to
learn what they looked like and therefore be better able to judge the
quality of arbitrary tests you came across... the uni-hamburg tests
satisfy that criterion in at least one important way...

you did not ask for a *current* test... if you had i would have been
much more suspicious of what you ultimately hoped to do with it... the
main benefit of a current test over an older one is that you would be
able to compare the data points - but you can't really judge the
quality of a test by looking at the data points, you have to look at
how the test was performed...

> So we come to the conclusion that Av products are quite impossible to
> test or compare?

i fail to see how you've jumped to that conclusion...

kurt wismer

unread,

Feb 28, 2004, 1:32:57 PM2/28/04

to

nu...@zilch.com wrote:
[snip]

> Also, how do you account for the huge number of DOS viruses you used?
> As has been pointed out, the number seems to exceed estimates of the
> total number of viruses of all kinds. And were they all tested for
> viability?

he seems to have made it quite clear that his test bed has not been
cleared of extraneous samples...

nu...@zilch.com

unread,

Feb 28, 2004, 2:09:11 PM2/28/04

to

On Sat, 28 Feb 2004 13:23:25 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>> But it does not help much if your newest test si soon 1 year old. The

>> virus (and Antivirus) world is quite different now than it was April
>> 2003
>
>i was under the impression you wanted examples of good tests so as to
>learn what they looked like and therefore be better able to judge the
>quality of arbitrary tests you came across... the uni-hamburg tests
>satisfy that criterion in at least one important way...
>
>you did not ask for a *current* test... if you had i would have been
>much more suspicious of what you ultimately hoped to do with it... the
>main benefit of a current test over an older one is that you would be
>able to compare the data points - but you can't really judge the
>quality of a test by looking at the data points, you have to look at
>how the test was performed...

But you're not addressing Jari's question. Where does one look for
quality _and_ recent detection tests of a broad (zoo, Trojans,etc.)
nature? Or don't you consider "recent" (the last 3 months or so) to be
important?

It's my impression that some scanners are changing rather rapidly.
Some are now adding Trojan detection, for example, where in the past
they ignored the category. Some may be moving away from broad
detection and focusing on ITW. Some may be dropping detection of old
and outdated (DOS) malware.

Lacking quality and recent broad based tests, how can we determine the
current trends?

Actually, the so called "unscientific" tests do at least indicate such
trends. And in this way they do have value. Also, imagine what would
happen in the improbable scenario that product X turned honest and
quit alerting on "crud". That would certainly show up in the
"unscientific" tests :)

Art
http://www.epix.net/~artnpeg

kurt wismer

unread,

Feb 28, 2004, 3:31:21 PM2/28/04

to

nu...@zilch.com wrote:

> On Sat, 28 Feb 2004 13:23:25 -0500, kurt wismer <ku...@sympatico.ca>
> wrote:
>
>
>>>But it does not help much if your newest test si soon 1 year old. The
>>>virus (and Antivirus) world is quite different now than it was April
>>>2003
>>
>>i was under the impression you wanted examples of good tests so as to
>>learn what they looked like and therefore be better able to judge the
>>quality of arbitrary tests you came across... the uni-hamburg tests
>>satisfy that criterion in at least one important way...
>>
>>you did not ask for a *current* test... if you had i would have been
>>much more suspicious of what you ultimately hoped to do with it... the
>>main benefit of a current test over an older one is that you would be
>>able to compare the data points - but you can't really judge the
>>quality of a test by looking at the data points, you have to look at
>>how the test was performed...
>
>
> But you're not addressing Jari's question. Where does one look for
> quality _and_ recent detection tests of a broad (zoo, Trojans,etc.)
> nature? Or don't you consider "recent" (the last 3 months or so) to be
> important?

i consider it important, but i accept that there can't always be recent
data available...

> It's my impression that some scanners are changing rather rapidly.
> Some are now adding Trojan detection, for example, where in the past
> they ignored the category. Some may be moving away from broad
> detection and focusing on ITW. Some may be dropping detection of old
> and outdated (DOS) malware.
>
> Lacking quality and recent broad based tests, how can we determine the
> current trends?

wait until another good quality test is performed...

> Actually, the so called "unscientific" tests do at least indicate such
> trends.

no they don't...

> And in this way they do have value. Also, imagine what would
> happen in the improbable scenario that product X turned honest and
> quit alerting on "crud". That would certainly show up in the
> "unscientific" tests :)

you clearly have the mistaken impression that 'unscientific' tests are
still well defined enough to derive some valid conclusions from them...

lets perform a thought experiment, shall we? lets consider a test that
has 3 samples... the first sample is virus A and the second 2 samples
are virus B... the test, being unscientific, counts all 3 as separate
viruses.... scanner X misses virus A, scanner Y misses virus B - the
results *should* be 50% for both but because of the improper
methodology it turns out to be 66% for scanner X and 33% for scanner Y...

this illustrates just one of the reasons why unscientific tests cannot
be trusted on *any* level... there is no way to be sure any
interpretation of the results is correct because of the lack of proper
methodology... if the conclusions one reaches happen to be accurate, it
is purely coincidental...

kurt wismer

unread,

Feb 28, 2004, 4:25:41 PM2/28/04

to

Albugo Candida wrote:

> On Sat, 28 Feb 2004 13:32:57 -0500, kurt wismer <ku...@sympatico.ca>
> wrote:
>
>>he seems to have made it quite clear that his test bed has not been
>>cleared of extraneous samples...
>

> Yet mine has. In my scientific tests, KAV out performed every
> competitor.

if it's such a good test why don't you publish?

nu...@zilch.com

unread,

Feb 28, 2004, 4:45:19 PM2/28/04

to

On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

That's your opinion. It's not mine.

>> Actually, the so called "unscientific" tests do at least indicate such
>> trends.
>
>no they don't...
>
>> And in this way they do have value. Also, imagine what would
>> happen in the improbable scenario that product X turned honest and
>> quit alerting on "crud". That would certainly show up in the
>> "unscientific" tests :)
>
>you clearly have the mistaken impression that 'unscientific' tests are
>still well defined enough to derive some valid conclusions from them...

Mistaken impression???

>lets perform a thought experiment, shall we? lets consider a test that
>has 3 samples... the first sample is virus A and the second 2 samples
>are virus B... the test, being unscientific, counts all 3 as separate
>viruses.... scanner X misses virus A, scanner Y misses virus B - the
>results *should* be 50% for both but because of the improper
>methodology it turns out to be 66% for scanner X and 33% for scanner Y...

You're off on a tangent that I wasn't even talking about.

>this illustrates just one of the reasons why unscientific tests cannot
>be trusted on *any* level... there is no way to be sure any
>interpretation of the results is correct because of the lack of proper
>methodology... if the conclusions one reaches happen to be accurate, it
>is purely coincidental...

Bull. If I have a large test bed which includes many Trojans and
product X failed to alert on 95% of them and a year later product X
alerts on 70% of them I'm justified in drawing the conclusion that
product X has been addressing their lack of Trojan detection, no?
It doesn't even matter for this purpose that 10% of my samples aren't
viable. I'm just looking for a major _change_ in the detection
characteristics of a product.

Similarly, if product Y alerted on 95% of the old DOS viruses in my
collection last yeat and today it alerts on 10% of them, I'm justified
in concluding that a major change in direction has been made by the
producers of product Y.

And so on. These are the kinds of things a unscientific test bed can
be useful for. There are others. I can use my unscientific test bed to
evaluate scanner unpacking capabilities (as a matter of convenience)
since vxer collections are full of unusual packers and the malware is
sometimes multiply packed and packed with more than one packer to
confuse scanners. It soon becomes clear to me which scanner(s) I want
to use for on-demand scanning (It was KAV DOS to anyone that cares :))

Art
http://www.epix.net/~artnpeg

kurt wismer

unread,

Feb 28, 2004, 7:11:53 PM2/28/04

to

nu...@zilch.com wrote:
> On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca>

>>nu...@zilch.com wrote:
[snip]

>>>Lacking quality and recent broad based tests, how can we determine the
>>>current trends?
>>
>>wait until another good quality test is performed...
>
> That's your opinion. It's not mine.

it's not an opinion, it's an instruction... you asked how, i told you
how...

>>>Actually, the so called "unscientific" tests do at least indicate such
>>>trends.
>>
>>no they don't...
>>
>>
>>>And in this way they do have value. Also, imagine what would
>>>happen in the improbable scenario that product X turned honest and
>>>quit alerting on "crud". That would certainly show up in the
>>>"unscientific" tests :)
>>
>>you clearly have the mistaken impression that 'unscientific' tests are
>>still well defined enough to derive some valid conclusions from them...
>
> Mistaken impression???
>
>>lets perform a thought experiment, shall we? lets consider a test that
>>has 3 samples... the first sample is virus A and the second 2 samples
>>are virus B... the test, being unscientific, counts all 3 as separate
>>viruses.... scanner X misses virus A, scanner Y misses virus B - the
>>results *should* be 50% for both but because of the improper
>>methodology it turns out to be 66% for scanner X and 33% for scanner Y...
>
> You're off on a tangent that I wasn't even talking about.

correction, it may not be what you meant, but it most definitely is
what you were talking about... you said only "unscientific tests", you
didn't specify what kind of unscientific test... clearly my example is
an unscientific test...

>>this illustrates just one of the reasons why unscientific tests cannot
>>be trusted on *any* level... there is no way to be sure any
>>interpretation of the results is correct because of the lack of proper
>>methodology... if the conclusions one reaches happen to be accurate, it
>>is purely coincidental...
>
>
> Bull. If I have a large test bed which includes many Trojans and
> product X failed to alert on 95% of them and a year later product X
> alerts on 70% of them I'm justified in drawing the conclusion that
> product X has been addressing their lack of Trojan detection, no?

no... i've just presented an example which shows how extreme
differences in results can be caused by the detection or non-detection
of a single piece of malware that has multiple instances in the testbed...

> It doesn't even matter for this purpose that 10% of my samples aren't
> viable. I'm just looking for a major _change_ in the detection
> characteristics of a product.

viability of the samples is not the only concern when it comes to
testbed integrity... uniqueness is also a concern, a big concern, and
it's even harder to address than viability...

> Similarly, if product Y alerted on 95% of the old DOS viruses in my
> collection last yeat and today it alerts on 10% of them, I'm justified
> in concluding that a major change in direction has been made by the
> producers of product Y.

if you think so then you are clearly operating under unstated
assumptions about the nature of that collection...

as such i would point you to
http://www.infidels.org/news/atheism/logic.html#alterapars

we can't get very far in a discussion if we don't start out on the same
page... this latest real life test has underlined for me the importance
of being aware of the assumptions we make about test beds so that we
can judge whether or not they're appropriate... uniqueness is something
that is easily overlooked... if one hasn't taken the time to weed out
the non-viable samples you can bet the duplicates are there too...

Roy Coorne

unread,

Feb 28, 2004, 7:43:33 PM2/28/04

to

Robert de Brus wrote:

...
>
> 99%? So basically what this means is that one can still get infected?
>
> Obviously it's rubbish!
>

Nobody & no thing is perfect.

Statistically, even an event with probability Zero _may_ happen, and
at once.

AV scanners are necessarily not up-to-the-very-hour. (Look at their
updating frequency.)

Don't forget Safe Hex.

So what is rubbish?

Roy

--
This posting is provided "As Is" with no warranties and confers no rights.

nu...@zilch.com

unread,

Feb 28, 2004, 8:28:51 PM2/28/04

to

On Sat, 28 Feb 2004 19:11:53 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>nu...@zilch.com wrote:

>> On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca>
>>>nu...@zilch.com wrote:
>[snip]
>>>>Lacking quality and recent broad based tests, how can we determine the
>>>>current trends?
>>>
>>>wait until another good quality test is performed...
>>
>> That's your opinion. It's not mine.
>
>it's not an opinion, it's an instruction... you asked how, i told you
>how...

Wrong. It's an opinion and a wrong one. There's no need to wait for
the next quality test to detect trends. And I've explained why. End of
subject.

Art
http://www.epix.net/~artnpeg

FromTheRafters

unread,

Feb 28, 2004, 9:03:32 PM2/28/04

to

<nu...@zilch.com> wrote in message news:5e0240dsla0vf3gum...@4ax.com...

> On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca> wrote:

[snip]

> >lets perform a thought experiment, shall we? lets consider a test that
> >has 3 samples... the first sample is virus A and the second 2 samples
> >are virus B... the test, being unscientific, counts all 3 as separate
> >viruses.... scanner X misses virus A, scanner Y misses virus B - the
> >results *should* be 50% for both but because of the improper
> >methodology it turns out to be 66% for scanner X and 33% for scanner Y...
>
> You're off on a tangent that I wasn't even talking about.

I, too, see apples and oranges here. Kurt is talking about the overall
"test" methodology I think, and not just the tester's data set maintenance
methodology.

I think that you are right in assuming that a flawed data set can
still be good for a time-difference comparison between each
vendor's' versions. The *same* flawed data set and the *same*
flawed test methodology can indeed be called a "comparison"
test between versions.

A nearly ideal test would have a data set more representative of what
is likely to be seen in the field. Once you force restrictions on the test
set it starts to become skewed. It is not practical for a tester to have
a representative cross section of all existing programs so concessions
must be made. After putting all of the necessary restrictions on the data
set and the test method, you are making the AV strive to attain a less
than optimum goal. They will strive to be the "best at test" rather than
to be the best at their real world function.

The *good* tests are merely the least harmful ones.

FromTheRafters

unread,

Feb 28, 2004, 9:11:25 PM2/28/04

to

"Robert de Brus" <de_brus@h o t m a i l.com> wrote in message news:2y90c.2502$KS1....@nasal.pacific.net.au...
> X-No-Archive: Yes
>
> In news:74cu30129g2o43m94...@4ax.com,
> Jari Lehtonen <spamR...@hukassa.com> typed

> || Tested by AV-Comparatives organization., the Kaspersky Antivirus gets
> || the best on-demand results with 99.85% of malware detected, McAfee
> || seconds with 95.41%.
>

> 99%? So basically what this means is that one can still get infected?
>
> Obviously it's rubbish!

Yeah, there ought to be a law against anything less than 100% effective
telling you that you are protected. ;o)

--
Somewhat less than 100% certain that this is a virus free post.

nu...@zilch.com

unread,

Feb 29, 2004, 8:44:47 AM2/29/04

to

On Sat, 28 Feb 2004 21:03:32 -0500, "FromTheRafters"
<!00...@nomad.fake> wrote:

>
><nu...@zilch.com> wrote in message news:5e0240dsla0vf3gum...@4ax.com...
>
>> On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca> wrote:
>
>[snip]
>
>> >lets perform a thought experiment, shall we? lets consider a test that
>> >has 3 samples... the first sample is virus A and the second 2 samples
>> >are virus B... the test, being unscientific, counts all 3 as separate
>> >viruses.... scanner X misses virus A, scanner Y misses virus B - the
>> >results *should* be 50% for both but because of the improper
>> >methodology it turns out to be 66% for scanner X and 33% for scanner Y...
>>
>> You're off on a tangent that I wasn't even talking about.
>
>I, too, see apples and oranges here. Kurt is talking about the overall
>"test" methodology I think, and not just the tester's data set maintenance
>methodology.
>
>I think that you are right in assuming that a flawed data set can
>still be good for a time-difference comparison between each
>vendor's' versions. The *same* flawed data set and the *same*
>flawed test methodology can indeed be called a "comparison"
>test between versions.

Yes, essentially that's the idea. Simply doing comparisons over time
of what scanner X reports on file Y involves no assumptions other than
file Y in my collection hasn't changed :) And also I see minimal
problems for this purpose in using what the scanners report to
categorize the files. If several good scanners identify file Z as the
POOP Trojan (or an alias name), then file Z goes into my Trojan test
bed ... providing there are no other files in that bed that the same
scanners identified as the POOP Trojan. You strive for zero duplicates
of course in order to have a real variety in each category of
interest.

Having created the categorized test beds, and having found that
scanner A only alerted on 10% of the Trojans in the past but it now
alerts on 70% of them, I see it as obvious that the scanner A vendor
has been doing some work in this area. And that's the only kind of
trend I'm talking about here ... not trends of which scanners score
the highest in various categories of detection.

>A nearly ideal test would have a data set more representative of what
>is likely to be seen in the field. Once you force restrictions on the test
>set it starts to become skewed. It is not practical for a tester to have
>a representative cross section of all existing programs so concessions
>must be made. After putting all of the necessary restrictions on the data
>set and the test method, you are making the AV strive to attain a less
>than optimum goal. They will strive to be the "best at test" rather than
>to be the best at their real world function.
>
>The *good* tests are merely the least harmful ones.

Which specific problem(s) do you have in mind here?

Art
http://www.epix.net/~artnpeg

IBK

unread,

Feb 29, 2004, 5:06:40 PM2/29/04

to

Hello Nick,

thx for your long reply.

I only know that VB (or better: a persons of VB) has access to the
monthly collections of various AV; what that person or VB does with
the collections I can not know. I just _supposed_ that the received
samples (the good ones) are used for VB tests. It was not my intention
to offend. I am sorry; but maybe something was misunderstood (cause of
my bad english); I did not say that VB include samples without
verifiend that they are viral. The standards of VB are really of high
quality/level.
Thanks again for your statement and hints. I will think about that.

regards,
andreas

kurt wismer

unread,

Feb 29, 2004, 6:15:37 PM2/29/04

to

nu...@zilch.com wrote:

> On Sat, 28 Feb 2004 19:11:53 -0500, kurt wismer <ku...@sympatico.ca>

>>nu...@zilch.com wrote:
>>>On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca>
>>>
>>>>nu...@zilch.com wrote:
>>
>>[snip]
>>
>>>>>Lacking quality and recent broad based tests, how can we determine the
>>>>>current trends?
>>>>
>>>>wait until another good quality test is performed...
>>>
>>>That's your opinion. It's not mine.
>>
>>it's not an opinion, it's an instruction... you asked how, i told you
>>how...
>
> Wrong. It's an opinion and a wrong one.

not in the dialect of english i happen to speak, it ain't... up where
i'm from it's an instruction, just like "hold your horses" or "go fly a
kite"... maybe where you're from it's an opinion, but frankly, that's
just weird...

> There's no need to wait for
> the next quality test to detect trends. And I've explained why. End of
> subject.

what you've explained is your justification for accepting anecdotal
evidence... what i've tried to show you is that without controls on the
quality of the testbed, non-uniqueness of samples can invalidate any
conclusion you hope to draw from such a test...

kurt wismer

unread,

Feb 29, 2004, 6:23:36 PM2/29/04

to

FromTheRafters wrote:
> <nu...@zilch.com> wrote in message news:5e0240dsla0vf3gum...@4ax.com...
>>On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca> wrote:
> [snip]
>>>lets perform a thought experiment, shall we? lets consider a test that
>>>has 3 samples... the first sample is virus A and the second 2 samples
>>>are virus B... the test, being unscientific, counts all 3 as separate
>>>viruses.... scanner X misses virus A, scanner Y misses virus B - the
>>>results *should* be 50% for both but because of the improper
>>>methodology it turns out to be 66% for scanner X and 33% for scanner Y...
>>
>>You're off on a tangent that I wasn't even talking about.
>
> I, too, see apples and oranges here. Kurt is talking about the overall
> "test" methodology I think, and not just the tester's data set maintenance
> methodology.

in fact, i did not go in depth about test methodology at all... i only
dealt with the issue of testbed integrity...

> I think that you are right in assuming that a flawed data set can
> still be good for a time-difference comparison between each
> vendor's' versions. The *same* flawed data set and the *same*
> flawed test methodology can indeed be called a "comparison"
> test between versions.

except that the presence of duplicates magnifies the appearance of what
might otherwise be insignificant changes...

kurt wismer

unread,

Feb 29, 2004, 6:26:38 PM2/29/04

to

nu...@zilch.com wrote:
[snip]

> Yes, essentially that's the idea. Simply doing comparisons over time
> of what scanner X reports on file Y involves no assumptions other than
> file Y in my collection hasn't changed :) And also I see minimal
> problems for this purpose in using what the scanners report to
> categorize the files. If several good scanners identify file Z as the
> POOP Trojan (or an alias name), then file Z goes into my Trojan test
> bed ... providing there are no other files in that bed that the same
> scanners identified as the POOP Trojan. You strive for zero duplicates
> of course in order to have a real variety in each category of
> interest.

and so it comes out, there *were* unstated assumptions about the nature
of the testbed used in your hypothetical 'unscientific tests'....

kurt wismer

unread,

Feb 29, 2004, 6:28:20 PM2/29/04

to

Roy Coorne wrote:
> Statistically, even an event with probability Zero _may_ happen, and at
> once.

ummm no it can't... if someone says event X has a zero probability and
event X happens, then that someone was wrong and the probability wasn't
actually zero...

FromTheRafters

unread,

Mar 1, 2004, 11:34:12 AM3/1/04

to

<nu...@zilch.com> wrote in message news:9rk340hp5nbkm3vo0...@4ax.com...

Nothing really specific Art, just that people want to have
comparison tests to reference when deciding on which AV
they wish to use. When a popular test organization has the
AVs jumping through hoops that have less than real world
significance, it causes the AVs to change their program so
that they can look better in the comparison tests.

FromTheRafters

unread,

Mar 1, 2004, 11:51:43 AM3/1/04

to

"kurt wismer" <ku...@sympatico.ca> wrote in message news:Xvu0c.279$JZ6....@news20.bellglobal.com...

> FromTheRafters wrote:
> > <nu...@zilch.com> wrote in message news:5e0240dsla0vf3gum...@4ax.com...
> >>On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca> wrote:
> > [snip]
> >>>lets perform a thought experiment, shall we? lets consider a test that
> >>>has 3 samples... the first sample is virus A and the second 2 samples
> >>>are virus B... the test, being unscientific, counts all 3 as separate
> >>>viruses.... scanner X misses virus A, scanner Y misses virus B - the
> >>>results *should* be 50% for both but because of the improper
> >>>methodology it turns out to be 66% for scanner X and 33% for scanner Y...
> >>
> >>You're off on a tangent that I wasn't even talking about.
> >
> > I, too, see apples and oranges here. Kurt is talking about the overall
> > "test" methodology I think, and not just the tester's data set maintenance
> > methodology.
>
> in fact, i did not go in depth about test methodology at all... i only
> dealt with the issue of testbed integrity...

Your statement about the "test" counting two instances of
virus B as two viruses made me think that the test method
was in question. Are you saying that the count is done by
the dataset maintenance's method and not by the test? I
would think that a test would want to have many instances
of virus B (polymorphic?) and count misses as misses. That
is to say that the AV being tested missed a virus B not all
virus Bs.

> > I think that you are right in assuming that a flawed data set can
> > still be good for a time-difference comparison between each
> > vendor's' versions. The *same* flawed data set and the *same*
> > flawed test methodology can indeed be called a "comparison"
> > test between versions.
>
> except that the presence of duplicates magnifies the appearance of what
> might otherwise be insignificant changes...

True, but I didn't assume that Art was talking about quantative
measurements - only trends. You are right though, a lot would
depend on how mucked up the testbed and test was to begin
with.

kurt wismer

unread,

Mar 1, 2004, 11:47:21 AM3/1/04

to

FromTheRafters wrote:
[snip]

> they wish to use. When a popular test organization has the
> AVs jumping through hoops that have less than real world
> significance, it causes the AVs to change their program so
> that they can look better in the comparison tests.

and which hoops would those be, precisely? as far as i know the only
constraint placed on the scanners is that they detect what they're
supposed to detect and that they are able to save their output to a log
file...

nu...@zilch.com

unread,

Mar 1, 2004, 12:21:19 PM3/1/04

to

On Mon, 1 Mar 2004 11:34:12 -0500, "FromTheRafters" <!00...@nomad.fake>
wrote:

>> >They will strive to be the "best at test" rather than
>> >to be the best at their real world function.
>> >
>> >The *good* tests are merely the least harmful ones.
>>
>> Which specific problem(s) do you have in mind here?
>
>Nothing really specific Art, just that people want to have
>comparison tests to reference when deciding on which AV
>they wish to use. When a popular test organization has the
>AVs jumping through hoops that have less than real world
>significance, it causes the AVs to change their program so
>that they can look better in the comparison tests.

Well, it seems to me that "real world significance" is like beauty.
It's in the eyes of the beholder :)

Art
http://www.epix.net/~artnpeg

kurt wismer

unread,

Mar 1, 2004, 1:17:06 PM3/1/04

to

FromTheRafters wrote:
> "kurt wismer" <ku...@sympatico.ca> wrote in message news:Xvu0c.279$JZ6....@news20.bellglobal.com...
>>FromTheRafters wrote:
>>><nu...@zilch.com> wrote in message news:5e0240dsla0vf3gum...@4ax.com...
>>>>On Sat, 28 Feb 2004 15:31:21 -0500, kurt wismer <ku...@sympatico.ca> wrote:
>>>
>>>[snip]
>>>
>>>>>lets perform a thought experiment, shall we? lets consider a test that
>>>>>has 3 samples... the first sample is virus A and the second 2 samples
>>>>>are virus B... the test, being unscientific, counts all 3 as separate
>>>>>viruses.... scanner X misses virus A, scanner Y misses virus B - the
>>>>>results *should* be 50% for both but because of the improper
>>>>>methodology it turns out to be 66% for scanner X and 33% for scanner Y...
>>>>
>>>>You're off on a tangent that I wasn't even talking about.
>>>
>>>I, too, see apples and oranges here. Kurt is talking about the overall
>>>"test" methodology I think, and not just the tester's data set maintenance
>>>methodology.
>>
>>in fact, i did not go in depth about test methodology at all... i only
>>dealt with the issue of testbed integrity...
>
>
> Your statement about the "test" counting two instances of
> virus B as two viruses made me think that the test method
> was in question. Are you saying that the count is done by
> the dataset maintenance's method and not by the test?

i'm saying that in the absence of any controls on the testbed's
integrity, multiple instances of the same piece of malware will a) be
present, and b) be counted as separate things... you cannot avoid
counting them as separate things if you don't know they are duplicates
and if you did know they were duplicates you wouldn't allow them to be
there in the first place...

also, while viability is something that can be tested for generically,
uniqueness is not... art's suggested method of letting the scanners do
the classification is a kludge and assumes that all the samples you're
using are detected and *identified* by at least one scanner (heuristic
detection may seem like a reasonable kludge for classification when
your concern is viability, but obviously does not help to establish
uniqueness)...

of course the sampling bias generated by this method means that if a
product significantly improves it's detection for samples that none of
the products could originally detect, you won't be able to see it...
that means such a test can only detect improvements and cannot detect
the lack of improvement... the conclusions one can draw from such a
test are quite limited...

> I
> would think that a test would want to have many instances
> of virus B (polymorphic?) and count misses as misses. That
> is to say that the AV being tested missed a virus B not all
> virus Bs.

true, but detection in that case should be all or nothing... all
instances of polymorphic virus A should count as 1 and if you don't
detect them all you score a 0...

>>>I think that you are right in assuming that a flawed data set can
>>>still be good for a time-difference comparison between each
>>>vendor's' versions. The *same* flawed data set and the *same*
>>>flawed test methodology can indeed be called a "comparison"
>>>test between versions.
>>
>>except that the presence of duplicates magnifies the appearance of what
>>might otherwise be insignificant changes...
>
> True, but I didn't assume that Art was talking about quantative
> measurements - only trends.

even for trends you'd be looking for 'significant' improvements - but
without quality control on the testbed, such determinations of
significance are specious...

nu...@zilch.com

unread,

Mar 1, 2004, 3:48:35 PM3/1/04

to

On Mon, 01 Mar 2004 13:17:06 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>> Your statement about the "test" counting two instances of

>> virus B as two viruses made me think that the test method
>> was in question. Are you saying that the count is done by
>> the dataset maintenance's method and not by the test?
>
>i'm saying that in the absence of any controls on the testbed's
>integrity,

A test bed that isn't "scientific" isn't necessarily uncontrolled.
When I use the term "scientific" in this context I'm using it as I
think knowledgeable people here use it. As a bare minimum, all samples
in a scientific collection have been tested for viability. Not meeting
that bare minimum requirement makes a collection "unscientific" right
off the bat. There would be quite a number of other factors as well,
of course.

>multiple instances of the same piece of malware will a) be
>present,

Several good scanners identify a sample as the POOP Trojan and no
other samples are allowed in the Trojan category bed identified as
POOP or its alias names. What do we have here? There's the remote
possibility that several good scanners have all misidentified POOP.
But we're not interested in using just one sample. We're interested in
using at least several hundred ... say 1,000 all chosen in the same
way. Now, you have to assign some unknown but reasonable probability
figure that several scanners will all misidentify ... and then compute
from this unknown figure a probable number of duplicates. Further, you
would have to be concerned that that number is significant when using
the test bed to look for increases in detection of Trojans from 100 to
700 (10% to 70%). I say you're calculating "smoke" as we used to say
when some engineer was worried about some minute and insignificant
effect. And you're talking about "smoke".

>and b) be counted as separate things... you cannot avoid
>counting them as separate things if you don't know they are duplicates
>and if you did know they were duplicates you wouldn't allow them to be
>there in the first place...
>
>also, while viability is something that can be tested for generically,
>uniqueness is not... art's suggested method of letting the scanners do
>the classification is a kludge and assumes that all the samples you're
>using are detected and *identified* by at least one scanner

Wrong. Read what I wrote. I require that _several_ scanners all agree
before a sample is included.

>(heuristic
>detection may seem like a reasonable kludge for classification when
>your concern is viability, but obviously does not help to establish
>uniqueness)...

So turn off the scanner heuristics then. That's what I'd do.

>of course the sampling bias generated by this method means that if a
>product significantly improves it's detection for samples that none of
>the products could originally detect, you won't be able to see it...

Not interested in categories that several good scanners aren't already
quite proficient in handling. In fact, it's pure nonsese to even bring
it up.

>that means such a test can only detect improvements and cannot detect
>the lack of improvement... the conclusions one can draw from such a
>test are quite limited...

It means nothing at all. You're inventing straw agruments again.

Art
http://www.epix.net/~artnpeg

kurt wismer

unread,

Mar 1, 2004, 6:28:18 PM3/1/04

to

nu...@zilch.com wrote:
> On Mon, 01 Mar 2004 13:17:06 -0500, kurt wismer <ku...@sympatico.ca>
>

>>>Your statement about the "test" counting two instances of
>>>virus B as two viruses made me think that the test method
>>>was in question. Are you saying that the count is done by
>>>the dataset maintenance's method and not by the test?
>>
>>i'm saying that in the absence of any controls on the testbed's
>>integrity,
>
> A test bed that isn't "scientific" isn't necessarily uncontrolled.

agreed... however you did not initially give any additional
specifications on what you meant beyond "unscientific test" and i can't
read your mind... my thought experiment used the uncontrolled type of
testbed...

> When I use the term "scientific" in this context I'm using it as I
> think knowledgeable people here use it. As a bare minimum, all samples
> in a scientific collection have been tested for viability.

great, but you were talking about unscientific tests - constraining
what you mean by 'scientific test' still leaves 'unscientific test's
fairly wide open...

i now know that you are referring to a test that uses a testbed where
non-viable and duplicate samples are weeded out by a sort of 'majority
vote' by a set of scanners you trust... i only know this because
FromTheRafters managed to coax these details out of you, however...

> Not meeting
> that bare minimum requirement makes a collection "unscientific" right
> off the bat. There would be quite a number of other factors as well,
> of course.

of course...

>>multiple instances of the same piece of malware will a) be
>>present,
>
> Several good scanners identify a sample as the POOP Trojan and no
> other samples are allowed in the Trojan category bed identified as
> POOP or its alias names. What do we have here? There's the remote
> possibility that several good scanners have all misidentified POOP.
> But we're not interested in using just one sample. We're interested in
> using at least several hundred ... say 1,000 all chosen in the same
> way. Now, you have to assign some unknown but reasonable probability
> figure that several scanners will all misidentify ... and then compute
> from this unknown figure a probable number of duplicates. Further, you
> would have to be concerned that that number is significant when using
> the test bed to look for increases in detection of Trojans from 100 to
> 700 (10% to 70%). I say you're calculating "smoke" as we used to say
> when some engineer was worried about some minute and insignificant
> effect. And you're talking about "smoke".

not so... i was talking about a situation where there is no quality
control on the testbed (since you originally made no specifications on
what, if any, kinds of controls would be present)... that's very
different from the situation where the quality control fails...

>>and b) be counted as separate things... you cannot avoid
>>counting them as separate things if you don't know they are duplicates
>>and if you did know they were duplicates you wouldn't allow them to be
>>there in the first place...
>>
>>also, while viability is something that can be tested for generically,
>>uniqueness is not... art's suggested method of letting the scanners do
>>the classification is a kludge and assumes that all the samples you're
>>using are detected and *identified* by at least one scanner
>
> Wrong. Read what I wrote. I require that _several_ scanners all agree
> before a sample is included.

art, "several scanners" happens to satisfy the "at least one scanner"
constraint...

on rereading the quote i think i may have misspoke, in the previous
article... 'implies' rather than 'assumes'... it implies that all the
samples you're using in the test are detected and identified...

>>(heuristic
>>detection may seem like a reasonable kludge for classification when
>>your concern is viability, but obviously does not help to establish
>>uniqueness)...
>
> So turn off the scanner heuristics then. That's what I'd do.

?? perhaps you want to re-read that section - i don't need to turn off
the heuristics, i just can't use those particular types of results...
it's not a problem, it's just the reason why scanner based
classification requires identification rather than just detection...

>>of course the sampling bias generated by this method means that if a
>>product significantly improves it's detection for samples that none of
>>the products could originally detect, you won't be able to see it...
>
>
> Not interested in categories that several good scanners aren't already
> quite proficient in handling. In fact, it's pure nonsese to even bring
> it up.

who said anything about categories? why can't i be talking about
specimens that belong in categories that several good scanners *do*
handle but for whatever reason are not themselves handled yet?

and since you require agreement between several good scanners for
inclusion in your hypothetical unscientific test you're actually
increasing the potential size of the set of malware where improvements
will go unnoticed... imagine if you required agreement between all
scanners, then there'd be no room for improvement...

>>that means such a test can only detect improvements and cannot detect
>>the lack of improvement... the conclusions one can draw from such a
>>test are quite limited...
>
> It means nothing at all. You're inventing straw agruments again.

you mean a 'straw man'... perhaps i am, but really, it would be much
easier to avoid misrepresenting your position if you'd fully specify
your position in the first place, or further specify it when it becomes
clear that you've been too general...

so now i know we're talking about a testbed thats been classified by
several scanners in order to weed out duplicates and probable
non-viable samples... so we've hopefully eliminated the possibility of
unpredictable 'improvement' scaling factors but we've introduced the
problem of omitted population segments discussed previously... the
improvement trends you hope to discover may get missed due to the
self-selected sample bias...

nu...@zilch.com

unread,

Mar 1, 2004, 8:20:12 PM3/1/04

to

On Mon, 01 Mar 2004 18:28:18 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>> Not interested in categories that several good scanners aren't already
>> quite proficient in handling. In fact, it's pure nonsese to even bring
>> it up.
>
>who said anything about categories?

??? I did. I was talking about looking at categories of malware that
several good scanners test well in (according to quality tests) that
products to be tested by my method do not do well in. Or, conversely,
I also included that you could also see when a vendor suddenly decided
to drop detection in one of those same categories. It would be obvious
using my method over time when a vendor dropped detection of old DOS
viruses, for example.

>why can't i be talking about
>specimens that belong in categories that several good scanners *do*
>handle but for whatever reason are not themselves handled yet?

I don't understand that sentence. But in order for me to defend my
method which you attacked as being worthless, I would hope that you
would stick to that topic and not wander off onto something else.

>and since you require agreement between several good scanners for
>inclusion in your hypothetical unscientific test you're actually
>increasing the potential size of the set of malware where improvements
>will go unnoticed... imagine if you required agreement between all
>scanners, then there'd be no room for improvement...

In the case of checking on a scanner with weak Trojan detection, for
example, that scanner is not used in building up the test bed. I see
no problem. And a scanner used in building up the bed of old DOS
viruses can be tested later for a significant drop in detection in
this category.

>>>that means such a test can only detect improvements and cannot detect
>>>the lack of improvement... the conclusions one can draw from such a
>>>test are quite limited...
>>
>> It means nothing at all. You're inventing straw agruments again.
>
>you mean a 'straw man'... perhaps i am, but really, it would be much
>easier to avoid misrepresenting your position if you'd fully specify
>your position in the first place, or further specify it when it becomes
>clear that you've been too general...

It would be better if you requested clarification before you rejected
my idea outright. You turned off any interest I had in further
discussion or clarification by pontificating and "instriucting" me and
insulting me by referring me to a treatise on logic. That pissed me
off.

>so now i know we're talking about a testbed thats been classified by
>several scanners in order to weed out duplicates and probable
>non-viable samples... so we've hopefully eliminated the possibility of
>unpredictable 'improvement' scaling factors but we've introduced the
>problem of omitted population segments discussed previously... the
>improvement trends you hope to discover may get missed due to the
>self-selected sample bias...

Omitted population segments?? Improvement trends get missed? What in
the hell are you talking about?

Art
http://www.epix.net/~artnpeg

kurt wismer

unread,

Mar 1, 2004, 11:31:25 PM3/1/04

to

nu...@zilch.com wrote:

> On Mon, 01 Mar 2004 18:28:18 -0500, kurt wismer <ku...@sympatico.ca>
> wrote:
>
> <snip to area of confusion>
>
>>>Not interested in categories that several good scanners aren't already
>>>quite proficient in handling. In fact, it's pure nonsese to even bring
>>>it up.
>>
>>who said anything about categories?
>
>
> ??? I did.

excuse me, apparently my rhetorical question was not clear... i was
explaining to FRT how using scanners to decide what goes in a test will
leave out samples that should otherwise be in a test and how that can
corrupt the results... *you* then brought up categories, you're correct
about that, but it was a red-herring... the samples that the
scanner-filter method leaves out aren't going to magically all belong
to some uninteresting category...

> I was talking about looking at categories of malware that
> several good scanners test well in (according to quality tests) that
> products to be tested by my method do not do well in. Or, conversely,
> I also included that you could also see when a vendor suddenly decided
> to drop detection in one of those same categories. It would be obvious
> using my method over time when a vendor dropped detection of old DOS
> viruses, for example.

that is the ideal scenario, however you cannot blindly hope that
reality will turn out ideally... you have to enumerate the ways in
which things can go wrong - something i tend to be good at...

>>why can't i be talking about
>>specimens that belong in categories that several good scanners *do*
>>handle but for whatever reason are not themselves handled yet?
>
> I don't understand that sentence.

ok, i'll try again - why can't i be talking about samples that are from
all categories...

> But in order for me to defend my
> method which you attacked as being worthless, I would hope that you
> would stick to that topic and not wander off onto something else.

i am still talking about your methodology, don't worry... i'm just
talking about one of the problems it has...

>>and since you require agreement between several good scanners for
>>inclusion in your hypothetical unscientific test you're actually
>>increasing the potential size of the set of malware where improvements
>>will go unnoticed... imagine if you required agreement between all
>>scanners, then there'd be no room for improvement...
>
> In the case of checking on a scanner with weak Trojan detection, for
> example, that scanner is not used in building up the test bed.

yes, i would assume you don't actually require agreement between all
the scanners - that's why i said "imagine"...

> I see
> no problem. And a scanner used in building up the bed of old DOS
> viruses can be tested later for a significant drop in detection in
> this category.

i would steer clear of testing for such drops... significant reductions
*could* be a drop in detection of real viruses, or it could be a drop
in detection of crud... without a better means of determining viability
of samples it's impossible to be sure...

>>>>that means such a test can only detect improvements and cannot detect
>>>>the lack of improvement... the conclusions one can draw from such a
>>>>test are quite limited...
>>>
>>>It means nothing at all. You're inventing straw agruments again.
>>
>>you mean a 'straw man'... perhaps i am, but really, it would be much
>>easier to avoid misrepresenting your position if you'd fully specify
>>your position in the first place, or further specify it when it becomes
>>clear that you've been too general...
>
> It would be better if you requested clarification before you rejected
> my idea outright.

i didn't say you were unclear, you were quite clear... there's a
difference between being unclear and being over general... had you been
unclear then i would have been confused and i would have said to myself
'i think there's something wrong here'... instead i found you making
what i thought was a far reaching general statement and since i can't
read your mind i have no way to know when you intend to make a general
statement and when you don't...

> You turned off any interest I had in further
> discussion or clarification by pontificating and "instriucting" me and
> insulting me by referring me to a treatise on logic. That pissed me
> off.

i'm sorry you feel that way... personally i find that reference (and a
similar one i also have bookmarked) to be quite helpful in getting a
deeper understanding of what can go wrong in a logical argument (both
my own and other people's)...

>>so now i know we're talking about a testbed thats been classified by
>>several scanners in order to weed out duplicates and probable
>>non-viable samples... so we've hopefully eliminated the possibility of
>>unpredictable 'improvement' scaling factors but we've introduced the
>>problem of omitted population segments discussed previously... the
>>improvement trends you hope to discover may get missed due to the
>>self-selected sample bias...
>
>
> Omitted population segments??

segments of the population of malware... your methodology will omit a
bunch of viruses, a bunch of worms, a bunch of trojans, etc. from the
final testbed... i'm sorry if statistical jargon terms like
'population' caught you off guard...

> Improvement trends get missed?

your stated position is that you can use 'unscientific' tests to
discover trends - trends that presumably indicate the improvement or
deprecation of a scanner over time... trends that are less likely to
reveal themselves when you use scanners to select the samples that you
later test scanners on...

> What in
> the hell are you talking about?

things that can go wrong with what i currently understand of your
hypothetical unscientific test methodology...

nu...@zilch.com

unread,

Mar 2, 2004, 8:04:19 AM3/2/04

to

On Mon, 01 Mar 2004 23:31:25 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>nu...@zilch.com wrote:
>
>> On Mon, 01 Mar 2004 18:28:18 -0500, kurt wismer <ku...@sympatico.ca>
>> wrote:
>>
>> <snip to area of confusion>
>>
>>>>Not interested in categories that several good scanners aren't already
>>>>quite proficient in handling. In fact, it's pure nonsese to even bring
>>>>it up.
>>>
>>>who said anything about categories?
>>
>>
>> ??? I did.
>
>excuse me, apparently my rhetorical question was not clear... i was
>explaining to FRT how using scanners to decide what goes in a test will
>leave out samples that should otherwise be in a test and how that can
>corrupt the results... *you* then brought up categories, you're correct
>about that, but it was a red-herring... the samples that the
>scanner-filter method leaves out aren't going to magically all belong
>to some uninteresting category...

Ok.

>> I was talking about looking at categories of malware that
>> several good scanners test well in (according to quality tests) that
>> products to be tested by my method do not do well in. Or, conversely,
>> I also included that you could also see when a vendor suddenly decided
>> to drop detection in one of those same categories. It would be obvious
>> using my method over time when a vendor dropped detection of old DOS
>> viruses, for example.
>
>that is the ideal scenario, however you cannot blindly hope that
>reality will turn out ideally... you have to enumerate the ways in
>which things can go wrong - something i tend to be good at...

Me too. As an engineer, looking at worst case scenarios occupied a
good deal of my time over many decades.

>>>why can't i be talking about
>>>specimens that belong in categories that several good scanners *do*
>>>handle but for whatever reason are not themselves handled yet?
>>
>> I don't understand that sentence.
>
>ok, i'll try again - why can't i be talking about samples that are from
>all categories...

Because I'm talking about samples from specific categories. I've only
mentioned two very broad ones that I've chosen. I'm not talking about
all categories of malware. I haven't considered others and I'm not
interested in others right now.

>> But in order for me to defend my
>> method which you attacked as being worthless, I would hope that you
>> would stick to that topic and not wander off onto something else.
>
>i am still talking about your methodology, don't worry... i'm just
>talking about one of the problems it has...

>>>and since you require agreement between several good scanners for
>>>inclusion in your hypothetical unscientific test you're actually
>>>increasing the potential size of the set of malware where improvements
>>>will go unnoticed... imagine if you required agreement between all
>>>scanners, then there'd be no room for improvement...
>>
>> In the case of checking on a scanner with weak Trojan detection, for
>> example, that scanner is not used in building up the test bed.
>
>yes, i would assume you don't actually require agreement between all
>the scanners - that's why i said "imagine"...
>
>> I see
>> no problem. And a scanner used in building up the bed of old DOS
>> viruses can be tested later for a significant drop in detection in
>> this category.
>
>i would steer clear of testing for such drops... significant reductions
>*could* be a drop in detection of real viruses, or it could be a drop
>in detection of crud... without a better means of determining viability
>of samples it's impossible to be sure...

I agree that strictly speaking, all you could say is that one or the
other has occured. I disagree with "staying clear" for a couple of
reasons. First, it's not my purpose in this to draw peer review
quality conclusions. My purpose is to use the far more easily formed
tests beds to look for major trends or shifts in emphasis. It's
informal. It's a screening test. The idea is to be alerted by
relatively large changes. Second, it seems far more likely to me that
some vendor _might_ in the near future drop detection of old DOS
viruses than it is that they would suddenly fix their engines so as to
not detect crud :) I don't believe that crud detection is entirely on
purpose for the sake of playing the testing game. At the current
state of the art, detection of crud is unavoidable to some extent. If
it was avoidable, we could use scanners to tell us that a sample is
viable :) So I think a reasonably good conclusion would be that the
vendor has dropped detection of old DOS viruses and not crud. Good
enough to openly pursue the question with the vendor and raise
questions on the virus newgroups.

>> You turned off any interest I had in further
>> discussion or clarification by pontificating and "instriucting" me and
>> insulting me by referring me to a treatise on logic. That pissed me
>> off.
>
>i'm sorry you feel that way... personally i find that reference (and a
>similar one i also have bookmarked) to be quite helpful in getting a
>deeper understanding of what can go wrong in a logical argument (both
>my own and other people's)...

No harm done. I got over it :) Such is life in newsgroups.

Art
http://www.epix.net/~artnpeg

kurt wismer

unread,

Mar 2, 2004, 10:31:55 AM3/2/04

to

nu...@zilch.com wrote:
> On Mon, 01 Mar 2004 23:31:25 -0500, kurt wismer <ku...@sympatico.ca>

>>nu...@zilch.com wrote:
>>>On Mon, 01 Mar 2004 18:28:18 -0500, kurt wismer <ku...@sympatico.ca>

[snip]

>>>>why can't i be talking about
>>>>specimens that belong in categories that several good scanners *do*
>>>>handle but for whatever reason are not themselves handled yet?
>>>
>>>I don't understand that sentence.
>>
>>ok, i'll try again - why can't i be talking about samples that are from
>>all categories...
>
>
> Because I'm talking about samples from specific categories. I've only
> mentioned two very broad ones that I've chosen. I'm not talking about
> all categories of malware. I haven't considered others and I'm not
> interested in others right now.

and i'm talking about the samples that will get excluded from the test
because of the method of sample selection... some will belong in
categories you're not interested in, but not all... and since they
won't be included, any improvement or problem detecting those
particular samples will go unnoticed...

[snip]

>>>I see
>>>no problem. And a scanner used in building up the bed of old DOS
>>>viruses can be tested later for a significant drop in detection in
>>>this category.
>>
>>i would steer clear of testing for such drops... significant reductions
>>*could* be a drop in detection of real viruses, or it could be a drop
>>in detection of crud... without a better means of determining viability
>>of samples it's impossible to be sure...
>
>
> I agree that strictly speaking, all you could say is that one or the
> other has occured. I disagree with "staying clear" for a couple of
> reasons.

well, it's just a statement of what i would do... the concern being
drawing conclusions that don't follow from the premises...

> First, it's not my purpose in this to draw peer review
> quality conclusions. My purpose is to use the far more easily formed
> tests beds to look for major trends or shifts in emphasis. It's
> informal. It's a screening test. The idea is to be alerted by
> relatively large changes.

ok, and you can do that, but you can't necessarily conclude what kinds
of changes those are... if a vendor rewrites their scanning engine with
the express purpose of performing more exact identification and thereby
cutting down on false alarms i would expect their crud detection to
change significantly...

> Second, it seems far more likely to me that
> some vendor _might_ in the near future drop detection of old DOS
> viruses than it is that they would suddenly fix their engines so as to
> not detect crud :) I don't believe that crud detection is entirely on
> purpose for the sake of playing the testing game. At the current
> state of the art, detection of crud is unavoidable to some extent. If
> it was avoidable, we could use scanners to tell us that a sample is
> viable :) So I think a reasonably good conclusion would be that the
> vendor has dropped detection of old DOS viruses and not crud. Good
> enough to openly pursue the question with the vendor and raise
> questions on the virus newgroups.

this is exactly why i would have steered clear of testing for drops in
detection rates - it's too easy to jump to conclusions about what kind
of changes are actually going on... technically all the test would
really tell us is that the detection of *something* changed
significantly, be that something that was supposed to be detected or
something that wasn't...

nu...@zilch.com

unread,

Mar 2, 2004, 12:15:29 PM3/2/04

to

On Tue, 02 Mar 2004 10:31:55 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

>nu...@zilch.com wrote:

>> On Mon, 01 Mar 2004 23:31:25 -0500, kurt wismer <ku...@sympatico.ca>
>>>nu...@zilch.com wrote:
>>>>On Mon, 01 Mar 2004 18:28:18 -0500, kurt wismer <ku...@sympatico.ca>
>[snip]
>>>>>why can't i be talking about
>>>>>specimens that belong in categories that several good scanners *do*
>>>>>handle but for whatever reason are not themselves handled yet?
>>>>
>>>>I don't understand that sentence.
>>>
>>>ok, i'll try again - why can't i be talking about samples that are from
>>>all categories...
>>
>>
>> Because I'm talking about samples from specific categories. I've only
>> mentioned two very broad ones that I've chosen. I'm not talking about
>> all categories of malware. I haven't considered others and I'm not
>> interested in others right now.
>
>and i'm talking about the samples that will get excluded from the test
>because of the method of sample selection... some will belong in
>categories you're not interested in, but not all... and since they
>won't be included, any improvement or problem detecting those
>particular samples will go unnoticed...

True but hardly significant for my purposes.

>[snip]
>>>>I see
>>>>no problem. And a scanner used in building up the bed of old DOS
>>>>viruses can be tested later for a significant drop in detection in
>>>>this category.
>>>
>>>i would steer clear of testing for such drops... significant reductions
>>>*could* be a drop in detection of real viruses, or it could be a drop
>>>in detection of crud... without a better means of determining viability
>>>of samples it's impossible to be sure...
>>
>>
>> I agree that strictly speaking, all you could say is that one or the
>> other has occured. I disagree with "staying clear" for a couple of
>> reasons.
>
>well, it's just a statement of what i would do... the concern being
>drawing conclusions that don't follow from the premises...
>
>> First, it's not my purpose in this to draw peer review
>> quality conclusions. My purpose is to use the far more easily formed
>> tests beds to look for major trends or shifts in emphasis. It's
>> informal. It's a screening test. The idea is to be alerted by
>> relatively large changes.
>
>ok, and you can do that, but you can't necessarily conclude what kinds
>of changes those are... if a vendor rewrites their scanning engine with
>the express purpose of performing more exact identification and thereby
>cutting down on false alarms i would expect their crud detection to
>change significantly...

Another factor that I didn't mention is an assumption that crud only
accounts for maybe 10% (to use Nick's number) of the samples. I dunno
if he meant raw from vxer sites without any culling at all or not, but
the implication in the context was that he meant raw. It's fairly easy
to cut that raw percentage drastically using F-Prot /collect and TBAV
(for old DOS viruses) as I've actually done and as someone at Virus
Bulletin wrote a paper on that I saw not long ago. Right off the bat,
a significant pile of crud files can be elminated with ease. Now, I
have no measure, of course, of the percentage of crud I wound up with
but there is a "hidden assumption" in my mind that it's fairly small
... on the order of maybe just 1% to 3 %. Anyway, this is another
reason why I don't believe a sudden drop in crud detection would
affect my conclusions.

>> Second, it seems far more likely to me that
>> some vendor _might_ in the near future drop detection of old DOS
>> viruses than it is that they would suddenly fix their engines so as to
>> not detect crud :) I don't believe that crud detection is entirely on
>> purpose for the sake of playing the testing game. At the current
>> state of the art, detection of crud is unavoidable to some extent. If
>> it was avoidable, we could use scanners to tell us that a sample is
>> viable :) So I think a reasonably good conclusion would be that the
>> vendor has dropped detection of old DOS viruses and not crud. Good
>> enough to openly pursue the question with the vendor and raise
>> questions on the virus newgroups.
>
>this is exactly why i would have steered clear of testing for drops in
>detection rates - it's too easy to jump to conclusions about what kind
>of changes are actually going on... technically all the test would
>really tell us is that the detection of *something* changed
>significantly, be that something that was supposed to be detected or
>something that wasn't...

You're just too damn much into the picky picky to see the forest for
the trees and the significances for the insignificances :)

Art
http://www.epix.net/~artnpeg

kurt wismer

unread,

Mar 2, 2004, 7:23:15 PM3/2/04

to

nu...@zilch.com wrote:
> On Tue, 02 Mar 2004 10:31:55 -0500, kurt wismer <ku...@sympatico.ca>

>>nu...@zilch.com wrote:
[snip]

>>>Because I'm talking about samples from specific categories. I've only
>>>mentioned two very broad ones that I've chosen. I'm not talking about
>>>all categories of malware. I haven't considered others and I'm not
>>>interested in others right now.
>>
>>and i'm talking about the samples that will get excluded from the test
>>because of the method of sample selection... some will belong in
>>categories you're not interested in, but not all... and since they
>>won't be included, any improvement or problem detecting those
>>particular samples will go unnoticed...
>
> True but hardly significant for my purposes.

unfortunately we really only have your say so, your intuition that
they'd be insignificant..

[snip]

> Another factor that I didn't mention is an assumption that crud only
> accounts for maybe 10% (to use Nick's number) of the samples. I dunno
> if he meant raw from vxer sites without any culling at all or not, but
> the implication in the context was that he meant raw. It's fairly easy
> to cut that raw percentage drastically using F-Prot /collect and TBAV
> (for old DOS viruses) as I've actually done and as someone at Virus
> Bulletin wrote a paper on that I saw not long ago. Right off the bat,
> a significant pile of crud files can be elminated with ease. Now, I
> have no measure, of course, of the percentage of crud I wound up with
> but there is a "hidden assumption" in my mind that it's fairly small

> .... on the order of maybe just 1% to 3 %. Anyway, this is another

> reason why I don't believe a sudden drop in crud detection would
> affect my conclusions.

i don't know that the context in which that 10% figure applies would
necessarily be generalizable to your scenario... it seems as though
this system is a breeding ground for uncertainty... i would only feel
comfortable with the significant change conclusion if it was *very*
significant...

[snip]

>>this is exactly why i would have steered clear of testing for drops in
>>detection rates - it's too easy to jump to conclusions about what kind
>>of changes are actually going on... technically all the test would
>>really tell us is that the detection of *something* changed
>>significantly, be that something that was supposed to be detected or
>>something that wasn't...
>
> You're just too damn much into the picky picky to see the forest for
> the trees and the significances for the insignificances :)

when it comes to statistical exercises (which is essentially what
detection tests are) i tend to think that's a good thing...

Nomen Nescio

unread,

Mar 2, 2004, 10:20:03 PM3/2/04

to

FromTheRafters wrote:
>
> "Nomen Nescio" <nob...@dizum.com> wrote in message
news:b7afd5a74ea6649d...@dizum.com...
>
> > The five scanners were wrong, and a properly conducted test would
> > have penalized them for producing a false alarm.
>
> Would a properly conducted test have had that sample included
> in the test bed of virus samples?

The ruckus on DSLReports was about 5 scanners. Someone posted
saying in fact 14 scanners reported the file as infected by Magistr. If I
was an AV tester I would have included that file in my "crud" testbed
as an object lesson to those scanners.

> ...or was this a crud test?

It appeared to me to be just another attempt to discredit NOD32.

> If so, what sort of possible methods could be used to make sure
> that the included crud is non-biased. Ideally, I think that crud has
> to be all non-viral programs or files - so to make a testbed for
> these more manageable what could be used to whittle this bunch
> down to a manageable level. To ascertain that something is
> viral you make certain they are both progeny and grandparent.
> This is not that case for crud. Crud is that which can be mistaken
> for virus and yet not be a viable virus sample.

Without scanning the billions of files in existence with every scanner
in existence we have no way of determining which scanner will false
alarm on which file, but files that are known to produce false alarms
are a good starting point for anyone who wishes to test scanners for
"crud" detection.

> > NOD32 was right, but it was unfairly penalized because the testers
> > followed your flawed "several scanners all agree before a sample is
> > included" methodology.
>
> Yes - this points out the danger of flawed methods very well.

I used this NOD32 case as a recent example. There have been other
instances. As Nick FitzGerald pointed out many times, Kaspersky is
the king of "crud" detection, but I have seen Kaspersky "failed" on
files reported as viral by other scanners that were harmless.

> > No sample should be included in a testbed unless it has been tested
> > and replicated, preferably for 3 generations.
>
> Yes, a viable virus will prove itself to be viral when you allow it
> to replicate recursively - but how would one prove the criteria
> for "crud" has been reached? Are the samples that are indeed
> offspring from your original collection sample - yet failed to
> produce viable offspring themselves, considered crud just due
> to the failure to meet that criterion?

If the offspring file is not viral, it is "crud", and should be discarded.

> > I know you dislike the
> > Virus Bulletin tests, but VB is the only tester who publicly declares
> > it has tested and replicated _every_ sample its in testbed. For me,
> > VB produces the _only_ credible anti-virus test results.
>
> I doubt that they are the only ones to do so.

The "Include any file reported as viral by 3 scanners in the testbed"
methodology used by most testers is unacceptable. I have followed
the arguments of the "militant AV test pedants" Nick FitzGerald and
Rod Fewster for years. They may be pedantic, but they are right.
"As close to perfect as possible" is the _only_ testbed standard that
will produce credible results, and as far as I am aware Virus Bulletin
is the _only_ tester who religiously maintains that standard.

> ...and do you believe it was warranted to penalize f-prot
> for not considering container files to be a threat.

Most users are brainwashed into believing all sorts of rubbish, and
most testers follow behind like sheep. Scanning inside archives is
unneccessary, but it has been promoted into an "essential feature"
by AV marketing departments. Some anti-virus programs have even
been denounced by stupid testers for not scanning outward e-mail,
for Gods sake.

If major AV company spin doctors successfully promote scanning of
JPG files as an "essential feature" to guard against the possibility
of some imaginary steganographic virus and Frisk does not go along
with this stupidity, some fool of a tester will certainly penalize
F-Prot for not scanning them.

> Do you think that on access scanning should be tested, or just
> on demand (where which files to scan can be left up to the
> user to decide)?

I think on access scanning is vital, but if an anti-virus program uses
the same database for on access and on demand scanning (I think
most programs do) why test both?

nu...@zilch.com

unread,

Mar 3, 2004, 9:55:56 AM3/3/04

to

On Wed, 3 Mar 2004 04:20:03 +0100 (CET), Nomen Nescio
<nob...@dizum.com> wrote:

<snip>

>I used this NOD32 case as a recent example. There have been other
>instances. As Nick FitzGerald pointed out many times, Kaspersky is
>the king of "crud" detection, but I have seen Kaspersky "failed" on
>files reported as viral by other scanners that were harmless.

I've never seen any quality "crud" detection tests that support any
claim :) Actually, depending on the nature of the crud, I think McAfee
is at least as bad if not worse. They're all crud detectors to some
extent or another. And I have no problem at all using the super crud
detectors. They've never produced false alarms on my PCs over many
years of using them.

>I think on access scanning is vital, but if an anti-virus program uses
>the same database for on access and on demand scanning (I think
>most programs do) why test both?

I've seen instances where product X has realtime detection for virus Y
but doesn't detect its dropper on-demand. Is it fair to those of us
who practice "safe hex" and who never use realtime scanning to drop
on-demand scanner tests? Is it wise to thus encourage dependence on
false realtime "protection" rather than "safe hex"? I don't think so.

Good on-demand scanners handle many run time packers ... and instances
of multiple packing with more than one packer. Too many products are
ignoring this sort of thing and they fail to alert on-demand, though
they have detection for the malware so packed. Similarly, good
on-demand scanners will "take apart" SFX compressed files and alert on
malware found in files archived "within". Lousy on-demand scanners
just go "duh" and report the .EXE file as clean.

I'll take the KAV on-demand scan engine any day over any of the
others. By far.

Art
http://www.epix.net/~artnpeg

nu...@zilch.com

unread,

Mar 5, 2004, 12:11:33 PM3/5/04

to

On Tue, 02 Mar 2004 19:23:15 -0500, kurt wismer <ku...@sympatico.ca>
wrote:

<snip>

>> You're just too damn much into the picky picky to see the forest for
>> the trees and the significances for the insignificances :)
>
>when it comes to statistical exercises (which is essentially what
>detection tests are) i tend to think that's a good thing...

Just to further annoy you (not really ;)) I happened to think of
another good use I actually found for my "unscientific" test bed. This
goes back to the days of the 16 bit AVPLITE for DOS. KAV introduced
AVPDOS32, and I noticed that it was alerting on some script viruses (I
think it was) that AVPLITE was not. It was peculiar since AVPLITE did
alert on some but failed to alert on a majority in the category that
the new AVPDOS32 did alert on.

When I queried KAV, I did get a brief response to the effect that I
should be using the new 32 bit version (which I already knew) since
there would be this sort of problem with the 16 bit version. They
never did announce publicly that the old 16 bit version was
discontinued or was no longer as fully effective as the new 32 bit
version. I think they continued to have it available for download from
the Russian site for quite a long time after this point in time.
Meanwhile, people were still actively using it, judging by many posts
on acv. I did mention what I had seen and heard but I doubt that had
much effect on the AVPLITE for DOS enthusiasts. After all, it was free
and AVPDOS32 was not.

BTW, AVPDOS32 is still availble from the Swiss site, and I had noticed
a detection flaw with it some time ago, compared to a later build
called KAVDOS32 build 135. I don't recall off hand what it is, nor
have I mentioned it anywhere until now.

Sometimes you're on your own, as it were. If it hadn't been for my
informal test bed, I would't have had a clue. Certainly, I rely
primarily on quality independent tests. But I wouldn't do without my
useful informal collection either.

Art
http://www.epix.net/~artnpeg

Frederic Bonroy

unread,

Mar 5, 2004, 12:39:40 PM3/5/04

to

nu...@zilch.com wrote:

> BTW, AVPDOS32 is still availble from the Swiss site, and I had noticed
> a detection flaw with it some time ago, compared to a later build
> called KAVDOS32 build 135. I don't recall off hand what it is, nor
> have I mentioned it anywhere until now.

There was a Sobig sample that wasn't detected by build 133. I think
build 134 was able to detect it.

FromTheRafters

unread,

Mar 5, 2004, 3:45:34 PM3/5/04

to

"kurt wismer" <ku...@sympatico.ca> wrote in message news:sOJ0c.5834$qA2.3...@news20.bellglobal.com...

> FromTheRafters wrote:
> [snip]
> > they wish to use. When a popular test organization has the
> > AVs jumping through hoops that have less than real world
> > significance, it causes the AVs to change their program so
> > that they can look better in the comparison tests.
>
> and which hoops would those be, precisely? as far as i know the only
> constraint placed on the scanners is that they detect what they're
> supposed to detect and that they are able to save their output to a log
> file...

That seems entirely reasonable, as long as what they're *supposed*
to detect isn't malware nested six layers deep in archives or within
container files that are several steps away from becoming a threat.
Users should be capable of getting the malware up to the point of it
becoming a threat and scanning it then.

FromTheRafters

unread,

Mar 5, 2004, 4:01:24 PM3/5/04

to

"kurt wismer" <ku...@sympatico.ca> wrote in message news:B6L0c.563$JZ6.1...@news20.bellglobal.com...
> FromTheRafters wrote:

> > Your statement about the "test" counting two instances of
> > virus B as two viruses made me think that the test method
> > was in question. Are you saying that the count is done by
> > the dataset maintenance's method and not by the test?
>
> i'm saying that in the absence of any controls on the testbed's
> integrity, multiple instances of the same piece of malware will a) be
> present, and b) be counted as separate things... you cannot avoid
> counting them as separate things if you don't know they are duplicates
> and if you did know they were duplicates you wouldn't allow them to be
> there in the first place...

I had the impression that the normal method was to take the original
sample from the 'collection', infect various files with it, and cull out
from this population the ones that fail to become grandparents, and
use the remaining (some number) of samples in the testbed. Children,
parents, and great-grandparents are not used - but the population of
grandparents (or a subset thereof) is used and is likely to contain a
'duplicate' virus (although probably not an exact copy) within it.

[snip]

kurt wismer

unread,

Mar 5, 2004, 4:14:15 PM3/5/04

to

so then my question would be what "popular test organization" makes
this a requirement in their core testing? i could see testing that if
one was evaluating value added features but it's not a part of virus
detection per se...

kurt wismer

unread,

Mar 5, 2004, 4:45:37 PM3/5/04

to

nu...@zilch.com wrote:
[snip]

> Just to further annoy you (not really ;)) I happened to think of
> another good use I actually found for my "unscientific" test bed.

there are all sorts of uses one can come up with if one can think
outside the box...

[snip]

> Sometimes you're on your own, as it were. If it hadn't been for my
> informal test bed, I would't have had a clue.

so it played the role of the canary in the coal mine, more or less...

kurt wismer

unread,

Mar 5, 2004, 4:56:59 PM3/5/04

to

first of all, that process represents a type of control (specifically
it's a generic viability control) on the testbed integrity...

second, if the virus was the type that was in any way polymorphic you
would probably want multiple copies to ensure complete detection of
that variant, but that's an exception to the rule... generally
duplicates make calculating the results more difficult because you have
to make sure you don't count them as separate distinct viruses...