[DuraSpace JIRA] (DS-4588) Incorrect spider detector test

0 views
Skip to first unread message

Alan Orth (LYRASIS JIRA)

unread,
Jul 22, 2021, 2:35:01 PM7/22/21
to dspace-...@googlegroups.com
Alan Orth created an issue
 
DSpace / Bug DS-4588
Incorrect spider detector test
Issue Type: Bug Bug
Affects Versions: 6.3
Assignee: Unassigned
Components: DSpace API
Created: 22/Jul/21 1:34 PM
Labels: statistics tests
Priority: Minor Minor
Reporter: Alan Orth

DSpace's SpiderDetectorServiceImplTest.testCaseInsensitiveMatching test incorrectly uses "FirefOx" as a test string to see whether case-insensitive matching of spider user agents will detect this seemingly valid user agent.

The problem is that the Firefox browser has never used such a user agent, so it would actually be indicative of a non-human user if a request with that user agent actually came in. Indeed, the COUNTER-Robots list of non-human user agents has the following pattern:

^firefox$

In the case-insensitive test the "FirefOx" string matches and the test fails. A better test string would be a lower case version of an actual Firefox user agent, for example:

mozilla/5.0 (x11; linux x86_64; rv:91.0) gecko/20100101 firefox/91.0

Now the test would be correct: if a spider user agent pattern matched this valid Firefox user agent then the pattern should cause the test to fail.

This same issue is currently present in DSpace 7 (main) and DSpace 5 as well.

Add Comment Add Comment
 
This message was sent by Atlassian Jira (v8.13.2#813002-sha1:c495a97)
Atlassian logo

Alan Orth (LYRASIS JIRA)

unread,
Jul 22, 2021, 2:43:01 PM7/22/21
to dspace-...@googlegroups.com
Alan Orth updated an issue
Change By: Alan Orth
DSpace dspace-api 's {{SpiderDetectorServiceImplTest.testCaseInsensitiveMatching}} test incorrectly uses "FirefOx" as a test string to see whether case-insensitive matching of spider user agents will detect this _seemingly valid_ user agent.

The problem is that the Firefox browser has _never_ used such a user agent, so it would actually be indicative of a non-human user if a request with that user agent actually came in. Indeed, the [COUNTER-Robots|https://github.com/atmire/COUNTER-Robots] list of non-human user agents has the following pattern:
{code:java}
^firefox$
{code}

In the case-insensitive test the "FirefOx" string matches and the test fails. A better test string would be a lower case version of an actual Firefox user agent, for example:
{code:java}
mozilla/5.0 (x11; linux x86_64; rv:91.0) gecko/20100101 firefox/91.0{code}

Now the test would be correct: if a spider user agent pattern matched this valid Firefox user agent then the pattern should cause the test to fail.

This same issue is currently present in DSpace 7 (main) and DSpace 5 as well.

Alan Orth (LYRASIS JIRA)

unread,
Jul 22, 2021, 3:02:02 PM7/22/21
to dspace-...@googlegroups.com
Alan Orth updated an issue
Change By: Alan Orth
Affects Version/s: 7.0
Affects Version/s: 5.10

Anonymous (LYRASIS JIRA)

unread,
Jul 22, 2021, 3:41:01 PM7/22/21
to dspace-...@googlegroups.com
Issue was automatically transitioned when Alan Orth created pull request #3336 in GitHub
Change By: Alan Orth
Status: Received Code Review Needed

Alan Orth (LYRASIS JIRA)

unread,
Jul 24, 2021, 3:05:01 AM7/24/21
to dspace-...@googlegroups.com
Alan Orth updated an issue
Change By: Alan Orth
Affects Version/s: 5.10

Alan Orth (LYRASIS JIRA)

unread,
Jul 24, 2021, 3:05:02 AM7/24/21
to dspace-...@googlegroups.com
Alan Orth updated an issue
dspace-api's {{SpiderDetectorServiceImplTest.testCaseInsensitiveMatching}} test incorrectly uses "FirefOx" as a test string to see whether case-insensitive matching of spider user agents will detect this _seemingly valid_ user agent.

The problem is that the Firefox browser has _never_ used such a user agent, so it would actually be indicative of a non-human user if a request with that user agent actually came in. Indeed, the [COUNTER-Robots|https://github.com/atmire/COUNTER-Robots] list of non-human user agents has the following pattern:
{code:java}
^firefox$
{code}
In the case-insensitive test the "FirefOx" string matches and the test fails. A better test string would be a lower case version of an actual Firefox user agent, for example:
{code:java}
mozilla/5.0 (x11; linux x86_64; rv:91.0) gecko/20100101 firefox/91.0{code}
Now the test would be correct: if a spider user agent pattern matched this valid Firefox user agent then the pattern should cause the test to fail.

This same issue is currently appears to be present in DSpace 7 (main) and DSpace 5 as well.
Reply all
Reply to author
Forward
0 new messages