Spider Missing Several URLs

Charles Williams

unread,

Apr 20, 2022, 12:13:10 PM4/20/22

to OWASP ZAP User Group

Hi ZAP Team,

I've been testing ZAP with my web application, and I'm noticing a rather large discrepancy between the sites tree that my spider creates, and the sites tree that I create when I manually explore the application. Note that I'm running this spider as a stage in my automation plan for the Automation Framework, and I've confirmed that the spider is authenticated. I've tried the traditional spider with every extra feature selected, and that still misses a large majority of my links. I've also tried implementing the Ajax spider in headless chrome, but it always returns saying it found 0 links . There are many href links in my HTML that aren't appearing in the sites tree (and clicking on these links leads to several other GET requests that are being missed in the scan).

In addition, repeated spider scans find more links than the initial scan. For instance, my first spider scan may find 7 links, and then sending another spider to start from that same location finds 50 more links. I understand that by developing the sites tree that more sites can be searched, but if I don't limit the time and scope of the search, why do I need multiple scans to find these URLs?

I suppose out of this I have two questions:

1) are there ways I should be better utilizing the traditional/ajax spiders to pull all the desired links in a single go?

2) if there isn't any way to get these spiders to mimic the exact behavior of my manual scan, how can I import the sites tree from my manual search into the plan for my Automation Framework scan?

Thank you!

Charles Williams

unread,

Apr 20, 2022, 12:16:57 PM4/20/22

to OWASP ZAP User Group

I should note that running the Ajax Spider directly on the Context in sites tree does produce more results, but running it via the AF always is giving me 0 links found.

Charles Williams

unread,

Apr 20, 2022, 12:36:06 PM4/20/22

to OWASP ZAP User Group

Disregard the Ajax Spider portion of this post, I figured out that I had it configured wrong in the automation plan.

However, I'm still left with why certain links aren't picked up by the spider. One theory I have is that there are certain href links that lead to one location, but clicking on them makes GET requests to other different links. In theory the Ajax Spider should catch this, but I'm not certain why it does not.

Simon Bennetts

unread,

Apr 20, 2022, 12:39:49 PM4/20/22

to OWASP ZAP User Group

Can you share HTML and JavaScript snippets for those links?

If we can reproduce the problem then we can fix it :)

If we cant then we cant :(

Many thanks,

Simon

Charles Williams

unread,

Apr 20, 2022, 12:57:04 PM4/20/22

to OWASP ZAP User Group

Here is an example link from the HTML - it is certainly buried in the body of the webpage, but there is no JavaScript that it interacts with to directly access it:

<a href="/organizations/61609B00-0E0D-4571-8084-F08E507B0581/patients/0308bb81-a927-413a-a28c-d849e867d2bd"><strong>Doe, John</strong> <span class="bocbadge">Edited today</span><br><div style="color: rgb(138, 143, 153); font-size: 12px;"><div>DOB: 01/25/2001<br><div>MRN: </div></div></div></a>

When navigating to this link, while the URL appears as the href says, no GET requests are actually sent to this link to access that data. Instead, clicking this link will send GET requests (and receive 200 responses from):
- /api/patients/0308bb81-a927-413a-a28c-d849e867d2bd/?org_id=61609B00-0E0D-4571-8084-F08E507B0581&_=1650472912963

- /v2/patients/0308bb81-a927-413a-a28c-d849e867d2bd/modules/?_=1650472912964

Most (if not all) of my missing links come from the /api/ and /v2/ prefixes, and they are predominantly accessed in this fashion - could that be what causes the spider to fail?

Two interesting notes I picked up on as well while I went back and tested this:

1) The full /organizations/61609B00-0E0D-4571-8084-F08E507B0581/patients/0308bb81-a927-413a-a28c-d849e867d2bd link never appears in the sites tree after spidering, it only gets as close as /organizations/61609B00-0E0D-4571-8084-F08E507B0581/patients/

2) On a related note, I noticed that the Ajax Spider gets caught in an infinite loop (since some of my URLs contain timestamps like ?_=1650472912964). Is there some way to ignore this in the Ajax Spider, like how the traditional spider can 'consider only parameter's name' for parameter handling?

Charles Williams

unread,

Apr 20, 2022, 12:57:58 PM4/20/22

to OWASP ZAP User Group

I should mention this is all test-generated data, so things like names and DoB in the link were fine to share

Charles Williams

unread,

Apr 20, 2022, 1:15:02 PM4/20/22

to OWASP ZAP User Group

I think the TL;DR of the problem is that my webpages have certain things that when clicked will change the format of the page (and pull in data from GET requests to the API), but the URL in the hotbar will not change, and maybe that means ZAP won't recognize it?

Charles Williams

unread,

Apr 20, 2022, 2:28:01 PM4/20/22

to OWASP ZAP User Group

Did a little more digging, I'm looking at the responses to my GET requests in ZAP console, and I'm noticing that some of the HTML is missing. When I do a local check of the HTML elements, I see a content div that has all of my data in it (all of my missing links come from this section). However, it is appearing as an empty div in the ZAP response

<div id="content"></div>

I'll keep looking to see what would cause this, but this is definitely the root of the problem.

Charles Williams

unread,

Apr 20, 2022, 2:37:43 PM4/20/22

to OWASP ZAP User Group

I stand corrected, this is the same as it is on my local session :( It would appear that the GET request for my page returns a rather bare-bones environment, and then it automatically makes several API calls to populate this page, I wonder if ZAP is unable to replicate this API behavior, or at least I'm not sure how to configure it.

Simon Bennetts

unread,

Apr 21, 2022, 5:06:25 AM4/21/22

to OWASP ZAP User Group

A link like the example you gave should be easy for the standard spider to handle.

However there are some controls which will affect how ZAP will handle the links.

The full set of options are described on https://www.zaproxy.org/docs/desktop/ui/dialogs/options/spider/

The ones that I think might be relevant in this case:

Maximum Depth to Crawl: by default this is 5
Maximum Children to Crawl: by default this is unlimited, so less likely to be relevant

I would definitely try increasing the Max Depth to Crawl setting.

The Ajax Spider getting caught in a loop is obviously a problem.

Unfortunately the spider cannot inspect links before "clicking" on them as we rely on the browser to handle the javascript.

There are features in the spider designed to prevent loops but they dont appear to be working in this case.

This one will prove difficult to investigate unless you can supply us with a test case :/