Here is an example link from the HTML - it is certainly buried in the body of the webpage, but there is no JavaScript that it interacts with to directly access it:
<a href="/organizations/61609B00-0E0D-4571-8084-F08E507B0581/patients/0308bb81-a927-413a-a28c-d849e867d2bd"><strong>Doe, John</strong> <span class="bocbadge">Edited today</span><br><div style="color: rgb(138, 143, 153); font-size: 12px;"><div>DOB: 01/25/2001<br><div>MRN: </div></div></div></a>
When navigating to this link, while the URL appears as the href says, no GET requests are actually sent to this link to access that data. Instead, clicking this link will send GET requests (and receive 200 responses from):
- /api/patients/0308bb81-a927-413a-a28c-d849e867d2bd/?org_id=61609B00-0E0D-4571-8084-F08E507B0581&_=1650472912963
- /v2/patients/0308bb81-a927-413a-a28c-d849e867d2bd/modules/?_=1650472912964
Most (if not all) of my missing links come from the /api/ and /v2/ prefixes, and they are predominantly accessed in this fashion - could that be what causes the spider to fail?
Two interesting notes I picked up on as well while I went back and tested this:
1) The full /organizations/61609B00-0E0D-4571-8084-F08E507B0581/patients/0308bb81-a927-413a-a28c-d849e867d2bd link never appears in the sites tree after spidering, it only gets as close as /organizations/61609B00-0E0D-4571-8084-F08E507B0581/patients/
2) On a related note, I noticed that the Ajax Spider gets caught in an infinite loop (since some of my URLs contain timestamps like ?_=1650472912964). Is there some way to ignore this in the Ajax Spider, like how the traditional spider can 'consider only parameter's name' for parameter handling?