Hi everyone,
I am trying to understand how CSD works with its two models: content and visual. I built Chromium and tested it on a locally hosted phishing page (e.g., Bank of America).
The content model produces a final score and verdict, while the visual model outputs a list of scores for different visual features. In my tests, the content model correctly flagged the page with a score above the threshold, but the visual scores always remained below their thresholds, even with the Bank of America logo and layout fully present.
I repeated this with other phishing pages (chase.com or wellsfargo.com) and observed the same: the content model works as expected, but the visual model never triggers.
Why might this be the case? Also, how exactly does Chrome use the visual scores and thresholds? From what I read in some academic papers, if any visual score crosses its threshold, the page is flagged as phishing. Is that correct?
If anyone could provide some clarification, I would greatly appreciate it.
Thank you, Daniel, for your response.
Regarding the visual model, my apologies for the earlier confusion. The model was run correctly, and I did see the visual scores (screenshot attached). I also built a getter to extract the thresholds for each score. My questions are:
For the content model, I observed its final score crossing the threshold. However, none of the visual scores have ever crossed theirs, even when replicating the page. Could you clarify what these visual scores represent, and why they might never exceed the threshold despite clear impersonation of the target page?
You mentioned that Safe Browsing produces a verdict if a visual score crosses the threshold. What about the content model - can it alone trigger a phishing verdict, or does it also require further checks from Safe Browsing? More broadly, do the content and visual models operate independently and report to Safe Browsing whenever either crosses its threshold, or is there some coordination between them?
I look forward to your clarification and greatly appreciate your help.

Thanks again for your response.
Regarding the visual categories: do you know whom or where I might be able to learn more about what these categories represent? Any pointers would be greatly appreciated.
More importantly, I want to make sure I fully understand how the client-side models (content and visual) work together with server-side classification. From your explanation, my current understanding is:
Neither the content nor visual model alone produces a phishing verdict.
If either model’s score crosses its threshold, Chrome pings the server-side classifier, which then makes the final determination.
Could you clarify if this is indeed the only circumstance under which Chrome pings the server-side classification? Or are there other scenarios where a server-side check is triggered?