How visual model in client-side detection works?

138 views
Skip to first unread message

Hoang Nguyen Dai

unread,
Sep 11, 2025, 5:37:06 PM (5 days ago) Sep 11
to Chromium-dev

Hi everyone,

I am trying to understand how CSD works with its two models: content and visual. I built Chromium and tested it on a locally hosted phishing page (e.g., Bank of America).

The content model produces a final score and verdict, while the visual model outputs a list of scores for different visual features. In my tests, the content model correctly flagged the page with a score above the threshold, but the visual scores always remained below their thresholds, even with the Bank of America logo and layout fully present.

I repeated this with other phishing pages (chase.com or wellsfargo.com) and observed the same: the content model works as expected, but the visual model never triggers.

 Why might this be the case? Also, how exactly does Chrome use the visual scores and thresholds? From what I read in some academic papers, if any visual score crosses its threshold, the page is flagged as phishing. Is that correct?

If anyone could provide some clarification, I would greatly appreciate it.

Daniel Rubery

unread,
Sep 12, 2025, 1:00:33 PM (4 days ago) Sep 12
to Chromium-dev, Hoang Nguyen Dai
> ... the visual model never triggers. Why might this be the case?

To determine why the visual model isn't running, you'll want to trace through the execution of phishing_classifier.cc. Here is the place where we start classification. Does that get called? Does the scorer have a non-null `visual_tflite_model_`? 

From what I read in some academic papers, if any visual score crosses its threshold, the page is flagged as phishing. Is that correct?

That is not correct. We divide classification between a client-side model and server-side model. If a visual score crosses the threshold, we send some data about the page to Safe Browsing for a more computationally-intensive classification. We only show a warning if Safe Browsing indicates the page is phishing.

Hoang Nguyen Dai

unread,
Sep 12, 2025, 5:20:23 PM (4 days ago) Sep 12
to Chromium-dev, Daniel Rubery, Hoang Nguyen Dai

Thank you, Daniel, for your response.

Regarding the visual model, my apologies for the earlier confusion. The model was run correctly, and I did see the visual scores (screenshot attached). I also built a getter to extract the thresholds for each score. My questions are:

  1. For the content model, I observed its final score crossing the threshold. However, none of the visual scores have ever crossed theirs, even when replicating the page. Could you clarify what these visual scores represent, and why they might never exceed the threshold despite clear impersonation of the target page?

  2. You mentioned that Safe Browsing produces a verdict if a visual score crosses the threshold. What about the content model - can it alone trigger a phishing verdict, or does it also require further checks from Safe Browsing? More broadly, do the content and visual models operate independently and report to Safe Browsing whenever either crosses its threshold, or is there some coordination between them?

I look forward to your clarification and greatly appreciate your help.


Hoang Nguyen Dai

unread,
Sep 12, 2025, 5:22:00 PM (4 days ago) Sep 12
to Chromium-dev, Daniel Rubery, Hoang Nguyen Dai
Posting the screenshot.
screenshot.png
On Friday, September 12, 2025 at 12:00:33 PM UTC-5 Daniel Rubery wrote:
Reply all
Reply to author
Forward
0 new messages