Strategies for running ZAP in production with Gitlab and the Automation Framework ?

217 views
Skip to first unread message

Thomas Ieong

unread,
Feb 11, 2024, 9:13:35 AMFeb 11
to ZAP User Group
Hey,

I've been tasked with implementing zap in our pipeline. I wanted to share our current implementation and have some feedback on it, do not hesitate to share your implementation aswell.


Thomas Ieong

unread,
Feb 11, 2024, 9:56:09 AMFeb 11
to ZAP User Group

We currently have this in our .gitlab-ci.yml

zap-vulnerability-test:
  stage: test
  when: manual
  image: big-business-image-containing-zap
  tags:
    - php
  only:
    refs:
      - '/^feature\/.*$/'
  variables:
    ZAP_PLAN_AUTHORITY: "http://10.X.X.X"
    ZAP_PLAN_AUTHORITY_REPORT: "10.X.X.X"
  script:
    - cp config/ci/tests/zap/plan.yml plan.yml
    - cp config/ci/tests/zap/config-zap config-zap
    - mkdir reports
    - owasp-zap -cmd -notel -configfile config-zap
    - owasp-zap -cmd -notel -configfile config-zap -addonupdate
    - owasp-zap -cmd -silent -configfile config-zap -autorun plan.yml
  artifacts:
    when: always
    expire_in: 1 week
    paths:
      - reports

Currently this job is only run manually and you have to provide the url of the site where you want to run the tests, you'll notice that we don't run ZAP in one command, why is that?

Well I found that if I just run:

owasp-zap -cmd -notel -configfile config-zap -addonupdate

and not the command before it would fail to fetch the update because ZAP doesn't use the proxy I provided in the config file, so that's why I decomposed the command in multiple steps.

The configfile is there so that ZAP can fetch update through our corporate proxy:

network.connection.defaultUserAgent=Mozilla/5.0 (Windows NT 10.0; rv:109.0) Gecko/20100101 Firefox/118.0
network.connection.httpProxy.enabled=true
network.connection.httpProxy.host=proxy.bigbusiness
network.connection.httpProxy.port=3128

network.globalExclusions.exclusions.exclusion(26).name=switchuser
network.globalExclusions.exclusions.exclusion(26).value=.*?_switch_user.*
network.globalExclusions.exclusions.exclusion(26).enabled=true

The global exclusion is there because we don't want ZAP to click on a switch user link in our app during the test, an exclude from the context should have been enough but I added it there just to be sure.

By the way is there a syntax to append an element to an array? I'm thinking something like this:

network.globalExclusions.exclusions.exclusion(-1).name=foo

Didn't see anything like this in the FAQ I had to count in the GUI manually from what I recall.

And finally here is our plan.yml:

---
env:
  contexts:
  - name: "target"
    urls:
      - "${ZAP_PLAN_AUTHORITY}"
    includePaths:
      - "${ZAP_PLAN_AUTHORITY}/.*"
    excludePaths:
      - "${ZAP_PLAN_AUTHORITY}/random-stuff/.*"
      # Plenty of internal url
      - "${ZAP_PLAN_AUTHORITY}/logout"
    authentication:
      method: "form"
      parameters:
        loginPageUrl: "${ZAP_PLAN_AUTHORITY}/login"
        loginRequestUrl: "${ZAP_PLAN_AUTHORITY}/login"
        loginRequestBody: "username={%username%}&password={%password%}&_csrf_token={%value%}"
      verification:
        method: "response"
        loggedInRegex: "\\Qid=\"link\"\\E"
        loggedOutRegex: "\\Qid=\"login-link\"\\E"
    sessionManagement:
      method: "cookie"
      parameters: {}
    users:
    - name: "test"
      credentials:
        username: "${ZAP_PLAN_USERNAME}"
        password: "${ZAP_PLAN_PASSWORD}"
    # Can't use an include in the 2.14.0 version
    technology:
      exclude:
      - "C"
      - "Windows"
      - "MacOS"
      - "ASP"
      - "IBM DB2"
      - "SQLite"
      - "CouchDB"
      - "Microsoft SQL Server"
      - "Oracle"
      - "JSP/Servlet"
      - "MongoDB"
      - "Firebird"
      - "HypersonicSQL"
      - "MySQL"
      - "SAP MaxDB"
      - "Ruby"
      - "SCM"
      - "IIS"
      - "Microsoft Access"
      - "Java"
      - "Tomcat"
      - "Sybase"
      - "Python"
  parameters:
    failOnError: true
    failOnWarning: true
    progressToStdout: true

jobs:
  - name: "passiveScan-config"
    type: "passiveScan-config"
    parameters:
      scanOnlyInScope: true
      enableTags: false
      disableAllRules: false

  # - name: "delay"
  #   type: "delay"
  #   parameters:
  #     time: 01:00:00

  - name: "requestor"
    type: "requestor"
    parameters:
      user: "test"
    requests:
      - url: "${ZAP_PLAN_AUTHORITY}/"
        name: ""
        method: ""
        httpVersion: ""
        headers: []
        data: ""
    tests:
      - type: "stats"
        statistic: "stats.auth.success"
        site: "${ZAP_PLAN_AUTHORITY}"
        operator: ">="
        value: 1
        onFail: "error"
      - type: "stats"
        statistic: "stats.auth.state.loggedin"
        site: "${ZAP_PLAN_AUTHORITY}"
        operator: ">="
        value: 1
        onFail: "error"

  # - name: "importUrls"
  #   type: import
  #   parameters:
  #     type: url
  #     fileName: /home/

  - name: "spider"
    type: "spider"
    parameters:
      user: test
      maxDuration: 10 # 0 is unlimited
      maxDepth: 5
      handleParameters: IGNORE_COMPLETELY
      parseRobotsTxt: false
      parseSitemapXml: false
      threadCount: 8
    tests:
      - onFail: "INFO"
        statistic: "automation.spider.urls.added"
        site: ""
        operator: ">="
        value: 100
        name: "At least 100 URLs found"
        type: "stats"

  # - name: "spiderAjax"
  #   type: "spiderAjax"
  #   parameters:
  #     browserId: firefox-headless
  #     user: test
  #     # maxDuration: 0
  #     # maxCrawlDepth: 0
  #     numberOfBrowsers: 1 # default 1
  #     runOnlyIfModern: false
  #     inScopeOnly: true
  #     eventWait: 2000 # Default 1000 miliseconds between cript fired
  #     reloadWait: 2000 # Default 1000 miliseconds between cript fired
  #     excludedElements:
  #       - description: "Logout Button"
  #         element: "button"
  #         attributeName: "aria-label"
  #         attributeValue: "Logout"
  #   tests:
  #     - onFail: "INFO"
  #       statistic: "spiderAjax.urls.added"
  #       site: ""
  #       operator: ">="
  #       value: 100
  #       name: "At least 100 URLs found"
  #       type: "stats"

  - name: "passiveScan-wait"
    type: "passiveScan-wait"
    parameters: {}

  # - name: "activeScan"
  #   type: "activeScan"
  #   parameters:
  #     user: test
  #     # maxRuleDurationInMins: 0 # Unlimited
  #     # maxScanDurationInMins: 0
  #     # delayInMs: 0 # delay between requests
  #     handleAntiCSRFTokens: true # affect performance from old blog post
  #     threadPerHost: 2 # max number of threads per host
  #     maxAlertsPerRule: 0
  #   policyDefinition:
  #     rules: []

  - name: "report"
    type: "report"
    parameters:
      template: "risk-confidence-html"
      reportDir: "reports"
      reportFile: "/xfer/${ZAP_PLAN_AUTHORITY_REPORT}"
      reportTitle: "ZAP Scanning Report"
    risks:
      - high
      - medium
      - low
      - info
    confidences:
      - high
      - medium
      - low
      - falsepositive

Nothing fancy here, there is a delay job because we plan to proxy our selenium test through zap in a future implementation.

The spider-ajax is commented out as well because while doing some tests I found that it was getting stuck in some pages, I heard that it would be better with the upcoming client extension but I didn't have time to test it right now (we're using ZAP 2.14.0 btw).

There is an import job that is there, the idea would be to provide a static list of urls  to ZAP so that it can run passive/active scan on these urls, however I'm not sure if ZAP actually visits the urls, I don't remember it providing passive scan analysis on these.

I followed the advice that were given in the ZAP Chat videos, I used the GUI to deal with the authentication, then exported the plan.
example-gitlab.yml
example-plan.yml
example-config

Thomas Ieong

unread,
Feb 11, 2024, 11:07:30 AMFeb 11
to ZAP User Group
The current implementation is simple and works but we'd like it to be able to do more like running active scan on a dedicated environment and reusing our selenium tests so that ZAP can learn about the structure of our application.

A first implementation in gitlab could look like this:

stages:
  - acceptance
  - deploy

deploy-prototype:
  stage: deploy
  environment:
    name: $CI_COMMIT_REF_SLUG
    url: http://app-prototype-${CI_COMMIT_REF_SLUG}.${APP_DOMAIN}:${APP_ACCESS_PORT}
    on_stop: stop-prototype
    auto_stop_in: 1 hour
  script:
    - make deploy-docker-prototype

stop-prototype:
  stage: deploy
  when: manual
  environment:
    name: $CI_COMMIT_REF_SLUG
    action : stop
  script:
    - make stop-docker-prototype

test-selenium-zap:
  stage: acceptance
  needs: ["deploy-prototype"]
  parallel: 5
  services:
    - docker:24.0.6-dind
    - name: ghcr.io/zaproxy/zaproxy:2.14.0
      alias: zap
      command: [
          "zap.sh",
          "-daemon",
          "-silent",
          "-host",
          "0.0.0.0",
          "-port",
          "8080",
          "-config",
          "api.addrs.addr.name=.*",
          "-config",
          "api.addrs.addr.regex=true",
          "-config",
          "api.disablekey=true"
      ]
  script:
    # Bunch of commands ....
    - >
      ./vendor/bin/codecept run -g paracept_$CI_NODE_INDEX
      --xml
      "report_selenium_firefox_testing-$CI_NODE_INDEX.xml"
      --html
      "report_selenium_firefox_testing-$CI_NODE_INDEX.html"
  artifacts:
    when: always
    reports:
      junit: tests/_output/report_selenium_firefox_testing-$CI_NODE_INDEX.xml
    expire_in: 3 day
    paths:
      - tests/_output

We self host our gitlab instance thanks to docker, we're using the docker executor btw.

The deploy-prototype contains custom code that will spin up the containers needed for our app (database, message queue...).

The test-selenium-zap stage is where we actually run the selenium tests and proxy them via zap, there is a "need" keyword so that we wait for our prototype environment to be up and running.

The command that will launch zap use a plan.yml that has the delay job enabled so that it will wait for the selenium tests to pass.

And this is where I've began to have problems, first one is if you're using the automation framework you're supposed to have a cli that looks like this:

zap.sh -cmd -autorun path/to/plan.yml -host 0.0.0.0 -port 8080

However since we're running zap in a service it doesn't have access to our plan.yml, I imagine that one way to counter would be to create another docker image that actually contains all the file needed and run that one instead of ghcr.io/zaproxy/zaproxy:2.14.0

Also how do you specify the environment variables needed in the plan.yml such as the ZAP_USERNAME or ZAP_PASSWORD or the url of the site to scan? I guess that in this situation you could modify the "command" keyword of the zap service to be something like this:

command: /bin/sh -c "export ZAP_PLAN_USERNAME=$ZAP_PLAN_USERNAME && owasp command ...."

but that is quite ugly, I guess another solution could be building a zap container earlier in the pipeline that contains the plan.yml and has the env variables needed?

There is an open issue on gitlab for variables expansion in services https://gitlab.com/gitlab-org/gitlab-runner/-/issues/3808

Okay so let's pretend that we solve these problem, now what we want is to signal zap that once the selenium tests are over to actually begin to run the spider, active scan...

Anddddddd there's another problem, you see we run like 300+ selenium tests in parallel separated in 5 groups, (that is why we have a "parallel" keyword set to 5)

How do you make sure that all of the tests are actually over?

The "after_script" keyword will run at the end of each of the 5 jobs, but we want to run a curl command once all of the tests are over not at the end of each job so that ZAP can do its magic.

No idea here. 

Finally let's pretend we also solved that problem, how do we actually fetch the report produced by the service?

I guess the easiest solution here would be modify the plan.yml and add a script job at the end so that it sends the reports via mail,gitlab issue, matrix instance...

Took inspiration from these links:

Thomas Ieong

unread,
Feb 11, 2024, 11:36:04 AMFeb 11
to ZAP User Group
A second implementation could be not using the "services" keyword and actually have zap running in its own environment.

By that I mean have a job named deploy-zap and then you'd have a bit less problems about plan.yml and env vars but I feel it's bit too much,  zap in a "services" should suffice I think.

Thoughts?

Simon Bennetts

unread,
Feb 12, 2024, 9:30:56 AMFeb 12
to ZAP User Group
Wow, theres a lot here to go through!

I cant give any feedback on the gitlab side as I've never used it.

Thanks for the feedback about the proxy - we do plan adding direct support for that to the Automation Framework.

To add an element to an array via the command line I think you have to know which the last element will be and use the one after.
This is obviously pretty nasty and one reason why we are focussing on the Automation Framework as a much cleaner option.

I did notice that you have lots of technology "excludes".
We now support "includes" which may support your usecase better: https://www.zaproxy.org/docs/desktop/addons/automation-framework/environment/

Did you try browser based auth (and authentication detection for that matter?).
If you want to enable the AJAX SPider then browser based auth is probably the way to go - injecting session state into browsers is very hard.

Re the plan - the expected way to run ZAP when using Docker and other files is to mount a directtory containing those files - see https://www.zaproxy.org/docs/docker/about/#mounting-the-current-directory
You can also specify environmental variables to docker via the command line.

ZAP has its own delay job that is ideal if you need to proxy tests through ZAP: https://www.zaproxy.org/docs/desktop/addons/automation-framework/job-delay/
As ypou'll see it has a variety of options for detecting when the delay should end :)

Cheers,

Simon

Thomas Ieong

unread,
Feb 15, 2024, 4:18:31 AMFeb 15
to ZAP User Group

> Did you try browser based auth (and authentication detection for that matter?).
> If you want to enable the AJAX SPider then browser based auth is probably the way to go - injecting session > state into browsers is very hard.

I did try the auto-detect auth and the browser based auth but what I remember it didn't just
work so that's why I settled for the form based auth, I'll give it another try later.


> Re the plan - the expected way to run ZAP when using Docker and other files is to mount a directtory containing those files - see https://www.zaproxy.org/docs/docker/about/#mounting-the-current-directory
You can also specify environmental variables to docker via the command line.

As of gitlab version 16.x, you cannot mount volumes to a services, this is a planned feature for 17.0.

See https://gitlab.com/gitlab-org/gitlab-runner/-/issues/28121#note_1675743672

For now you'll have to build a custom docker image that has all the files needed.

For the environment variables, it was my mistake you can just use a "variables" keyword and that does the trick.


> ZAP has its own delay job that is ideal if you need to proxy tests through ZAP: https://www.zaproxy.org/docs/desktop/addons/automation-framework/job-delay/

I did see the delay job but once again the complexity is on the gitlab side.

With this job:


test-selenium-zap:
  stage: acceptance
  needs: ["deploy-prototype"]
  services:

    - name: ghcr.io/zaproxy/zaproxy:2.14.0
      alias: zap
      command: [
          "zap.sh",
          "-silent",
          "-host",
          "0.0.0.0",
          "-port",
          "8080",
          "-config",
          "api.addrs.addr.name=.*",
          "-config",
          "api.addrs.addr.regex=true",
          "-config",
          "api.disablekey=true",
  "-autorun",
  "plan.yml"

      ]
  script:
    # Bunch of commands ....
    # Below will run selenium test
    - php vendor/bin/codecept run tests/acceptance
    # Some curl command to tell zap to stop the delay job and carry on with the scan
    - curl http://zap/JSON/automation/action/endDelayJob
    # Then for the report, need to add a script job in the plan.yml that sends it via mail etc
    # You could also enable the file transfer api, then this job could fetch it from the zap service and you could store the reports as a job artifact https://docs.gitlab.com/ee/ci/jobs/job_artifacts.html

Everything works as expected, except that we have like 300 tests running so it's taking a huge amount of time (20mn for the selenium tests then add 20mn for the passive/active scan of ZAP).

On a side note is this issue https://github.com/zaproxy/zaproxy/issues/2157 still a thing? Can anyone who use this in CI share their experience?

One solution to reduce the amount of time our tests are taking are to parallelize them, in gitlab you do so with the keyword "parallel", here we will separate our tests in 5 groups, so it will be "parallel: 5"


test-selenium-zap:
  stage: acceptance
  needs: ["deploy-prototype"]
  parallel: 5
  services:
    - name: ghcr.io/zaproxy/zaproxy:2.14.0
      alias: zap
      command: [
          "zap.sh",
          "-silent",
          "-host",
          "0.0.0.0",
          "-port",
          "8080",
          "-config",
          "api.addrs.addr.name=.*",
          "-config",
          "api.addrs.addr.regex=true",
          "-config",
          "api.disablekey=true",
  "-autorun",
  "plan.yml"
      ]
  script:

    - >
      ./vendor/bin/codecept run -g paracept_$CI_NODE_INDEX
      --xml
      "report_selenium_firefox_testing-$CI_NODE_INDEX.xml"
      --html
      "report_selenium_firefox_testing-$CI_NODE_INDEX.html"
    - curl http://zap/JSON/automation/action/endDelayJob

What will happen is that this job will actually be executed 5 times at the same time, so each job
will launch its own group of test and the first one that finishes will do the curl command to tell ZAP to continue with the scan.

And that's the problem with that parallel keyword, how can you make sure that ZAP only starts when all of the selenium tests are finished?

The only solution I can come up with consists of adding an "after_script" block and write a custom script that checks if the other jobs actually have finished their selenium tests and only then tell zap to continues.

However it feels very hacky and I should not have to do that, there must be a better solution but I haven't found it yet.

Right now I guess we could not use the parallel keyword and accept that this job will take +40mn and then run that job every night at 3AM or something.

Or just run it on a subset of our test.

Thoughts?

Simon Bennetts

unread,
Feb 15, 2024, 5:23:55 AMFeb 15
to ZAP User Group
ZAP can take a while to perform security tests I'm afraid, it often has a lot to do.

We have a lot of guidance as to how you can speed up ZAP scans here: https://www.zaproxy.org/faq/how-can-you-speed-up-scans/

Re https://github.com/zaproxy/zaproxy/issues/2157 I have no idea, no one has posted to that issue in a while, and its got no votes so is not on my radar to look at.

Re triggering ZAP to continue after unit tests have completed - as mentioned before that exactly what the delay job is for.
How you trigger it to continue may well be messy, and it sounds like the problem is on the gitlab side?

Cheers,

Simon
Reply all
Reply to author
Forward
0 new messages