Gitlab Ci Prevent Download Artifacts ((EXCLUSIVE))

0 views

Skip to first unread message

Natalya Lovitz

unread,

Jan 20, 2024, 7:00:08 PM1/20/24

to tiololecvals

SLSA first launched in 2021 in response to calls for a framework to secure software supply chains. SLSA provides a checklist of standards and controls to prevent tampering, improve integrity, and secure packages and infrastructure. The goal is for software developers to be able to use best practices to guarantee the integrity of each and every artifact, more specifically that the source code users are relying on is the code they are actually using and that the build machine producing the artifacts was secure.

gitlab ci prevent download artifacts

Download File ⚙⚙⚙ https://t.co/mIARXe0BXD

GitLab enables users to generate artifact metadata following the SLSA format for any artifacts that are built on the platform. Because the process happens within the GitLab Runner, without needing third-party software, it prevents the opportunity for any tampering or corruption of the attestation itself.

There are three things you can watch forever: fire burning, water falling, and the build is passing after your next commit. Nobody wants to wait for the CI completion too much, it's better to set up all the tweaks to avoid long waiting between the commit the build status. Cache and artifacts to the rescue! They help reduce the time it takes to run a Pipeline drastically.

People are confused when they have to choose between cache and artifacts. GitLab has bright documentation, but the Node.js app with cache example and the Pipeline template for Node.js contradict each other.

Each job starts with a clean slate and doesn't know the results of the previous one. If you don't use cache and artifacts, the runner will have to go to the internet or local registry and download the necessary packages when installing project dependencies.

We install the npm dependencies and use the cache described in the hidden dependencies_cache job. Then we specify how to update the cache via a pull-push policy. A short lifetime (1 hour) helps to save space for the artifacts. There is no need to keep node_modules artifact for a long time on the GitLab server.

We learned what's the difference between cache and artifacts. We built a reproducible Pipeline that works predictably and uses resources efficiently. This article shows some common mistakes and how to avoid them when you are setting up CI in GitLab.
I wish you green builds and fast pipelines. Would appreciate your feedback in the comments!

Instead of caching node_modules, consider caching node's caching directory instead.
The difference is caching downloaded tar.gz files instead of thousands of small files. Despite gitlab's efforts, their caching mechanism sucks big time for for a large amount of small files.

node_modules can be huge in real world, and then unsuitable for artifacts which are limited in size. Worth knowing, it is also uploaded to central Gitlab, which can be a bottleneck for a large Gitlab instance with lots of runners uploading to it.

Yes, this! My project's node_modules is 2GB and is too big for artifacts. What is the recommended solution to deal with that? I've had to include npm ci on every step to get my pipeline to work at all.

Notice on this one the build job comes before semantic-release. That means the older artifacts are replaced. What happens if the artifact in semantic-release and build conflict by path? Is this defined?

In case we've specified artifacts or caches in our CI files, the job carries two more tasks out, for pulling and(/or) pushing some files as caches or artifacts. The files are even stored in an object storage service like S3 or MinIO or in the container filesystem itself.

Caching is one of the most useful techniques we can use to speed-up gitlab CI jobs. The Gitlab documentation devoted an entire page to caching with a good amount of details about all the caching features and their use-cases.

In the previous gitlab-ci.yaml file we can see that we need to install yarn dependencies aka node_modules each time we need to execute a yarn command. This operation is redundant and so costly in time. For this reason it should be done at most once per pipeline execution.

By default, each time a job with cache tries to pull the cache specified in its definition in gitlab-ci.yaml, then it executes the commands in the script and finally, it pushes the new changes to the files under cache:paths to the cache storage server again. We can change this behavior by changing the cache policy.

Before getting pushed or pulled, the caches (and artifacts) are compressed using the zip algorithm. And starting from version 13.6, we can use fastzip to zip/unzip our caches and artifacts. Even better we can have 5 compression levels to choose from according to the speed/compression ratio we want to achieve (slowest, slow, default, fast, and fastest)

Until now, having the gitlab-ci.yaml committed into a Gitlab repository, Gitlab CI/CD will create a pipeline each time we make a change to our entire git repository. It depends on the project requirements and organization but often this is not the desired behavior since it creates many unwanted pipelines.

Currently this has to be executed manually and it will allow you tomigrate the existing artifacts to the object storage, but all newartifacts will still be stored on the local disk. In the futureyou will be given an option to define a default storage artifacts for allnew files.

Artifacts allow you to persist data after a job has completed, and share that data with another job in the same workflow. An artifact is a file or collection of files produced during a workflow run. For example, you can use artifacts to save your build and test output after a workflow run has ended. All actions and workflows called within a run have write access to that run's artifacts.

By default, GitHub stores build logs and artifacts for 90 days, and this retention period can be customized. For more information, see "Usage limits, billing, and administration." The retention period for a pull request restarts each time someone pushes a new commit to the pull request.

Storing artifacts uses storage space on GitHub. GitHub Actions usage is free for standard GitHub-hosted runners in public repositories, and for self-hosted runners. For private repositories, each GitHub account receives a certain amount of free minutes and storage for use with GitHub-hosted runners, depending on the account's plan. Any usage beyond the included amounts is controlled by spending limits. For more information, see "Managing billing for GitHub Actions."

You can use the upload-artifact action to upload artifacts. When uploading an artifact, you can specify a single file or directory, or multiple files or directories. You can also exclude certain files or directories, and use wildcard patterns. We recommend that you provide a name for an artifact, but if no name is provided then artifact will be used as the default name. For more information on syntax, see the actions/upload-artifact action.

You can define a custom retention period for individual artifacts created by a workflow. When using a workflow to create a new artifact, you can use retention-days with the upload-artifact action. This example demonstrates how to set a custom retention period of 5 days for the artifact named my-artifact:

After a workflow run has been completed, you can download or delete artifacts on GitHub or using the REST API. For more information, see "Downloading workflow artifacts," "Removing workflow artifacts," and the "Actions."

Jobs that are dependent on a previous job's artifacts must wait for the dependent job to complete successfully. This workflow uses the needs keyword to ensure that job_1, job_2, and job_3 run sequentially. For example, job_2 requires job_1 using the needs: job_1 syntax.

In the .gitlab-ci.yml file, you can define both the cache and artifacts keyword in your jobs. Although the official GitLab docs do explain the differences between caching and artifacts here, the following table might help you clarify a few things:

It is possible to combine the advantages of caches and artifacts, by declaring both keywords in the same job. For instance, you can have a cache warming job that makes use of the fact that caches are persistent across multiple pipeline runs. That very same job also declares an artifact (for the same paths as the cache) to reliably transport the files to other jobs (which only need to read these files anyway). Keep in mind, though, that the GitLab runner creates two zip files, one for the cache, one for the artifact, even though they will have the exact same content in this scenario. Thus, the the reliability feature of the artifact incurs extra costs: storage and (compression) time.

In order for the quality gate to fail on the GitLab side when it fails on the SonarQube side, the scanner needs to wait for the SonarQube quality gate status. To enable this, set the sonar.qualitygate.wait=true parameter in the .gitlab-ci.yml file.

So all the steps are pretty much done, but the script only handles one environment, which is not enough in most cases. All the deployment targets in the gitlab-ci.yml file are taken from variables that were set in the GitLab UI. That's good because those support scoping them to different environments. But how to define that? It's just a matter of adding a environment: section to a job definition. Seems easy? Good, because it is.

This looks okay...ish, the deployment can now be done to more than one environment. But the gitlab-ci.yml file has gotten really long because the deployment steps are repeated for each environment. The readability of the file dropped enormously and not to mention how difficult it will be to maintain in the future - if there are more environments, then it will be even more difficult.

More documentation about the feature is available at Use extends to reuse configuration sections (there's even a possibility to split the configuration into several files!). Let's apply this to the gitlab-ci.yaml file: