# Scanning a docker image

## Xygeni docker image scan

Xygeni is capable of identifying **vulnerabilities in Docker images**.&#x20;

{% hint style="info" %}
Please note that not all scan commands are available for image scanning. Visit [this ](https://docs.xygeni.io/xygeni-scanner-cli/xygeni-cli-overview/xygeni-cli-operation-modes/single-scan/..#scan-commands-available-for-image-scanning)to see which scan commands are available for image scanning.
{% endhint %}

<figure><img src="https://4096647782-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FUTz59rJLkJBjiRWAMknU%2Fuploads%2FcxbGvKP5qpuNv2Khi6uT%2Fimage.png?alt=media&#x26;token=dee0421b-516f-4ce7-90f7-e2175d079a0b" alt=""><figcaption></figcaption></figure>

For running a scan over a container image:

```bash
xygeni [command] --image my_image:latest
```

### Docker image

Use `--image` to pass the image name, including the registry host, the namespace and the repository, plus tag or the image SHA digest.

```
Container image options:
      --image=<image>        The container image, in registry/repository/image:tag format.
                             Examples: debian, alpine:latest, cgr.dev/chainguard/go,
                             gcr.io/google-containers/python@sha256:fe...4b
      --image-platform=<platform>
                             The image platform in the form os/arch, if image is multi-platform.
      --image-sources=sources
                             The image source(s) to use, comma-separated in order.
                             Defaults to docker, containerd, podman, remote.
      --image-scope=<scope>  How layers are analyzed. One of merged, mergedExceptBase, byLayer,
                             byLayerExceptBase. Default: merged.
```

{% hint style="info" %}
The **image name** follows the `[HOST[:PORT_NUMBER]/][NAMESPACE/]REPOSITORY[:TAG|@DIGEST]`convention, see [docker convention](https://docs.docker.com/reference/cli/docker/image/tag/) for more details.
{% endhint %}

For multi-platform images, the `--image-platform OS/ARCH` could be provided. When not given, the platform where the scanner runs will be used.

### Docker image Sources

{% hint style="info" %}
Specify in `--image-sources` the comma-separated list of sources where the image could be pulled.&#x20;
{% endhint %}

The following are the supported **sources**:

* `docker`: the local [docker engine](https://docs.docker.com/engine/) will be used when available.
* `containerd`: the containerd daemon, via [nerdctl](https://github.com/containerd/nerdctl), will be used when available.
* `podman`: the [podman](https://podman.io/) cli will be used when available.
* `remote`: pull image directly from a remote OCI registry, using the [OCI distribution api](https://github.com/opencontainers/distribution-spec/blob/v1.0.1/spec.md).
* `tarball:<path>`: when the image contents are available locally, the path should point to the directory or tar file with the image contents. The image ("tarball") is expected to be in the OCI format.

The default is first try with `docker`, then with `containerd`, then via `podman` and then `remote` as last option.

{% hint style="info" %}
If the image was already pulled locally, scan times may improve. It could be convenient to use a local runtime if the image to analyze has already been pulled.
{% endhint %}

The `remote` source needs credentials for remote registries. See  [container registry configuration](#container-registry-configuration) for details. For other image sources, an authenticated session in the underlying runtime is required. For example, via `docker login` for docker engine, `nerdctl login` for containerd, or `podman login` for podman.

### Docker image Layers&#x20;

As container images are made of **layers**, for some scans like secrets it is convenient to scan the contents of each layer separately.&#x20;

{% hint style="info" %}
The `--image-scope` controls which layers to consider and how the scan proceeds, either layer-by-layer or on the merged filesystem combining all layers.
{% endhint %}

| --image-scope value | Mode of operation   | Layers to process     |
| ------------------- | ------------------- | --------------------- |
| merged              | combined filesystem | all                   |
| mergedExceptBase    | combined filesystem | all except base image |
| byLayer             | layer by layer      | all                   |
| byLayerExceptBase   | layer by layer      | all except base image |

### Container Registry configuration

The configuration file `xygeni.yml` contains sections where each external system is configured.&#x20;

As the scanner often runs without user interaction, for example in a CI/CD pipeline, authentication is often done via access tokens that are ephemeral and generated following an authentication workflow (SAML, OIDC and JSON web tokens are often used in CI/CD systems).

Once a valid access token is available, it is often stored in a configuration file, environment variable, or in a secret vault managed by the CI/CD system. Configuration for an external system uses a *token source* that fetches the token from a list of environment variables or files.

For pulling images from container registries, or storing attestations in OCI registries, the `containerRegistry` section configures the location and sources of access tokens for each registry. For example, for Docker Hub:

```yaml
# Container (OCI) Registries
containerRegistry:
  -
    # Docker Hub
    # The hostname to match in the image name. This is the default when no hostname provided.
    hostname: docker.io
    # Docker registry official URL
    url: 'https://registry-1.docker.io'
    # Which projects use this registry? A hostname is often given in the image name.
    # Use a regex pattern, like 'project1|project2|project3' or 'prefix_.*'
    # Leave empty for matching by hostname.
    usedBy: ''
    # The username to connect to the registry api.
    user: null
    # How the access token should be fetched:
    # From environment/system property (env:),
    # From file (use ${scanned.dir} for scanned directory, ${XYGENI_DIR} for scanner directory, ${user.home} for $HOME),
    # or encode directly (use encryption to protect the token against casual readers)
    tokenSources:
      - env:DOCKER_TOKEN
      - file:${user.home}/.docker.token
  -
    # The hostname to match in the image name. This is the default when no hostname provided.
    # For private registry, copy this using <aws_account>.dk.ecr.<region>.amazonaws.com as hostname,
    # or use wildcards like *.dk.ecr.*.amazonaws.com
    hostname: public.ecr.aws
    # Docker registry official
    url: 'https://public.ecr.aws'
    # Which projects use this registry? A hostname is often given in the image name.
    # Use a regex pattern, like 'project1|project2|project3' or 'prefix_.*'
    # Leave empty for matching by hostname.
    usedBy: ''
    # The username to connect to the registry api. ECR uses a fixed name.
    user: null
    # How the access token should be fetched:
    # From environment/system property (env:),
    # From file (use ${scanned.dir} for scanned directory, ${XYGENI_DIR} for scanner directory, ${user.home} for $HOME),
    # or encode directly (use encryption to protect the token against casual readers)
    tokenSources:
      - env:AWS_ECR_TOKEN
      - file:${user.home}/.aws_ecr.token
  -
    # Azure CR
    # The hostname for public Microsoft Container Registry.
    # For Azure CR, replace with <org-registry>.azurecr.io with your own, or use wildcard *.azurecr.io
    hostname: mcr.microsoft.com
    # public CR. Leave blank to reuse private hostname
    url: 'https://mcr.microsoft.com'
    # Which projects use this registry? A hostname is often given in the image name.
    # Use a regex pattern, like 'project1|project2|project3' or 'prefix_.*'
    # Leave empty for matching by hostname.
    usedBy: ''
    # The username to connect to the registry api.
    user: null
    # How the access token should be fetched:
    # From environment/system property (env:),
    # From file (use ${scanned.dir} for scanned directory, ${XYGENI_DIR} for scanner directory, ${user.home} for $HOME),
    # or encode directly (use encryption to protect the token against casual readers)
    tokenSources:
      - env:AZURE_CR_TOKEN
      - file:${user.home}/.azure_cr.token
  -
    # Google CR
    # Transitioning to pkg.dev. You may need to change for the hostname of your private repository
    # google-containers and distroless are popular public repositories
    hostname: gcr.io
    url: 'https://gcr.io'
    usedBy: ''
    user: null
    tokenSources:
      - env:GCR_TOKEN
      - file:${user.home}/.gcr.token
  -
    # GitHub CR
    # ghcr.io is the hostname for GitHub container registry
    hostname: ghcr.io
    url: 'https://ghcr.io'
    usedBy: ''
    user: null
    tokenSources:
      - env:GITHUB_TOKEN
      - env:GITHUB_PAT
      - file:${user.home}/.github.token
  -
    # GitLab CR
    # registry.gitlab.com is the hostname for GitLab CR
    hostname: registry.gitlab.com
    url: ''
    usedBy: ''
    user: null
    tokenSources:
      # Perhaps a restricted token with read_registry permissions could be used here
      - env:GITLAB_TOKEN
      - env:GITLAB_PAT
      - file:${user.home}/.gitlab.token
  -
    # JFrog Artifactory. You may similarly configure Sonatype Nexus, etc.
    # The hostname for registries for on-cloud Artifactory, to match your own
    # You may write your own if needed, like docker.artifactory.your_domain, etc.
    hostname: '*.jfrog.io'
    url: ''
    usedBy: ''
    user: null
    tokenSources:
      # Perhaps a restricted token with read_registry permissions could be used here
      - env:JFROG_TOKEN
      - file:${user.home}/.jfrog.token
```
