Error on data_setting.yml

102 views
Skip to first unread message

이지원

unread,
Jan 25, 2022, 2:29:32 AM1/25/22
to cpax_forum
Hi all,

I want to use the C-PAC command-line interface (CLI) with ABIDE dataset.

data_setting.yml file is ready to run.

$ cpac utils data_config build ~/abide_data_setting.yml

After this command line I got an error as shown below.

p1.png

How can I get data_config file without this error?

Thank you.
abide_data_setting.yml

Jon Clucas, MIS

unread,
Jan 25, 2022, 11:04:53 AM1/25/22
to cpax_forum
Hi,

Thanks for reporting this issue! Your data settings file specifies

# Directory where CPAC should place data configuration files.
outputSubjectListLocation: /media/12T/ABIDE 

but cpac isn't automatically binding that path to the Docker container (I'll open an issue so a future version will).

I believe you can just add a custom binding (like cpac -B /media/12T/ABIDE:/media/12T/ABIDE utils data_config build ~/abide_data_setting.yml) and cpac utils data_config build should function. Alternatively, if you run cpac utils data_config build ~/abide_data_setting.yml from inside /media/12T/ABIDE or a parent directory, that directory should bind automatically and cpac utils data_config build should function.

Please let us know how it goes

이지원

unread,
Jan 28, 2022, 12:40:57 AM1/28/22
to cpax_...@googlegroups.com
Hi Jon Clucas,

Thank you very much for your reply.

I success to get data_config file! 

I have some questions about C-PAC.

1. cpac -B 
/media/12T/ABIDE:/media/12T/ABIDE 
utils data_config build ~/abide_data_setting.yml)
I'm having a hard time understanding this command and I can't find the form, so please explain. (especially second line)

2. As you said run cpac utils data_config build ~/abide_data_setting.yml from inside /media/12T/ABIDE then it can bind automatically. 
when I move to /media/12T/ABDIE directory then run cpac how can it bind automatically? I'd like to know why differences in run at home directory or run at  /media/12T/ABDIE directory.

2-1. After I get data_config file I run at home directory but I got error.
~$ cpac run /media/12T/ABIDE/any_dir /media/12T/ABIDE/case1_s3_default_output participant --data_config_file /media/12T/ABIDE/data_config_abide.yml
So I move to /media/12T/ABDIE directory (cd /media/12T/ABIDE/ ) and run this command line again.
/media/12T/ABIDE$ cpac run /media/12T/ABIDE/any_dir /media/12T/ABIDE/case1_s3_default_output participant --data_config_file /media/12T/ABIDE/data_config_abide.yml
The result was successful.
As I refered above, I'd like to know why the results of run at home directory or run at  /media/12T/ABDIE directory are different.

3. 
Finally, to run C-PAC with a specific data configuration file (instead of providing a BIDS data directory):
cpac run /Users/You/any_directory /Users/You/some_folder_for_outputs participant --data_config_file /Users/You/Documents/data_config.yml
In this command line, what is any_directory's role?
After I run it, a new directory called  'any_di'  has been created separate from /media/12T/AB/Users/You/any_directory.

I look forward to your detailed explanation.

Thanks for your kind reply,
LEE

2022년 1월 26일 (수) 오전 1:04, Jon Clucas, MIS <jon.c...@childmind.org>님이 작성:
--
You received this message because you are subscribed to a topic in the Google Groups "cpax_forum" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cpax_forum/0dzk9MDqK_E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cpax_forum+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cpax_forum/9b740322-906c-4093-97c3-cd003987ad77n%40googlegroups.com.
success.jpg
fail.jpg

Jon Clucas, MIS

unread,
Jan 28, 2022, 3:14:10 PM1/28/22
to cpax_forum
I'm glad had success, and thanks for your clarifying questions!

1. cpac -B 
/media/12T/ABIDE:/media/12T/ABIDE
utils data_config build ~/abide_data_setting.yml)
I'm having a hard time understanding this command and I can't find the form, so please explain. (especially second line)

Some excerpts from  cpac --help:

usage: cpac [-h] [--version] [-o OPT] [-B CUSTOM_BINDING]
            [--platform {docker,singularity}] [--image IMAGE] [--tag TAG]
            [--working_dir PATH] [-v] [-vv]
            {run,group,utils,pull,upgrade,crash} ...

cpac: a Python package that simplifies using C-PAC <http://fcp-indi.github.io> containerized images.

This commandline interface package is designed to minimize repetition.
As such, nearly all arguments are optional.

When launching a container, this package will try to bind any paths mentioned in
 • the command
 • the data configuration

[…]
optional arguments:
[…]
  -B CUSTOM_BINDING, --custom_binding CUSTOM_BINDING
                        directories to bind with a different path in
                        the container than the real path of the directory.
                        One or more pairs in the format:
                                real_path:container_path
                        (eg, /home/C-PAC/run5/outputs:/outputs).
                        Use absolute paths for both paths.
                       
                        This flag can take multiple arguments so cannot be
                        the final argument before the command argument (i.e.,
                        run or any other command that does not start with - or --)


The talk about bind paths is containerization lingo; C-PAC runs in a container (Docker or Singularity; from your logs I can see you're running in Docker) which is a separate operating environment from your local machine. Bind paths are essentially tunnels between your local environment and your container.

The log in the screenshot attached to the first message in this thread shows these bindings for cpac utils data_config build ~/abide_data_settings.yml:

local              Docker     mode
-----------------  ---------  ------
/home/djk          /home/djk  rw
/home/djk          /tmp       rw
/home/djk/outputs  /output    rw
/home/dkk/log      /crash     rw

You can visualize your bind paths like first try.png

We should clarify that cpac also binds the current working directory to the container at the same path by default and if no working directory is specified, the current working directory is assumed to be the desired working directory (hence binding to /tmp)

The custom binding flag (-B /media/12T/ABIDE:/media/12T/ABIDE in the format "-B real_path:container_path") tells Docker to include the local path /media/12T/ABIDE (the left of the :) in the container at the path /media/12T/ABIDE (the left of the :), so assuming /media is a separate device from /home, that binding adds a tunnel to your Docker container like

-B ∕media∕12T∕ABIDE˸∕media∕12T∕ABIDE.png

(if your /media is on the same drive as /home, you'd just have all four tunnels pointing to paths on the top-left device)

2. As you said run cpac utils data_config build ~/abide_data_setting.yml from inside /media/12T/ABIDE then it can bind automatically. 
when I move to /media/12T/ABDIE directory then run cpac how can it bind automatically? I'd like to know why differences in run at home directory or run at  /media/12T/ABDIE directory.

Something I think isn't sufficiently clear in our documentation is that cpac assumes your current working directory is the working directory is your present working directory. Since your data settings file specifies that you want to save outputs in /media/12T/ABIDE, a container launched from within that directory would have bindings like
after cd.png and be able to write to that path, but that path won't exist if the container is launched from a working directory outside of /media

3. Finally, to run C-PAC with a specific data configuration file (instead of providing a BIDS data directory):
cpac run /Users/You/any_directory /Users/You/some_folder_for_outputs participant --data_config_file /Users/You/Documents/data_config.yml
In this command line, what is any_directory's role?
After I run it, a new directory called  'any_di'  has been created separate from /media/12T/AB/Users/You/any_directory.

Some excerpts from cpac run --help:

usage: run.py [-h] [--pipeline_file PIPELINE_FILE] [--group_file GROUP_FILE]
              [--data_config_file DATA_CONFIG_FILE] [--preconfig PRECONFIG]
              [--aws_input_creds AWS_INPUT_CREDS]
              [--aws_output_creds AWS_OUTPUT_CREDS] [--n_cpus N_CPUS]
              [--mem_mb MEM_MB] [--mem_gb MEM_GB]
              [--num_ants_threads NUM_ANTS_THREADS]
              [--save_working_dir [SAVE_WORKING_DIR]] [--disable_file_logging]
              [--participant_label PARTICIPANT_LABEL [PARTICIPANT_LABEL ...]]
              [--participant_ndx PARTICIPANT_NDX] [--T1w_label T1W_LABEL]
              [--bold_label BOLD_LABEL [BOLD_LABEL ...]] [-v]
              [--bids_validator_config BIDS_VALIDATOR_CONFIG]
              [--skip_bids_validator] [--anat_only] [--tracking_opt-out]
              [--monitoring]
              bids_dir output_dir {participant,group,test_config,cli}

[…]
positional arguments:
  bids_dir              The directory with the input dataset formatted
                        according to the BIDS standard. Use the format
                        s3://bucket/path/to/bidsdir to read data directly from
                        an S3 bucket. This may require AWS S3 credentials
                        specified via the --aws_input_creds option.
  output_dir            The directory where the output files should be stored.
                        If you are running group level analysis this folder
                        should be prepopulated with the results of the
                        participant level analysis. Use the format
                        s3://bucket/path/to/bidsdir to write data directly to
                        an S3 bucket. This may require AWS S3 credentials
                        specified via the --aws_output_creds option.
  {participant,group,test_config,cli}
                        Level of the analysis that will be performed. Multiple
                        participant level analyses can be run independently
                        (in parallel) using the same output_dir. test_config
                        will run through the entire configuration process but
                        will not execute the pipeline.

C-PAC is a BIDS app, which requires 3 positional arguments:

"〉 bids_dir—(positional argument #1) the directory with the input dataset formatted according to the BIDS standard. This directory is read only.

〉 output_dir—(positional_argument #2) the directory where the output files should be stored. This is the only directory the pipeline should write to. Can be used to store intermediate files, but they should be removed after the pipeline finishes. This directory is shared across all of the participant level jobs—it’s up to the script to create subfolders for each subject.

〉  “participant”—(positional_argument #3) indicates that this is a participant level analysis."

So even though the data configuration file supersedes the positional bids_dir, that argument is still required.


"Note: we are still providing the postionally-required bids_dir input parameter. However C-PAC will not look for data in this directory when you provide a data configuration YAML with the --data_config_file flag. Providing . or $PWD will simply pass the present working directory. In addition, if the dataset in your data configuration file is not in BIDS format, just make sure to add the --skip_bids_validator flag at the end of your command to bypass the BIDS validation process."

2-1. After I get data_config file I run at home directory but I got error.
~$ cpac run /media/12T/ABIDE/any_dir /media/12T/ABIDE/case1_s3_default_output participant --data_config_file /media/12T/ABIDE/data_config_abide.yml
So I move to /media/12T/ABDIE directory (cd /media/12T/ABIDE/ ) and run this command line again.
/media/12T/ABIDE$ cpac run /media/12T/ABIDE/any_dir /media/12T/ABIDE/case1_s3_default_output participant --data_config_file /media/12T/ABIDE/data_config_abide.yml
The result was successful.
As I refered above, I'd like to know why the results of run at home directory or run at  /media/12T/ABDIE directory are different.

I switched the order of these last two questions because this one is partially answered in the previous answer:

"〉 bids_dir—(positional argument #1) the directory with the input dataset formatted according to the BIDS standard. This directory is read only." [emphasis added]

Running from your home directory with /media/12T/ABIDE/any_dir as the first positional argument performs this sequence:
  1. Bind /media/12T/ABIDE/any_dir to a read-only directory in your container at /media/12T/ABIDE/any_dir . Because none of that path exists yet in the container, the whole tree (/media/12T/ABIDE, /media/12T, and /media) is read-only.
  2. Bind /media/12T/ABIDE/case1_s3_default_output to a read-write directory in your container at /media/12T/ABIDE/case1_s3_default_output . Because /media/12T/ABIDE in the container is read-only, this bind fails
  3. Because Docker failed and the platform wasn't specified, cpac tries to do the same thing with Singularity instead of Docker ("By default, cpac (the wrapper) will try Docker first and fall back to Singularity if Docker fails. If both fail, an exception is raised."). Because this crash is not platform-specific, Singularity behaves the same as docker (binds read-only directory, fails to bind read-write subdirectory)
Running from /media/12T/ABIDE, however, performs this sequence:
  1. Bind /media/12T/ABIDE to /media/12T/ABIDE and to /tmp inside the container (both paths are read-write)
  2. Bind /media/12T/ABIDE/any_dir to a read-only directory in your container at /media/12T/ABIDE/any_dir . Because the parent directory already exists, the parent tree keeps its existing permissions.
  3. Bind /media/12T/ABIDE/case1_s3_default_output to a read-write directory in your container at /media/12T/ABIDE/case1_s3_default_output . Because /media/12T/ABIDE in the container is read-write, this bind succeeds
  4. [Continue processing]

If you want more control over the bindings, you can use a C-PAC Docker image without using the cpac wrapper. The positional arguments are still the same and still required, but you have to specify every binding yourself; none will bind automatically.

Please let us know if you need further clarification!

이지원

unread,
Jan 29, 2022, 6:26:28 AM1/29/22
to cpax_...@googlegroups.com
Hi Jon Clucas,
Thank you for your kindness reply.

I want to ask another questions after I understood about your reply.

1. After I ran command line like (cpac run, cpac utils) then Is container created in the current working directory? and binds the current working directory to the container at the same path?
If running from /media/12T/ABIDE,

sf.png  
Is container created in /media/12T/ABIDE?
Does Binding /media/12T/ABIDE to a directory in my container's /media/12T/ABIDE?

2. I didn't understand well about 
if no working directory is specified, the current working directory is assumed to be the desired working directory (hence binding to /tmp)
in your answer to question 1.  
/tmp directory is working directory in container?

3. output_dir—(positional_argument #2 The directory where the output files should be stored)
output_dir is read-write directory? 

4. when container created, all binding directories(like /media/12T/ABIDE, /tmp, /output, /log ) are read-write?

5. In your answer to question 2-1, I understood as follows. Is that right?
 Running from home directory failed binding. Because /media/12T/ABIDE in the container is read-only.
 Running from /media/12T/ABIDE successed binding. Becuase /media/12T/ABIDE in the container is read-write.

I look forward to your detailed explanation again.

Thank you,
LEE




2022년 1월 29일 (토) 오전 5:14, Jon Clucas, MIS <jon.c...@childmind.org>님이 작성:
--
You received this message because you are subscribed to the Google Groups "cpax_forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cpax_forum+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cpax_forum/cbee45c8-3dd9-4b05-b13f-d33232f72dd7n%40googlegroups.com.

이지원

unread,
Jan 29, 2022, 6:48:55 AM1/29/22
to cpax_forum
6.   when I  run cpac utils data_config build ~/abide_data_setting.yml from inside /media/12T/ABIDE  
then container can cognize /media, /media/12Tor /media/12T/ABIDE, /media/12T/ABIDE/subdirectory, ... ,?
I understand container can't cognize  outside of /media but how does the work like above environment?

Thank you,
LEE

2022년 1월 29일 토요일 오후 8시 26분 28초 UTC+9에 이지원님이 작성:

Jon Clucas, MIS

unread,
Feb 1, 2022, 12:11:45 PM2/1/22
to cpax_forum

1. “Containers isolate software from its environment;” the containers are created somewhere not intended for human use, that location depends on your container platform configuration, and you typically shouldn’t need to know or care where the container is created (e.g., if the container location is on a partition with inadequate space).

The root tree in the current latest C-PAC container image looks like

.
├── ants_template
├── bin
├── boot
├── code
├── cpac_resources
├── cpac_templates
├── dev
├── etc
├── home
├── lib
├── lib64
├── media
├── mnt
├── ndmg_atlases
├── opt
├── proc
├── root
├── run
├── sbin
├── srv
├── sys
├── tmp
├── @update.afni.binaries
├── usr
└── var

and /media is empty. cpac (the command line interface) tries to bind all the local paths you’ll need in your container, mostly at matching names (the path names don’t have to match). When a local path is bound to a path in a container, any files or subdirectories in that local path are also accessible at that path in the container and vice versa. Any changes locally are reflected in the container, and any changes in the container are reflected locally but only at bound paths. When you run cpac, your terminal will print a little table at the top that tells you which paths locally are bound to which paths in the container and what the access mode is for the binding.

For any unspecified paths that cpac thinks C-PAC might need, cpac assumes the current working directory, so in most cases the current working directory is bound at runtime by cpac in a C-PAC container to a matching path and to C-PAC's working directory.

2. The in-container working directory is defined in the pipeline configuration under pipeline_setup.working_directory.path. For the default configuration, this path is set to /tmp (and in most cases this path shouldn’t matter from a user's perspective).

3. According to the BIDS-app specification, output_dir (positional argument #2) “is the only directory the pipeline should write to,” so this directory has to be writable.

4. By default, bids_dir (positional argument #1) is read-only, output_dir (positional argument #2) and the logging and working directories are read-write, and any custom bindings that aren’t specified read-only are defaulted to read-write.

5. Yes.

Running from the home directory with /media/12T/ABIDE/any_dir as the read-only bids_dir makes /media/12T and its descendant directories read-only, preventing the output directory, a child of /media/12T, from being created. If the output directory (/media/12T/ABIDE/case1_s3_default_output) already existed, that run may have succeeded. I’m not sure if a child of a local directory bound read-only can be bound read-write ― the failure you reported was the attempt to create a read-write child in a read-only directory.

Running from /media/12T/ABIDE that directory is bound read-write to paths in the container that aren’t otherwise specified (itself as the current working directory and /tmp as the C-PAC working directory), so cpac is able to create a child directory within that path.

We (the development team) could resequence these bindings so that read-only directories are bound last, but I'm concerned that would make accidentally writing to a directory that should be read-only too easy.

6. cpac utils data_config build binds your current working directory to a matching path and to /output inside the container.

n3iBUmThAQAJ-6.png

(The table that prints to terminal will say

Loading 🐳 Docker
Loading 🐳 fcpindi/c-pac:latest with these directory bindings:
  local             Docker            mode
  ----------------  ----------------  ------
  /media/12T/ABIDE  /media/12T/ABIDE  rw
  /media/12T/ABIDE  /output           rw
Logging messages will refer to the Docker paths.

)

In your container your root file tree will include

/
├── media
│   └── 12T
│       └── ABIDE
│           └── « files and directories in local /media/12T/ABIDE »
└── output
    └── « files and directories in local /media/12T/ABIDE »

And any changes in your local /media/12T/ABIDE or your in-container /media/12T/ABIDE or your in-container /output will be reflected in all three locations. Any other paths in your local /media/12T or local /media will be unknown in your container environment.


Hopefully these answers help. Please ask more questions if you need more clarification.

Thanks,

Jon

이지원

unread,
Feb 8, 2022, 3:11:58 AM2/8/22
to cpax_forum
Hi Jon Clucas,

Now, I understood all.
Thank you for your friendly reply.

Thanks,

LEE
2022년 2월 2일 수요일 오전 2시 11분 45초 UTC+9에 Jon Clucas, MIS님이 작성:
Reply all
Reply to author
Forward
0 new messages