Multi-site experiments fail to deploy with "stitcher" error

81 views
Skip to first unread message

Edward Tremel

unread,
Aug 7, 2024, 2:35:03 PM8/7/24
to cloudlab-users
Hi CloudLab admins,

I would like to create an experiment that allows nodes from two different sites to communicate with each other (without using the "control" network that is connected to the public Internet), and I'm starting by trying to deploy the built-in "multisite-lan" experiment profile. However, every time I try to start this experiment, using the profile's default settings (2 nodes in each site), it fails with the error message "FABRIC stitcher failure."

Here is a link to my most recent failed experiment status page: https://www.cloudlab.us/status.php?uuid=d85e1eda-54e7-11ef-a601-e4434b2381fc

As you can see in the portal log file linked from that page (https://www.cloudlab.us/spewlogfile.php?logfile=120f3bf254bb858640cccf642b5e616254c9db5c), the source of the FABRIC stitcher failure seems to be the Python exception "Invalid Key." Does this mean a CloudLab server somewhere is missing the SSH key it needs to deploy this experiment? Or did I somehow do something wrong in configuring and setting it up? I'd appreciate any help you can provide.

Thanks,

Edward

Leigh Stoller

unread,
Aug 7, 2024, 2:47:40 PM8/7/24
to cloudlab-users

> I would like to create an experiment that allows nodes from two different sites to communicate with each other (without using the "control" network that is connected to the public Internet), and I'm starting by trying to deploy the built-in "multisite-lan" experiment profile. However, every time I try to start this experiment, using the profile's default settings (2 nodes in each site), it fails with the error message "FABRIC stitcher failure."
>
> Here is a link to my most recent failed experiment status page: https://www.cloudlab.us/status.php?uuid=d85e1eda-54e7-11ef-a601-e4434b2381fc.

Hi. I am looking at this now, please do not create any more experiments,
they will continue to fail. FABRIC just released a new version of their
software, our code might have to change.

You can leave your current experiment as is while I continue to look.

Thanks
Leigh


Leigh Stoller

unread,
Aug 7, 2024, 3:10:45 PM8/7/24
to cloudlab-users

>> I would like to create an experiment that allows nodes from two different sites to communicate with each other (without using the "control" network that is connected to the public Internet), and I'm starting by trying to deploy the built-in "multisite-lan" experiment profile. However, every time I try to start this experiment, using the profile's default settings (2 nodes in each site), it fails with the error message "FABRIC stitcher failure."
>>
>> Here is a link to my most recent failed experiment status page: https://www.cloudlab.us/status.php?uuid=d85e1eda-54e7-11ef-a601-e4434b2381fc.
>
> Hi. I am looking at this now, please do not create any more experiments,
> they will continue to fail. FABRIC just released a new version of their
> software, our code might have to change.

Just a followup. I posted a message over at the FABRIC forums, I will
let you know when things are working again.

Leigh

Leigh Stoller

unread,
Aug 7, 2024, 4:55:33 PM8/7/24
to cloudlab-users

> Hi. I am looking at this now, please do not create any more experiments,
> they will continue to fail. FABRIC just released a new version of their
> software, our code might have to change.

OK, FABRIC stitching is operational again!

Leigh


Edward Tremel

unread,
Aug 8, 2024, 12:08:08 PM8/8/24
to cloudlab-users
Hi Leigh,

Thanks for responding and fixing this so promptly! I am now able to launch multi-site experiments successfully.

Best regards,

Edward

Jinghan Sun

unread,
Sep 11, 2024, 10:51:53 PM9/11/24
to cloudlab-users
Hi Leigh & CloudLab admins,

I may run into a similar stitcher problem when I launch my multisite experiments. The error message is: "RunStitcher error: Provision failed on urn:publicid:IDN+emulab.net+authority+cm: No slice here that matches the provided credentials". I would appreciate any help!
 

Leigh Stoller

unread,
Sep 12, 2024, 9:54:51 AM9/12/24
to cloudlab-users

> Hi Leigh & CloudLab admins,
>
> I may run into a similar stitcher problem when I launch my multisite experiments. The error message is: "RunStitcher error: Provision failed on urn:publicid:IDN+emulab.net+authority+cm: No slice here that matches the provided credentials". I would appreciate any help!

Hi. Thanks for letting us know, we will look at this today.

Leigh


Leigh Stoller

unread,
Sep 12, 2024, 5:57:25 PM9/12/24
to cloudlab-users

>> Hi Leigh & CloudLab admins,
>>
>> I may run into a similar stitcher problem when I launch my multisite experiments. The error message is: "RunStitcher error: Provision failed on urn:publicid:IDN+emulab.net+authority+cm: No slice here that matches the provided credentials". I would appreciate any help!
>
> Hi. Thanks for letting us know, we will look at this today.

Hi. This should be working again.

Thanks
Leigh

Jinghan Sun

unread,
Sep 12, 2024, 6:17:15 PM9/12/24
to cloudlab-users
Hi Leigh,

Thanks very much for your help! I can successfully resume my experiments.

Best,
Jinghan

Reply all
Reply to author
Forward
0 new messages