firesim launchrunfarm not working

11 views
Skip to first unread message

Varun Gandhi

unread,
Nov 9, 2021, 6:06:09 PM11/9/21
to chip...@googlegroups.com

Hi,

I’m trying to launch an f1 instance using launchrunfarm, but I keep getting the following error (pasted below), i.e., lack of capacity in all availability zones. I have no other f1 instances running. Moreover, I was able to successfully create an f1 instance using the ec2 console, so it’s in all likelihood not due to my account running out of its allocation. Alternatively,  I also tried creating a manager instance in us-west-2 (assuming that this issue could potentially be limited to us-east-1), but I’m seeing the same error.

I was able to create f1 instances using my manager instance until last Saturday. Has there been an update to the codebase since that could have possibly caused this issue? I reported this to the firesim team as well on GitHub, but haven’t heard back yet. 

Here’s a copy of the error log:

—————————Error Log ———————————————————

2021-11-08 20:08:13,036 [main ] [INFO ] FireSim Manager. Docs: http://docs.fires.im
Running: launchrunfarm

2021-11-08 20:08:13,037 [init ] [DEBUG] {'hwconf_dict': {'firesim-boom-fast-small-quadcore-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d60863b0>,
'firesim-boom-large-quadcore-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d6086488>,
'firesim-boom-medium-singlecore-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d6086320>,
'firesim-boom-quadcore-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d60863f8>,
'firesim-boom-singlecore-large2-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d6086248>,
'firesim-boom-singlecore-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d6086200>,
'firesim-boom-singlecore-no-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d6086170>,
'firesim-boom-small-quadcore-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d60861b8>,
'firesim-rocket-quadcore-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d6086368>,
'firesim-rocket-quadcore-no-nic-l2-llc4mb-ddr3': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d6086440>,
'firesim-rocket-singlecore-no-nic-l2-llc4mb-ddr3-half-freq-uncore': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d60862d8>,
'firesim-supernode-rocket-singlecore-nic-l2-lbp': <runtools.runtime_config.RuntimeHWConfig instance at 0x7f70d6086290>}}
2021-11-08 20:08:13,048 [aws_resource] [DEBUG] i-0cec34b3b2eb6886e
2021-11-08 20:08:13,393 [aws_resource] [DEBUG] {'Name': 'ime-3'}
2021-11-08 20:08:13,394 [init ] [DEBUG] {'autocounter_readrate': 0,
'defaulthwconfig': 'firesim-boom-quadcore-nic-l2-llc4mb-ddr3',
'disable_asserts': False,
'f1_16xlarges_requested': 0,
'f1_2xlarges_requested': 0,
'f1_4xlarges_requested': 1,
'linklatency': 6405,
'm4_16xlarges_requested': 0,
'netbandwidth': 200,
'no_net_num_nodes': 2,
'print_cycle_prefix': True,
'print_end': '-1',
'print_start': '0',
'profileinterval': -1,
'run_instance_market': 'ondemand',
'runfarmtag': 'mainrunfarm',
'spot_interruption_behavior': 'terminate',
'spot_max_price': 'ondemand',
'suffixtag': '',
'switchinglatency': 10,
'terminateoncompletion': False,
'topology': 'example_2config',
'trace_enable': False,
'trace_end': '-1',
'trace_output_format': '0',
'trace_select': '1',
'trace_start': '0',
'workload_name': 'nfshield.json',
'zerooutdram': False}
2021-11-08 20:08:13,401 [get_deploytr] [DEBUG] Setting deploytriplet by querying the AGFI's description.
2021-11-08 20:08:13,401 [get_afi_for_] [DEBUG] agfi-00c595c4ae0d05459
2021-11-08 20:08:13,401 [get_afi_for_] [DEBUG] None
2021-11-08 20:08:17,616 [get_afi_for_] [DEBUG] {u'FpgaImages': [{u'UpdateTime': datetime.datetime(2021, 10, 1, 4, 21, 42, tzinfo=tzlocal()), u'Name': 'firesim-boom-quadcore-nic-l2-llc4mb-ddr3', u'Tags': [], u'PciId': {u'SubsystemVendorId': '0xfedd', u'VendorId': '0x1d0f', u'DeviceId': '0xf000', u'SubsystemId': '0x1d51'}, u'FpgaImageGlobalId': 'agfi-00c595c4ae0d05459', u'Public': False, u'State': {u'Code': 'available'}, u'ShellVersion': '0x04261818', u'OwnerId': '734394535448', u'FpgaImageId': 'afi-0f77fb25ada9a16dc', u'CreateTime': datetime.datetime(2021, 10, 1, 3, 6, 37, tzinfo=tzlocal()), u'Description': 'firesim-buildtriplet:FireSim-WithNIC_DDR3FRFCFSLLC4MB_WithDefaultFireSimBridges_WithFireSimConfigTweaks_chipyard.QuadMediumBoomConfig-F65MHz_BaseF1Config,firesim-deploytriplet:FireSim-WithNIC_DDR3FRFCFSLLC4MB_WithDefaultFireSimBridges_WithFireSimConfigTweaks_chipyard.QuadMediumBoomConfig-F65MHz_BaseF1Config,firesim-commit:72a52523e18443d9ef9a1e2664cbc71a45fc0c57-dirty'}], 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': '72d34184-4e35-489a-bb9f-29a03c9027ad', 'HTTPHeaders': {'x-amzn-requestid': '72d34184-4e35-489a-bb9f-29a03c9027ad', 'transfer-encoding': 'chunked', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'vary': 'accept-encoding', 'server': 'AmazonEC2', 'cache-control': 'no-cache, no-store', 'date': 'Mon, 08 Nov 2021 20:08:16 GMT', 'content-type': 'text/xml;charset=UTF-8'}}}
2021-11-08 20:08:17,894 [aws_resource] [DEBUG] i-0cec34b3b2eb6886e
2021-11-08 20:08:17,955 [aws_resource] [DEBUG] {'Name': 'ime-3'}
2021-11-08 20:08:21,868 [aws_resource] [DEBUG] i-0cec34b3b2eb6886e
2021-11-08 20:08:21,934 [aws_resource] [DEBUG] {'Name': 'ime-3'}
2021-11-08 20:08:31,054 [launch_insta] [INFO ] An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 4): We currently do not have sufficient f1.4xlarge capacity in the Availability Zone you requested (us-east-1e). Our system will be working on provisioning additional capacity. You can currently get f1.4xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1a, us-east-1b, us-east-1c, us-east-1d.
2021-11-08 20:08:31,054 [launch_insta] [INFO ] This probably means there was no more capacity in this availability zone. Try the next one.
2021-11-08 20:08:41,275 [launch_insta] [INFO ] An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 4): We currently do not have sufficient f1.4xlarge capacity in the Availability Zone you requested (us-east-1c). Our system will be working on provisioning additional capacity. You can currently get f1.4xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1a, us-east-1b, us-east-1d, us-east-1e.
2021-11-08 20:08:41,276 [launch_insta] [INFO ] This probably means there was no more capacity in this availability zone. Try the next one.
2021-11-08 20:08:51,745 [launch_insta] [INFO ] An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 4): We currently do not have sufficient f1.4xlarge capacity in the Availability Zone you requested (us-east-1a). Our system will be working on provisioning additional capacity. You can currently get f1.4xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1b, us-east-1c, us-east-1d, us-east-1e.
2021-11-08 20:08:51,745 [launch_insta] [INFO ] This probably means there was no more capacity in this availability zone. Try the next one.
2021-11-08 20:08:52,529 [launch_insta] [INFO ] An error occurred (Unsupported) when calling the RunInstances operation: Your requested instance type (f1.4xlarge) is not supported in your requested Availability Zone (us-east-1f). Please retry your request by not specifying an Availability Zone or choosing us-east-1a, us-east-1b, us-east-1c, us-east-1d, us-east-1e.
2021-11-08 20:08:52,529 [launch_insta] [INFO ] This probably means there was no more capacity in this availability zone. Try the next one.
2021-11-08 20:08:59,444 [launch_insta] [INFO ] An error occurred (InsufficientInstanceCapacity) when calling the RunInstances operation (reached max retries: 4): We currently do not have sufficient f1.4xlarge capacity in the Availability Zone you requested (us-east-1d). Our system will be working on provisioning additional capacity. You can currently get f1.4xlarge capacity by not specifying an Availability Zone in your request or choosing us-east-1a, us-east-1b, us-east-1c, us-east-1e.
2021-11-08 20:08:59,444 [launch_insta] [INFO ] This probably means there was no more capacity in this availability zone. Try the next one.
2021-11-08 20:09:13,154 [launch_insta] [INFO ] This probably means there was no more capacity in this availability zone. Try the next one.
2021-11-08 20:09:13,154 [launch_insta] [CRITI] we tried all subnets, but there was insufficient capacity to launch your instances
2021-11-08 20:09:13,154 [launch_insta] [CRITI] only the following 0 instances were launched
2021-11-08 20:09:13,154 [launch_insta] [CRITI] []
2021-11-08 20:09:13,163 [aws_resource] [DEBUG] i-0cec34b3b2eb6886e
2021-11-08 20:09:13,241 [aws_resource] [DEBUG] {'Name': 'ime-3'}
2021-11-08 20:09:14,447 [aws_resource] [DEBUG] i-0cec34b3b2eb6886e
2021-11-08 20:09:14,521 [aws_resource] [DEBUG] {'Name': 'ime-3'}
2021-11-08 20:09:15,570 [wait_on_inst] [INFO ] Waiting for instance boots: 0 f1.16xlarges
2021-11-08 20:09:15,571 [ ] [ERROR] Fatal error.
Traceback (most recent call last):
File "/home/centos/chipyard/sims/firesim/deploy/firesim", line 334, in
main(args)
File "/home/centos/chipyard/sims/firesim/deploy/firesim", line 282, in main
globals()args.task
File "/home/centos/chipyard/sims/firesim/deploy/firesim", line 166, in launchrunfarm
runtime_conf.runfarm.launch_run_farm()
File "/home/centos/chipyard/sims/firesim/deploy/runtools/run_farm.py", line 272, in launch_run_farm
wait_on_instance_launches(f1_4s, 'f1.4xlarges')
File "/home/centos/chipyard/sims/firesim/deploy/awstools/awstools.py", line 282, in wait_on_instance_launches
rootLogger.info("Waiting for instance boots: " + str(len(instances)) + " " + message)
TypeError: object of type 'NoneType' has no len()
————————————————————————————————————————————————
Reply all
Reply to author
Forward
0 new messages