Jira (BOLT-1468) Add 'check-node-connections' endpoint in bolt-server to support wait_until_available

5 views
Skip to first unread message

Sean McDonald (JIRA)

unread,
Aug 27, 2019, 12:34:02 PM8/27/19
to puppe...@googlegroups.com
Sean McDonald updated an issue
 
Puppet Task Runner / Task BOLT-1468
Add 'check-node-connections' endpoint in bolt-server to support wait_until_available
Change By: Sean McDonald
Summary: implement wait_until_available Add 'check-node-connections' endpoint in bolt-server to support wait_until_available
Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Sean McDonald (JIRA)

unread,
Aug 27, 2019, 12:34:03 PM8/27/19
to puppe...@googlegroups.com
Sean McDonald updated an issue
*Background*

When told to run a task on some nodes, a PE master typically contacts the nodes over the PCP protocol. It sends a formatted request over PCP to the pxp-agent service running on the nodes, directing them to run the task locally. But if the nodes don't have a pxp-agent service running, the PE master must contact them via SSH or WinRM instead. For these cases, the PE master runs a "pe-bolt-server" service, which is a sinatra application that waits for post requests to /ssh/run_task or /winrm/run_task, then runs the task via an instance of the bolt executor, just like bolt does when you run a task via the CLI. Basically, it's a thin REST API wrapper around normal bolt "task run" operations that the PE master can use when there's no PXP agent to talk to over PCP.

In PE Kearney, "task run" is the only supported action over PCP, so it follows that the only endpoints in the bolt-server REST API are for "task run". For other typical bolt actions ("command run", "file upload", "script run"), PCP operations are handled by wrapping any non-task action in an ephemeral task in order to use the "task run" endpoints.

PCP-868 and ORCH-2321 describe describes process of enabling "wait until available" without a task wrapper over PCP on the agent and server. This ticket describes the new endpoint on bolt-server that will support "wait until available" when there is no PCP transport available.

**NOTE** the implementation of connection checks for wait until available is different from other bolt actions! Due to the nature of wait_until_available the bolt-server endpoint will *_not_* perform the same operation as an actual "wait_until_available" call. The differences are as follows:
# Orchestrator will be performing the actual process of 'waiting' for nodes to be connected, so the endpoint in bolt-server *_should not wait_*. The bolt-server endpoint should check for the status of all node connections and then return immediately.
# Node checks from the orchestrator are performed in batches, not one target at a time. The schema and implementation for checking connections in bolt server should expect an array of nodes to check.

*Requirements*

Changes will be made to the [bolt server app|https://github.com/puppetlabs/bolt/tree/master/lib/bolt_server]:

* The app follows the [json schema specification|https://json-schema.org/specification.html]. Add a description of the JSON schema for data passed to the new endpoints [here|https://github.com/puppetlabs/bolt/tree/master/lib/bolt_server/schemas]. This schema should match the one defined for the "check connected nodes" action in PCP-868.

* Add new POST endpoints in the transport sinatra app [here|https://github.com/puppetlabs/bolt/blob/master/lib/bolt_server/transport_app.rb] to support checking node connections via SSH and WinRM, using the new JSON schema.
* Document the new endpoints in the [developer-docs|https://github.com/puppetlabs/bolt/blob/master/developer-docs/].

*Testing*

* Write RSpec tests for the changes to bolt [here|https://github.com/puppetlabs/bolt/tree/master/spec/bolt_server].
* Acceptance/integration tests on the orchestrator side exist for this code [here|https://github.com/puppetlabs/orchestrator/blob/lovejoy/test/puppetlabs/orchestrator/integration/bolt_server.clj], but updating them is *out of scope* for this ticket. *Do* run a quick manual test that the API endpoints work as expected on a running PE master, but making changes orchestrator to make use of the endpoints should be done as part of an ORCH ticket.

Sean McDonald (JIRA)

unread,
Aug 27, 2019, 12:36:03 PM8/27/19
to puppe...@googlegroups.com
Sean McDonald updated an issue
*Background*

When told to run a task on some nodes, a PE master typically contacts the nodes over the PCP protocol. It sends a formatted request over PCP to the pxp-agent service running on the nodes, directing them to run the task locally. But if the nodes don't have a pxp-agent service running, the PE master must contact them via SSH or WinRM instead. For these cases, the PE master runs a "pe-bolt-server" service, which is a sinatra application that waits for post requests to /ssh/run_task or /winrm/run_task, then runs the task via an instance of the bolt executor, just like bolt does when you run a task via the CLI. Basically, it's a thin REST API wrapper around normal bolt "task run" operations that the PE master can use when there's no PXP agent to talk to over PCP.

In PE Kearney, "task run" is the only supported action over PCP, so it follows that the only endpoints in the bolt-server REST API are for "task run". For other typical bolt actions ("command run", "file upload", "script run"), PCP operations are handled by wrapping any non-task action in an ephemeral task in order to use the "task run" endpoints.

ORCH-2321 describes process of enabling "wait until available" without a task wrapper over PCP on the agent and server. This ticket describes the new endpoint on bolt-server that will support "wait until available" when there is no PCP transport available.


**NOTE** the implementation of connection checks for wait until available is different from other bolt actions! Due to the nature of wait_until_available the bolt-server endpoint will *_not_* perform the same operation as an actual "wait_until_available" call. The differences are as follows:
# Orchestrator will be performing the actual process of 'waiting' for nodes to be connected, so the endpoint in bolt-server *_should not wait_*. The bolt-server endpoint should check for the status of
all node connections and then return immediately.

# Node checks from the orchestrator are performed in batches, not one target at a time. The schema and implementation for checking connections in bolt server should expect an array of nodes to check.

*Requirements*

Changes will be made to the [bolt server app|https://github.com/puppetlabs/bolt/tree/master/lib/bolt_server]:

* The app follows the [json schema specification|https://json-schema.org/specification.html]. Add a description of the JSON schema for data passed to the new endpoints [here|https://github.com/puppetlabs/bolt/tree/master/lib/bolt_server/schemas]. This schema should match the one defined for the "check connected nodes" action in PCP-868.

* Add new POST endpoints in the transport sinatra app [here|https://github.com/puppetlabs/bolt/blob/master/lib/bolt_server/transport_app.rb] to support checking node connections via SSH and WinRM, using the new JSON schema.
* Document the new endpoints in the [developer-docs|https://github.com/puppetlabs/bolt/blob/master/developer-docs/].

*Testing*

* Write RSpec tests for the changes to bolt [here|https://github.com/puppetlabs/bolt/tree/master/spec/bolt_server].
* Acceptance/integration tests on the orchestrator side exist for this code [here|https://github.com/puppetlabs/orchestrator/blob/lovejoy/test/puppetlabs/orchestrator/integration/bolt_server.clj], but updating them is *out of scope* for this ticket. *Do* run a quick manual test that the API endpoints work as expected on a running PE master, but making changes orchestrator to make use of the endpoints should be done as part of an ORCH ticket.
Reply all
Reply to author
Forward
0 new messages