Stuck with a Globus compute component as part of a flow

3 views
Skip to first unread message

Anthony Weaver

unread,
Jan 22, 2026, 7:38:28 PMJan 22
to Discuss
I have some Python code (tested and working) that I call as part of a flow (also working).
The code itself produces a metadata ingest document and I am trying to programmatically perform an ingest.  I can't use the flow ingest action because my code returns too much data for that.  I'm using the following as my guides


Under settings, developer I created a client ID , but every time my code gets to the ingest step, it fails with errors that suggest it's trying to do CLI authorization.  Below is the last part of the error that leads me to believe this is what is going on
**********
File \"/opt/globus-compute-env/lib/python3.12/site-packages/globus_sdk/login_flows/command_line_login_flow_manager.py\", line 140, in prompt_for_code\n     with self._handle_input_errors():\n   File \"/usr/lib/python3.12/contextlib.py\", line 158, in __exit__\n     self.gen.throw(value)\n   File \"/opt/globus-compute-env/lib/python3.12/site-packages/globus_sdk/login_flows/command_line_login_flow_manager.py\", line 152, in _handle_input_errors\n     raise CommandLineLoginFlowEOFError(msg) from e\n globus_sdk.login_flows.command_line_login_flow_manager.CommandLineLoginFlowEOFError: An EOF was read when an authorization code was expected. (Are you running this in an interactive terminal?)\n\n",
******************

My relevant code is:

CLIENT_ID = "XXXXXXXXX"
APP = globus_sdk.UserApp("ingest-app", client_id=CLIENT_ID)
SEARCH_CLIENT = globus_sdk.SearchClient(
                    app=APP,
                    app_scopes=[globus_sdk.Scope(globus_sdk.SearchClient.scopes.all)])
INDEX_ID = "XXXXXXXXXX"
    
res = SEARCH_CLIENT.ingest(INDEX_ID, to_ingest)

Is this the right way to go about this?  Any idea where I'm going wrong?  Thank you for your help

Anthony Weaver

unread,
Jan 23, 2026, 9:45:25 AMJan 23
to Discuss, Anthony Weaver
I believe I have figured out the issue and have it  working.  Although I'm not sure if this is the best/most secure way to go about this.
An overview of what to get this working is:

1. I changed the APP from a UserApp to a ClientApp
2. I have the CLIENT_ID and CLIENT_SECRET set as environment variables in my endpoint and use os.environ.get() to retrieve those values
3. Make sure the CLIENT_ID being used has write permissions to the search index
4. Monitor the ingest task and return SUCCESS, FAIL, TIMEOUT depending on the situation

So now the relevant ingest portion of my compute function (which is run as part of a flow) looks like

  def prettyprint_json(obj, fp=None):
        if fp:
            return json.dump(obj, fp, indent=2, separators=(",", ": "), ensure_ascii=False)
   
        return json.dumps(obj, indent=2, separators=(",", ": "), ensure_ascii=False)

    # Write the ingest data just for backup
    with open(ingest_file, "w") as fp:
        prettyprint_json(
            {"ingest_type": "GMetaList", "ingest_data": {"gmeta": entries}}, fp)

    # Ingest the data
    tmp_ingest = prettyprint_json(
                {"ingest_type": "GMetaList", "ingest_data": {"gmeta": entries}})
    ingest_data = json.loads(tmp_ingest)

    CLIENT_ID = os.environ.get("GLOBUS_COMPUTE_CLIENT_ID")
    SECRET = os.environ.get("GLOBUS_COMPUTE_CLIENT_SECRET")
    INDEX_ID = "XXXXXXXXXXXXXXXXXXXXXXXXX"
    APP = globus_sdk.ClientApp("ingest-app", client_id=CLIENT_ID, client_secret=SECRET)

    SEARCH_CLIENT = globus_sdk.SearchClient(
                    app=APP,
                    app_scopes=[globus_sdk.Scope(globus_sdk.SearchClient.scopes.all)])
   
    ingest_res = SEARCH_CLIENT.ingest(INDEX_ID, ingest_data)
    task_id = ingest_res["task_id"]
    waited = 0
    max_wait = 1200 # 20 minutes
    while True:
        res = SEARCH_CLIENT.get_task(task_id)
        if res["state"] in ("SUCCESS", "FAILED"):
            return res["state"]
       
        # wait 1s and check for timeout
        waited += 1
        if waited >= max_wait:
            return "Ingest timed out"
        time.sleep(1)

Stephen Rosen

unread,
Jan 23, 2026, 11:55:37 AMJan 23
to dis...@globus.org
Hi Anthony,

I want to confirm for you -- and everyone else on the list -- that this approach is a good one overall.

Under a flow, Globus Compute can be used as glue, to bring in arbitrary functionality.
This is lighter weight than other mechanisms and easier to understand; generally I would recommend it when the available Action Providers in Flows can't meet your needs.
And that understand-ability improves your security posture, since it's much easier to secure things when you have a good grasp of their behavior.

When you use a ClientApp with client credentials, that's a "Client Identity" or "Service Account".
(The SDK doc is newer and more comprehensive. I recommend starting there.)

Putting use of a Service Account inside of a Compute function means that anyone with access to that function can run as the automated user in that limited context. Just make sure to restrict who can run the function and you effectively control that surface area.


You will need to adjust your usage of the SDK as you upgrade to version 4.
In particular, I noticed this:

    app_scopes=[globus_sdk.Scope(globus_sdk.SearchClient.scopes.all)]

this is correct in globus-sdk v3, but in v4 we finally fixed things so that the scopes accessor gives back Scope objects rather than strings. When you upgrade, simply write:

    app_scopes=[globus_sdk.SearchClient.scopes.all]


You linked the searchable-files-demo as a source. That's written using SDK v3. We should update it to use v4, to show the latest usage patterns. (v4 was released in October, so either not that long ago or very long ago, depending on who you ask. ;-)


Anyway, I hope this helps and gives you more confidence moving forward on this path!
Cheers,
-Stephen



-- 
Written by a human.
Reply all
Reply to author
Forward
0 new messages