Thanks Deanna,
I actually figured out what the problem was.
The function factored_joint_mvn in tensorflow_probability/python/sts/internal/util.py uses LinearOperatorBlockDiag, without passing a name argument.
This in turn, if no name is provided, simply concatenates all the names of all input operators.
```
if name is None:
# Using ds to mean direct sum.
name = "_ds_".join(
operator.name for operator in operators
```
Because we have a rather large state space model, this was resulting in names in the graph which were enormous (literally 10s of 1000s of characters). This across a large graph meant ridiculous protobuf messages.
This was the only thing sending the protobuf message lengths sky high. Using our own internal equivalent and assigning a name attribute reduced the protobuf message sizes to normal levels and reduced the memory of the process when running by several GB.
Hope that's helpful to someone else who hits the same problem one day!