Thank you, I should definitely cite that. Is the agent in your paper the one that I compared to in section 4.4.6 ?
It is difficult to say for sure since you don't introduce the notion of "compassion" formally, but I think that this "Consider a machine that can delude an agent into believing that everyone else feels great while in reality making everyone else feel miserable. Agent-2 would choose to utilize this horrible machine whereas Agent-1 would choose to avoid utilizing this machine" might not true for our agent. The agent tries to recover "true" states of the environment and to learn to assign values to them. If it knows that this will be a delusion, it will not assign any positive value to it, and will not to choose to be deluded since it will prevent the agent to optimize such values.