I have some partial answers to some of your questions. This is a very long reply!
1. Q: Is creating a tag an expensive operation?
1. A: Yes, but only the first one to a given PLC. Once you have a tag open any more tags are much cheaper.
2. Q: What if I always have a tag created (and never destroyed) on a specific
connection_group_id, would creating another tag on the same connection
group still be an expensive operation?
2. A: No, as noted above. Just do not close them all. That will close the whole connection to the PLC.
3. Q: Does libplctag ever open multiple connections with the PLC?
3. Q: Not unless you tell it to by using the connection group IDs. Those are supported to give the user control over how many connections you have to a PLC. I am considering opening a separate connection for reading tag metadata as that takes a lot of buffer space if there are many tags and can starve normal tag reads and writes.
4. Q: Tradeoffs of creating multiple connections?
4. A: Yes there are tradeoffs. Exactly what they are is less clear. Long, long rambling discussion below.
Most PLCs have very limited network CPU capacity. It is not so much a problem of the bandwidth. CIP-based PLCs need all kinds of internal resources and buffers and those are quite limited. Older ones, and CompactLogix in particular, have fairly hard and low limits on the total number of connections. I think the low end is around 32 connections. Newer ones are closer to 128? Maybe?
The primary limits seem to be:
a) the CPU load. With a pathetic ENBT, it is really easy to hit 100% CPU and that really hurts everything else. I have had to add network modules just for the HMI so that the control network flow was not impacted. This is common.
b) the connection resources. Each Ethernet/IP (CIP) connection uses some amount of internal buffer space and other memory. This is both small and limited. For Rockwell PLCs, the largest buffer is apparently 64k on a Micro8x0 series. But those cannot do packed requests! A ControlLogix with the latest firmware and network module will negotiate a 4k buffer per connection to the PLC. Otherwise you get about 500 bytes. Some OMRON PLCs will negotiate an 8k buffer per connection. Apart from the buffers, the total number of connections is quite limited compared to any PC or even most embedded boards these days. Think about the mismatch. A PLC with 32 connections and a ~500 byte buffer size will have a _total_ of 16kB of buffer. A standard desktop PC or laptop these days, that you can get for $500, has 8GB. Suppose you only allocated 8MB in the PC. That is a ratio of 500.
c) network bandwidth does not really seem to be a problem but if you have a very limited network, perhaps it will be a limit. Long distance wireless or cell network might be an issue. That said, an independent network module in a ControlLogix has to use the backplane to transmit requests and response to the CPU module. As far as I know that backplane has limited bandwidth. I do not know the limit, but it is low enough that Rockwell could not make a network module that supported a 1Gbps connection and actually use that network bandwidth. I was told this by someone familiar with the issue but only under anonymity. The L80 CPUs with the built in port have the network hardware in the same module as the CPU and it does not use the backplane. This is all based on rumor and guessing, so take all these limits with a large bag of salt.
d) latency. Latency kills performance. A completely unloaded 1756-L80, using the built in 1Gbps network port, still takes just under a millisecond to respond to a request. Packed requests absolutely are the best for performance here. The CIP protocol for explicit messaging not a streaming protocol. Everything is request/response. Implicit messaging as used in produced/consumed tags and IO is a different story and you can get some high bandwidth usage with those. The library does not support those yet.
So, is it worth having multiple connections? There are some situations where it might make sense.
If you need to sling a fair amount of data around, then multiple connections will give you more buffer space. Watch the CPU load though. Hundreds of small tags hit the CPU harder than a few much larger tags. As far as I can tell, the CIP protocol seems to be relatively inefficient/expensive to decode/encode.
If you have a small number of tags and need to hit each one very, very often (i.e. less than 10ms RPI) then it might make sense to have multiple connections, but again, watch the CPU load!
I am working on some tests using a local PLC simulator (it just simulates tags) to make sure that libplctag can handle at least 100k tags without significant performance degradation due to the library itself (the PLC will be hammered in a real scenario).
As always, "Your Mileage May Vary." Test different scenarios that are similar to your application to see if it helps. If all your tags are small and you have fewer than 100 or 200, then a single connection may be the fastest and use the least resources.
Sorry for the wall of text response, but there are so many variables that it is not possible to give a simple yes/no answer.
Best,
Kyle