FED3 Bandit Task Questions

50 views
Skip to first unread message

Taaseen Rahman

unread,
Apr 24, 2025, 12:25:31 AMApr 24
to FEDforum
Hi,

I am interested in running some experiments using the 2-armed bandit task with our FEDs and had a few questions. 

Training phase:
  • Ideally, how long should we do these experiments? Previously when I worked with some programs that I could run long term (e.g. Closed Economy), I would leave the FEDs in the cages, check daily and then collect the data after approximately a week or two. Would it be fairly similar with the bandit task? 
  • If I were to keep the FEDs in the cages for a long period (e.g. 1-2 weeks), and it would be their only source of food, would that be alright? That's how I did it for Closed Economy.
  • Do the mice need to be 'pre-trained' for the bandit task in any way?
Possible customizations:
  • Has anyone tried using alternative criterions to change the probability, e.g. instead of 30 using 60 pellets or more (or even less than 30)? Wondering which might be a good number to use depending on the length of my experiments. 
  • Speaking of pellet numbers, is there a reason why we change the probabilities by logging pellet numbers? I think 30 pellets is low enough for it to be fine, but how would timer probability switches work? The probability switching after every set time period?
  • I may be misunderstanding the code, but from what I gathered, "fed3.allowBlockRepeat = false" means that the probabilities aren't repeated between two blocks in a row. I've previously had some problems with a few of my FEDs which restart automatically for some reason (I couldn't tell why, I saw one happen in front of me as I stood next to it), and it has previously caused some minor problems to my left-right sequence experiments. That makes me a bit worried because if I were to use the bandit task code, and my FED were to restart, it may coincidentally grant a mouse identical probabilities for two consecutive blocks.
  • I also noticed the line of code regarding the timeout mechanics, and that it resets the timeout if the mouse pokes during it again. I think that's a good design choice to help identify persevering tendencies in mice, but I was wondering if this line of code could be a problem if a mouse continues to poke far too frequently for the timer to be reset for too long? Would a longer timeout with no resets work better by any chance (in which case, it may actually be better to log pokes during this phase even if not 'valid pokes').
Thanks again for the library update. Looking forward to the reply.

Sincerely,

Taaseen

Bridget Matikainen-Ankney

unread,
Apr 24, 2025, 9:35:36 AMApr 24
to FEDforum
Hi Taaseen! We've run the Bandit task successfully for 96 hours straight, checking mouse weight daily and that the devices are functioning, like you mentioned. We've run the task with a criteria of 30 successful trials to switch, all the way up to 80, and the mice perform well in that range. Pre-training I found was critical. At a bare minimum we train them with a dual-poke FR acquisition for 3 days. If you have time you can train them on a 100/0 deterministic reversal task for 4 days, which they learn quickly, before introducing them to the grittier probabilistic bandit task, though the deterministic training isn't necessary, it just makes them perform much better on the Bandit. Good luck!! 

zane andrews

unread,
Apr 27, 2025, 7:50:15 PMApr 27
to FEDforum
Thanks Bridget. 
In terms of pellets to switch, do you think a shorter criterion is better? If doing experiments for 4 days, have you seen that mice adapt to the bandit task better with a short criterion?

Cheers

ZAne

Alex Legaria

unread,
May 2, 2025, 1:01:01 PMMay 2
to FEDforum
Hello! 

I'm a postdoc in Lex's lab, and have been working closely in the development of this implementation of the bandit task using FED3. I hope I can help answer some of your questions. I am writing documentation for this task, which may also address some of your questions: 


Disclosure: The documentation is still in development. Some parts of it may be outdated or incomplete.

Regarding your questions,  Taaseen:

  • Ideally, how long should we do these experiments? Previously when I worked with some programs that I could run long term (e.g. Closed Economy), I would leave the FEDs in the cages, check daily and then collect the data after approximately a week or two. Would it be fairly similar with the bandit task? 
    • The length of the experiments really depend on your scientific question, I guess. But you can certainly leave the FEDs in the cage and just check daily that they are getting enough food and other metrics (like battery levels, that the hopper still has enough pellets for the next day, and that it isn't jammed). We have run the bandit task in a closed economy setting continuously for several months with no issues

  • If I were to keep the FEDs in the cages for a long period (e.g. 1-2 weeks), and it would be their only source of food, would that be alright? That's how I did it for Closed Economy.
    • Short answer Yes! But see above :)
  • Do the mice need to be 'pre-trained' for the bandit task in any way?
    • Not really! If I plan to do a more sophisticated version of the task (like more than two probabilities), then I train them on deterministic reversal first, which really is a special case of the bandit task, where the only probability options are 100 and 0. This seems to help them learn how to reverse, so they know that when one side of the pokes "stops working" they can test the other side. I usually train them on this for 1-2 weeks (max).
  • Has anyone tried using alternative criterions to change the probability, e.g. instead of 30 using 60 pellets or more (or even less than 30)? Wondering which might be a good number to use depending on the length of my experiments. 
    • We have tested different number of pellets for the criterion of probability changes. We've found that the best criterion is to do 20 or 30 pellets. 20 seems to work totally fine, since mice appear not to care about what happened 25 pellets ago, for instance. The issue with doing more pellets is that you get fewer probability changes, and usually we have observed the strongest phenotypes to be around the probability switches. If it is a really large number (let's say 100), mice might also go through extinction, thinking that it has stopped working completely. That said, we have not rigorously quantified this, but is more like anecdotal data.
  • Speaking of pellet numbers, is there a reason why we change the probabilities by logging pellet numbers? I think 30 pellets is low enough for it to be fine, but how would timer probability switches work? The probability switching after every set time period?
    • Great question! No strong rationale, really, it just seems to work well. For instance, I believe there are some groups that are using a slightly different implementation using FED where probabilities change slightly after every poke (restless bandit). In the documentation there is also a brief explanation of how one would go about changing the criterion for probability changes. The bandit code is written to, hopefully, make these kind of customizations easy.
  • I may be misunderstanding the code, but from what I gathered, "fed3.allowBlockRepeat = false" means that the probabilities aren't repeated between two blocks in a row. I've previously had some problems with a few of my FEDs which restart automatically for some reason (I couldn't tell why, I saw one happen in front of me as I stood next to it), and it has previously caused some minor problems to my left-right sequence experiments. That makes me a bit worried because if I were to use the bandit task code, and my FED were to restart, it may coincidentally grant a mouse identical probabilities for two consecutive blocks.
    • I see! That definitely could happen, although I only rarely see FEDs restarting spontaneously. I would say that, particularly if you run them for long periods of time, if it happens once I don't think it'd be that terrible, because the mice would get many blocks of different probabilities overall. If the FEDs are restarting very often, then there is probably something wrong with the device and that's a different story. But in general, I have had no issues with this. 
  • I also noticed the line of code regarding the timeout mechanics, and that it resets the timeout if the mouse pokes during it again. I think that's a good design choice to help identify persevering tendencies in mice, but I was wondering if this line of code could be a problem if a mouse continues to poke far too frequently for the timer to be reset for too long? Would a longer timeout with no resets work better by any chance (in which case, it may actually be better to log pokes during this phase even if not 'valid pokes').
    • Great point! I have seen it been an issue especially if mice sneak in bedding or nestlets in the poke, such that it keeps resetting over and over. That being said, regarding mouse behavior itself, some mice do persist quite a bit, but after a few minutes (at the longest) they stop. It is actually cool to see that throughout training, the number of pokes during time out (which are logged in the csv) is reduced. I've considered that a measure of learning, in a way. I've found that 10 seconds with resetting activated and with white noise tends to work best. We've tried other variations, such as 30 seconds without resetting, or 5 seconds after every poke, regardless of whether they get a reward or not, and I've seen no evidence that it improves their behavior or learning, if anything it can be slightly worse. But! I haven't done a rigorous analyses of different parameter settings and see how that affects learning/performance, but is a bit more of anecdotal observations. One more thing! You can set the "reset" parameter to true or false, so you can decide what works best for you.

Regarding your question Zane,

  • In terms of pellets to switch, do you think a shorter criterion is better? If doing experiments for 4 days, have you seen that mice adapt to the bandit task better with a short criterion?
    • I think I were to do my experiments again, I'd do 20 pellets, instead of 30. In general, I think there is no difference between the two, but with 20 you get more probability changes, which is usually where the learning phenotype is strongest. It is possible that even fewer pellets than that could work, but I'd be concerned that mice just start poking randomly assuming they will eventually get a pellet. Although the time out helps to avoid it, I've still seen many mice just "decide" that they don't want to learn the task and just start poking randomly. There are some big-scale (in terms of the number of mice) experiments going on right now that are using 20 pellets for criterion, and it seems to be going really well.

I hope this helps. Feel free to reach out if you have any further questions!

Best,

Alex Legaria

Taaseen Rahman

unread,
May 3, 2025, 3:57:31 AMMay 3
to Alex Legaria, FEDforum
Hi Alex,

Thank you so much for the elaborate reply. I think the 20 pellets setup sounds good to me. We'll definitely take into account the insight you've offered when designing the experiment. I think for now we'll probably use the 20 pellets to criterion setup, and maybe give them 3-4 days of deterministic training (either by setting the bandit probabilities to 100 and 0, or just doing FR1 and FR1Rev on alternate days, whichever seems convenient), and then do a bandit task for two weeks maybe. We might even tweak the code to focus on persevering tendencies if it seems interesting. Fingers crossed the FEDs don't restart, but yes, I've had some restart on me automatically when I go for long term studies (e.g. over a week with the FEDs being in the cage the entirety of the time). 

Sincerely,

Taaseen

--
You received this message because you are subscribed to the Google Groups "FEDforum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fedforum+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/fedforum/f9fd78f3-2fcd-462f-b923-1a18f673187en%40googlegroups.com.

Taaseen Rahman

unread,
May 5, 2025, 1:10:46 AMMay 5
to FEDforum
Hi,

I was just wondering about the deterministic pre-training. Is it better to run FR1->FR1Reverse->FR1->FR1Reverse for 4 days, or do you think I can just adjust the bandit code from "80,20" probabilities to "100,0" and leave it in for 4 days? I was curious because these two pre-training modes are slightly different since the former method involves the active poke changing every 24h, while the latter changes every time the mice reach pellet criterion.

Sincerely,

Taaseen

Lex

unread,
May 5, 2025, 4:39:53 PMMay 5
to Taaseen Rahman, FEDforum
Hi Tasseen,
I wouldn't do the FR1 and FR1Rev steps.  We train them on 100%-0% deterministic Bandit from the start and get great results. Here's an example of a naive mouse getting trained to very nice performance on the 100-0 Bandit task in 97 hours (4 days).  The gray line is the probability of a pellet on the Left poke, the red line is a smoothed representation of the mouse's choice over 97 hours. 

image.png

Have fun! -Lex





Taaseen Rahman

unread,
May 5, 2025, 7:14:37 PMMay 5
to Lex, FEDforum
Hi Lex,

That makes more sense. Thanks again for letting me know. I’ll do it this way.

Sincerely,

Taaseen
Reply all
Reply to author
Forward
0 new messages