How to read the .nal example files: from top to bottom, a .nal file can be considered as a stream of stimulus events. An belief event, "stimulus": stimulus. :|: A goal event which is wanted to happen by NARS / leading to reward for Reinforcement Learner: stimulus! :|: A belief event indicating that a certain action was taken: ^action. :|: Comments: //comment Disabling the trying of actions that are not expected to be the best / (also disables curiosity for ONA) else only multiple repeats of the experiment would be representative: *motorbabbling=0.0 Defines the actions used in the example: *setopname 1 ^eat *setopname 2 ^leave Hamper displaying of reasoning results to keep the output clean (not necessary for the Q-Learner obviously): *volume=0 Additionally, the 2 numbers in curly braces: For NARS it's the 2-valued truth and desire value {frequency, confidence} where the extreme values frequency 1.0/0.0 mean "true/false" for belief events, "desired/undesired" for goal events. Consistently, frequency is positive evidence over total evidence w+/w, and confidence is total evidence mapped to a value between 0 and 1: w/(w+1) (see the Non-Axiomatic Logic book to learn more about this). For the Q-Learner it's the reward value: {reward, 1.0} (the second number is ignored) Additionally: How much a Q-Learner prefers long-term rewards over near-term ones is controlled via the parameter Gamma (higher gamma makes the agent more patient), and of course dependent on the magnitude of the reward values, plus the eligibility trace decay Lambda (higher value, less decay): https://github.com/opennars/OpenNARS-for-Applications/blob/QLearner/src/QLearner.c#L6 For ONA it's mostly the desire values and truth values of the beliefs it forms (*volume=100 to show), plus the temporal projection decay parameter: https://github.com/opennars/OpenNARS-for-Applications/blob/master/src/Config.h#L126