[1] |
Collins A G E, Cockburn J. Beyond dichotomies in reinforcement learning. Nature Reviews Neuroscience, 2020, 21 (10): 576–586. doi: 10.1038/s41583-020-0355-6
|
[2] |
Kool W, Gershman S J, Cushman F A. Planning complexity registers as a cost in metacontrol. Journal of Cognitive Neuroscience, 2018, 30 (10): 1391–1404. doi: 10.1162/jocn_a_01263
|
[3] |
Gilovich T, Griffin D, Kahneman D. Heuristics and Biases: The Psychology of Intuitive Judgment. Cambridge: Cambridge University Press, 2002 .
|
[4] |
Kool W, Gershman S J, Cushman F A. Cost-benefit arbitration between multiple reinforcement-learning systems. Psychological Science, 2017, 28 (9): 1321–1333. doi: 10.1177/0956797617708288
|
[5] |
Kool W, Cushman F A, Gershman S J. Competition and cooperation between multiple reinforcement learning systems. In: Morris R, Bornstein A, Shenhav A, editors. Goal-Directed Decision Making. New York: Academic Press, 2018 : 153–178.
|
[6] |
Bolenz F, Kool W, Reiter A M, et al. Metacontrol of decision-making strategies in human aging. eLife, 2019, 8: e49154. doi: 10.7554/eLife.49154
|
[7] |
Gläscher J, Daw N, Dayan P, et al. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 2010, 66 (4): 585–595. doi: 10.1016/j.neuron.2010.04.016
|
[8] |
Kool W, Botvinick M. Mental labour. Nature Human Behaviour, 2018, 2 (12): 899–908. doi: 10.1038/s41562-018-0401-9
|
[9] |
Smid C R, Ganesan K, Thompson A, et al. Neurocognitive basis of model-based decision making and its metacontrol in childhood. Developmental Cognitive Neuroscience, 2023, 62: 101269. doi: 10.1016/j.dcn.2023.101269
|
[10] |
Hämmerer D, Schwartenbeck P, Gallagher M, et al. Older adults fail to form stable task representations during model-based reversal inference. Neurobiology of Aging, 2019, 74: 90–100. doi: 10.1016/j.neurobiolaging.2018.10.009
|
[11] |
Eppinger B, Heekeren H R, Li S C. Age-related prefrontal impairments implicate deficient prediction of future reward in older adults. Neurobiology of Aging, 2015, 36 (8): 2380–2390. doi: 10.1016/j.neurobiolaging.2015.04.010
|
[12] |
Ruel A, Bolenz F, Li S C, et al. Neural evidence for age-related deficits in the representation of state spaces. Cerebral Cortex, 2023, 33 (5): 1768–1781. doi: 10.1093/cercor/bhac171
|
[13] |
Vink M, Kleerekooper I, van den Wildenberg W P M, et al. Impact of aging on frontostriatal reward processing. Human Brain Mapping, 2015, 36 (6): 2305–2317. doi: 10.1002/hbm.22771
|
[14] |
Spaniol J, Bowen H J, Wegier P, et al. Neural responses to monetary incentives in younger and older adults. Brain Research, 2015, 1612: 70–82. doi: 10.1016/j.brainres.2014.09.063
|
[15] |
Hird E J, Beierholm U, De Boer L, et al. Dopamine and reward-related vigor in younger and older adults. Neurobiology of Aging, 2022, 118: 34–43. doi: 10.1016/j.neurobiolaging.2022.06.003
|
[16] |
da Silva Castanheira K, LoParco S, Otto A R. Task-evoked pupillary responses track effort exertion: Evidence from task-switching. Cognitive, Affective, & Behavioral Neuroscience, 2021, 21 (3): 592–606. doi: 10.3758/s13415-020-00843-z
|
[17] |
Rondeel E W M, van Steenbergen H, Holland R W, et al. A closer look at cognitive control: differences in resource allocation during updating, inhibition and switching as revealed by pupillometry. Frontiers in Human Neuroscience, 2015, 9: 494. doi: 10.3389/fnhum.2015.00494
|
[18] |
Feher da Silva C, Hare T A. Humans primarily use model-based inference in the two-stage task. Nature Human Behaviour, 2020, 4 (10): 1053–1066. doi: 10.1038/s41562-020-0905-y
|
[19] |
Zandi B, Lode M, Herzog A, et al. PupilEXT: flexible open-source platform for high-resolution pupillometry in vision research. Frontiers in Neuroscience, 2021, 15: 676220. doi: 10.3389/fnins.2021.676220
|
[20] |
Santini T, Fuhl W, Kasneci E. PuRe: Robust pupil detection for real-time pervasive eye tracking. Computer Vision and Image Understanding, 2018, 170: 40–50. doi: 10.1016/j.cviu.2018.02.002
|
[21] |
Kool W, Cushman F A, Gershman S J. When does model-based control pay off. PLoS Computational Biology, 2016, 12 (8): e1005090. doi: 10.1371/journal.pcbi.1005090
|
[22] |
de Leeuw J R. jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods, 2015, 47: 1–12. doi: 10.3758/s13428-014-0458-y
|
[23] |
Daw N D, Gershman S J, Seymour B, et al. Model-based influences on humans’ choices and striatal prediction errors. Neuron, 2011, 69 (6): 1204–1215. doi: 10.1016/j.neuron.2011.02.027
|
[24] |
Rummery G A, Niranjan M. On-line Q-learning using connectionist systems. Cambridge: University of Cambridge, 1994 .
|
[25] |
Bürkner P C. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 2017, 80 (1): 1–28. doi: 10.18637/jss.v080.i01
|
[26] |
Shenhav A, Botvinick M M, Cohen J D. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 2013, 79 (2): 217–240. doi: 10.1016/j.neuron.2013.07.007
|
[27] |
Bolenz F, Profitt M F, Stechbarth F, et al. Need for cognition does not account for individual differences in metacontrol of decision making. Scientific Reports, 2022, 12 (1): 8240. doi: 10.1038/s41598-022-12341-y
|
[28] |
Castro-Rodrigues P, Akam T, Snorasson I, et al. Explicit knowledge of task structure is a primary determinant of human model-based action. Nature Human Behaviour, 2022, 6 (8): 1126–1141. doi: 10.1038/s41562-022-01346-2
|
[29] |
Eppinger B, Walter M, Heekeren H R, et al. Of goals and habits: age-related and individual differences in goal-directed decision-making. Frontiers in Neuroscience, 2013, 7: 253. doi: 10.3389/fnins.2013.00253
|
[30] |
Walsh M M, Anderson J R. Navigating complex decision spaces: Problems and paradigms in sequential choice. Psychological Bulletin, 2014, 140 (2): 466–486. doi: 10.1037/a0033455
|
[31] |
Jablonska K, Stanczyk M, Piotrowska M, et al. Age as a moderator of the relationship between planning and temporal information processing. Scientific Reports, 2022, 12 (1): 1548. doi: 10.1038/s41598-022-05316-6
|
[32] |
de Wit S, van de Vijver I, Ridderinkhof K R. Impaired acquisition of goal-directed action in healthy aging. Cognitive, Affective, & Behavioral Neuroscience, 2014, 14 (2): 647–658. doi: 10.3758/s13415-014-0288-5
|
[33] |
Patzelt E H, Kool W, Millner A J, et al. Incentives boost model-based control across a range of severity on several psychiatric constructs. Biological Psychiatry, 2019, 85 (5): 425–433. doi: 10.1016/j.biopsych.2018.06.018
|
Figure 1. Behavioral task. (a) State transition structure of the task. Each trial starts with a random first-stage state. Given the transition structure, each first-stage choice leads deterministically to the second-stage state. Each second-stage choice is associated with a drifting scalar reward. (b) The stake manipulation (top). One of the high-stake or low-stake tips is randomly presented at the beginning of the trial, which means that the actual benefit of the trial is several times the score shown in the feedback. Transition manipulation (bottom). The task transition did not change in stable-transition blocks, and the task transition structure changed irregularly in variable-transition blocks.
Figure 3. Time series depicting the average pupil diameter of low-stake and high-stake trials (a, d) over the course of trials. Solid lines indicate the mean pupil diameter (baseline-corrected). Shaded areas indicate standard errors (SEs) of pupil diameter (baseline-corrected). Red lines on the top indicate time points of a reliable [p <0.05] positive effect for younger and older adults. Reaction time in stage 1 (b, e) and stage 2 (c, f). The error bar represents the standard error of the mean. * indicates p <0.05, ** indicates p <0.01, and *** indicates p <0.001.
Figure 5. Time series depicting the average pupil diameter of stable-transition and variable-transition trials (a, d) over the course of trials. Solid lines indicate the mean pupil diameter. Shaded areas indicate SEs of pupil diameter (baseline-corrected). Red lines on the top indicate time points of a reliable [p < 0.05] positive effect for younger and older adults. Reaction time in stage 1 (b, e) and stage 2 (c, f). The error bar represents the standard error of the mean. *** indicates p <0.001 and NS = not significant.
Figure 6. The metacontrol effect after grouping. Logistic regression weights show the influence of stakes, transitions and their interaction effect on the model-based weights in older adults. Older adults were grouped according to subjective reports of different levels of structural update difficulty. The vertical line represents the 95% confidence interval, and the dots represent the mean.
[1] |
Collins A G E, Cockburn J. Beyond dichotomies in reinforcement learning. Nature Reviews Neuroscience, 2020, 21 (10): 576–586. doi: 10.1038/s41583-020-0355-6
|
[2] |
Kool W, Gershman S J, Cushman F A. Planning complexity registers as a cost in metacontrol. Journal of Cognitive Neuroscience, 2018, 30 (10): 1391–1404. doi: 10.1162/jocn_a_01263
|
[3] |
Gilovich T, Griffin D, Kahneman D. Heuristics and Biases: The Psychology of Intuitive Judgment. Cambridge: Cambridge University Press, 2002 .
|
[4] |
Kool W, Gershman S J, Cushman F A. Cost-benefit arbitration between multiple reinforcement-learning systems. Psychological Science, 2017, 28 (9): 1321–1333. doi: 10.1177/0956797617708288
|
[5] |
Kool W, Cushman F A, Gershman S J. Competition and cooperation between multiple reinforcement learning systems. In: Morris R, Bornstein A, Shenhav A, editors. Goal-Directed Decision Making. New York: Academic Press, 2018 : 153–178.
|
[6] |
Bolenz F, Kool W, Reiter A M, et al. Metacontrol of decision-making strategies in human aging. eLife, 2019, 8: e49154. doi: 10.7554/eLife.49154
|
[7] |
Gläscher J, Daw N, Dayan P, et al. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 2010, 66 (4): 585–595. doi: 10.1016/j.neuron.2010.04.016
|
[8] |
Kool W, Botvinick M. Mental labour. Nature Human Behaviour, 2018, 2 (12): 899–908. doi: 10.1038/s41562-018-0401-9
|
[9] |
Smid C R, Ganesan K, Thompson A, et al. Neurocognitive basis of model-based decision making and its metacontrol in childhood. Developmental Cognitive Neuroscience, 2023, 62: 101269. doi: 10.1016/j.dcn.2023.101269
|
[10] |
Hämmerer D, Schwartenbeck P, Gallagher M, et al. Older adults fail to form stable task representations during model-based reversal inference. Neurobiology of Aging, 2019, 74: 90–100. doi: 10.1016/j.neurobiolaging.2018.10.009
|
[11] |
Eppinger B, Heekeren H R, Li S C. Age-related prefrontal impairments implicate deficient prediction of future reward in older adults. Neurobiology of Aging, 2015, 36 (8): 2380–2390. doi: 10.1016/j.neurobiolaging.2015.04.010
|
[12] |
Ruel A, Bolenz F, Li S C, et al. Neural evidence for age-related deficits in the representation of state spaces. Cerebral Cortex, 2023, 33 (5): 1768–1781. doi: 10.1093/cercor/bhac171
|
[13] |
Vink M, Kleerekooper I, van den Wildenberg W P M, et al. Impact of aging on frontostriatal reward processing. Human Brain Mapping, 2015, 36 (6): 2305–2317. doi: 10.1002/hbm.22771
|
[14] |
Spaniol J, Bowen H J, Wegier P, et al. Neural responses to monetary incentives in younger and older adults. Brain Research, 2015, 1612: 70–82. doi: 10.1016/j.brainres.2014.09.063
|
[15] |
Hird E J, Beierholm U, De Boer L, et al. Dopamine and reward-related vigor in younger and older adults. Neurobiology of Aging, 2022, 118: 34–43. doi: 10.1016/j.neurobiolaging.2022.06.003
|
[16] |
da Silva Castanheira K, LoParco S, Otto A R. Task-evoked pupillary responses track effort exertion: Evidence from task-switching. Cognitive, Affective, & Behavioral Neuroscience, 2021, 21 (3): 592–606. doi: 10.3758/s13415-020-00843-z
|
[17] |
Rondeel E W M, van Steenbergen H, Holland R W, et al. A closer look at cognitive control: differences in resource allocation during updating, inhibition and switching as revealed by pupillometry. Frontiers in Human Neuroscience, 2015, 9: 494. doi: 10.3389/fnhum.2015.00494
|
[18] |
Feher da Silva C, Hare T A. Humans primarily use model-based inference in the two-stage task. Nature Human Behaviour, 2020, 4 (10): 1053–1066. doi: 10.1038/s41562-020-0905-y
|
[19] |
Zandi B, Lode M, Herzog A, et al. PupilEXT: flexible open-source platform for high-resolution pupillometry in vision research. Frontiers in Neuroscience, 2021, 15: 676220. doi: 10.3389/fnins.2021.676220
|
[20] |
Santini T, Fuhl W, Kasneci E. PuRe: Robust pupil detection for real-time pervasive eye tracking. Computer Vision and Image Understanding, 2018, 170: 40–50. doi: 10.1016/j.cviu.2018.02.002
|
[21] |
Kool W, Cushman F A, Gershman S J. When does model-based control pay off. PLoS Computational Biology, 2016, 12 (8): e1005090. doi: 10.1371/journal.pcbi.1005090
|
[22] |
de Leeuw J R. jsPsych: A JavaScript library for creating behavioral experiments in a Web browser. Behavior Research Methods, 2015, 47: 1–12. doi: 10.3758/s13428-014-0458-y
|
[23] |
Daw N D, Gershman S J, Seymour B, et al. Model-based influences on humans’ choices and striatal prediction errors. Neuron, 2011, 69 (6): 1204–1215. doi: 10.1016/j.neuron.2011.02.027
|
[24] |
Rummery G A, Niranjan M. On-line Q-learning using connectionist systems. Cambridge: University of Cambridge, 1994 .
|
[25] |
Bürkner P C. brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 2017, 80 (1): 1–28. doi: 10.18637/jss.v080.i01
|
[26] |
Shenhav A, Botvinick M M, Cohen J D. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 2013, 79 (2): 217–240. doi: 10.1016/j.neuron.2013.07.007
|
[27] |
Bolenz F, Profitt M F, Stechbarth F, et al. Need for cognition does not account for individual differences in metacontrol of decision making. Scientific Reports, 2022, 12 (1): 8240. doi: 10.1038/s41598-022-12341-y
|
[28] |
Castro-Rodrigues P, Akam T, Snorasson I, et al. Explicit knowledge of task structure is a primary determinant of human model-based action. Nature Human Behaviour, 2022, 6 (8): 1126–1141. doi: 10.1038/s41562-022-01346-2
|
[29] |
Eppinger B, Walter M, Heekeren H R, et al. Of goals and habits: age-related and individual differences in goal-directed decision-making. Frontiers in Neuroscience, 2013, 7: 253. doi: 10.3389/fnins.2013.00253
|
[30] |
Walsh M M, Anderson J R. Navigating complex decision spaces: Problems and paradigms in sequential choice. Psychological Bulletin, 2014, 140 (2): 466–486. doi: 10.1037/a0033455
|
[31] |
Jablonska K, Stanczyk M, Piotrowska M, et al. Age as a moderator of the relationship between planning and temporal information processing. Scientific Reports, 2022, 12 (1): 1548. doi: 10.1038/s41598-022-05316-6
|
[32] |
de Wit S, van de Vijver I, Ridderinkhof K R. Impaired acquisition of goal-directed action in healthy aging. Cognitive, Affective, & Behavioral Neuroscience, 2014, 14 (2): 647–658. doi: 10.3758/s13415-014-0288-5
|
[33] |
Patzelt E H, Kool W, Millner A J, et al. Incentives boost model-based control across a range of severity on several psychiatric constructs. Biological Psychiatry, 2019, 85 (5): 425–433. doi: 10.1016/j.biopsych.2018.06.018
|