Replies: 8 comments 10 replies
-
Summary CommentaryCAVEAT: To be crystal clear re: my use of the term “negative valence”, I want to point out the distinction between the mere omission of a positive US in the context of APPETITIVE processing (excluded) — VERSUS everything else negative relevant for behavior (my definition for “negative valence”). I feel this bears pointing out because the literature in RL and behavioral economics, and even much of neuroscience is sloppy about this distinction and actually generally uses reward omission as the paradigmatic case of negative outcomes — which IMO it should NOT be considered. Instead, reward omission should be considered EITHER just a part of the world’s being probabilistic, OR a potential signal for me to pay more attention to what I’m doing or to consider that my world’s contingencies have changed. The 3-way taxonomy @randy outlines makes a lot of sense to me. At the highest level is the distinction between the negative outcomes that serve to define processing in an AVERSIVE CONTEXT versus negative events/experiences/considerations that accompany processing in what is otherwise an APPETITIVE CONTEXT. The former might be defined as processing where the dominating consideration is the escape/avoidance/mitigation of the occurrence of seriously bad events, e.g., @randy’s “existential threats” but also other really bad stuff like substantial loss of property, etc. as a human/economic example, while the latter might be defined as less serious negative experiences that occur within the context of what is otherwise an appetitive-dominant situation. One way to define a distinction between these two might be in terms of the magnitude of the stakes involved for the positive vs. negative experiences. Appetitive context means there’s a lot to gain while the negative stuff is less consequential - and vice versa.
Within the appetitive context, it also seems important to distinguish between negative stuff like the unpleasantness of physical and/or mental effort that comes with pursuing appetitive goals — VERSUS negative stuff that happens “out there in the world” as a result of my appetitive behavior, either along the way to goal-realization, or that ALSO occurs in conjunction with goal-the realization; the former are kinda hurdles to be overcome, the latter “side effects” of the primary (desired) outcome. A possible dimension for distinguishing between these two kinds of negative events might be the sensory (or cognitive?) source of the information: for the former the source seems to (always?) be the subject’s own body (or brain?), i.e., it is interoceptive; while for the latter the source seems to be events that happen in the world (exteroceptive). This distinction could be useful/relevant because these processing modes tend to involve separate neural structures and could thus serve to motivate model architecture/organization. Addressing each of the three taxonomical branches in turn: APPETITIVE VS. AVERSIVE CONTEXT PROCESSING AS LEFT- VS. RIGHT-DOMINANT LATERALIZED PROCESSING: I believe that the long-standing distinction made between appetitive-as-left-hemisphere vs. aversive-as-right still holds up empirically (BIS/BAS and all that) although it tends to be largely underemphasized in much of the cognitive neuroscience literature. Note that such a framing does not need to be to the exclusion of other lateralization schemes commonly proposed: left-as-on-task vs. right-as-monitoring; left-as-primary-task vs. right-as-secondary, etc.
OBLIGATORY COST/INTEROCEPTIVE(?) PROCESSING AS VMPFC (mostly excludes OFC): Generally, the VMPFC constitutes a large chunk of neural real estate that is not yet very well characterized, especially in primates. A lot of VMPFC seems to process interoceptive and/or affectively-charged information that is central to motivation, etc. By hypothesis, my mental model for VMPFC has several subareas, each specialized for processing various dimensions of “cost” associated with various categories and anticipated instantiations of specific actions, partially hierarchically organized so as to feed into a kind of integrated utility area (ACCutil?) where the expected outcome value of contemplated actions and their accompanying costs combine to form expected utility reps.
Regarding how to combine multiple cost dimensions, including how to weigh them relative to one another, Prospect Theory has generally used different exponent parameters for discounting positive (benefits) vs. negative (costs) outcomes as a function of their respective magnitude. I am aware, however, if and/or how they handle multiple cost dimensions. There is a body of work in behavioral economics having to do with multi-featural decision making, typically involving consumers’ purchasing behavior, that could be relevant here as well. One approach might be for each subject to have a vector of such exponent-parameters, one for each cost dimension, along with scaling parameters that reflect the relative strength of each dimension for that subject and combine them linearly.
Also, I have only recently come to realize just how intimately integrated the VMPFC, etc. are with the HC/MTL and associated subcortical structures (anterior thalamus, nuc. reuniens, ventral striatum-defined BG pathways, etc.). I’ve come to think of this system (also including OFC) as a kind of “rodent brain” (because that seems to be all they have cognitively) that serves as kind of core system “around which” the additional cognitive processing substrates are layered on in the primate (i.e., subsumption).
Finally, Josh Brown recently published an update to his PRO model of ACC function, which also seems relevant here. Although the original PRO had ACC doing only outcomes, and predictions thereof, he may have since added the cost side. In any case, it’s worth looking at his latest.
NEGATIVE EXTERNAL EVENTS IN APPETITIVE CONTEXT HANDLED BY OFC: To a first approximation the OFC seems to serve as a substrate for a kind of forward-looking working memory goal-state representational space that serves to motivate behavior towards affectively-charged desired states of the world, e.g., searching/working for food when hungry, etc. While there was some early thought that positive vs. negative valence was represented along a medial-to-lateral gradient in OFC (refs in FrankClaus06?), more recent thinking is that appetitive (food) vs. aversive (electric shock) USs are represented in the OFC in a mosaic-like pattern, and that the medial-lateral gradient may contribute to the OFC’s serving as a kind of “task-representational-space” perhaps extending into adjacent VLPFC in primates (my hypothesis).
ADDITIONAL THOUGHTS: RIGHT- VS. LEFT-LATERALIZATION IN OFC(?): We know from the motor system that right vs. left lateralization exhibits a ubiquitous pattern whereby contralateral action is dominant but ipsilateral action is always also represented in each hemisphere and that there is extensive interhemispheric communication. Speculatively, a potentially useful hypothesis re: OFC functional organization might be that while both appetitive and aversive US outcomes are represented in each hemisphere, the left OFC is appetitive-dominant while the right is aversive-dominant (analogous to contra- vs. ipsi- in the motor system). Such an arrangement would allow for the representation of relatively low-stakes negative outcomes, like occasional thorn pricks, etc., to be represented in the left hemisphere alongside a currently dominant appetitive goal-state, like foraging on a berry bush. That is, while I stay in persistent appetitive-processing-mode, I am nonetheless able to “keep in mind” the need to keep the thorn pricking to a minimum.
PREDATOR-PREY INTERACTIONS: For what seems like obvious reasons, predator-prey interactions have exerted an especially powerful influence on evolution. The prey’s perspective, in particular, can be particularly informative as it serves to highlight the importance of distinguishing between processing under aversive vs. appetitive context. And, the aversive context here can be especially enlightening. Consider a prey that encounters a predator in its immediate environment. As a first priority it must avoid capture, which entails first of all maintaining its attention on the predator and its location at all times, and then continually working to keep optimal practical distance between itself and the predator. Think of the ubiquitous case of one person chasing another with a table between them. Something to notice here is how the predator serves as a kind of anti-goal for the prey — I’ve got to maintain focus on the predator in order to keep my distance; that is, I CANNOT afford to treat the predator as something unpleasant and so put it out of my mind. From an OFC-as-working-memory-for-goals perspective, the maintenance of the predator rep and its location has to be my main focus. This is of course all happening in an aversive context. On the other hand, going through life in perpetual fear/avoidance mode is a terrible way to live. Thus, if I can identify a state-of-the-world for myself that precludes my predator’s access to me then I don’t have to focus on the predator for the foreseeable future. In the behavioral literature, this is studied by so-called safety-signal or security learning, and such learned safe states acquire strong positive valence — think about being inside a shelter all warm and dry while a cold rain pours outside. An important thing to note here is that once such safe states are learning/identified, their pursuit can serve to convert an aversive context situation into an appetitive one — instead of perpetually having to avoid the predator, I can instead pursue the quasi-appetitive goal of finding somewhere that I know they can’t reach me for whatever reason and then I can forget about them. Problem solved!
THE DUAL HIGH-STAKES EXPERIMENTAL PARADIGM: There is a really nice paradigm, I think from the behavioral ecology literature, in which rodents are required to venture out from a safe area at one end of the experimental environment in order to forage for food placed at various points in an open field that has a predator residing at the opposite end of the environment. The predator’s range is limited somehow such that it’s speed relative to the subject means there is a danger gradient that goes from safe foraging (near the subject’s safe haven) towards the predator’s lair. This allows for quantitative analysis of individual subject’s behavior in order to explore causal factors affecting risk-taking behavior, individual differences, etc.
REQUEST: @randy and anyone: Please indicate which of the above-mentioned lit search questions you’d like me to prioritize; otherwise, I will follow my own muse. Also, let me know if I missed any issues brought up in the post above, or if there are additional, specific questions you’d like me to address. |
Beta Was this translation helpful? Give feedback.
-
Specific Inline Comments
|
Beta Was this translation helpful? Give feedback.
-
Basic USneg, BLAneg, OFCneg, ACCcost etc connectivity and logic in BOAHere's some notes on basic computational logic for organizing layers within the first two "approach goal compatible" forms of negative US outcomes. Graded USs
Gate OFCneg along with OFCus as part of the goal rep
Action costs: Effort = 1st pool in USnegsummary of discussion from today:
Existing Effort logic provides a good template for how to generalize to all USneg:
This same logic works well for pain (e.g., bumping into walls in eboa), social negative outcomes, etc. CS -> BLANeg
VsMatrix
OFCneg -> ACCcostModel OFCneg, ACCcost on OFCus, OFCval layers -- same basic logic, connectivity with corresponding BLAneg etc. |
Beta Was this translation helpful? Give feedback.
-
Bidirectional (positive / negative) net value as a function of total reward (pos US) and total cost (neg US):Goals:
// [i] below is each individual US -- aggregate across all then normalize!
pvNeg= 1 - (1 / (1 + sum(weight[i] * negUSRaw[i]))) // raw = unbounded linear sum, negUS is normalized value across all
pvPos = 1 - (1 / (1 + sum(weight[i] * drive[i] * posUSRaw[i]))) // same
var pvNet float32 // pv is overall primary value -- + / -
if pvPos > threshold * pvNeg { // threshold [default 1] = factor for relationship between pos / neg
// todo: threshold can be adaptive -- depending on longer-term rate of positive reward etc..
pv = pvPos * (1-pvNeg) // negative discounts against posUS
} else {
pv = -pvNeg * (1 - pvPos) // converse
}
da = pv // da = dopamine; + = burst above baseline tonic; - = dip below baseline tonic |
Beta Was this translation helpful? Give feedback.
-
rationale for why we need to use globals in GPU-friendly way: basically need to read / write lots of values to / from different layers on a cycle-by-cycle basis. There are a few vars that don't need to be visible but more that are -- so on balance, better to just have in one place. Also, we run 10 cycles of updating entirely on the GPU without any copying back/ forth to CPU, so all of the computation has to also run on the GPU. |
Beta Was this translation helpful? Give feedback.
-
When a negative US becomes its own outcome: ACh salienceKey point: ACh gates (inhibits / disinhibits) activity in VS (ventral striatum), so that goal gating only happens on "salient" events, such as positive US onset, or CSs. The new LHb code has this logic. Thus, the ACh response determines whether a negative US is sufficient to drive VS gating on its own:
|
Beta Was this translation helpful? Give feedback.
-
Effort vs UrgencyNow that Effort is a "proper" negative US, it always activates BLANegAcq, like any other neg US. This then competes with novelty activity at the start in BOA, preventing the existing default model from engaging the curiosity drive. We might want to make some kind of exception here, but it also suggests an important clarification between urgency and effort: they should be mutually exclusive "bins" that accumulate the same quantity ("effort") depending on whether the goal is engaged or not:
This will fix the weird situation where Urgency could interfere with an already-engaged state (@garymm pointed out this problem) and will make Go firing much more robust, as it was originally intended to. |
Beta Was this translation helpful? Give feedback.
-
The logic of Giving UpThe following negative valence signals drive giving up on an engaged goal:
It is possible to integrate the first two neg-US based values into a single threshold, but the ACh salience mechanism for strong sudden-onset USs suggests that they might engage distinct mechanisms, and it is simpler to set thresholds on single US values rather than deal with the complexities of normalized values. The VSPatch mechanism is distinct because of the way that we integrate negative and positive PV values: first there is a comparison between PVpos and PVneg, and if pos > neg, then neg discounts pos (and vice-versa), and then VSPatch only operates in relation to the PVpos values. So it isn't on the same terms as neg directly, and functionally, it is more about evaluating the positive side of the equation (whether it will show up or not) vs. the negative side, so it makes sense to treat it separately. However, we can discount the positive PVpos as a function of the accumulated VSPatch activity, in comparison with the PVneg, to enable both factors to accumulate and work together to drive giving up. So, VSPatch reduces PVpos and thus accelerates the chance of PVneg driving a switch. Stochasticity is also critical for this process: previously had some randomness in the effort max values -- much better to just compute a probability of giving up and roll the dice accordingly. Need a gain-tuned logistic function around the threshold value, to parameterize how noisy vs. deterministic this is. Finally, the full "bicameral BOA" framework is likely necessary to enable integration and projection of the value of alternative options in light of accumulating costs on the dominant engaged one, but we can implement simpler heuristics. Implementation
|
Beta Was this translation helpful? Give feedback.
-
Here's the discussion thread for trying to figure out how to integrate negative valence outcomes (pain, bad taste / smell, predators, negative social outcomes, etc) into the existing BOA model in a better way.
So far, BOA has been built around positive-valence (appetitive) outcomes that drive goal engaged approach behavior, within an overall consumatory (e.g., foraging) framework free from existential threats ("ordinary" life). This BOA framework naturally suggests two different categories of negative valence signals, which align with the central ACC (action cost) and OFC (outcome value) distinction, and logically affect different aspects of decision making. However, existential-level decision making is likely qualitatively different in important ways, as discussed subsequently.
Action costs: effort (time), pain (physical impact, etc), frustration, etc (what else?)
Negative outcomes: rotten / sour / bitter taste, negative social outcomes, etc (what else?)
Existential threat: at some point, pain transitions into mortal threat, at which point it should probably be more than a discount factor. This is a discrete outcome, not a continuous cost. It should drive active avoidance behavior directly instead of just representing a cost.
Biological data
The amygdala is the core system for all of the "valenced" signals, and most research has traditionally focused on negative valence instead of positive. Having all of these signals going through the amygdala (BLA) allows for competitive "attention" dynamic so that the most important ones dominate: if there are existential issues at hand, those will presumably out-compete others.
To accommodate the above factorization, we likely need to organize BLA into existential vs. ordinary categories, and route these into different pathways in other brain systems.
Beta Was this translation helpful? Give feedback.
All reactions