Institut for psykologi > Forskning > CVC - Center of Excellence > TVA resources > TVA Intro
A Theory of Visual Attention
Søren Kyllingsbæk & Thomas Habekost
Introduction
In this paper we describe A Theory of Visual Attention (TVA) first presented by Bundesen (1990). We have tried to make the theory accessible through a number of different examples ranging from single item identification to partial report.
TVA is a combined theory of recognition and selection. Whereas many theories of visual attention separate the two processes both in time and representation, TVA instantiates the two processes in a unified mechanism implemented as a race model of both selection and recognition. In other words, when an object in the visual field is recognized, it is also selected at the same time and vice versa.
By the unification of selection and recognition TVA tries to resolve the long standing debate of early versus late selection. The first position claims that selection occurs prior to recognition (e.g., Broadbent, 1958) and the other that recognition is the precursor for selection (e.g., Deutsch & Deutsch, 1963).
According to TVA, elements in the visual field are processed in parallel. Visual processing is a two-stage process comprised of (a) an initial match of the visual impression with visual long-term memory (VLTM) representations followed by (b) a selection/recognition race for representation in visual short-term memory (VSTM). Note that the initial match to VLTM does not imply recognition. Thus, after the match, both the strength of evidence that a given letter is a C and the strength of evidence that the letter is a G may be positive. Only after the recognition/selection race is the letter recognized, that is, categorized as either a C or a G.
In the following paragraphs we will describe the mathematical implementation of TVA, followed by applications of the model to different experimental paradigms.
Stages
The First Stage - Computation of Eta values
According to TVA, visual processing starts with a massively parallel comparison (matching) between objects in the visual field and representations in VLTM. This process is capacity unlimited in the sense that the time it takes is independent of the number of objects in the visual field. The end result of the matching process is the computation of eta-values ("evidence values"), h(x,i), each measuring the degree of match between a given object x and a long-term memory representation (category) i.
Eta-values are affected by the visibility (e.g. contrast) of the visual stimuli as well as the degrees of match between the stimuli and the VLTM representations. The latter are affected by learning, thus one may for instance learn to read one’s own name faster than other first names (see Bundesen, Kyllingsbæk, Houmann, & Jensen, 1997). Altogether, eta-values are affected by "objective" properties of the visual environment and the VLTM of the subject, but not by "subjective" properties such as selection criterion or categorization bias.
The Second Stage - The Race
Different categorizations of the objects in the visual field compete for entrance into VSTM in a stochastic race process. The capacity of VSTM is limited to K elements. K typically has a value around 4 items. Categorizations of the first K visual objects to finish processing are stored in VSTM (the first K winners of the race). Categorizations from other elements are lost, but may give rise to priming effects and other "subliminal" phenomena.
Note that categorizations from elements already represented by other categorizations may freely enter VSTM even though it is full. Thus VSTM is mainly limited with respect to the number of elements of which categorizations may be stored, not with respect to the number of categorizations of the elements represented in the store.
As stated above, visual objects are processed in parallel in TVA. Furthermore, substantial independence is found between visual categorizations of different objects and between different types of visual categorizations of the same object. For simplicity, assume that only two objects are present in the visual field, and the two objects shall be judged with respect to color and shape. Given that VSTM capacity is larger than two, VSTM is not a limiting factor. If attentional parameters are kept constant, then the probability that the first object is correctly categorized with respect to color is independent of whether the object is correctly categorized with respect to shape and independent of whether the other object is correctly categorized with respect to color or shape (cf. Bundesen, Kyllingsbæk, & Larsen, 2003).
To determine the rate of processing of each categorization of an element, the eta-values are combined with two types of "subjective" values, pertinence and bias. As suggested by Broadbent (1971), two different attentional mechanisms are necessary for adequate behavior: one for filtering (based on pertinence) and one for pigeonholing (based on bias). For example, if subjects are to report the identity of black target letters amongst white distractor letters, the white distractors must be filtered out, and the targets must be categorized with respect to letter identity. In TVA terms, pertinence should be high for black and low for white stimuli and bias should be high for letter identities and low for other categories.
The rate of processing v(x,i) of a categorization is given by two equations. By Equation 1,
(1)
where h(x,i) is the strength of the sensory evidence that element x belongs to category i, bi is the perceptual bias associated with i, S is the set of elements in the visual field, and wx and wz are attentional weights for elements x and z. Thus the rate of processing is determined as the strength of the sensory evidence that object x is of category i weighted by the bias towards making categorizations of type i and by the relative attentional weight of object x (given by the ratio of wx over the sum of the attentional weights of all objects in the visual field).
The attentional weights are in turn given by
(2)
where R is the set of perceptual categories, h(x,j) is the strength of the sensory evidence that element x belongs to category j, and pj is the pertinence value associated with category j. The distribution of pertinence values defines the selection criteria at any given point in time. By Equation 2, the attentional weight of object x is a weighted sum of pertinence values, where each pertinence value is weighted by the degree of evidence that object x actually is a member of category j.
In most experimental setups, eta-, beta-, and pi-values can be assumed to be constant during the stimulus presentation, which implies that v values are also constant. The v values are defined as rate parameters in the encoding process. When the v values are constant it can shown that the probability of encoding is exponentially distributed (see Equations 3 and 4).
The Example Paradigm Used
In the following text the mechanisms of the TVA model will gradually be described through the use of increasingly more complex experimental designs.
The core task used is short presentations of black target letters either alone or amongst white distractor letters. Presentations are brief (below 200 ms) to prevent the influence of eye movements. Presentation of the stimulus letters may be followed by presentation of stimulus masks to terminate further processing. In other words, the task of subjects is to view briefly presented displays of black (and white) letters and report the identity of as many of the black letters as possible.
Figure 1: Processing of a singly presented stimulus with different v
values.
(click on the picture to activate JAVA application).
Single Target Letter Presentation
The simplest case of the exemplary paradigm is the presentation of a single black target letter followed by a mask. The rate of processing v(x,i) depends on both the visual element (the letter) x and the categorization (feature) i. By insertion into Equation 1,
Further, TVA states that if the rate of processing is constant, the processing time of a stimulus is exponentially distributed with a rate parameter v(x,i):
(3)
(4)
where f(t) is the probability density function and F(t) is the probability distribution function (the probability of encoding an item as a function of exposure time). Figure 1 shows F(t) for a single stimulus with the rate parameter v(x,i) at a value of either 5 elements/s, 10 elements/s, 15 elements/s, or 20 elements/s.
Figure 2: Two stimuli with different smallest threshold for conscious perception.
(click on the picture to activate JAVA application).
Smallest Effective Exposure Duration
Until now we have assumed that processing begins immediately after the stimulus letters have been exposed. However there is evidence that there is a threshold below which no processing takes place (e.g. Bundesen & Harms 1999). Thus we introduce a new parameter called t0 to represent the threshold. Parameter t0 typically takes on values less than 20 ms in normal healthy subjects. If presentations are shorter than this value, no processing takes place (v(x,i) = 0). The effective exposure duration t is defined as t = t - t0.
Equations 3 and 4 are then:
Whole Report - Multiple Targets
Now we complicate matters by presenting two black letters rather than only one. The subjects’ task is to report the identity of both letters presented. For simplicity we assume that eta- and beta-values are the same for both letters and the same as when presenting only a single letter. Further, assume that the subject attend equally to the two letters, that is the attentional weights are equal, say they are 1 for both elements. Now the rate of processing is half of what is was for the single letter presentation:
Again, the probability that processing of the two letters has finished may be found using Equation 4. Further, we can compute the joint probability of reporting none, one, or both letters. Because the two letters are processed independently, the joint probability is simply the product of the marginal probabilities of the two letters finishing processing (F(t)) or not (1-F(t)):
see definition of F(t) in Equation 4.
Differences in Attentional Weights
Now assume that we ask subjects to pay more attention to one of the letters, say that the letters are presented to the right and left of fixation and that subjects are told to pay more attention to the right letter. Assume that the attentional weight of the right letter is 2 and the left letter is 1. The two v values are then:
Left letter:
The presentation of a white distractor letter together with a single black target letter may be viewed in a similar way. The distractor will have a smaller attentional weight due to the fact that the pertinence of white is low compared to the pertinence of black. This results in smaller attentional weights for white objects compared to black.
Figure 3: Processing of multiple targets in whole report. t0 is the threshold
for conscious perception, C is the total limited processing capacity, K
is the VSTM memory capacity, and T is the number of targets presented.
(click on the picture to activate JAVA application).
VSTM Limitations in Whole Report
When only few stimuli are present in the visual field, probabilities of categorizing the stimuli may be computed using the above equations. However, when the number of stimuli presented in the visual field exceeds the capacity K of VSTM, matters are more complicated.
What complicates matters are the derivations of the equations for the probabilities of reporting none, one, two, etc. letters correct. Let the score s be the number of targets that enter VSTM and, therefore, can be reported (without guessing). Two cases must be considered. In Case 1, (a) s is less than the VSTM capacity K, or (b) s equals the total number of targets in the display, T, and T equals K. In this case performance is unaffected by K (the limitation of the storage capacity of VSTM) so that s equals the number of letters that finish processing. In Case 2, s equals K, but the number of letters in the display, T, is greater than K. In this case performance is often affected by K so that only a subset of those targets that finish processing can be reported (i.e., s is often smaller than the number of targets that finish processing).
In Case 1 we get a formula for the probability of reporting s items correctly that is a generalization of the above equations for whole report of two letters:
(5)
where T is the number of letters presented.
is
the number of different ways in which s items may be drawn from a
total of T items without replacement when order is of no importance
(i.e., the number of combinations of s letters out of a total of T
letters). This number is used because the score s may be the result of
a number of different reports; for example, if {A,B,C,D} is presented and s
is equal to 2, we may have reported {A,B}, {A,C}, {A,D}, {B,C}, {B,D}, or
{C,D}, that is,
= 6 different combinations.
Case 2 (s = K < T) is more complicated. We must consider what happens when VSTM is filled up with letters. One way of formulating this mathematically is to partition the exposure duration into small intervals. For each of these intervals we compute the joint probability that (a) one letter (viz., the last one to enter VSTM) finishes processing in this interval, (b) K-1 letters finish processing before the beginning of the interval, and (c) the remaining T-K letters finish processing after the end of the interval. Finally, we compute the sum of all these joint probabilities. When we let the size of the time intervals tend towards zero, the sum of the joint probabilities tends towards the true probability of reporting K letters when K is a limiting factor. Formally the summation is replaced by integration:
(6)
Because F(t) is exponentially distributed, the integration may be solved analytically. Thus, Equations 5 and 6 can be implemented as fast computer algorithms used for fitting the TVA model to empirical data.
Figure 4: Partial report. t0 is the threshold for conscious perception, C
is the total limited processing capacity, K is the VSTM memory
capacity, alpha is the ratio of the attentional weight of a distractor
to the attentional weight of a target, T is the number of targets
presented, D is the number of distractors in the display.
(click on the picture to activate JAVA application).
Partial Report - Multiple Targets and Multiple Distractors
The above considerations may be extended to displays containing both targets and distractors by assuming that distractors have smaller attentional weights than targets.
Instead of two different cases, we now consider three. In Case 1, the number of letters (targets + distractors) entering VSTM is less than K. In Case 2, the number of letters entering VSTM equals K and the last letter to enter VSTM is a target. In Case 3, the number of letters entering VSTM equals K and the last letter to enter VSTM is a distractor.
A formula for the first case follows as a generalization of Equation 5:
(7)
where T is the number of targets presented, D is the number of distractors, s is the number of target letters reported, and m is the number of distractors entering VSTM. G(t) is the probability distribitution function of the processing of a distractor, similar to F(t) representing the target processing. The only difference between G(t) and F(t) is the value of the v parameter.
The second case, when the last letter entering VSTM is a target, is a generalization of Equation 6:
(8)
where f(t) is the probability density function of the last, target letter entering VSTM, and m = K – s is the number of distractors entering VSTM.
Finally for the case when the last letter to enter is a distractor:
(9)
where g(t) is the probability density function of the last, distractor letter entering VSTM, and m = K – s is the total number of distractors entering VSTM.
Alpha Values
Let us assume that (1) the stimulus material used is homogeneous (i.e., eta values for different stimuli are equally great), (2) biases are also equally great, and (3) the attentional weight of a target equals wtarget while the attentional weight of a distractor equals wdistractor. Define a as the ratio wdistractor / wtarget, that is, the relative attentional weight of a distractor compared to the weight of a target. Thus, a represents the efficiency of selection between distractors and targets. If a is close to 0, filtering is close to perfect, but if a equals 1, processing is nonselective.
By the above assumptions, the eta-beta product in Equation 1 may be treated as a constant, C. Because wdistractor = a wtarget, we get
(10)
where T is the number of targets and D is the number of distractors.
Similarly for a target
(11)
We see that the rate of processing of a distractor equals the rate of processing of a target weighted by the value of a.
Unmasked Displays
Until now we have been looking at whole and partial report when the stimulus display is terminated by masks, thus stopping further processing. In this case we have considered the rate of processing v(x,i) to be constant across time within the effective exposure duration. When a stimulus display is unmasked, we assume that the v values decay exponentially with a fixed time constant, m. In formal terms:
where ED is the exposure duration. Parameter m varies between individuals and also depends on the experimental design (e.g., contrast and intensity of the display). It is typically estimated at a few hundred milliseconds.
Modeling Patient Data
In neuropsychological studies, both sensory and attentional parameters may vary dramatically across the visual field. Thus, v values may be noticeably different for every stimulus item in the display. Recently we have applied the TVA model to data from patients with attentional deficits following brain lesions (e.g. Duncan, Bundesen et al., 1999; Duncan et al., 2003; Habekost & Bundesen, 2003). This has been possible through a generalization of Equations 5 - 9 using set theory. Instead of fitting the TVA model to the score distribution we fitted the observed probabilities of report at each individual display location. This new approach has demonstrated that TVA can be used to gain new insights into the mechanisms underlying neuropsychological deficits in visual attention such as visual neglect and simultanagnosia.
A Neural Theory of Visual Attention
We have recently published a neural implementation of the TVA model, NTVA (see Bundesen, Habekost, & Kyllingsbæk, 2005). NTVA is a neurocomputational theory of visual attention which corresponds mathematically to the original TVA model. This work has been supported by the Danish Research Council for the Humanities and the Carlsberg Foundation.
References
Broadbent, D. E. (1958). Perception and Communication. Oxford University Press.
Broadbent, D. E. (1971). Decision and Stress. London: Academic Press.
Bundesen, C. (1990). A theory of visual attention. Psychological Review, 97, 523-547. (pdf)
Bundesen, C., Habekost, T., & Kyllingsbæk, S (2005). A neural theory of visual attention. Bridging cognition and neurophysiology. Psychological Review, 112, 291-328. (pdf)
Bundesen, C., Kyllingsbæk, S., Houmann, K. J., & Jensen, R. M. (1997). Is visual attention automatically attracted by one's own name? Perception & Psychophysics, 59, 714-720. (pdf)
Bundesen, C., Kyllingsbæk, S., & Larsen, A. (2003). Independent encoding of colors and shapes from two stimuli. Psychonomic Bulletin & Review, 10, 474-479. (pdf)
Deutsch, J. A., & Deutsch, D. (1963). Attention: Some theoretical considerations. Psychological Review, 70, 80-90.
Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Ward, R., Kyllingsbæk, S., van Raamsdonk, M., Rorden, C., & Chavda, S. (2003). Attentional functions in dorsal and ventral simultanagnosia. Cognitive Neuropsychology, 20, 675-701. (pdf)
Duncan, J., Bundesen, C., Olson, A., Humphreys, G., Chavda, S., & Shibuya, H. (1999). Systematic analysis of deficits in visual attention. Journal of Experimental Psychology: General, 128, 450-478.
Duncan, J., & Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review, 96, 433-458.
Habekost, T., & Bundesen, C. (2003). Patient assessment based on a theory of visual attention (TVA): Subtle deficits after a right frontal-subcortical lesion. Neuropsychologia, 41, 1171-1188.




