Learning the influence of spatio-temporal variations in local image structure on visual
saliency.
Kienzle, W., F. A. Wichmann, B. Schölkopf and M. O. Franz
Proc. 10th
Tü,bingen Perception Conference (TWK 2007), 63. (Eds.) H.H.Bülthoff, A.Chatziastros, H.A.Mallot,
R.Ulrich, Knirsch, Kirchentellinsfurt (2007)
Computational models
for bottom-up visual attention traditionally consist of a bank of Gabor-like or
Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter
responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine
learning algorithm can be used to derive a saliency model from human eye movement data with a very
small number of additional assumptions. The learned model is much simpler than previous models, but
nevertheless has state-of-the-art prediction performance [2]. A central result from this study is
that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity
of the model. Here we extend the learning method to the temporal domain. While the previous model
[2] predicts visual saliency based on local pixel intensities in a static image, our model also
takes into account temporal intensity variations. We find that the learned model responds strongly
to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides
with the typical saccadic latencies, indicating that the learning algorithm has extracted a
meaningful statistic from the training data. In addition, we show that the model correctly predicts
a significant proportion of human eye movements on previously unseen test data.
[1] L. Itti, C. Koch, and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis," IEEE
PAMI 20(11), 1254--1259, 1998.
[2] W. Kienzle, F. A. Wichmann, B. Schölkopf and M. O. Franz: A
Nonparametric Approach to Bottom-Up Visual Saliency. Advances in Neural Information Processing
Systems: NIPS 2006, 1-8 (12 2006)
Back to publication list