Learning the influence of spatio-temporal variations in local image structure on visual saliency.

Kienzle, W., F. A. Wichmann, B. Schölkopf and M. O. Franz
Proc. 10th Tü,bingen Perception Conference (TWK 2007), 63. (Eds.) H.H.Bülthoff, A.Chatziastros, H.A.Mallot, R.Ulrich, Knirsch, Kirchentellinsfurt (2007)


Computational models for bottom-up visual attention traditionally consist of a bank of Gabor-like or Difference-of-Gaussians filters and a nonlinear combination scheme which combines the filter responses into a real-valued saliency measure [1]. Recently it was shown that a standard machine learning algorithm can be used to derive a saliency model from human eye movement data with a very small number of additional assumptions. The learned model is much simpler than previous models, but nevertheless has state-of-the-art prediction performance [2]. A central result from this study is that DoG-like center-surround filters emerge as the unique solution to optimizing the predictivity of the model. Here we extend the learning method to the temporal domain. While the previous model [2] predicts visual saliency based on local pixel intensities in a static image, our model also takes into account temporal intensity variations. We find that the learned model responds strongly to temporal intensity changes ocurring 200-250ms before a saccade is initiated. This delay coincides with the typical saccadic latencies, indicating that the learning algorithm has extracted a meaningful statistic from the training data. In addition, we show that the model correctly predicts a significant proportion of human eye movements on previously unseen test data.

[1] L. Itti, C. Koch, and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis," IEEE PAMI 20(11), 1254--1259, 1998.

[2] W. Kienzle, F. A. Wichmann, B. Schölkopf and M. O. Franz: A Nonparametric Approach to Bottom-Up Visual Saliency. Advances in Neural Information Processing Systems: NIPS 2006, 1-8 (12 2006)
Back to publication list