University of Birmingham > Talks@bham > Artificial Intelligence and Natural Computation seminars > Bandits on an unknown manifold

Bandits on an unknown manifold

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Hector Basevi.

Host: Dr Ata Kaban

Speaker’s website: http://www.cs.man.ac.uk/~reeveh/

Abstract: The multi-armed bandit is a simple model which exemplifies the exploitation-exploration trade-off in reinforcement learning. An agent is given access to a collection of arms, each of which generates a stochastic reward. The agent sequentially selects arms to pull, so as to maximise their cumulative reward. At each time step, the agent must choose between “exploiting” arms which have previously performed well and “exploring” arms which have been pulled less often. This simple model has numerous practical applications from sequential clinical trials through to web-page ad placement.

Multi-armed bandits with covariates extends this model by assuming that the reward distributions depend upon observable side information. For example, in sequential clinical trials the agent might have access to a patient’s MRI scan or genome sequence; in web-page ad placement side-information might include a particular user’s preferences and purchasing history. Multi-armed bandits with covariates require the agent to model the reward distributions as a function of the covariate space, combining the bias-variance trade-off from supervised learning with the exploitation-exploration trade-off from reinforcement learning.

We shall consider a setting in which the covariates lie on a low-dimensional sub-manifold of unknown dimension, within a high-dimensional feature space. We introduce a simple algorithm – the k-Nearest Neighbour UCB algorithm which adapts Kpotufe’s procedure for selecting the number of neighbours in the supervised regression setting. Our algorithm achieves a minimax-optimal regret bound, up to logarithmic factors, exploiting the manifold’s low-dimensional structure, without the need to supply the dimension of the manifold as an input.

This is a joint work with Joseph Mellor and Gavin Brown.

This talk is part of the Artificial Intelligence and Natural Computation seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

Talks@bham, University of Birmingham. Contact Us | Help and Documentation | Privacy and Publicity.
talks@bham is based on talks.cam from the University of Cambridge.