### Question

In many machine learning problems (e.g., Bayesian learning for linear regression) one seeks to estimate the evidence of a model family Pr(D|θ) = f(θ) for some set of model family (hyper)parameters1. For instance, in the case of the linear regression and Bayesian learning those were the parameters α and β. We then seek to maximize the evidence by varying those parameters and ï¬nd the model that has good generalization performance. However, it is sometimes the case that the evidence function f we seek to optimize cannot be computed analytically. More critically, it is often challenging to optimize this function. Instead, we approach the problem by taking samples{θk}K k=1 of the parameters of this function and evaluating its value on those samples{f(θk)}K k=1, selecting the sample that results in the highest probability of evidence θ∗ = argmaxk f(θk). 1. SupposeyouapproachtheaboveparameteroptimizationproblemusingaregulargridofpointsGR = {θk}K k=1, where θ ∈<. Assume that you can evaluate the function f(θ) for any argument θ in some ï¬xed time τ, which couldbelarge. Whatarethepotentialissuesinthestrategyoutlinedintheparagraphaboveunderthesesettings? 2. Now consider a different evaluation strategy. Instead of ï¬rst constructing a regular grid of points GR you will be given two initial values of θ, call them G2 = {θk}2 k=1. Your goal is to select the next ”best” point θK+1, where this function is to be evaluated. The ”tool” you have at your disposal is the Bayesian linear regression with radial basis functions placed at the points in G2. In other words, you can model f from the samples G2 and the correspondingF2 = {f(θk)}. Propose a strategy to select the next best point θ3 where to evaluate f. Justify clearly the mathematical model underpinning this strategy and the set of steps (i.e., algorithm) that you will use to select that next point. Be precise. Generalize this to selecting θK+1 given previous evaluationsGK FK. 3. Demonstrate this strategy on the problem of ï¬nding the maximum of the function f(x) = e−x2 − e−(x−5)2 on the interval I = [−3,8]. Choose two starting points, x1 = −0.8 and x2 = 5.7. Show the sequence of ten subsequent choices of xk and in each iteration plot the estimated mean value of your function f and the uncertainty, over the given evaluation intervalI, and the location of the next chosen evaluation point xk+1

### CS536, Machine Learning

