Part A
All of part A should be contained inside one script.
1. We have taken measurements of the activity of 2 genes for 25 patients. In order to store this data, you must create a 2x25 component matrix. Each row of this matrix will correspond to the measurements of the activity of a single gene. Each column of this matrix will correspond to
a different patient. Populate this matrix with the data for GeneA in the first row and the data for GeneB in the second row
GeneA = [1.1 0.5 27 1.4 1.9 2.6 3.2 8.1 4.1 0.4 9.4 13 14.1 9.1 18.2 2.1 5.3 16.4 7 21.1 0.3 7.5 17.1 5.2 13.1];
GeneB = [0.2 0.6 11 0.9 1.2 0.9 1.8 4.4 2.5 0.3 5.9 7.9 5.9 6.1 7.2 1.8 1.6 6.1 5 10.1 0.2 4.0 8.8 1.5 4.2]; (5 marks)
2. We need to centre this dataset on the origin. Calculate the mean of each row of gene measurements and create a new 2x25 component matrix. Copy across the data from the first matrix, but subtract the mean gene measurement from each row. Plot this and set the range from 10 to 30 on each axis [hint: again use the scatter command]. (10 marks)
3. In the next step, we want to identify the axis on which the data shows the greatest variation. To do this, we create an axis, calculate the coordinate of each data point on this axis and then calculate the variance of these coordinates. We then change the orientation of the axis and repeat this process, evaluating the variance for trying a range of different axis orientations, before ultimately selecting the axis with the greatest variance. To achieve this we will start by considering an axis through the origin that is flat and then consider one that has an angle 0.01, then one that has an angle 0.02, then 0.03 and so on until we reach an angle of pi/2 (ie 90 degrees). To keep track of the variance for each axis orientation, we require a 158 component vector (because there are 158 steps of size 0.01 between 0 and pi/2), so firstly set up a 158 component vector to contain the variances.
a. Set up a for loop in which the loop variable runs from 0 to pi/2 in steps of 0.01. This for loop and the loop variable (usually i or j in
class) is going to parameterize the steepness of the axis. (5 marks)
b. Inside this for loop, create a two component vector in which the first component is 1 and the second component is tan(LoopVariable). Normalise this vector, just as you would if you were making it an orthonormal basis vector, ie if the vector is (a, b)T then create a second vector (1/sqrt(a2+b2 )) (a, b)T. This is the normalized basis vector for the axis.(5 marks)
c. Also inside this loop, using the normalized basis vector, calculate the coordinates of all 25 data points on this axis, ie take the dot produce of all 25 data points with the basis vector. (5 marks)
Part B
All of part B should be contained inside one script. We will now consider 25 patients where we have measured the expression levels of 5 genes.
1. Construct a 5x25 component matrix where each row corresponds to a different gene and each patient’s measurements corresponds to a
different column. Populate this matrix with the following data:-
GeneA = [1.1 0.5 27 1.4 1.9 2.6 3.2 8.1 4.1 0.4 9.4 13 14.1 9.1 18.2 2.1 5.3 16.4 7 21.1 0.3 7.5 17.1 5.2 13.1];
GeneB = [11.4 2.7 3.2 4.4 1.4 6.8 2.2 23.6 21.9 10.2 14.7 9.1 3.3 6.1 10.2 4.8 6.6 2.8 8 5.7 0.2 8.0 8.8 11.5 4.2];
GeneC = [0.7 0.1 13.4 0.9 1.4 1.1 1.5 3.1 1.4 0.4 3.7 8.5 9.1 2.9 7.8 0.9 3.6 6.4 3.1 9.4 0.01 3.2 10.1 3.2 5.4];
GeneD = [21.5 14.3 1.1 9 13.6 12.3 12.8 11.1 2.6 2.1 2.9 4.6 7.1 2.4 3.6 11.7 18.3 25.3 15.2 2.6 6.2 6.1 3.3 15.0 7.1];
GeneE = [0.7 0.1 12.7 0.7 1.1 1.1 1.2 6.2 3.0 0.2 4.1 7.3 7.2 5.1 9.0 0.8 2.2 6.9 3.1 10.9 0.2 5.4 9.1 3.0 6.0]; (5 marks)
2. Our first step is to do the higher dimensional equivalent of centering our data on the origin. Calculate the mean measurement for each gene and create a new 5x25 component matrix to contain the measurements from the matrix above after subtracting, on each row, the mean measurement for each gene. (5 marks)