Statistical Analysis of Writing Style: Problem Set

Question 1.1

Problem 1. set-up: you are interested in studying the writing style of a popular Time Magazine contributor, FZ. you collect a simple random sample of his articles and count how many times he uses the word however in each of the articles in your sample, (x1, ..., xn). In this set-up, xi is the number of times the word however appeared in the i-th article.

Question 1.1. (10 points) define the population of interest, the population quantity of interest, and the sampling units.

Question 1.2. (10 points) what are potentially useful estimands for studying writing style? (hint: you are interested in comparing FZ writing style to that of other contributors.)

Question 1.3. (10 points) model: let Xi denote the quantity that captures the number of times the word however appears in the i-th article. let’s assume that the quantities X1, ...Xn are independent and identically distributed (IID) according to a Poisson distri- bution with unknown parameter λ, p(Xi = xi | λ) = Poisson(xi | λ) for i = 1, ..., n.

using the 2-by-2 table of what’s variable/constant versus what’s observed/unknown, declare what’s the technical nature (random variable, latent variable, known constant or unknown constant) of the quantities involved the set-up/model above: X1, ..Xn, x1, ...xn, λ and n.

Question 1.4. (10 points) write the data generating process for the model above.

Question 1.5. (10 points) define the likelihood L(λ) = p(· | ·) for this model and set-up at the highest level of abstraction.

Question 1.6. (10 points) write the likelihood L(λ) for a generic sample of n articles, (x1, ..., xn).

Question 1.7. (10 points) write the log-likelihood `(λ) for a generic sample of n articles, (x1, ..., xn).

Question 1.8. (10 points) write the log-likelihood `(λ) for the following specific sample of 7 articles (12, 4, 5, 3, 7, 5, 6).
Stat-3503/Stat-8109 Airoldi/Fall-21

Question 1.9. (10 points) plot the log-likelihood `(λ) in R for the same specific sample of 7 articles (12, 4, 5, 3, 7, 5, 6). What is the maximum value of λ (approximately)?

Question 1.10. (10 points) draw a graphical representation of this model, which explicitly shows the random quantities and the unknown constants only. edo says. mmmh ... something is amiss. the articles FZ writes have different lengths. if we model the word occurrences in each article as IID Poisson random variables with rate λ, we are implicitly assuming that the articles have the same length. why?
(10 points; extra

Question 1.2

credit) and if that is true, what is the implied common length? (10 points; extra credit) problem 2. set-up: you collect another random sample of articles penned by FZ and count how many times he uses the word however in each of the articles in your sample, (x1, ..., xn). you also count the length of each article in your sample, (y1, ..., yn). In this set-up, xi is the number of times the word however appeared in the i-th article, as before, and yi is the total number of words in the i-th article.

Question 2.1. (10 points) model: let Xi denote the quantity that captures the number of times the word however appears in the i-th article. let’s assume that the quantities X1, ...Xn are independent and identically distributed (IID) according to a Poisson distri- bution with unknown parameter ν · yi 1000 , p(Xi = xi | yi, ν, 1000) = Poisson(xi | ν · yi
1000 ) for i = 1, ..., n. using the 2-by-2 table of what’s variable/constant versus what’s observed/unknown, declare what’s the technical nature (random variable, latent variable, known constant or unknown constant) of the quantities involved the set-up/model above: X1, ..Xn, x1, ...xn, y1, ...yn, ν and n.

Question 2.2. (10 points) what is the interpretation of yi 1000 in this model? explain.

Question 2.3. (10 points) what is the interpretation of ν in this model? explain.

Question 2.4. (10 points) write the data generating process for the model above.

Question 2.5. (10 points) define the likelihood L(ν) = p(· | ·) for this model and set-up at the highest level of abstraction.

Question 2.6. (10 points) write the likelihood L(ν) for a generic sample of n articles, (x1, ..., xn), and n article lengths, (y1, ..., yn).
Stat-3503/Stat-8109 Airoldi/Fall-21

Question 2.7. (10 points) write the log-likelihood `(ν) for a generic sample of n articles, (x1, ..., xn), and n article lengths, (y1, ..., yn).

Question 2.8. (10 points) Simulate the number of occurrences of the word however for 5

articles using the data generating process. Assume ν = 10 and coresponding article lengthsy = (1730, 947, 1830, 1210, 1100). Record the number of occurrences of however in each article.

Question 2.9. (10 points) write the log-likelihood `(ν) for the following the specific sample of occurrences you generated in the previous question and their corresponding 5 article lengths (1730, 947, 1830, 1210, 1100).

Question 2.10. (10 points) Plot the log-likelihood from the previous question in R. Does the maximum occur near ν = 10?

Question 2.11. (10 points) draw a graphical representation of this model, which explicitly shows the random quantities and the unknown constants only. edo says. that was a more reasonable model. but FZ writes about different topics. our model is not capturing that. is FZ more prone to offering his own opinions when he writes about politics than when he writes about other topics? let’s investigate.

Get instant help from 5000+ experts for