Concepts
1. If a researcher finds that the data points in a factor analysis are surrounded by a 3-dimensional ellipsoidal space that is elongated along the x and y axes but concentrated within the z axes, what does this imply about the contribution of the variable mapped to the x, y, and z dimensions?
2. How do factor analysis and cluster analysis differ in the way they aggregate data?
3. What are the advantages and drawbacks associated with hierarchical versus nonhierarchical clustering methods?
Exercises
1. Explain and interpret the following rotated factor loading table.
2. Perform a hierarchical cluster analysis using the following data ranking prospective employees. Comment on your findings.
3. How many significant factors are there in the following analysis? How many variables were there in the original analysis?
4. A marketing group collects the following data about the neighborhoods it serves. Use factor analysis to summarize it. How many factors are sufficient to describe the data? (that is, have eigenvalues greater than 1). What characterizes the factors?
5. Using the data from Question 4, generate the factor scores and identify which neighborhoods the marketing firm might target when advertising an expensive Englishlanguage cable package?
6. Use the Milwaukee Sales dataset to carry out a k-means cluster analysis on the z-score standardized lot size, age, air conditioning, and number of bedrooms associated with a property.
7. Using the Singapore 2010 Census dataset, carry out factor analysis on the unemployment rate, illiteracy rate, percentage of the population with no school, percentage of the population who rent, the number of English speakers, and the percentage of the population with a university degree. Describe the rotated factor loadings, and identify the most important variables that comprise each factor.
8. Using the SPSS Housing Dataset, carry out a factor analysis on the property’s region, price, number of bedrooms, date of construction, and floor area. Describe the rotated factor loadings, and and identify the most important variables that comprise each factor.
9. Use the SPSS Housing Dataset, carry out a k-means cluster analysis on the z-score standardized house price, number of bedrooms, date of construction, and floor area. Use k = 3.
Results
1. The first factor is highly loaded with being older than 65, and home ownership. The second factor covers university degrees and English speakers, with less emphasis on Malay ethnicity.
2. Many of the applicants are quite similar, with the whole group tending to fall into two or arguably three groups. If three groups are selected, there are a few obvious outliers.
3. Arguably 3 to 4 significant factors. There were 8 variables in the original analysis.
4. Two factors are sufficient to describe the data. Factor 1 is characterized by the average income and the percentage of English speakers, while Factor 2 is is characterized by the percentage of females.
5. Neighborhoods 3 and 11 score highly for Factor 1, which is correlated with income and English-speaking.
6. The first cluster is characterised by the presence of air conditioning, while the second focuses on high age score and slightly less high numbers of bedrooms. The third cluster is defined by large lot sizes.
7. The first component has all but the percent of renters loading highly, while the second component loads the percentage of renters. The illiteracy rate and percentage of the population with no school dominate the first component, so that it might be termed the education factor, while the second factor is exclusively the renter factor.
8. The rotated component matrix shows Factor 1 with strong loading of price, number of bedrooms, date of construction, while Factor 2 is loaded with the region and floor area. These two factors have eigenvalues above one and cumultaively explain 69.3% of the variance.
9. The three clusters are each quite large, with the smallest cluster containing 103 records. Each of the variables is significant, and the final clusters are characterized by the following the first cluster tends to contain relatively older houses, which also tend to be cheaper. The second cluster contains relatively expensive, large houses with many bedrooms. The third cluster consists of newer houses, which are slightly smaller than the average.