Data Collection: Pitfalls & Benefits of Statistical Qualifications.

Data Collection: Pitfalls to Avoid and Benefits of Leveraging Statistical Qualifications and Advance

Answered

Pitfalls to Avoid When Collecting Data

1. Failing to involve the proper Statistical Qualifications

2. Overusing anecdotal sampling

3. Not leveraging interval estimation

4. Maintaining a rigid ”census-taking mentality”

5. Failing to obtain a representative sample

6. Not learning about measurement error

7. Withholding data acquisition for unreasonable proof that it will add value

8. Denying the nonresponse problem

9. Not taking advantage of more advanced designs that will lower costs and add capabilities

10. Failing to statistically validate results from observational data

The economic benefits and the avoidance of pitfalls justify delegating data collection to quant specialists, who are typically expert leaders possessing more specialized training in DoS or DoE. They can deduce the most efficient designs, run smarter simulations, and find the minimum sample sizes, which are just as likely to be smaller than expected.

DoS Versus DoE: Both apply randomization to collect representative data, which can be leveraged for esti- mation. At the aggressive end of both techniques, they require involvement and then delegation to an expect leader or a quant specialist (sub-subject matter expert) with that particular training. At the advanced level, executing DoS or DoE requires specialized training within the specialized training of statistics.

The difference between these two data collection techniques is that DoE facilitates influencing the infor- mation collected. Sampling techniques measure a representative subset of observations in order to estimate characteristics of the whole population. Experimentation measures a set of observations under what-if sce- narios created by manipulating some explanatory variables called factors. By observing the corresponding effects due to these manipulations, we can infer cause-and-effect relationships.

DoS randomly selects sampling units (SUs) to measure a multidimensionallandscape as it naturally ap- pears. DoE is a Rubik's cube twist beyond observation alone. That twist is viewing possible landscapes created by intervention; we assign treatments to experimental units (EUs). Here each treatment level has a poten tially different effect on the response and each articulates one possible landscape. DoS is like DoE with only a control group.

1. TheTe is some expectation that an estimate should be “close” to the true value, which we might later observe. In the fullness of time, we may lose credibility when point estimates repeatedly “fail” to be close enough. There is the risk in misunderstanding that the most likely value, which might be repre- sented by a point estimate, is not necessarily likely.

2. TheTe is greater finality with an interval estimate. A point estimate might appease some business need without solving it. We often return for another point estimate because the first one failed to keep the problem solved.

3. Multiple decisions suppose we want to choose between competing choices or to track choices over time. Interval estimates allow fuller comparisons.

4. The width of the interval is excellent for comparing estimates and visualizing the information.

5. For advanced business problems, we will combine estimators into equations or base them conditionally upon one another. For those applications, confidence intervals are extremely helpful for simplifying the results just not the calculations.

Data are measurements on observations. As such they are the product of measurement devices, such as a yardstick or a thermometer. The accuracy of the measurements is relative to the measurement device and the data storage.O Hence, continuous data are akeady rounded to the maximum precision that the measuring device and data storage can support. This is not a problem as long as we can meet our accuracy needs.

There are several types of measurement devices including: questionnaires, direct observation, yardsticks, thermometers, machines monitoring an assembly line, and so on. Sometimes people are a functional part of the measurement devices, and other times data entry software plays this role.

Every functional part of mea- surement merits scrutiny. There was a large litigation between two major corporations that went badly for the plaintiff because the data entry software restricted the precision of the information. Hence, the plaintiff’s data could not connect the defendant's product to particular failures. In understanding the measurement device, we must reflect upon the business context. In practice, we are aggressive about pursuing the business contextual information. Missing values can represent a failure to measure, and other times they really are the measurements. For example, if amount of loan loss is missing, it could be because the loss amount was not entered into the computer or because the loan never defaulted. In the latter case, the value can be thought of as either zero or as “does not apply,” whichever is more appropriate for pursuing the business objective.

Get instant help from 5000+ experts for