In Spring 2013, I had the privilege to take a course by a professor who got his PhD at UC Berkeley in Biostatistics and who’d been developing his own textbook. I had the opportunity to learn from the material he’d been preparing for such textbook last year, so I thought it’d be worthwhile sharing the valuable knowledge I’ve accumulated in a short Statistics series.
Statistics refers to the use of data analysis to solve management problems for effective decision making supported by the conclusions drawn from empirical evidence. In Statistics, we must collect and organize the data, summarize and analyze the results, in order to interpret and present the numerical data. In this lesson, I’ll cover the collection and organization of the data.
The data may be collected by two demographics, the population which isn’t practically feasible due to the cost restraint of time and resources so the data is usually collected by a sample of the population instead. The data can be derived from empirical evidence or anecdotal evidence, however the use of empirical evidence is most valuable since it’s more quantifiable. The data then could be a Qualitative or Quantitative variable: Qualitative Variables are the characteristic studied that is nonnumerical, for example gender, affiliation or brand; Quantitative Variables are the information reported that is numerical, for example number of people, miles per gallon or amount of money. Once we gather this data, we may then organize it in preparation for analyzing our results.
There are four levels of measurement of data with increasing degrees of statistical sophistication: Nominal, Ordinal, Interval, and Ratio. The level of measurement of the data determines applicable calculations and statistical tests that may be applied. Nominal level data are observations of qualitative data that are classified into categories and may be counted. There are no particular order to the variables, and examples include the Qualitative variables of gender, affiliation or brands. Ordinal level data are observations of data that are classified into categories and may be counted. Data classifications are represented by sets of labels that have relative values and may be ranked or arranged in some order, examples include a rank in class, a student or instructor evaluations. Interval level data are classifications that are ordered according to the amount of the characteristic they possess. No natural zero point and zero may be a valid characteristic, and the ration between numbers is not valid and is meaningless; examples include dress sizes, Fahrenheit temperature, etc. Ratio level data is a common Quantitative variable, they’re inherent to a natural zero starting point and zero is the absence of the characteristic of the variable. The ratio between numbers is valid and meaningful, as examples include height, weight, or number of students, etc.
Once we understand the significance of our collected data, we’d then organize our data into tables. Frequency tables are among the most common, which represent a grouping of qualitative data into mutually exclusive classes showing the number of observations in each class. This kind of table is perfect for Nominal level data, where the statistician may tally the amount of females and males there are in a sample. Ordinal level data may also be organized into their given categories and counted, where the statistician may tally how many students there are in a given class. Interval and Ratio level data however, may be represented better by charts such as scatter diagrams and regression analysis which I’ll cover in the next lesson regarding summarizing and analyzing the data.
Image:http://www.skylinetradeshowtips.com/wp-content/uploads/2011/03/Statistics-made-up-for-trade-shows.jpg
