Statistics refers to the use of data analysis to solve management problems for effective decision making supported by the conclusions drawn from empirical evidence. In Statistics, we must collect and organize the data, summarize and analyze the results, in order to interpret and present the numerical data. In the last lesson, I covered the collection and organization of the data. In this lesson, I’ll cover summarizing and analyzing the results.
I’m sure sometime throughout your public education, you’ve studied arithmetic mean, median and mode of data. In statistics, we take that summarized data further by analyzing it’s statistical significance. For those who don’t remember let me recap, The Mean is our traditional average of data, the Median is the middle value in our list of data, and the Mode is the list of numbers that occur most frequent. One method of analyzing is called the Central Location or Central Tendency, which describes the center or middle of data. Data can also be analyzed by a weighted formula, for example the weighted mean would be a set of numbers with corresponding weights.
When analyzing the Mean, Median or Mode skewness may occur, in which the relative positions of such data favor the left or right side of the frequency chart. This skewness can be measured by its Variance, that describes the dispersion or spread of data. A good way to think of this variation would be visualizing a wave length frequency, normally if ranges are equal the more disperse the data the more frequent the wave lengths occur and the more condensed the data results in larger wave lengths to occur. Although, sometimes the data presents itself symmetrically in a normal behavior curve which represents a bell-shaped curve as shown below.
A very popular way to analyze data of variation proportions would be by the method of the bell-shaped curve. This method is common when referring to grading students in a course, that would scale student’s grades in relation to each other’s performance. The data would be dispersed throughout the chart, as the left half of the chart represents 50% and the second half of the chart represents the other 50%. The Variance of the data spreads throughout the chart, and each +/- sd represents the standard deviation which measures the amount of dispersion from the average or arithmetic mean. For example, in a teacher’s grade book the A students would be farthest to the right.
Another common way to analyze data for statistical significance would be by correlation, through Scatter Diagrams. Correlation Analysis is the study of the relationship between variables, which techniques measure the numerical association between two or more variables. A relevant example to student’s grades would be whether there is a correlation between study time and GPA. Study time would be the independent variable on the X-axis and GPA would be the dependent variable on the Y-axis in such a case. A more proper way to measure such significance would be by Pearson’s Coefficient of Correlation, which is the measure of strength and direction of the relationship between two variables. A Coefficient of zero would represent no correlation, negative one would represent negative correlation and positive one would represent a positive correlation in favor or disfavor of the statistical analysis.
A more advanced method of analyzing statistical data would be by Regression Analysis, which is essentially an equation that expresses the linear relationship between two variables to estimate on variable on the basis of another. A good example would be predicting and estimating the trend in housing prices, and the slope of the regression line would determine the trend of such future housing prices. In an example such as this however, it is important to have gathered as much data precedent to the estimate. To graph this example, square footage could be the independent variable on the X-axis and the house price would be the dependent variable on the Y-axis. The following regression analysis would look similar to the graph below, and the trend line would be the estimated prediction.
In the next and final lesson of this Statistics series, I’ll cover interpretation and presentation of numerical data.
Image:http://www.bio.miami.edu/ecosummer/eco2012/pix/statistics.jpg


