Stern School of Business
Statistics and Data Analysis
 

Professor: William Greene, Departments of Economics and IOMS

BS Ohio State University, 1972 (Operations Research); MA Wisconsin, 1974 (Economics); PhD Wisconsin, 1976 (Econometrics); History: Cornell 1976-1982; Real world, 1982-1983; Return to ivory tower at Stern (then GBA) NYU, 1983-2008; Toyota Motor Corp Professor, 2007-. Publications: Articles – see vita on home page; Books: Modeling Ordered Choices, 2010, Econometric Analysis, 7th Ed (2011); Applied Choice Analysis (2006); Software, NLOGIT (www.nlogit.com), Editor in Chief, Foundations and Trends in Econometrics. Editor in Chief, Journal of Productivity Analysis, Associate Editor, Journal of Economic Education, Journal of Choice Modeling. Research interests: econometric methodology, discrete choice modeling, efficiency and productivity analysis, health economics, transportation, nonlinear estimation, entertainment and media.
Office:  MEC 7-90, Ph. 998-0876, Fax. 995-4218
e-mail: wgreene@stern.nyu.edu

Home Page: http://www.stern.nyu.edu/~wgreene

Abstract

This course has two broad objectives: (1) This course will provide students with an understanding of fundamental notions of data presentation and analysis. We will develop tools to enable students to use statistical thinking in the context of business problems. The course deals with modern methods of data exploration (partly to reveal unusual or problematic aspects of data sets), the uses and abuses of the basic techniques of statistical inference, and the use of linear regression as a tool for management and financial analysis. (2) There is randomness everywhere in life and in the environment. We will develop models of probability and random variables that help to understand the randomness of everyday life.

Prerequisites

I will assume that students are familiar with routine algebra, exponents and logarithms as well as graphical tools such as the slope and intercept of a straight line. Algebra will be used freely throughout the course. We may have a rare occasion to use a derivative, but the use of calculus will be sparing at most.

 

Course Requirements and Course Grades

 

Final grades for the course will be determined on the basis of the following components and weights:

 

*   Mid-term exam: 30%.

(Sample midterm questions) (2007 Midterm Exam with Solutions) (2008 Midterm Exam with Solutions)

*   Final exam: 40%.

(Sample problems for study for the final) (Notes for sample problems) (2007 Final Exam with Solutions) (2008 Final Exam with Solutions)

*   In class short (10 minutes) quizzes 10% (4 @ 2.5% ) 10% (in aggregate)

*   Model development project (details below) 5%. Students may work in groups of up to 5 and submit a single report for the group..

*   Homework assignments (details below): (6 @ 2.5%) 15% (in aggregate). Students may work in groups of up to 3 and submit a single report for the group.

 

Official policy at Stern mandates that grades in core classes follow a distribution in which no more than 35% of students receive A or A-.

Homework assignments are mandatory. Late submissions will be accepted only with a persuasive justification, but not after the solutions have been posted.  All examinations are open book, open notes, closed telephone, closed PDA, closed iPhone, closed iPad, closed Droid, closed Crackberry, closed laptop. Do bring a hand calculator to both exams. Links to copies of past exams appear below.

 

Honor Code:  Of course.

 

Course Materials

 

*    The recommended text for this course is Basic Statistical Ideas for Managers, 2nd edition by David Hildebrand, Lyman Ott and J. Brian Gray, Thomson Learning, 2005. This book is available from the Professional Bookstore on LaGuardia Place and from online booksellers such as Amazon. Note that the text is not required. We will not “follow” the text during the semester, and “reading assignments” in the outline below are recommended for background only. You may find the text useful for reinforcing the material we cover in class and as a source of many examples and applications.

 

*    Some useful notes on several subjects. These include the bare bones theoretical results and lots of examples. These are the “Notes…” indicated below.

 

Essential statistical results for finance

Essential regression related results for finance

Displaying data

Summary statistics

Basic probability

Combinations and permutations

Discrete random variables

The normal distribution

The basic linear regresssion model    

Building the regression model

Specification analysis of the regression model

Variable selection

Forming confidence Intervals

Testing hypotheses

 

*    The software for the course will be Minitab, release 16. Version 16 is available from the Professional Bookstore for about $100, but you can “rent” a copy for about $30 for the semester from the website, http://www.e-academy.com at the site, click on e-Store. (Introduction to Minitab) (How to use Minitab on the Stern Citrix Server)

Other Stuff

*    Please remember to turn off your cell phone before you come to class.

*    Please try to arrive early. Late entrances are disruptive.

*    As a general rule, laptops are an annoyance during class, particularly when you are checking your email, playing with Facebook, tweeting or watching YouTube videos while others are studying statistics. If you absolutely must use your laptop to take notes, please be respectful of the interests of your colleagues.

Course Outline and Schedule)

Materials: (Introductory notes for the course – Notes 0: Introduction – Right click to download

Session 1: Introduction to statistics; data description and presentation; types of data; Minitab.
Reading: Text:, Chapter 1 and Sections 2.1-2.6, Notes on displaying data.
Materials: (Slides for this session – Notes 1: Data Presentation.)      
Minitab Project Files:  (Basic Statistics)

Session 2: Sampling, Descriptive statistics (mean, median, mode, standard deviation), Covariance and correlation.
Reading: Text, Sections 2.2-2.6, Notes on summary statistics.
Materials: (Slides for this session – Notes 2: Descriptive Statistics.) (An article about nonrandom sampling and biased statistical analysis)
Minitab Project Files: (US Gasoline Market) (Basic Statistics)

Session 3: Probability, conditional and unconditional probability, independence, joint probability, Bayes Theorem.
Reading: Text, Sections 3.1-3.3, Notes on basic probability. Probabilities and the Gulf oil spill.
Materials
(Slides for this session – Notes 3: Probability.)   (Sample Probability Exercises)   (Some famous and fun problems in probability)    

Session 4: Probability and expected value, applications of expected value. 
Reading: Text, Sections 3.3, 4.3, Notes on basic probability.
Materials: (Slides for this session – Notes 4: Expected Value.) (Notes about credit default swaps)

Session 5: Random Variables.
Reading: Text, Section 4.1, 4.3
, Notes on discrete random variables..  
Materials: (Slides for this session – Notes 5: Random Variables.)

Session 6: Random variables, covariance and correlation.
Reading: Text Section 4.4 – 4.5
, Notes on discrete random variables
Materials: (Slides for this session – Notes 6: Bivariate Random Variables.)

Session 7: Discrete distributions, Bernoulli, binomial.
Reading:
Text Section 5.1, 5.2, Notes on discrete random variables, Notes on combinations and permutations.
Materials: (Slides for this session – Notes 7: Binomial and Poisson Distributions)

Session 8: Discrete distributions, binomial, hypergeometric, Poisson.
Reading: Text. Sections 5.2-5.3
, Notes on discrete random variables, Notes on combinations and permutations.
Materials:
(Slides for this session – Notes 8: Binomial and Hypergeometric Distributions) (A science experiment to produce Poisson outcomes)

Session 9: The normal distribution.
Readings: Text Section 5.4, Notes on the normal distribution

Materials: (Slides for this session – Notes 9: The Normal Distribution.) (Sample Problems) (What is the margin of error?)

 

Session 10: Samples and sampling distributions, normal distribution, large samples, law of large numbers, central limit theorem.
Materials: Text Sections 5.5, 6.1 – 6.3, Notes on the normal distribution
Materials: (Slides for this session – Notes 10: The Central Limit Theorem and the Law of Large Numbers.) (Random Walk Models for Stock Prices)
Minitab Project Files: (Cleared Calls) (WHO Data) (Basic Statistics)

Session 11: Central Limit Theorem, normal approximations, lognormality, random walk.
Reading: Text Sections 6.1 – 6.3, Notes on the normal distribution.
  
Materials: (Slides for this session – Notes 11: Normal Approximation and Random Walks.)   (Seminar: That U.S. 37th Ranking by WHO) (Notes for seminar) (Lognormal Random Walks for Stock Prices)
Minitab Project Files: (WHO Data)

Session 12: Linear regression.
Reading: Text Sections 11.1, 11.2, 11.4, Notes on the basic linear regression model.

Materials: (Slides for this session – Notes 12: Linear Regression.)    (A controversial regression study)    (Slides for an application of modeling)    (Handout for application) (Regression Analysis by WHO)
Minitab Project Files: (WHO Data) (Drug Couriers) (Trends in Frequent Flyers) (US Gasoline Market)

Session 13: Linear regression model, sample and population.
Reading: Text Section 11.2, Notes on the basic linear regression model.
Materials:
(Slides for this session – Notes 13: Regressions and Residuals.)
Minitab Project Files:
(Movie Success) (US Gasoline Market)

Session 14: Least squares linear regression, residual analysis, analysis of variance.
Reading: Text Section 11.4, 11.5, Notes on the basic linear regression model.

Materials: (Slides for this session – Notes 14: Regression Analysis.)
Minitab Project Files: (Movie Success) (WHO Data)

Session 15: Prediction, elasticity, functional form, correlation and covariation.
Reading: Text Section 11.5, Notes on the basic linear regression model.
 
Materials: (Slides for this session – Notes 15: Regression and Correlation.)
Minitab Project Files: (Sale Prices for Monet Paintings) (Trends in Frequent Flyers) (Cost Data for U.S. Utilities) (Salaries) (US Gasoline Market)

Session 16:  Prediction, regression to the mean, measurement error, truncation, selection.  
Reading: Notes on the basic linear regression model.
Materials: (Slides for this session – Notes 16: Specifying the Regression Model.)
Minita Project Files: (US Gasoline Market) (Sale Prices for Monet Paintings)

Session 17: Multiple regression,
Reading: Text Chapter 12, Notes on building the regression model
  
Materials: (Slides for this session – Notes 17: Multiple Regression Part 1)   (Multiple Regression with Minitab)
Minitab Project Files: (US Gasoline Market) (Sale Prices for Monet Paintings)

Session 18: Multiple regression
Reading: Text Chapter 12, Notes on building the regression model.

Materials: (Slides for this session – Notes 18: Multiple Regression Part 2)
Minitab Project Files: (US Gasoline Market) (Sale Prices for Monet Paintings) (WHO Data) (Computer Prices) (UK Electronics Stores) (House Prices)

Session 19: Multiple regression,
Reading: Text Chapter 12, 13, Notes on specification analysis of the regression model.

Materials:
(Slides for this session – Notes 19: Multiple Regression Part 3)
Minitab Project Files: (US Gasoline Market) (WHO Data) (Profits and R&D Data) (Movie Madness)

Session 20: Multiple regression
Reading:
Text Chapter 12, 13, Notes on specification analysis of the regression model.
Materials: (Slides for this session – Notes 20 Multiple Regression Part 4)
Minitab Project Files: (WHO Data) (Sale Prices for Monet Paintings) (Longley Data)

Session 21: Statistical inference, confidence intervals
Reading:
Text Chapter 7, 8, 9, Notes on forming confidence intervals.  
Materials: (Slides for this session – Notes 21: Statistical Inference.)  
Minitab Project Files: (Credit Application data)

Session 22: Statistical tests
Reading: Text Chapter 8, 9
, Notes on testing hypotheses.
Materials: (Slides for this session – Notes 22: Testing Hypotheses Part 1) 
Minitab Project Files: (Credit Application data)

Session 23: Statistical tests
Reading: Text Chapter 8, 9, Notes on testing hypotheses.
  
Materials: (Slides for this session – Notes 23; Testing Hypotheses Part 2.)
Minitab Project Files: (Credit Application data) (Econometrics Midterm Grades data)

Session 24: Hypothesis tests,
Reading: Text Chapter 8, 9, 11, Notes on testing hypotheses.
  
Materials: (Slides for this session – Notes 24: Hypothesis Testing) (Notes on the Power of a Test) (Tests about Variances)
Minitab Project Files: (Credit Application data) (US Gasoline Market) (Sale Prices for Monet Paintings) (German Health Survey Data) ( Sydney/Melbourne Travel Choice)

Session 25: Analyzing Qualitative Data 
Reading: None
Materials:  (Slides for this session – Notes 25: Analyzing Qualitative Data.) (A Study about Obesity) (The Netflix Prize)   
Minitab Project Files: (Credit Application data) (German Health Survey Data) ( Sydney/Melbourne Travel Choice)

Individual Problem Sets and Assignments

Students may work with one of their colleagues on these homework assignments and submit your assignment as a group. (Groups may have no more than two students.). All data sets for the assignments are linked below. You can left click to open them in Minitab, or right click to download them to your own computer.

Assignment 1.  Data Description and basic probability.  Problem set 1  Problem set 1 solutions
Data sets: (HOG-Ex0201.mpj) (HOG-Ex0202.mpj) (HOG-Ex0218.mpj) (HOG-Ex0222.mpj) (97employ.mpj) (WHO-HealthStudy.mpj

Assignment 2. Probability. Problem set 2  Problem set 2 solutions

Assignment 3.  Probability and Random Variables, Expected Value, Poisson and Hypergeometric Distributons, Normal Distribution. Problem set 3  Problem set 3 solutions
Data sets: (WHO-HealthStudy.mpj) (Easton.mpj) (salary.mpj) (Movies9OCT2003.mpj)  

Assignment 4.  Basic Regression. Problem set 4   Problem set 4 solutions  
Data sets: (WHO-HealthStudy.mpj) (EconGrades.mpj) (heating.mpj) (KansasCtyPopn.mpj)

Assignment 5.  Multiple Regression. Problem set 5  Problem set 5 solutions
Data sets: (UKElectronics.mpj) (GermanHealth.mpj) (MoreMoviemadness data.mpj) (Credit Application data)

Assignment 6.  Statistical Inference. Problem set 6  Problem set 6 solutions
Data sets: (German Health Survey Data) (Sale Prices for Monet Paintings)

Model Development Project

Notes on the model development project Data for Model Development (Minitab) (Excel)