Regression and Multivariate Data Analysis
STAT-GB 2301 / STAT-UB 17
Jeffrey S. Simonoff
Office: KMC 8-54
Phone: (212) 998-0452
FAX: (212) 995-4003
IMPORTANT NOTE: This web page refers to the Regression and Multivariate Data Analysis class to be taught during the Fall 2021 semester.
This is a data-driven, applied statistics course focusing on the analysis of data using regression models. It emphasizes applications to the analysis of business and other data and makes extensive use of computer statistical packages. Topics include simple and multiple linear regression, residual analysis and other regression diagnostics, multicollinearity and model selection, autoregression, heteroscedasticity, regression models using categorical predictors, and logistic regression. All topics are illustrated on real data sets obtained from financial markets, market research studies, and other scientific inquiries. The goal of the class is that students begin to develop the skills to be able to collect, organize, analyze, and interpret regression data.
If you are a non-Stern NYU student, there are certain procedures that you must follow in order to register for the course. Please click here for details if you are a graduate student, and click here for details if you are an undergraduate student.
Samprit Chatterjee and Jeffrey S. Simonoff, Handbook of Regression Analysis With Applications in R, Second Edition, John Wiley and Sons (2020). [Highly recommended, but not required; you can do all of the work required for the class without it. In any event, I believe that it is a useful applied guide to have.]
The course grade will be based on homeworks/projects only. Grades will be determined based on a class-wide curve (that is, there will not be separate curves for undergraduates and graduate students). The course will be very heavily computer oriented; if you have not used a statistical package before, you may be in for some rough going. The "official" package for the course is Minitab, which is available online through the Stern network apps@Stern and for rental through the website http://www.onthehub.com (I highly recommend that you rent the package). You may use any package you wish, on any machine that you wish, as long as it performs the necessary calculations; any deficiencies on the part of the package are the responsibility of the student. I can provide additional support for Minitab and R, but not for SAS, SPSS, Stata, Systat, or STATISTICA (although these packages are able to perform all of the necessary modeling methods for this class). Excel will not be an acceptable tool for analyses in this class. I have put R code for the different handouts up on the web (click here) and on Brightspace.
Class will meet according to the Stern graduate calendar, not the standard university calendar. See the Stern website (http://www.stern.nyu.edu/portal-partners/registrar/academic-calendars) for details on the differences between the two calendars. Note, in particular, that the first day of class is Thursday, September 9, there is no class on Thursday, September 16, there is no class on Tuesday, October 12, and a class session will be held on Wednesday, December 15. There will also be a class session during Finals week. It is crucially important that all students review basic regression material before the first class. Please see the material under Required work before first class session below.
You will be responsible for obtaining your own data for the assignments. Do not merely take data from a textbook; obtain your data from original data sources. You will be required to provide complete source information for your data (a URL if the data come from the World Wide Web, or a photocopy of the appropriate page(s) if the data come from a printed source). Generally speaking, you will have roughly two weeks to complete each assignment. Assignments must be submitted through Brightspace; as part of the submission process they will be evaluated by the Turnitin Plagiarism Service.
At Stern we take pride in our well-rounded education and approach our academics with honesty and integrity. The Stern Code of Conduct states that you commit to "Exercise integrity in all aspects of our academic work including, but not limited to, the preparation and completion of exams, papers and all other course requirements by not engaging in any method or means that provides an unfair advantage." Further, you commit to "Refrain from behaving in ways that knowingly support, assist, or in any way attempt to enable another person to engage in any violation of the Code of Conduct. Our support also includes reporting any observed violations of this Code of Conduct or other School and University policies that are deemed to have an adverse effect on the NYU Stern community." This applies to this class in the following specific ways (in addition to general prohibitions on cheating, plagiarism, and so on):
Further details will be given in class. Violation of these conditions can lead to loss of all credit for the assignment involved at a minimum, with more severe sanctions possible after consultation with the Dean's Office.
A friendly piece of advice: don't hand in the assignments late! That is the quickest way to get in trouble in a course like this. An assignment is considered late if it is submitted after the specified due date and time. There will be progressively larger penalties for increasing amount of lateness of an assignment (1.5 points out of 10 up to one class late, 3 points up to two classes late, 6 points up to three classes late). No assignments will be accepted for credit more than three classes late. Work responsibilities will not be accepted as an excuse for lateness of an assignment; it is your responsibility to submit the assignment on time. If you have any questions about the grade you have received on a homework, you must raise it with me within one week of when the graded homework was made available to the class; no grading adjustments will be considered after that time. Don't wait until the last minute to do an assignment, as you might find that access to computing facilities is difficult or impossible (the network might be down, or your laptop's hard drive might crash); such lack of access will not be accepted as an excuse for lateness.
I have had complaints from students in the past regarding distractions caused by students using laptops in class. If you want to use a laptop in class for note taking, or to follow along with the discussion or statistical analyses done in class, I ask that you sit in the back of the classroom. Of course, surfing the web, answering e-mails, instant messaging, etc., are not appropriate uses of a laptop (or any electronic device) under any circumstances.
If you have a qualified disability and will require academic accommodation during this course, please contact the Moses Center for Student Accessibility (212-998-4980, firstname.lastname@example.org) and arrange for me to receive a letter from them verifying your registration and outlining the accommodations they recommend.
The final grade for the course will be based on the grades on the assigned homeworks only; there will be no opportunities for makeup or extra credit work, and an incomplete grade for the course will not be considered simply to make up assignments that were not done. You will not have the opportunity to resubmit a homework for regrading that has been corrected based on my comments under any circumstances. Thus, assignments for which you receive no credit will have a strong detrimental effect on your grade, and as few as two such assignments could result in a failing grade in the course. The actual curve used in the course will depend on the performance of the class, but in the past the cutoff for A grades (A and A-) has been roughly 8.5 (out of 10), while the cutoff for B grades (B+, B, and B-) has been roughly 7.5 (there is no guarantee that these cutoffs will apply this semester, however).
Most importantly - THIS COURSE IS LIKELY TO BE TIME-CONSUMING! If you're taking a particularly heavy course load this semester, or are going to be doing a lot of traveling (work-related, for example), this is probably not the course for you! Note in particular that because of the nature of the course, the assignments will come closer together in the second half of the semester.
I will be making slides and handouts available on Brightspace for the class sessions during the semester. You should make every effort not to miss classes, however, since the material covered in class will be far more relevant to you than is material in the textbook.
Prerequisite: Introductory statistics core course. More generally, the prerequisite is an introductory statistics class that includes discussion of descriptive statistics and univariate statistical inference (confidence intervals, prediction intervals, and hypothesis testing), and exposure to simple regression methods.> />
Required work before first class session: I will assume a basic understanding of the simple regression model from the beginning of the class. You should review this material from your introductory statistics course before the first class session. You should download, print out, and read the following handouts: Regression - the basics and Purchasing power parity - is it true?. You are responsible for all of the material in those handouts, although we will briefly discuss them in class. You should also download Homework 1 and answer all of the questions. The answers to these questions can be found here.
Chapters refer to the Chatterjee and Simonoff book. Corresponding class sessions given are only approximate.
1. Review of basic regression concepts - Chapter 1
2. Multiple regression - Chapter 1
4. Checking assumptions of regression - Chapters 1, 2, 3
5. Addressing violation of assumptions: choosing the correct predictors (model selection), autocorrelation - Chapters 2, 4, 5
6. Analysis of variance and covariance and nonconstant variance - Chapters 6, 7
7. Modeling group membership: logistic regression - Chapters 8, 9