Text
Supported software
Anonymous quote of the day
"If I had one more day to live, I would live it in my statistics class --- it would seem so much longer."
The goals of this course
The following conversation is one that I have had many times when meeting someone for the first time:
The most important purpose of this course is to try to make sure you
never say something like that!
For many people, the word "statistics" elicits one (or more) of three impressions: sports statistics (interesting for the fan, but ultimately not of much real-world value), lots of dry numbers that are probably useful to the specialist (but thank goodness I'll never have to worry about them!), and carefully chosen and manipulated figures used by politicians to fool the electorate into voting for them (or against the other side). As is true of most generalizations, there is an element of truth to this point of view, but it unfortunately masks a much more important truth. Quantitative reasoning (sometimes called numeracy, in analogy with the word literacy) has become crucial for everyday living, and essential for the practice of business. Statistical reasoning and methodology provide the tools to become numerate.
The underlying principle of this course is that the world is full of randomness, and the only way to understand that randomness is to examine it systematically. We will talk about various statistical methodologies, and of course I hope that you know how to use these methodologies when the course is over. Even more importantly, however, I hope that you will know how to think about randomness, and about data. This is a very applied course - we will talk about applications of statistics in many different fields, both business-related and non-business-related. You will see many analyses of real data, and you will spend lots of time doing your own statistical analyses of real data using the computer and learning to interpret the results of those analyses. You will spend relatively little time learning (or memorizing) formulas. We will talk about practical issues in how to think about randomness, and will discuss some basic probability (the language of randomness). We will talk about practical issues in how to think about data, and will discuss graphical and methodological ways to highlight what is going on in data. Finally, we will discuss ways to summarize relationships in data using statistical models, and demonstrate the ability to highlight structure in data by doing so.
The idea that is ultimately most important to grasp in a class like this is not any specific methodology, but rather the principle of statistical inference. What is statistical inference? One definition is that it is probabilistic generalization from data. This simple phrase summarizes all of the reasons behind the structure of the course. First, probabilistic refers to recognizing and expressing the uncertainty that is part of any inference. This can only be done by knowing something about probability, which is the language of randomness, so we will spend some time talking about probability. Second, generalization refers to the point that we are usually interested in claims that go beyond the data that we have; we want to know about what we don't have yet, not what we already have. This depends on knowing how to apply statistical methods that are tuned to the questions at hand, but calculation is less important than understanding, since often statistical software can do the calculation work for us. Prediction, as opposed to description, is usually a big part of useful generalizations. Third, from data refers to the fact that we must be explicit about the evidence that is actually available to us. In statistics, context always matters; this is in direct contrast to mathematics, which is the study of objects independent of context (the rules and strategies of geometric or logical proofs are the same, no matter the context). Mathematics is about moving from the general to the specific (deduction), while statistics is about moving from the specific to the general (induction), and that is impossible without understanding the natures of both.
Administrative structure of class
The grade in this class will be based on a total of 260 points that can be earned, in the following ways:
The use of laptop/notebook computers will not be permitted during class. They are very distracting to other students, and the fact is that they are usually not being used for purposes related to class. If you wish to use a laptop computer to take notes during class, you must inform me in writing of your intention to do this (e-mail is fine), and then you must send me (via e-mail) a copy of the notes that you took in class that day within one hour of the end of each class.
The Stern Code of Conduct states that you commit to "Exercise integrity in all aspects of our academic work including, but not limited to, the preparation and completion of exams, papers and all other course requirements by not engaging in any method or means that provides an unfair advantage." Further, you commit to "Refrain from behaving in ways that knowingly support, assist, or in any way attempt to enable another person to engage in any violation of the Code of Conduct. Our support also includes reporting any observed violations of this Code of Conduct or other School and University policies that are deemed to have an adverse effect on the NYU Stern community." This applies to this class in the following specific ways (in addition to general prohibitions on cheating, plagiarism, and so on):
I strongly urge you to bring in to me examples of statistics,
probabilistic reasoning, and so on, that you see in newspapers or
magazines,
whether they are directly relevant to material being discussed in class
or not. Such material might very well then be incorporated into class
discussion.
Syllabus
I. Applied Probability
II. Statistical Inference
III. Regression Analysis