William Greene - Stern School of Business, NYU



Publications

	Some Recent Papers at Social Science Research Network

	Econometric Analysis, Prentice Hall, 8th Edition, 2017
	The standard source in Economics, Sociology, Political Science, Medical Research, Transport Research, and Environmental Economics, to name just a few, the seventh edition of Econometric Analysis provides a comprehensive survey of econometrics, with signifcant pedagogical work that will continue to serve as a modern, up-to-date text and reference for future practioners.
		Amazon.com
		Barnes & Noble.com
		Prentice Hall catalog

	LIMDEP and NLOGIT - econometric software
	LIMDEP has long been a leader in the field of econometric analysis. Recognized for years as the standard for the estimation of limited and qualitative dependent variable models, LIMDEP 10.0 is unsurpassed in the breadth and variety of its estimation models. No other program offers a wider range of regression, panel data, survival, frontier, discrete and count data, and single and multiple equation linear and nonlinear models - over 200 fully automated estimation models. With NLOGIT, LIMDEP is the only integrated econometrics package to include a program for FIML estimation of nested logit and discrete choice models, as an optional feature. Major enhancements to NLOGIT 5.0 include random parameters logit and alternatives to the multinomial logit model, such as multinomial probit, heteroscedastic extreme value, random parameters logit and covariance heterogeneity. New models and features are continuously added, enhancing the power of LIMDEP's analysis tools and keeping LIMDEP a true state-of-the-art program. No wonder LIMDEP is now used for teaching and research at thousands of sites in universities, government and private research institutions in the U.S. and throughout the world.

	Modeling Ordered Choices: A Primer and Recent Developments
	We survey the literature on models for ordered choices, including ordered logit and probit specifications. The contemporary form of the model is presented and analyzed in detail. The historical development of the model is presented as well. We detail a number of generalizations that have appeared in the recent literature. Finally, we propose a new form of the model that accommodates in a natural, internally consistent form, functional form flexibility and individual heterogeneity. Much of this study is pedagogical. However, the last few sections propose new model formulations, and illustrate them with an application to self reported health satisfaction.

	The Econometric Approach to Efficiency Analysis
	This chapter presents an overview of techniques for econometric analysis of technical (production) and economic (cost) efficiency. The stochastic frontier model of Aigner, Lovell, and Schmidt (1977) is now the standard econometric platform for this type of analysis. I survey the underlying models and econometric techniques that have been used in studying technical inefficiency in the stochastic frontier framework and present some of the recent developments in econometric methodology. Applications that illustrate some of the computations are presented in the final section.

	Discrete Choice Modeling
	We detail the basic theory for models of discrete choice. This encompasses methods of estimation and analysis of models with discrete dependent variables. Entry level theory is presented for the practitioner. We then describe a few of the recent, frontier developments in theory and practice.

	Censored Data and Truncated Distributions
	We detail the basic theory for regression models in which dependent variables are censored or underlying distributions are truncated. The model is extended to models for counts, sample selection models, and hazard models for duration data. Entry-level theory is presented for the practitioner. We then describe a few of the recent, frontier developments in theory and practice.

	Functional Form and Heterogeneity in Models for Count Data
	This study presents several extensions of the most familiar models for count data, the Poisson and negative binomial models. We develop an encompassing model for two well-known variants of the negative binomial model (the NB1 and NB2 forms). We then analyze some alternative approaches to the standard log gamma model for introducing heterogeneity into the loglinear conditional means for these models. The lognormal model provides a versatile alternative specification that is more flexible (and more natural) than the log gamma form, and provides a platform for several “two part” extensions, including zero inflation, hurdle, and sample selection models. (We briefly present some alternative approaches to modeling heterogeneity.) We also resolve some features in Hausman, Hall and Griliches (1984, Economic models for count data with an application to the patents–R&D relationship, Econometrica 52, 909–938) widely used panel data treatments for the Poisson and negative binomial models that appear to conflict with more familiar models of fixed and random effects. Finally, we consider a bivariate Poisson model that is also based on the lognormal heterogeneity model. Two recent applications have used this model. We suggest that the correlation estimated in their model frameworks is an ambiguous measure of the correlation of the variables of interest, and may substantially overstate it. We conclude with a detailed application of the proposed methods using the data employed in one of the two aforementioned bivariate Poisson studies.

	Functional Forms for the Negative Binomial Model for Count Data
	This note develops an encompassing model for two well known variants of the negative binomial model (the NB1 and NB2 forms). We conclude with an application of the proposed model using the data employed in a recent health care study.

	A Bivariate Latent Class Correlated Generalized Ordered Probit Model with an Application to Modeling Observed Obesity Levels
	Obesity is a major risk factor for several diseases including diabetes, heart disease and stroke. Increasing rates of obesity internationally are set to cost health systems increasing resources. In the US a conservative estimate puts re- sources already spent on obesity at $120 billion annually. Given scarce health care resources it is important that categorisation of the overweight and obese is accurate, such that health promotion and public health targeting can be as e¤ective as possible. To test the accuracy of current categorisation within the overweight and obese we extend the discrete data latent class literature by explicitly de.ning a latent variable for class membership as a function of both observables and unobservables, thereby allowing the equations de.ning class membership and observed outcomes to be correlated. The procedure is then applied to modeling observed obesity outcomes, based upon an underly- ing ordered probit equation. We .nd the standard boundaries for converting body mass index into categories may be inappropriate for individuals at the margin, which is then allowed for in estimation.

	Interpreting Estimated Parameters and Measuring Individual Heterogeneity in Random Coefficient Models
	Recent studies in econometrics and statistics include many applications of random parameter models. The underlying structural parameters in these models are often not directly informative about the statistical relationship of interest. As a result, standard significance tests of structural parameters in random parameter models do not necessarily indicate the presence or absence of a ‘significant’ relationship among the model variables. This note offers a suggestion on how to examine the results of estimation of a general form of random parameter model. We also extend results on computing individual level parameters in a random parameters setting and show how simulation based estimates of parameters in conditional distributions can be used to examine the influence of model covariates (marginal effects) at an individual level.

	Fixed Efects and Bias Due to the Incidental Parameters Problem in the Tobit Model
	The maximum likelihood estimator in nonlinear panel data models with fixed effects is widely understood (with a few exceptions) to be biased and inconsistent when T, the length of the panel, is small and fixed. However, there is surprisingly little theoretical or empirical evidence on the behavior of the estimator on which to base this conclusion. The received studies have focused almost exclusively on coefficient estimation in two binary choice models, the probit and logit models. In this note, we use Monte Carlo methods to examine the behavior of the MLE of the fixed effects tobit model. We find that the estimator’s behavior is quite unlike the estimators of the binary choice models. Among our findings are that the location coefficients in the tobit model, unlike those in the probit and logit models, are unaffected by the ‘incidental parameters problem.’ But, a surprising result related to the disturbance variance estimator emerges instead - the finite sample bias appears here rather than in the slopes. This has implications for estimation of marginal effects and asymptotic standard errors, which are also examined in this paper. The effects are also examined for the probit and truncated regression models, extending the range of received results in the first of these beyond the widely cited biases in the coefficient estimators.

	Distinguishing Between Heterogeneity and Inefficiency: Stochastic Frontier Analysis of the World Health Organization’s Panel Data on National Health Care Systems
	The most commonly used approaches to parametric (stochastic frontier) analysis of efficiency in panel data, notably the fixed and random effects models, fail to distinguish between cross individual heterogeneity and inefficiency. This blending of effects is particularly problematic in the World Health Organization’s (WHO) panel data set on health care delivery, which is a 191 country, five year panel. The wide variation in cultural and economic characteristics of the worldwide sample of countries produces a large amount of unmeasured heterogeneity in the data. Familiar approaches to inefficiency estimation mistakenly measure that heterogeneity as inefficiency. This study will examine a large number of recently developed alternative approaches to stochastic frontier analysis with panel data, and apply some of them to the WHO data. A more general, flexible model and several measured indicators of cross country heterogeneity are added to the analysis done by previous researchers. Results suggest that in these data, there is considerable evidence of heterogeneity that in other studies using the same data, has masqueraded as inefficiency. Our results differ substantially from those obtained by several earlier researchers.

	Fixed and Random Effects in Stochastic Frontier Models
	This paper examines extensions of the panel data stochastic frontier model that circumvent two important shortcomings of the existing fixed and random effects approaches. The conventional panel data stochastic frontier estimators both assume that technical or cost inefficiency is time invariant. In a lengthy panel, this is likely to be a particularly strong assumption. Second, as conventionally formulated, the fixed and random effects estimators force any time invariant cross unit heterogeneity into the same term that is being used to capture the inefficiency. Thus, measures of inefficiency in these models may be picking up heterogeneity in addition to or even instead of technical or cost inefficiency. In this paper, a true fixed effects model is extended to the stochastic frontier model using results that specifically employ the nonlinear specification. We find in passing as part of this analysis that accepted results related to the incidental parameters problem do not extend to the frontier model, and are actually misleading. The random effects model is then reformulated as a special case of the random parameters model that retains the fundamental structure of the stochastic frontier model. An alternative, semiparametric approach is also considered using a finite mixtures approach. The techniques are illustrated through two applications, a large panel from the U.S. banking industry and a cross country comparison of the efficiency of health care delivery. We find in the banking application that while familiar approaches to fixed and random effects and our new suggested methods both give similar answers to each other, the two methods overall give quite different results, raising (without answering) a number of questions as to what these effects models are actually measuring in the stochastic frontier context.

	Alternative Panel Data Estimators for Stochastic Frontier Models
	Received analyses based on stochastic frontier modeling with panel data have relied primarily on results from traditional linear fixed and random effects models. This paper examines several extensions of these models that employ nonlinear techniques. The fixed effects model is extended to the stochastic frontier model using results that specifically employ the nonlinear specification. Based on Monte Carlo results, we find that in spite of the well documented incidental parameters problem, the fixed effects estimator appears to be no less effective than traditional approaches in a correctly specified model. We then consider two additional approaches, the random parameters (or ‘multilevel’ or ‘hierarchical’) model and the latent class model. Both of these forms allow generalizations of the model beyond the familiar normal distribution framework.

	A Latent Class Model for Discrete Choice
	The multinomial logit model (MNL) has for many years provided the fundamental platform for the analysis of discrete choice. The basic model’s several shortcomings, most notably its inherent assumption of independence from irrelevant alternatives (IIA) have motivated researchers to develop a variety of alternative formulations. The mixed logit model stands as one of the most significant of these extensions. This paper proposes a semi-parametric extension of the MNL, based on the latent class formulation, which resembles the mixed logit model but which relaxes its requirement that the analyst makes specific assumptions about the distributions of parameters across individuals. An application of the model to the choice of long distance travel by three road types (2-lane, 4-lane without a median and 4- lane with a median) by car in New Zealand is used to compare the MNL latent class model with mixed logit.

	Behavior of the Fixed Effects Estimator in Nonlinear Models
	The nonlinear fixed effects models in econometrics has often been avoided for two reasons one practical, one methodological. The practical obstacle relates to the difficulty of estimating nonlinear models with possibly thousands of coefficients. In fact, in a large number of models of interest to practitioners, estimation of the fixed effects model is feasible even in panels with very large numbers of groups. The more difficult, methodological question centers on the incidental parameters problem that raises questions about the statistical properties of the estimator. There is very little empirical evidence on the behavior of the fixed effects estimator. In this note, we use Monte Carlo methods to examine the small sample bias in the binary probit and logit models, the ordered probit model, the tobit model, the Poisson regression model for count data and the exponential regression model for a nonnegative random variable. We find three results of note: A widely accepted result that suggests that the probit estimator is actually relatively well behaved appears to be incorrect. Perhaps to some surprise, the tobit model, unlike the others, appears largely to be unaffected by the incidental parameters problem, save for a surprising result related to the disturbance variance estimator. Third, as apparently unexamined previously, the estimated asymptotic estimators for fixed effects estimators appear uniformly to be downward biased.

	Convenient Estimators for the Panel Probit Model: Further Results
	Bertschek and Lechner (1998) propose several variants of a GMM estimator based on the period specific regression functions for the panel probit model. The analysis is motivated by the complexity of maximum likelihood estimation and the possibly excessive amount of time involved in maximum simulated likelihood estimation. But, for applications of the size considered in their study, full likelihood estimation is actually straightforward, and resort to GMM estimation for convenience is unnecessary. In this note, we reconsider maximum likelihood based estimation of their panel probit model then examine some extensions which can exploit the heterogeneity contained in their panel data set. Empirical results are obtained using the data set employed in the earlier study.

	Specification and Estimation of Models for Panel Data
	This paper surveys recently developed approaches to analyzing panel data with nonlinear models. We summarize a number of results on estimation of fixed and random effects models in nonlinear modeling frameworks such as discrete choice, count data, duration, censored data, sample selection, stochastic frontier and, generally, models that are nonlinear both in parameters and variables. We show that notwithstanding their methodological shortcomings, fixed effects are much more practical than heretofore reflected in the literature. For random effects models, we develop an extension of a random parameters model that has been used extensively, but only in the discrete choice literature. This model subsumes the random effects model, but is far more flexible and general, and overcomes some of the familiar shortcomings of the simple additive random effects model as usually formulated. Once again, the range of applications is extended beyond the familiar discrete choice setting. Finally, we draw together several strands of applications of a model that has taken a semiparametric approach to individual heterogeneity in panel data, the latent class model. A fairly straightforward extension is suggested that should make this more widely useable by practitioners. Many of the underlying results already appear in the literature, but, once again, the range of applications is smaller than it could be.

	Estimating Econometric Models with Fixed Effects
	The application of nonlinear fixed effects models in econometrics has often been avoided for two reasons, one methodological, one practical. The methodological question centers on an incidental parameters problem that raises questions about the statistical properties of the estimator. The practical one relates to the difficulty of estimating nonlinear models with possibly thousands of coefficients. This note will demonstrate that the second is, in fact, a nonissue, and that in a very large number of models of interest to practitioners, estimation of the fixed effects model is quite feasible even in panels with huge numbers of groups. The models are fully parametric, and all parameters of interest are estimable.

	The Mixed Logit Model, State of Practice (With David Hensher)
	The mixed logit model is considered to be the most promising state of the art discrete choice model currently available. Increasingly researchers and practitioners are estimating mixed logit models of various degrees of sophistication with mixtures of revealed preference and stated preference data. It is timely to review progress in model estimation since the learning curve is steep and the unwary are likely to fall into a chasm if not careful. These chasms are very deep indeed given the complexity of the mixed logit model. Although the theory is relatively clear, estimation and data issues are far from clear. Indeed there is a great deal of potential mis-inference consequent on trying to extract increased behavioural realism from data that are often not able to comply with the demands of mixed logit models. Possibly for the first time we now have an estimation method that requires extremely high quality data if the analyst wishes to take advantage of the extended behavioural capabilities of such models. This paper focuses on the new opportunities offered by mixed logit models and some issues to be aware of to avoid misuse of such advanced discrete choice methods by the practitioner.

	FIML Estimation of Selection Models for Count Data
	This paper presents an estimator for a model of sample selection for count data. The model is an extension of the standard sample selectivity treatment for the linear regression model. To develop the model, we first review some received results on unobserved heterogeneity in the Poisson regression model for count data. The model is then extended to encompass an endogenous sample selection mechanism. Previous papers have developed sequential, single equation, limited information estimation techniques. This paper presents a full information maximum likelihood (FIML) estimator for the model. Two techniques for computation of the sort of log-likelihood we analyze are described, simulation and numerical quadrature. An application to a problem in credit scoring is presented to illustrate the techniques.

	Marginal Effects in the Censored Regression Model
	We find that a well known result for marginal effects in the censored regression model with normally distributed disturbances applies more generally to any censored regression model in which the disturbances have a continuous distribution. The result suggests a comparison of the coefficients and marginal effects in alternative models, e.g., normal vs. logistic, that is qualitatively different from the familiar counterpart in the probit and logit models for binary choice.

	Specification and Estimation of Nested Logit Models
	The nested logit model is currently the preferred extension to the simple multinomial logit discrete choice model. The appeal of the nested logit model is its ability to accommodate differential degrees of interdependence (i.e. similarity) between subsets of alternatives in a choice set. The received literature displays a frequent lack of attention to the very precise form that a nested logit model must take to ensure that the resulting model is invariant to normalisation of scale and is consistent with utility maximisation. Some recent papers by Koppelman and Wen (1998a, 1998b) and Hunt (1998) have addressed some aspects of this issue, but some important points remain somewhat ambiguous. When utility function parameters have different implicit scales, imposing equality restrictions on common attributes associated with different alternatives (i.e. making them generic) can distort these differences in scale. Model scale parameters are then 'forced' to take up the real differences that should be handled via the utility function parameters. With many variations in model specification appearing in the literature, comparisons become difficult, if not impossible, without clear statements of the precise form of the nested logit model. There are a number of approaches to achieving this, with some or all of them available as options in commercially available software packages. This note seeks to clarify the issue, and to establish the points of similarity and dissimilarity of the different formulations that appear in the literature.

	Gender Economics Courses
	Burnett (1997) proposes a model of the joint determination of two binary choice variables, presence or absence of a gender economics course and presence or absence of a women's studies program. The econometric techniques used in estimation of her model are not consistent with the model, and will not produce consistent estimates of the parameters of the model. This note reestimates her bivariate probit model using maximum likelihood procedures. We also present some related results on specification and estimation of a model in which two binary variables are jointly (simultaneously) determined and on computation of marginal effects in a bivariate probit model.

	Simulated ML Estimation of a Gamma Frontier Model
	The normal-gamma stochastic frontier model was proposed in Greene (1990) and Beckers and Hammond (1987) as an extension of the normal-exponential proposed in the original derivations of the stochastic frontier by Aigner, Lovell, and Schmidt (1977). The normal-gamma model has the virtue of providing a richer and more flexible parameterization of the inefficiency distribution in the stochastic frontier model than either of the canonical forms, normal-half normal and normal-exponential. However, several attempts to operationalize the normal-gamma model have met with very limited success, as the log likelihood is possesed of a significant degree of complexity. This note will propose an alternative approach to estimation of this model based on the method of simulated maximum likelihood estimation as opposed to the received attempts which have approached the problem by direct maximization.