Panel Data Econometrics

Panel Data Sets

Professor W. Greene
Department of Economics
Office: MEC 7-90
Ph: +1-212-998-0876
E-mail: wgreene@stern.nyu.edu
Home Page: https://pages.stern.nyu.edu/wgreene

prev0.gifReturn to course home page.

Notes: The following list points to a series of data sets.  We will use some of these in our class discussions.  A number of others are provided for students to analyze as part of their study of the topic.  The Penn World Tables are a major cross country data base online that provides a wealth of interesting data.  This can be accessed directly: The Penn World Tables

There are many other sources of data on the web.  One that is particularly rich is the archives of the Journal of Applied Econometrics: (Click here to visit)

Data below are provided in two formats: (1) The 'csv format' is a plain vanilla ascii text file containing the variable names at the top of the file followed by the variables, arranged neatly in the file and separated by commas. This is a portable file that is readable by any econometrics package. It will also import directly into Excel just by double clicking its name. (2) If you are using LIMDEP or NLOGIT, the .lpj project file can be imported directly into the program, as is. (The .csv file can as well.)

RAILROAD = Company ID reduced to remove gap at ID=21, values 1 to 49.
YEAR = Year (1985 to 1997).
NI = Number of year observations for firm, repeated in each observation.

STOPS = Number of stations on the network.

NETWORK = Length of railway network (m).

LABOREXP = Labor expenses in 1000 CHF.

STAFF = Number of employees.

ELECEXP = Electricity expenses in 1000 CHF.

KWH = Total consumed electricity (in 1000 kWh).

TOTCOST = Total cost (in 1000 CHF).

NARROW_T = Dummy for the networks with narrow track (1 m wide).

RACK = Dummy for the networks with RACK RAIL (cremaillere) in at least some part;

(used to maintain a slow movement of the train on high slopes).

TUNNEL = Dummy for networks that have tunnels with an average length of more than 300 meters.

VIRAGE = Dummy for the networks whose minimum radius of curvature is 100 meters or less.

CT = Total costs adjusted for inflation (1000 CHF).

Q1 = Total output in train kilometers.

Q2 = Total passenger output in passenger kilometers.

Q3 = Total goods output in ton kilometers.

PL = Labor price adjusted for inflation (in CHF per person per year).

PK = Capital price using the total number of seats as a proxy for capital stock (CHF per seat).

PE = Price of electricity (CHF per kWh).

LABOR = Quantity of labor.

ELEC = Quantity of energy.

CAPITAL = Quantity of Capital

Capital costs = TOTCOST- (LABOREXP + ELECEXP)

Inflation adjustment is done with respect to 1997 prices.

Logs of costs and prices (lnCT, lnPK, lnPL) are normalized by PE.

LNCT, LNPK, LNPL, LNQ1, LNQ2, LNQ3, LNSTOP, LNCAP, LNNET = logs of variables

 

Variables in the file are
id     = person identification

BRAND    = 1,2,3,4
CHOICE    = alternative chosen (0=no, 1=yes);
FASH    = dummy coded fashion (0=no, 1=yes);
QUAL    = dummy coded quality (0=low, 1=high);
PRICE    = price, level coded, .04, .08, .12, .16, .20;
PRICESQ    = price squared;
ASC4    = dummy variable for 4th choice (no brand);;
MALE    = dummy variable for male (0=no, 1=yes);
AGE25 = age less than 25 (0=no, 1=yes)
AGE39   = age 25 to 39 (0=no, 1=yes),
AGE40 = age greater than 40 (0=no, 1=yes)

 

The variables in the file are

ID = respondent ID number

CHOICE = binary indicator of the preferred alternative.

NTASK = number of tasks ranging from 8 to 12, repeated on each of the 4*NTASK records

PRICE = fixed price in cents per kilowatt hour

CNTLNGTH = contract length

LOCAL = dummy variable for local utility

KNOWN = dummy variable for well known company

TOD = dummy variable for time of day rates, 11 cents in day, 5 cents at night

SEAS = dummy variable for seasonal rates (10 cents summer, 8 cents winter, 6 cents spring/fall)

NUMREC = 4*NTASK

 

The data file is in two parts. The first file contains the panel of 17,919 observations on the Person ID and 4 time-varying variables. The second file contains time invariant variables for the individual or the 2,178 households. The data were downloaded from the Journal of Applied Econometrics archive website. The two data sets are merged in the .csv and .lpj files noted.

Variables in the file are
Time Varying

PERSONID = Person id (ranging from 1 to 2,178),

EDUC = Education,

LOGWAGE = Log of hourly wage,

POTEXPER = Potential experience,

TIMETRND = Time trend.
Time Invariant

ABILITY = Ability,

MOTHERED = Mother's education,

FATHERED = Father's education,

BRKNHOME = Dummy variable for residence in a broken home,

SIBLINGS = Number of siblings.

 

 

The Variables in the file are:

Firm = 1 to 729

Year = 1981 - 2001

Pat_any = 1 if firm had any patents in that year

LGSPILLT = lagged log of stock of tec weighted R&D (Jaffe distance)

LGSPILLS = lagged log of stock of sic weighted R&D (Jaffe distance)

LGMALSPI = lagged log of stock of tec weighted R&D (Mahalanobis distance)

LGMALSPT = lagged log of stock of sic weighted R&D (Mahalanobis distance)

LGRD1 = lagged log stock of R&D expenditures (coded -1 for missing)

LSALES1 = lagged log sales

LGRD1_DU = dummy variable indicates missing value of LGRD1

TI = Number of observations for firm i

T = year - 1981. 1 to 21

LGSTBAR = Firm mean of LGSPILLT

LGSPBAR = Firm mean of LGSPILLS

LGMSIBAR = Firm mean of LGMALSPI

LGMSTBAR = Firm mean of LGMALSPT

LSALEBAR = Firm mean of LSALES1

LGRD1BAR = Firm mean of LGRD1

 

Balanced panel, 90 counties and 7 years, 630 rows of data in total..

The last eight (constructed) variables are missing for 1981. Missing variables are denoted by a "." in the text file.

The 59 Variables in the file are:

1. county = county identifier

2. year = 81 to 87

3. crmrte = crimes committed per person

4. prbarr = 'probability' of arrest

5. prbconv = 'probability' of conviction

6. prbpris = 'probability' of prison sentenc

7. avgsen = avg. sentence, days

8. polpc = police per capita

9. density = people per sq. mile

10. taxpc = tax revenue per capita

11. west = 1 if in western N.C.

12. central = 1 if in central N.C.

13. urban = 1 if in SMSA

14. pctmin80 = perc. minority, 1980

15. wcon = weekly wage, construction

16. wtuc = wkly wge, trns, util, commun

17. wtrd = wkly wge, whlesle, retail trade

18. wfir = wkly wge, fin, ins, real est

19. wser = wkly wge, service industry

20. wmfg = wkly wge, manufacturing

21. wfed = wkly wge, fed employees

22. wsta = wkly wge, state employees

23. wloc = wkly wge, local gov emps

24. mix = offense mix: face-to-face/other

25. pctymle = percent young male

26. d82 = 1 if year = 82

27. d83 = 1 if year = 83

28. d84 = 1 if year = 84

29. d85 = 1 if year = 85

30. d86 = 1 if year = 86

31. d87 = 1 if year = 87

32. lcrmrte = log(crmrte)

33. lprbarr = log(prbarr)

34. lprbconv = log(prbconv)

35. lprbpris = log(prbpris)

36. lavgsen = log(avgsen)

37. lpolpc = log(polpc)

38. ldensity = log(density)

39. ltaxpc = log(taxpc)

40. lwcon = log(wcon)

41. lwtuc = log(wtuc)

42. lwtrd = log(wtrd)

43. lwfir = log(wfir)

44. lwser = log(wser)

45. lwmfg = log(wmfg)

46. lwfed = log(wfed)

47. lwsta = log(wsta)

48. lwloc = log(wloc)

49. lmix = log(mix)

50. lpctymle = log(pctymle)

51. lpctmin = log(pctmin)

52. clcrmrte = lcrmrte - lcrmrte[t-1]

53. clprbarr = lprbarr - lprbarr[t-1]

54. clprbcon = lprbconv - lprbconv[t-1]

55. clprbpri = lprbpri - lprbpri[t-1]

56. clavgsen = lavgsen - lavgsen[t-1]

57. clpolpc = lpolpc - lpolpc[t-1]

58. cltaxpc = ltaxpc - ltaxpc[t-1]

59. clmix = lmix - lmix[t-1]

 

 

The variables in the file are

IND = industry code

YEAR = year, 1977 to 1984

EMP = firm employment

WAGE = wage

CAP = capital

INDOUTPT = industry output

NI = log EMP

W = log WAGE

K = log CAP

YS = log IDOUTPT

REC = line number 1-1031

YEARM1 = lag of YEAR.

ID = firm id number

NL1 = lag 1 of N

NL2 = lag 2 of N

WL1 = lag 1 of W

WL2 = lag 2 of W

KL1 = lag 1 of K

KL2 = lag 2 of K

YSL1 = lag 1 of YS

YSL2 = lag 2 of YS

YR1976 to YR1984 are year dummy variables

 

 

prev0.gifReturn to course home page.