Last updated:
I've put together a small dataset of trades and quotes for select US equities. This is intended as a sample dataset for microstructure students. The programs and data are in my ftp directory.
The data are from the New York Stock Exchange's TAQ database. See my TAQAnalysis notes and the documentation on the NYSE's web site at http://www.nyxdata.com/Data-Products/Daily-TAQ.
The data cover twenty US stocks in the last quarter of 2010. The stocks were not randomly chosen. In terms of average daily share volume, they all lie in the next-to-lowest decile. The dataset was constructed on WRDS, using the program crspSelect.sas. (Although the final dataset is mostly TAQ, the selection process starts with crsp.) The source code for this program contains embedded documentation of the other selection criteria.
The data are available as a sas dataset (ctqall.sas7bdat) and as a csv file (ctqAll.csv). Most of the sample programs on this website are in SAS (see my usage notes on SAS and WRDS at SASonWRDS), but the csv file can be imported to many other analysis programs. If you're working in SAS on Stern's rnd node, you shouldn't have to copy these datasets. The sample programs (described below) already have library pointers set to access them directly.
The Stern Center for Research Computing website has a lot of useful Stern-specific information. If you're completely new to this, start with Connecting to SCRC Computers.
Here is a sample of the data (AMWD on Sep 1, 2010 at 9:34):
symbol permno date seqno time BBid BOfr price size ex cond corr Flag g127 AMWD 10501 20100901 83 9:34:01 15.63 15.86 . . . . . AMWD 10501 20100901 84 9:34:02 15.63 15.91 . . . . . AMWD 10501 20100901 85 9:34:03 15.63 15.86 . . . . . AMWD 10501 20100901 86 9:34:04 15.63 15.83 . . . . . AMWD 10501 20100901 87 9:34:06 15.64 15.87 . . . . . AMWD 10501 20100901 88 9:34:07 15.64 15.84 15.6400 467 D 0 0 0 AMWD 10501 20100901 89 9:34:24 15.65 15.91 . . . . . AMWD 10501 20100901 90 9:34:25 15.65 15.85 . . . . . AMWD 10501 20100901 91 9:34:27 15.66 15.79 . . . . . AMWD 10501 20100901 92 9:34:28 15.66 15.79 15.6600 100 D 0 0 0 AMWD 10501 20100901 93 9:34:28 15.66 15.79 15.6600 300 D 0 0 0 AMWD 10501 20100901 94 9:34:30 15.66 15.79 15.6600 100 D 0 0 0 AMWD 10501 20100901 95 9:34:35 15.66 15.79 15.6600 100 Z 0 0 0 AMWD 10501 20100901 96 9:34:54 15.67 15.79 . . . . . AMWD 10501 20100901 97 9:34:55 15.67 15.86 . . . . . AMWD 10501 20100901 98 9:34:56 15.68 15.88 . . . . . AMWD 10501 20100901 99 9:34:58 15.68 15.80 . . . . . AMWD 10501 20100901 100 9:34:59 15.68 15.80 15.6801 100 D 0 0 0
BBid and BOfr are the NBBO (National Best Bid and Offer) computed across all quoting venues as of the end of the indicated second. There is a record in the file every time the NBBO changes. The fields to the right refer to trades (if any).
The record counts for the twenty stocks are:
Cumulative Cumulative symbol Frequency Percent Frequency Percent ----------------------------------------------------------- ABL 3880 0.40 3880 0.40 ADEP 31839 3.30 35719 3.70 AMWD 168109 17.41 203828 21.11 ANGN 6310 0.65 210138 21.77 ARBX 44291 4.59 254429 26.36 BBGI 33184 3.44 287613 29.79 BITS 38829 4.02 326442 33.82 BNHNA 64032 6.63 390474 40.45 BSTC 74168 7.68 464642 48.13 BTUI 65149 6.75 529791 54.88 CHNR 61284 6.35 591075 61.23 COHN 20250 2.10 611325 63.33 CTEK 983 0.10 612308 63.43 CVV 65462 6.78 677770 70.21 DAIO 31255 3.24 709025 73.45 DFR 40703 4.22 749728 77.66 DHIL 132718 13.75 882446 91.41 DHRM 39912 4.13 922358 95.55 EBTX 37973 3.93 960331 99.48 ESSX 5016 0.52 965347 100.00
Everyone will analyze one security. (Your ticker symbol will arrive via email.) The project will be cumulative over the course, and you should assemble your results in a "lab notebook".
Subset the full dataset to obtain the data for your symbol only. Compute and plot the daily closing prices. Estimate descriptive statistics and autocorrelations.
The first assignment is based on the shell progam firstLook.sas (in my ftp directory on rnd) which uses the ticker symbol ESSX.
You should first copy this into your directory. Log into rnd and execute following cp (copy) command:
cp
/homedir/fin/fac/jhasbrou/public_html/ftp/phd2011Fall/firstLook.sas
Then modify the program to work for your ticker symbol. (Either edit the program using an rnd editor like pico, or download the program and edit it locally.) Run the program
sas firstLook
SAS should create three files: firstLook.log (a log file summarizing the run and any error messages, firstLook.lst (the listing file, which contains the useful output), and firstLook.rtf. The last file contains the high-resolution plot. To view it, download and edit in Microsoft Word.h
See the shell program analyzeTrades.sas (in the ftp directory). Modify it to use your ticker symbol and run it.
See the shell program dpRegression.sas (in the ftp directory).
Run the analysis for your ticker symbol. Extend the program to estimate a
generalized Roll model that includes a signed volume term, i.e., where the
efficient price increment is given by
. The price
change is then given by
. If your estimates are wild, try experimenting
with capping the volume at 1,000 or 10,000 shares.