Market Data Analysis

Last updated:

I've put together a small dataset of trades and quotes for select US equities. This is intended as a sample dataset for microstructure students. The programs and data are in my ftp directory.

The data are from the New York Stock Exchange's TAQ database. See my TAQAnalysis notes and the documentation on the NYSE's web site at http://www.nyxdata.com/Data-Products/Daily-TAQ.

The data cover twenty US stocks in the last quarter of 2010. The stocks were not randomly chosen. In terms of average daily share volume, they all lie in the next-to-lowest decile. The dataset was constructed on WRDS, using the program crspSelect.sas. (Although the final dataset is mostly TAQ, the selection process starts with crsp.) The source code for this program contains embedded documentation of the other selection criteria.

The data are available as a sas dataset (ctqall.sas7bdat) and as a csv file (ctqAll.csv). Most of the sample programs on this website are in SAS (see my usage notes on SAS and WRDS at SASonWRDS), but the csv file can be imported to many other analysis programs. If you're working in SAS on Stern's rnd node, you shouldn't have to copy these datasets. The sample programs (described below) already have library pointers set to access them directly.

The Stern Center for Research Computing website has a lot of useful Stern-specific information. If you're completely new to this, start with Connecting to SCRC Computers.

Here is a sample of the data (AMWD on Sep 1, 2010 at 9:34):

symbol   permno       date   seqno    time      BBid    BOfr    price    size   ex   cond   corr   Flag   g127

 AMWD     10501   20100901     83    9:34:01   15.63   15.86     .          .                 .      .      . 
 AMWD     10501   20100901     84    9:34:02   15.63   15.91     .          .                 .      .      . 
 AMWD     10501   20100901     85    9:34:03   15.63   15.86     .          .                 .      .      . 
 AMWD     10501   20100901     86    9:34:04   15.63   15.83     .          .                 .      .      . 
 AMWD     10501   20100901     87    9:34:06   15.64   15.87     .          .                 .      .      . 
 AMWD     10501   20100901     88    9:34:07   15.64   15.84   15.6400    467   D             0      0      0 
 AMWD     10501   20100901     89    9:34:24   15.65   15.91     .          .                 .      .      . 
 AMWD     10501   20100901     90    9:34:25   15.65   15.85     .          .                 .      .      . 
 AMWD     10501   20100901     91    9:34:27   15.66   15.79     .          .                 .      .      . 
 AMWD     10501   20100901     92    9:34:28   15.66   15.79   15.6600    100   D             0      0      0 
 AMWD     10501   20100901     93    9:34:28   15.66   15.79   15.6600    300   D             0      0      0 
 AMWD     10501   20100901     94    9:34:30   15.66   15.79   15.6600    100   D             0      0      0 
 AMWD     10501   20100901     95    9:34:35   15.66   15.79   15.6600    100   Z             0      0      0 
 AMWD     10501   20100901     96    9:34:54   15.67   15.79     .          .                 .      .      . 
 AMWD     10501   20100901     97    9:34:55   15.67   15.86     .          .                 .      .      . 
 AMWD     10501   20100901     98    9:34:56   15.68   15.88     .          .                 .      .      . 
 AMWD     10501   20100901     99    9:34:58   15.68   15.80     .          .                 .      .      . 
 AMWD     10501   20100901    100    9:34:59   15.68   15.80   15.6801    100   D             0      0      0 

BBid and BOfr are the NBBO (National Best Bid and Offer) computed across all quoting venues as of the end of the indicated  second. There is a record in the file every time the NBBO changes. The fields to the right refer to trades (if any).

The record counts for the twenty stocks are:

                                   Cumulative    Cumulative
symbol    Frequency     Percent     Frequency      Percent
-----------------------------------------------------------
ABL           3880        0.40          3880         0.40  
ADEP         31839        3.30         35719         3.70  
AMWD        168109       17.41        203828        21.11  
ANGN          6310        0.65        210138        21.77  
ARBX         44291        4.59        254429        26.36  
BBGI         33184        3.44        287613        29.79  
BITS         38829        4.02        326442        33.82  
BNHNA        64032        6.63        390474        40.45  
BSTC         74168        7.68        464642        48.13  
BTUI         65149        6.75        529791        54.88  
CHNR         61284        6.35        591075        61.23  
COHN         20250        2.10        611325        63.33  
CTEK           983        0.10        612308        63.43  
CVV          65462        6.78        677770        70.21  
DAIO         31255        3.24        709025        73.45  
DFR          40703        4.22        749728        77.66  
DHIL        132718       13.75        882446        91.41  
DHRM         39912        4.13        922358        95.55  
EBTX         37973        3.93        960331        99.48  
ESSX          5016        0.52        965347       100.00  

The plan

Everyone will analyze one security. (Your ticker symbol will arrive via email.) The project will be cumulative over the course, and you should assemble your results in a "lab notebook".

Part 1 Preliminaries (due in class on Tuesday, September 15)

Subset the full dataset to obtain the data for your symbol only. Compute and plot the daily closing prices. Estimate descriptive statistics and autocorrelations.

The first assignment is based on the shell progam firstLook.sas (in my ftp directory on rnd) which uses the ticker symbol ESSX.

You should first copy this into your directory. Log into rnd and execute following cp (copy) command:

cp /homedir/fin/fac/jhasbrou/public_html/ftp/phd2011Fall/firstLook.sas .

Then modify the program to work for your ticker symbol. (Either edit the program using an rnd editor like pico, or download the program and edit it locally.) Run the program

sas firstLook

SAS should create three files: firstLook.log (a log file summarizing the run and any error messages, firstLook.lst (the listing file, which contains the useful output), and firstLook.rtf. The last file contains the high-resolution plot. To view it, download and edit in Microsoft Word.h

Part 2 Analysis of trades (due in class on Thursday, September 15).

See the shell program analyzeTrades.sas (in the ftp directory). Modify it to use your ticker symbol and run it.

Part 3 Analysis of trade/ dynamics (due in class on Tuesday, October 4)

See the shell program dpRegression.sas (in the  ftp directory). Run the analysis for your ticker symbol. Extend the program to estimate a generalized Roll model that includes a signed volume term, i.e., where the efficient price increment is given by . The price change is then given by
. If your estimates are wild, try experimenting with capping the volume at 1,000 or 10,000 shares.