Monday, September 9, 2013

Organize/Sort large data Part 1 - Using Pivot Table in Excel

While I was doing my Masters in Financial Engineering, one of the things I have to do is to deal with large data containing dates, trading symbols, prices and volume etc.. One of the task is to only extract information related to a particular symbol. For example, this file contains historical data of a few trading symbols and I only want to extract data pertaining to a few trading symbols. For a reasonably small data sets (~500k), you can use the Pivot Table function in Microsoft Excel.

Step 1.
Insert a Pivot Table by going to Insert -> Pivot Table.


Select the range of data in the spreadsheet you would like to organize/sort and choose if you would like to create this table in a new worksheet or the existing one. Normally I would choose to create a new worksheet to make things neater. Of course if you keep creating pivot table, your workbook will become messy so do take note of how you want to create your pivot table. In this case, I chose to create a new worksheet with the data from "Sheet1!$A$1:$G$493468". Upon pressing OK, you will get an empty pivot table which you have to customize to your needs.

Step 2.

Noticed that in Excel, the first row is automatically used as the label. So right now you just need to know what information you need and pull them to the correct area. When you check the fields in the "PivotTable Field List" you need you will notice these fields are added to the area below it. Excel does the adding automatically but it may not be what you want. So do make sure the correct label goes to the correct row/column labels.

In this case, I would want the following setting:
  1.  "dates" to be at the row
  2. Trading symbols or in this case it is also called "Ticker" at the column
Next you need to pull in the data you need from the various Tickers e.g. in this case I use Price or "PRC" pull it into the "Value" field. Notice that by default, the values are taken using count under the value fields.you need to change it by left clicking on it ("Count of PRC") and change "Value Field Setting" to "Product".



After that you would have obtain all the unique trading symbols and arranged in dates. You can then subsequently choose to exclude symbols which you do not need by unchecking the trading symbols under "Column Label" in the worksheet, cell B3 in this case.

For those who did not managed to download the completed product, it can be downloaded here.

No comments:

Post a Comment