Saturday, December 28, 2013

Natural Cubic Spline interpolation (Application)

It has been a long journey from Theory to Derivation and now we are onto application. With all the theories and equations we obtained, we will now apply them and conduct interpolation.
For (n+1) points we now have 3 generic equations
$\frac{3(y_{i+2}-y_{i+1})}{(x_{i+2}-x_{i+1})^2} +\frac{3(y_{i+1}-y_i)}{(x_{i+1}-x_i)^2} = \frac{k_i}{x_{i+1}-x_i }+k_{i+1}[\frac{2}{x_{i+1}-x_i} +\frac{2}{x_{i+2}-x_{i+1}}]+\frac{k_{i+2}}{x_{i+2}-x_{i+1}}$           —(1)
$3\frac{y_{n+1}-y_n}{(x_{n+1}-x_n)^2}= \frac{2k_{n+1}}{x_{n+1}-x_n}+ \frac{k_n}{(x_{n+1}-x_n)}$ —(2)
$3\frac{y_1-y_0}{(x_1-x_0)^2} = \frac{2k_0}{x_1-x_0}+\frac{k_1}{x_1-x_0}$  —(3)
For i=[0,n-2]

The unknow here is k which is $f_i'(x_i)=c_i$, we have also seen that the rest of $a_i,b_i, d_i$ are all known as follows:
$a_i=\frac{1}{3}[\frac{(k_{i+1}-k_i)}{(x_{i+1}-x_i)^2}-\frac{2b_i}{(x_{i+1}-x_i )}]$ ---(4)
$b_i= \frac{3(y_{i+1}-y_i )}{(x_{i+1}-x_i)^2} - \frac{k_{i+1}+2k_i}{(x_{i+1}-x_i)}$    ---(5)
$c_i=k_i$    ---(6)
$d_i = y_i$    ---(7)

What is left now is to find k and it can be found by applying equation (1) - (3) and put them into matrix form and solve k using simple linear algebra.

$Wk = F$
$k_{(n+1)x1}=[k_0,k_1,...,k_n]^T$
$F_{(n+1)x1} =[ f_0,f_1,...,f_n]^T$
Where $f_i$ are the Left Hand sides of (1)-(3) and W is the (n+1)x(n+1) matrix formed by the Right Hand side of (1)-(3).

By obtaining the inverse of W,

$k = W^{-1} F$

we can solve for k.

To do interpolation for a value x, follow the following algorithm:

Step 1:
Find the interval $[x_i,x_{i+1}]$ by which x belongs.

Step 2:
Apply (4)-(7) to obtain the cubic equation coefficient.

Step 3:
Plug in the coefficients into the equation:
$y(x) = a_i(x-x_i)^3 + b_i(x-x_i)^2+c_i(x-x_i)+d_i$

Done! Now you got an estimated value of y(x)!

Tuesday, December 24, 2013

Natural Cubic Spline interpolation (Derivation)

Recap that in our previous post, we introduce how cubic spline interpolation can be applied to a n+1 points to produce a smooth curve. There will be a total of n curves each described by:
$f_i (x) = a_i(x-x_i)^3 + b_i(x-x_i)^2 + c_i (x-x_i) + d_i$

The conditions for a natural cubic spline is given by the following:

$f_i (x_i) = a_i(x-x_i)^3 + b_i(x-x_i)^2 + c_i (x-x_i)+d_i = y_i$    (1)
$f_i (x_i) = f_{i+1}(x_i)$                                                                         (2)
$f_i '(x_i) = f_{i+1}'(x_i)=c_i$                                                                (3)
$f_i ''(x_{i+1}) = f_{i+1}''(x_{i+1})=2b_i$                                                      (4)
$f_0''(x_0) = 0$                                                                                   (5)
$f_{n-1}''(x_{n-1}) = 0$                                                                            (6)

From here we will derive the solution to conduct cubic spline interpolation. The end results will be as specified in Wikipedia. The idea is the same as when you try to solve simultaneous equations and place them into systems of linear equations and solve them using simple linear algebra.

From (3) we are going to express everything in $f_i '(x_i) = f_{i+1} '(x_i) = k_{i+1}$. 
First we obtain $a_i$ in terms of $b_i$ and $k_i$

$f_{i+1}'(x_i)=c_i$     
$c_{i+1}-c_i= 3a_i (x_(i+1)-x_i )^2+2b_i (x_(i+1)-x_i )$ 
$a_i=\frac{1}{3}[\frac{(k_{i+1}-k_i)}{(x_{i+1}-x_i)^2}-\frac{2b_i}{(x_{i+1}-x_i )}]$ (7)

From (4) we will get,
$f_{i+1}''(x_{i+1})=2b_{i+1}=6a_i(x_{i+1}-x_i)+2b_i$
$b_{i+1}-b_i= 3a_i (x_{i+1}-x_i )$                                                                             (8)

Next we will substitute (7) into (8) 
$b_{i+1}-b_i= 3*\frac{1}{3}[ \frac{k_(i+1)-k_i}{(x_{i+1}-x_i)^2} - \frac{2b_i}{(x_{i+1}-x_i )}]*(x_{i+1}-x_i ) $
$b_{i+1}-b_i= \frac{(k_{i+1}-k_i)}{(x_{i+1}-x_i )}-2b_i $
$b_{i+1}+b_i= \frac{(k_{i+1}-k_i)}{(x_{i+1}-x_i )}$                                                                    — (9)

Equation (9) will be used in one of the final substitution so let's just hang on to equation (9) first and we will come back to this later. Now we try to substitute (1) and (7) into (2) to express $b_i$ in terms of $k_i$:

$y_{i+1}= a_i (x_{i+1}-x_i )^3+b_i (x_{i+1}-x_i)^2+c_i (x_{i+1}-x_i )+d_i$

$y_{i+1}-y_i= \frac{1}{3}[ \frac{(k_{i+1}-k_i)}{(x_{i+1}-x_i)^2} -\frac{2b_i}{(x_{i+1}-x_i )}](x_{i+1}-x_i )^3+b_i (x_{i+1}-x_i)^2+k_i (x_{i+1}-x_i )$

$\frac{(y_{i+1}-y_i)}{(x_{i+1}-x_i)^2} = \frac{1}{3}[\frac{(k_{i+1}-k_i)}{(x_{i+1}-x_i}-2b_i ]+b_i+\frac{k_i}{(x_{i+1}-x_i )}$

$\frac{3(y_{i+1}-y_i )}{(x_{i+1}-x_i)^2} = [\frac{(k_{i+1}-k_i)}{(x_{i+1}-x_i)}-2b_i ]+3b_i+\frac{3k_i}{x_{i+1}-x_i } $

$b_i= \frac{3(y_{i+1}-y_i )}{(x_{i+1}-x_i)^2} -  \frac{(k_{i+1}-k_i)}{x_{i+1}-x_i}-\frac{3k_i}{(x_{i+1}-x_i )}$
$b_i= \frac{3(y_{i+1}-y_i )}{(x_{i+1}-x_i)^2} - \frac{k_{i+1}+2k_i}{(x_{i+1}-x_i)}$  —(10)

Finally substitute (10) into (9) to get a equation with only ks, ys and xs.
$\frac{3(y_{i+2}-y_{i+1} )}{(x_{i+2}-x_{i+1})^2} -  \frac{(k_{i+2}+2k_{i+1})}{(x_{i+2}-x_{i+1})}+\frac{3(y_{i+1}-y_i )}{(x_{i+1}-x_i)^2} - \frac{(k_{i+1}+2k_i)}{(x_{i+1}-x_i)}= \frac{k_{i+1}-k_i}{x_{i+1}-x_i}$

$\frac{3(y_{i+2}-y_{i+1})}{(x_{i+2}-x_{i+1})^2} +\frac{3(y_{i+1}-y_i )}{(x_{i+1}-x_i)^2} = \frac{k_{i+1}-k_i}{x_{i+1}-x_i}+ \frac{k_{i+2}+2k_{i+1}}{x_{i+2}-x_{i+1}}+ \frac{k_{i+1}+2k_i}{x_{i+1}-x_i}$

$\frac{3(y_{i+2}-y_{i+1})}{(x_{i+2}-x_{i+1})^2} +\frac{3(y_{i+1}-y_i}{(x_{i+1}-x_i)^2} = \frac{k_i}{x_{i+1}-x_i }+k_{i+1}[\frac{2}{x_{i+1}-x_i} +\frac{2}{x_{i+2}-x_{i+1}}]+\frac{k_{i+2}}{x_{i+2}-x_{i+1}}$ —(11)

With this equation we have exactly (n-1) such equations for i=[0,n-2] and we have (n+1) unknown. By using the 2 other equation (5) and (6), we will obtain the other 2 equations.

Therefore, subsitute i=n and (10 )into (4) 
$0= 6a_n (x_{n+1}-x_n )+2b_n$
$0= 2[\frac{(k_{n+1}-k_n)}{(x_{n+1}-x_n)^2} -\frac{2b_n}{x_{n+1}-x_n }](x_{n+1}-x_n )+2b_n$
$b_n   =\frac{k_{n+1}-k_n}{x_{n+1}-x_n }$
$3\frac{y_{n+1}-y_n }{(x_{n+1}-x_n)^2} = \frac{k_{n+1}-k_n}{x_{n+1}-x_n }+
\frac{k_{n+1}+2k_n}{x_{n+1}-x_n}$
$3\frac{y_{n+1}-y_n}{(x_{n+1}-x_n)^2}= \frac{2k_{n+1}}{x_{n+1}-x_n}+ \frac{k_n}{(x_{n+1}-x_n)}$ —(12)

Substitute i=0 into (10),
$b_0= \frac{3(y_1-y_0 )}{(x_1-x_0)^2} - \frac{k_1+2k_0}{(x_i-x_0)}=0$
$3\frac{y_1-y_0}{(x_1-x_0)^2} = \frac{k_1+2k_0}{x_1-x_0}$
$3\frac{y_1-y_0}{(x_1-x_0)^2} = \frac{2k_0}{x_1-x_0}+\frac{k_1}{x_1-x_0}$  —(13)


Equations 11-13 is the exact equations you will see in Wikipedia. Now the obvious question is "How are we going to use all these equations we derived to do Cubic Spline Interpolation?". This shall be reviewed in the upcoming post.

Friday, December 20, 2013

Natural Cubic Spline interpolation (Theory)

I believe many would be very familiar with linear interpolation where we fix 2 points and we draw a line in between these 2 point. So the points in between are defined by a linear equation. Not many would come across and use cubic interpolation. Here is a quick guide on how it works.

In cubic spline, the idea is the same, we have 2 points and we want to find a cubic equation to describe the points in between such that we can use it to estimate the intermediate values. Now let us consider 4 points as shown below.

In this case, we have 4 points and we will need to model 3 equations; if we have n+1 points we will need n such equations  Each of these equation will look like this:

$f_i (x) = a_i (x-x_i)^3 + b_i (x-x_i)^2 + c_i (x-x_i) + d_i$

Thus there are 4n unknown we need to find. Here are the 4n equations that will help us determine these unknowns:

We know that at each equation has to fix the points hence we have the following:

$f_i (x_i) = a_i (x-x_i)_i^3 + b_i(x-x_i)_i^2 + c_i (x-x_i)_i + d_i$
where i = [0,...,n-1]

This implies that we have, 

$d_i = y_i$           ---(1a)
$f_0 (x_0) = y_0$    ---(1b)

This will give us n+1 equations.

To ensure that the equations are continuous/connecting, we need to ensure the end of the one equation will match the start of the next equation. Hence,

$f_i (x_i) = f_{i+1}(x_i)$ ; ---(2)
This will give us n-1 equations.

We also would want the slope, first derivative,  at the connecting points to be continuous or the same hence,

$f_i '(x_i) = f_{i+1}'(x_i)=c_i$ ---(3)
where i = [1,...,n-1]

This will give us n-1 equations.

We also want the second derivative  to be continuous or the same at connecting points, hence,

$f_i''(x_i) = 2b_i$
$f_i ''(x_i) = f_{i+1}''(x_i)=2b_i$ -- (4)

This will give us another n-1 equations.

Hence, now we have in total 4n-2 equations and we still need 2 more.

There are a few types of cubic spline and one of the most commonly seen is the natural cubic spline. For this we specify the final 2 equations:

$f_0''(x_0) = 0$ -- (5)
$f_{n-1}''(x_{n-1}) = 0$ -- (6)

Now we have 4n equations and we can go ahead to solve those variables. With those variables,we can now do interpolation with the equations:

$f_i (x) = a_i (x-x_i)^3 + b_i(x-x_i)^2 + c_i (x-x_i) + d_i$

Next, hang on to this theory first while I place all these into a system of linear equations.
And solve them using Matrix Algebra.

Sunday, November 24, 2013

Reduce screen brightness in Ubuntu LTS 12.04.3

One of the things that kept bugging me was how the Ubuntu login screen would be so much brighter than I needed it to be. Coming directly from a black / purple background for the terminal and boot sequence, it would explode into colour and momentarily daze me.

It's almost part of my muscle memory now to tap the Fn-Brightness Down key combination until it reaches a more comfortable level at every startup. So I decided to take a different approach - find a command that allows me to issue the brightness level myself.

And it turns out that in Linux, there is a simple text file that controls the brightness of your backlight. For each laptop, it will differ, but I found mine using:

# find / -name brightness

That means, when logged in as the root user (the prompt typically shows a #), find files recursively starting from the root of the filesystem / with the -name brightness. This yielded the following files:

/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/backlight/acpi_video0/brightness /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-LVDS-1/intel_backlight/brightness /sys/devices/pci0000:00/0000:00:02.0/backlight/acpi_video1/brightness /sys/devices/pci0000:00/0000:00:1c.1/0000:03:00.0/leds/ath9k-phy0/brightness /sys/devices/pci0000:00/0000:00:1c.5/0000:07:00.0/leds/mmc0::/brightness

There we can immediately rule out the last two entries for leds since we are looking for entries to change screen brightness and not LEDs. Since the first three also involve backlight we can safely say we are on the right track. Then I changed to the pci0000:00/ directory to save a little typing*:

# cd /sys/devices/pci0000:00/ # cat 0000:00:01.0/0000:01:00.0/backlight/acpi_video0/brightness # cat 0000:00:02.0/drm/card0/card0-LVDS-1/intel_backlight/brightness # cat 0000:00:02.0/backlight/acpi_video0/brightness

Initially, I got 10 for both the acpi_video files, and 976 for the Intel one. This suggested that the Intel file offered finer control over the brightness settings than the other two, which seemed to be supported by the max_brightness files I found poking about in the same directories, which had values 10, 4882, 10 respectively.

So, while logged in as root, I tried:

# echo 0 > 0000:00:01.0/0000:01:00.0/backlight/acpi_video0/brightness

and that dimmed the screen somewhat. And trying the same for the Intel file turned off the screen completely, so I had to type** the last in the dark:

# echo 0 > 0000:00:02.0/drm/card0/card0-LVDS-1/intel_backlight/brightness # echo 200 > 0000:00:02.0/drm/card0/card0-LVDS-1/intel_backlight/brightness

And then I played around with the values for a bit to see what brightness values I would be comfortable with. Once I'd had enough, I decided to set up a shortcut so that I could adjust it without logging in as root. First, I made it writeable by all, then I set up a link to it in my home directory:

# chmod 666 0000:00:02.0/drm/card0/card0-LVDS-1/intel_backlight/brightness # ln -s /sys/devices/pci0000:00/0000:00:02.0/drm/card0/card0-LVDS-1/intel_backlight/brightness ~/brightness

Now I can adjust it to whatever value I want by issuing the following command within my home directory:

$ echo MY_PREFERRED_VALUE > brightness

Naturally, you might prefer to place the command in your startup scripts to use a preferred value automatically. That also means you don't have to change the permissions of that file, since root is active at startup. But that would need a bit more explaining, so I will leave it out of this post.

  1. * if tab completion works, it saves a lot of typing.
  2. ** here, press Up to retrieve the previous command, Home or Ctrl-A to go to the beginning, then replace 0 with 200 manually.

Sunday, October 20, 2013

Selectively sum substring value in a cell/row given condition on substring values in cell/row

Consider this problem, you have a row of strings that you would like to sum given that the string is in a certain format. For example in cell B5 to AF5 contains the following rows of data,

0.5AL,,,,1.0BL,1.0BL,1.0BL,1.0BL,1.0BL,1.0BL,1.0BL,1.0BL,1.0BL,1.0BL

and we would like to sum up all the ALs and BLs. The limitation here is to do the sum in 1 line i.e. to have the sum ready in just one cell and not to create additional rows. We will not be able to do it with Excel provided functions like sumif" and  "countif".  Typically, I would provide the explanation before the solutions. But in this case, the solutions will be given and explained in greater detail as to why Excel provided methods are not useful in this case.

SUM(IF(RIGHT(B5:AF5,2)="BL",1,0)*(VALUE(IF(LEFT(B5:AF5,3)="",0,(LEFT(B5:AF5,3))))))

Note that there are a few issues that complicates the problem:

  1. There are empty cells in this row.We could do a VALUE(LEFT(B5:AF5,3)) to get the numerical value. But we will get an error if the cell is empty. 
  2. The cells contain strings and not numbers. The typical "sumif" applies conditions and sum on the same range. This is not what we want because we want to condition on the substring instead of the whole string.
To solve issue #1, we use the "IF" statement to assign a 0 value if the cell is empty.
VALUE(IF(LEFT(B5:AF5,3)="",0,(LEFT(B5:AF5,3))))

This will give us the numerical values. To solve issue #2, we need to apply logical conditions on summing and this can be done by using
IF(RIGHT(B5:AF5,2)="BL",1,0)

This will give us the logical condition of 1 if the condition is met and 0 otherwise.

By multiplying the two we will have a range of values conditioned on the existance of the substring "AL" or "BL" and we can get the sum by adding them up. This made me interested on including this feature in functions.

I was asking around for alternative solutions here are some alternative solutions:

1.     An improved version can be done with the following formula:
 SUM(IF(RIGHT(B5:AF5,2)="BL","BL",VALUE(IF(LEFT(B5:AF5,3))),0))

2.     As pointed out by one of my friends, he suggested using SQL to do this. The pseudocode is as follows:

select substr(column, 3,2), sum(column, 0,3) from table group by substr(column, 3,2);







Sunday, September 29, 2013

Free Web Hosting using Google Doc/Drive

I was trying to publish some API and I realized the API format I used in my C++ codes are not compatible with the one used by GIT Wiki(Markdown). I did not really went on to see how I can do that on GIT( because I can sense how painful it will be to change all my format in my code). Hence, I went to find solutions to accommodate my current situations.

I found that Google Drive not only allows you to store information on the web, it also allows you to publish materials on the web for free! If you tried to visit the helps sites here and here,you will realize that there are many information that will confuse you as it had confused me. Here is what you need to do to publish your site.

Step 1.
I believe most would know how to do this since you are reading this.
Create a public folder accessible to the Public and upload all your web files (HTML, ...) to the folder you just created.

Step 2.
Find out the folder ID by going to https://drive.google.com/#folders.
Navigate to the public folder you created in Step 1.
Noticed that the address now looks something like this:

https://drive.google.com/#folders/0BzarOaYG8nb6QlJCRTl5MjJp (this link is invalid)

Copy "0BzarOaYG8nb6QlJCRTl5MjJp" and that is your Folder ID.

Step 3.

Now you need to identify your website address. Mine looks like this:

https://googledrive.com/host/0BzarOaYG8nb6UzBrT1NzUzJuSEU/

Have you already figured it out already?
Your address to your website is https://googledrive.com/host/<folder id>, hence in this case it will be:

https://googledrive.com/host/0BzarOaYG8nb6QlJCRTl5MjJp




Some other useful sites you can visit:
http://www.labnol.org/internet/host-website-on-google-drive/28178/
http://www.youtube.com/watch?v=fwyZqyGEPNk


Thursday, September 12, 2013

Python On Android

For Python Lovers out there, this might interest you. Many people have frowned over having to read through all the documentations for App development on Android. This will provide you a fast way to get what you want to run on your Android phone. Although it does not have full functionality of the development pack it will allow you to do simple automation that you might want to do.

Step 1.
You will need to install both Python for Android and SLA4 onto your Android phone. However, they are considered "unknown application" so you have to allow unknown application to install on your phone by checking the option at setting->security.

I shall not go through the installation as it is rather straight forward.

Step 2.
Next you are ready to code in Python!! Just start SLA4A application and you will see a list of sample scripts there.


The API for Python on Android can be found here and you can find some tutorial here. Also do note that similar to Python on PC, you will also be able to install packages and use it when it is available.

A few interesting things you will be able to do with this...

  1. Take photo/video
  2. Location discovery
  3. Bluetooth connectivity
  4. SMS
  5. Whatsapp (I have not tried this myself)

A sample example I did here is to use location discovery to get the coordinate of the current location. Using this coordinates, I launch Google Map and it will display your location.

Organize/Sort large data Part 2 - Extracting Data at Specific day of the Week.

In the previous post , I explained how large data can be organized by using Pivot Table. In this post, I will extend this example by describing to you how you can extract data that is on particular day of the week, e.g. Wednesday. If that day is a holiday we want it to take the value before or after that.

Step 1.

Choose the symbol of your liking, in this case "FRI" is used.
Copy and paste this filtered data obtained,  into a new worksheet, in this case, "Sheet3".

Step 2.

Next, noticed that the format of the date represented here is YYYYMMDD. In this format, Excel is not able to make out the date. Hence, you will need to translate the date into Excel readable date format. This steps would vary depending on the format used and you might need some creativity to translate it to excel readable format. In this example, we do truncation using "LEFT" and "RIGHT" function of Excel. Which obtain part of a string starting from the right or left. The following formula will enable us to do the translation:

DATE(LEFT(A2,4),LEFT(RIGHT(A2,4),2),RIGHT(A2,2))

Step 3.

Now Excel is able to read the date properly however, you need to extract the weekdays information and this can be done using:

 WEEKDAY(B4,1)

The first parameter correspond to the Date cell we converted in Step 2 and the second parameter of  allows you to choose different representation of weekdays. In this case, we chose 1, which means the start of the week as Sunday represented by 1 and the end of the week as Saturday represented by 7; Wednesday is represented by 4.


Explanation of what I will be doing, you may choose to skip if you just want to get to the results:
In the case where we are certain that we have data on every Wednesday, we will be able to use a simple method by using FILTERS or a simple IF statement "IF(C5=3,D5,)".

But the problem here is we could have an holiday on Wednesday and we will not be able to get any data for that week. I need to get the closest data to Wednesday. What I am doing here is to create a convention where I can identify the closest day to Wednesday in the event of a Holiday. By making Wednesday the start of the week, it will have the smallest digit which can be identified by comparing with the previous day.

You might like to notice that none of the option in the WEEKDAY() second parameter allows us to let us start the week on Wednesday. Hence, we need to use another method to remap the weekdays.


Step 4.

We need to remap the weekday as per my previous explanation. To do this I created a Lookup Table  in "Sheet2!$C$3:$D$7" and it looks like this:


This table translate the week such that it will start on Wednesday.

By using this table and the formula:

VLOOKUP(C2,Sheet2!$C$3:$D$7,2)

Create a new column using this formula and you would have recreated the new Weekday field by starting the week from Wednesday which is represented by the smallest number of the week, 1.

Step 5.

Noticed that the start of the week is the smallest number. From this Wednesday to the coming Tuesday, the new Weekday number will be in increasing order and it is only when there is a transition to the next week, then new weekday number will suddenly be smaller. By using this pattern, use a IF statement in a new column to filter out the starting day of the week:

IF(D6<D7,E6,"")

Hence, when there is a holiday on Wednesday, it will automatically take the next working day of the week as the start of the week. Now you just need to filter out the BLANKS.

VIOLA! You got the data you need!

The finished product can be obtained here.

Tuesday, September 10, 2013

.dll? That's .so Microsoft. (C)

I was thinking about DLLs from the previous blog by Ken and decided to look a little at how shared objects are made and used in Linux. Based on this answer to a StackOverflow question, I came up with this example in C.

Preparing the shared object

Say you want to write a function sumSquare(int, int) which takes two integers and returns the sum of their squares. Then you could code it simply as follows:

    int sumSquare(int first, int second) {
        return ((first * first) + (second * second));
    }

Save this file as, say, sumSquare.c. Then you could build the shared object with gcc like this:

    $ gcc -fPIC -Wall -shared -o sumSquare.so sumSquare.c

where -fPIC instructs gcc to produce position-independent code which is necessary for code that can be in different locations, -shared compiles the code into a shared object instead of an executable, and -o filename specifies the preferred output filename. If this last argument is not given, then gcc prefers to use a.out as the output filename. -Wall is just my personal preference to show all Warnings.

Now you have the shared object which you can include in your code. Yeah... how?

Including the shared object

Now I write a tiny piece of code that takes the sum of squares of two numbers - say, 3 and 4 - and prints them on screen.

    #include 
    #include 

    int main(int argc, char** argv) {

        void *handle;
        int (*ss)(int, int);
        char *error;

        handle = dlopen("./sumSquare.so", RTLD_LAZY);
        if (!handle) {
            fprintf(stderr, "%s\n", dlerror());
            return 1;
        }
        dlerror();

        ss = dlsym(handle, "sumSquare");
        if ((error = dlerror()) != NULL) {
            fprintf(stderr, "%s\n", error);
            return 2;
        }

        printf("3^2 + 4^2 = %d\n", (*ss)(3, 4));

        dlclose(handle);
        return 0;
    }

OK, that wasn't so tiny after all. But there is good reason for it. Let's look at it segment by segment: first, the includes. <stdio.h> is responsible for input and output, and you might have seen it before. Then there is <dlfcn.h>. What does this library do? It handles the communication with the shared object, providing calls like dlsym(), dlopen() and dlerror().

Next, the variables are declared. A void *handle, seemingly a pointer to nothing, is followed by a strange int (*ss) which takes two arguments and a char* error. The last is easy enough to understand, it's a 'string', a sequence of characters.

The next block provides a, yes, handle on the shared object. ./sumSquare.so means that the shared object should be in the same directory as the code. The if-block prints an error to STDERR and returns 1 if nothing is in handle. The final dlerror() silently drops any stray errors. Not really the best approach, as you might guess... but I'll leave it to you to improve on :)

Then ss is assigned the symbol "sumSquare", which as we saw before, is the name of the function we want. Here again is an if-block in case an error pops up. Here, dlerror() will be NULL unless something goes wrong, so we check for that.

All that was just making sure the function could be accessed. Finally, the call to ss is made with the requisite (here hard-coded) arguments as part of a printf statement.

At the end of the function, the handle for sumSquare.so is closed.

Here is the call to the compiler:

    $ gcc -Wall -o trivial -ldl code.c

-ldl is necessary for the library in <dlfcn.h> to be linked in. This makes all the dynamic loading possible.

That's it! You can take a look at sumSquare.c and code.c if you wish. I would recommend reading about the dlopen() calls though.

Monday, September 9, 2013

Organize/Sort large data Part 1 - Using Pivot Table in Excel

While I was doing my Masters in Financial Engineering, one of the things I have to do is to deal with large data containing dates, trading symbols, prices and volume etc.. One of the task is to only extract information related to a particular symbol. For example, this file contains historical data of a few trading symbols and I only want to extract data pertaining to a few trading symbols. For a reasonably small data sets (~500k), you can use the Pivot Table function in Microsoft Excel.

Step 1.
Insert a Pivot Table by going to Insert -> Pivot Table.


Select the range of data in the spreadsheet you would like to organize/sort and choose if you would like to create this table in a new worksheet or the existing one. Normally I would choose to create a new worksheet to make things neater. Of course if you keep creating pivot table, your workbook will become messy so do take note of how you want to create your pivot table. In this case, I chose to create a new worksheet with the data from "Sheet1!$A$1:$G$493468". Upon pressing OK, you will get an empty pivot table which you have to customize to your needs.

Step 2.

Noticed that in Excel, the first row is automatically used as the label. So right now you just need to know what information you need and pull them to the correct area. When you check the fields in the "PivotTable Field List" you need you will notice these fields are added to the area below it. Excel does the adding automatically but it may not be what you want. So do make sure the correct label goes to the correct row/column labels.

In this case, I would want the following setting:
  1.  "dates" to be at the row
  2. Trading symbols or in this case it is also called "Ticker" at the column
Next you need to pull in the data you need from the various Tickers e.g. in this case I use Price or "PRC" pull it into the "Value" field. Notice that by default, the values are taken using count under the value fields.you need to change it by left clicking on it ("Count of PRC") and change "Value Field Setting" to "Product".



After that you would have obtain all the unique trading symbols and arranged in dates. You can then subsequently choose to exclude symbols which you do not need by unchecking the trading symbols under "Column Label" in the worksheet, cell B3 in this case.

For those who did not managed to download the completed product, it can be downloaded here.

LibreOffice addons in C++... OK, OK, Basic

There are times when the built-in functions you have in your spreadsheet program just won't do. You feel you need to write your own special function for convenience or to stroke your ego. This entails adding functionality to an existing program.... how would you go about doing that?

This was really inspired by Ken's post about developing a DLL for Excel with C++. I thought I'd cover the open-source side somewhat less competently by sharing how to put together the same thing for OpenOffice/LibreOffice Calc. Just so you could compare the different approaches and I get to learn something about C++ and LibreOffice suites.

Step 1: Download Excel...

OK, here goes.

Step 1: Look up the C++ examples in the LibreOffice API. Pick a simple example, like Counter.

Step 2: Recoil in horror at the length of a "very simple" example. Even if you take into account the padding that the BSD licence gives, there are still 120-odd lines in counter.cxx and 50-ish in countermain.cxx. Then there's the .idl file ...

Step 3: Realise that I am doing a silly thing, trying to use a Windows convention outside Windows. In LibreOffice, it does not seem possible to compile shared objects (what Windows knows as DLLs) and bundle them for use with VBA. What I should have been doing is to instead define a function in LibreOffice Basic.

So here's the real step 1: Open the Basic-IDE using the menu sequence below.

Step 2Drill down to Standard.Module1 in My Macros and click New:

Step 3: Write the following function into the editor:

Function addition(a, b)
    addition = a + b
End Function

Be sure to remove the two existing procedures, Sub Main and Sub Macro1; writing functions inside procedures is not acceptable in StarBasic. Then Save the code and close the editor.

Step 4: Use the function as you would an inbuilt function: For instance, write =addition(A1, B1) in cell C1. You can check by changing the values of A1 and B1 that you have a new function you can use with any new spreadsheet on your computer.

Step 5: Optionally, if you want to share the file with others, it should be located in ~/.config/libreoffice/3/user/basic/Standard/Module1.xba. Copy that XML file as, say, Addition.xba and update script.xlb with the following line:

<library:element library:name="Addition"/>

LibreOffice should detect it upon loading.

As with Excel, changes to the code outside of LibreOffice (even deleting the script) will not be detected until you restart.

Yes, yes, it's a bit of a cop-out. I will look more into the C++ analogue or talk about writing a macro in Python soon.

Using C++ and VBA in Excel

One of the most powerful features in Microsoft Excel is the ability to use VBA to code modules and subroutines which can be used together with the spreadsheet. However, if you are someone who is more comfortable with C/C++, you might find VBA cumbersome. This post will describe to you how you can use C++ to create DLL which can be used with Excel with minimum VBA coding. Note that we are not using any class, we are just using the primitive ways to create a C++ function.

First create you need to decide the environment you wish to develop your C++ code. In this case, I start off by using Microsoft Visual Studio 2012.

Step 1.
Create a empty Win32 Visual C++ project and you will get to the Wizard page.








Remember to check the "DLL" radio button and "Empty Project".



Step 2.
Create a source file under the folder "Source" where you will write all your C++ codes.

For example, I created the addition function as follows:

/* Author: Ang Kian Liong

*/

double __stdcall addition(double & x, double & y){
return x+y;
}

Note that this function uses "__stdcall " which essentially tells the compiler to compile the codes in a certain way such that VBA can understand. For the more advance reader you can go to stackexchange or Wikipedia for more information on how it works technically. 

Step 3.
Now what you have done is define and describe a function in C++ that can be compiled in . However, in order for an external party to use the DLL, it needs to know what functions are there in the library and how it can access it. Therefore, you need to create a definition file as follows:

LIBRARY "ADD"
EXPORTS
addition

Save as a ".def" file in the same folder as your C++ source files. Access your linker properties, Project-><<Project name>> properties. Add the definition file under "Module Definition File".


Resolve any syntax error on your code and compile your code.

Step 4.
Next, we will explore how this can be invoked in Excel VBA. Open Excel , create a new Macro Enabled Excel Spreadsheet and press "ALT + F11" to get into VBA development environment. Insert a new module, in this case I name it "Module 1". To use the DLL we created earlier we need to declare in VBA the function that is we want to use from the DLL. We also need to specify the location of this DLL. For example,

Declare Function addition _
Lib "C:\Users\Ken\Documents\Visual Studio 2012\Projects\Win32Project2\Debug\Win32Project2.dll" (ByRef x As Double, ByRef y As Double) As Double

Now try to use the "addition" function in Excel and you will find that you can use it just like any other functions that come with Excel.

Enjoy! 

Here is the compiled DLL with Excel.

Note: 
1) If you are using the DLL in Excel, you will not be able to change and re-compile your code in C++ because ur Excel has already open and is using the DLL. You will get an error saying you cannot open the DLL file for overwriting. You will need to close your excel and compile the code before opening Excel again.

2)  Remember to enable macro when you open up the Excel Spreadsheet.

3) For those who tried to code C++ function without using passing by reference, you will notice that it will not work. I believe it is because Excel is using passing by reference by default and thus require all DLL to be written in the same fashion.