Wednesday, 26 December 2012

Deep Neural Networks and Restricted Boltzmann Machines

The new version of the Accord.NET brings a nice addition for those working with machine learning and pattern recognition: Deep Neural Networks and Restricted Boltzmann Machines.









Class diagram for Deep Neural Networks in the Accord.Neuro namespace.




Deep neural networks have been listed as a recent breakthrough in signal and image processing applications, such as in speech recognition and visual object detection. However, is not the neural networks which are the new things here; but rather, the learning algorithms. Neural Networks have existed for decades, but previous learning algorithms were unsuitable to learn networks with more than one or two hidden layers.


But why more layers?



The Universal Approximation Theorem (Cybenko 1989; Hornik 1991) states that a standard multi-layer activation neural network with a single hidden layer is already capable of approximating any arbitrary real function with arbitrary precision. Why then create networks with more than one layer?





To reduce complexity. Networks with a single hidden layer may arbitrarily approximate any function, but they may require an exponential number of neurons to do so. We can borrow a more tactile example from the electronics field. Any boolean function can be expressed using only a single layer of AND, OR and NOT gates (or even only NAND gates). However, one would hardly use only this to fully design, let's say, a computer processor. Rather, specific behaviors would be modeled in logic blocks, and those blocks would then be combined to form more complex blocks until we create a all-compassing block implementing the entire CPU.





The use of several hidden layers is no different. By allowing more layers we allow the network to model more complex behavior with less activation neurons; futhermore the first layers of the network may specialize on detecting more specific structures to help in the later classification. Dimensionality reduction and feature extraction could have been performed directly inside the network on its first layers rather than using specific separate algorithms. 



Do computers dream of electric sheep?



The key insight in learning deep networks was to apply a pre-training algorithm which could be used to tune individual hidden layers separately. Each layer is learned separately without supervision. This means the layers are able to learn features without knowing their corresponding output label. This is known as a pre-training algorithm because, after all layers have been learned unsupervised, a final supervised algorithm is used to fine-tune the network to perform the specific classification task at hand.







As shown in the class diagram on top of this post, Deep Networks are simply cascades of Restricted Boltzmann Machines (RBMs). Each layer of the final network is created by connecting the hidden layers of each RBM as if they were hidden layers of a single activation neural network.



Now, the most interesting part about this approach will given now. It is about one specific detail on how the RBMs are learned, which in turn allows a very interesting use of the final networks. As each layer is a RBM learned using an unsupervised algorithm, they can be seen as standard generative models. And if they are generative, they can be used to reconstruct what they have learned. And by sequentially alternating computation and reconstruction steps initialized with a random observation vector, the networks may produce patterns which have been created using solely they inner knowledge about the concepts it has learned. This may be seen fantastically close to the concept of a dream.



--



At this point I would also like to invite you to watch the video linked above. And if you like what you see, I also invite you to download the latest version of the Accord.NET Framework and experiment with those newly added features.



The new release also includes k-dimensional trees, also known as kd-trees, which can be use to speed up nearest neighbor lookups in algorithms which need it. They are particularly useful in algorithms such as the mean shift algorithm for data clustering, which has been included as well; and in instance classification algorithms such as the k-nearest neighbors.

Tuesday, 2 October 2012

Accord.NET and .NET 3.5

Quite some time ago, Accord.NET had been upgraded to .NET 4.0. This upgrade was made to advantage of some of the unique platform features. However, this was a drawback for those users who were still bound to .NET 3.5 and would like to use (or continue using) the framework.



So, some weeks ago I did finish creating a compatibility layer providing .NET3.5 support for the newest version of the framework. While some features had to be left aside, most of the framework is currently functional. A few features, however, had to be removed. Those include the Accord.Audio modules,  support for CancellationTokens on processor-intensive learning algorithms, some parallel processing capabilities and some of the Expression Tree manipulation in some numeric optimization algorithms and in Decision Trees.



With time, some of those could be re-implemented to work correctly under 3.5 as needed. You can leave a request, if you would like. Stay tuned!

Friday, 21 September 2012

The Mistery of Failing Tests - Or how to implement IDisposable correctly

Some days ago my Visual Studio tests started failing randomly. Curiously, the same tests which failed in a previous run would succeed if ran again. There seemed to be nothing wrong with the tests, except for the message "The agent process was stopped while the test was running" appearing the Test Results window.









It turns out this message tends to appear whenever an object throws an exception inside its destructor.



Luckly, only about a dozen classes in Accord.NET do need to implement finalizers. After a few searches, it turned down the symptoms were being caused by a NullReferenceException being thrown in one of those classes.



The solution is to always implement the Disposable pattern correctly.




Disposing managed code correctly


To do so, first, implement IDisposable. Consider, for example, the Metronome class, which makes use of timers:







    /// <summary>
/// Virtual Metronome.
/// </summary>
///
/// <remarks>
/// Objects from this class acts as virtual metronomes. If connected
/// to a beat detector, it can be used to determine the tempo (in
/// beats per minute) of a signal. It can also be used in manual mode
/// by calling <see cref="Tap"/> method. For more details, see the
/// Beat detection sample application which comes together with the
/// framework.
/// </remarks>
///
public class Metronome : IDisposable
{
// ...

private Timer timeUp;
private Timer metronome;

// ...
}



Then, implement the required Dispose method, but do not write the dispose code right away. Instead, write:






        /// <summary>
/// Performs application-defined tasks associated with
/// freeing, releasing, or resetting unmanaged resources.
/// </summary>
///
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}

/// <summary>
/// Releases unmanaged and - optionally - managed resources
/// </summary>
///
/// <param name="disposing"><c>true</c> to release both managed
/// and unmanaged resources; <c>false</c> to release only unmanaged
/// resources.</param>
///
protected virtual void Dispose(bool disposing)
{
if (disposing)
{
// free managed resources
if (timeUp != null)
{
timeUp.Dispose();
timeUp = null;
}
if (metronome != null)
{
metronome.Dispose();
metronome = null;
}
}
}



Always check if a member is not null before disposing it. Alternatively, after disposing, always set it to null so you can't accidentaly dispose it twice. The GC.SuppressFinalize instructs the Garbage Collector that the object has already been disposed, so it won't be required to call its finalizer. The GC will be trusting you have already cleaned the room; so don't fail on it!



Finally, write the finalizer as:







        /// <summary>
/// Releases unmanaged resources and performs other cleanup operations before the
/// <see cref="Metronome"/> is reclaimed by garbage collection.
/// </summary>
///
~Metronome()
{
Dispose(false);
}






Disposing unmanaged code



The disposal of unmanaged code follows exactly the same pattern as above, but some care is required when disposing unmanaged resources. Consider the Signal class, which represents an audio signal in PCM format:








    /// <summary>
/// Represents a PCM sound discrete signal (measured in time).
/// A real discrete-time signal is defined as any real-valued
/// function of the integers.
/// </summary>
///
/// <remarks>
/// <para>
/// In signal processing, sampling is the reduction of a continuous
/// signal to a discrete signal. A common example is the conversion
/// of a sound wave (a continuous-time signal) to a sequence of samples
/// (a discrete-time signal).</para>
///
/// <para>
/// A sample refers to a value or set of values at a point in time
/// and/or space.</para>
///
/// </remarks>
///
public class Signal : IDisposable
{
// ...

private byte[] rawData;
private IntPtr ptrData;
private GCHandle handle;

private SampleFormat format;

private int channels;
private int sampleRate;
private int length;

// ...
}





Note that this class uses an unmanaged memory block to store the audio signal. To dispose it properly, the Disposable pattern has to be implemented as it follows:







        /// <summary>
/// Performs application-defined tasks associated with freeing,
/// releasing, or resetting unmanaged resources.
/// </summary>
///
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}

/// <summary>
/// Releases unmanaged resources and performs other cleanup operations
/// before the <see cref="Signal"/> is reclaimed by garbage collection.
/// </summary>
///
~Signal()
{
Dispose(false);
}

/// <summary>
/// Releases unmanaged and - optionally - managed resources
/// </summary>
///
/// <param name="disposing"><c>true</c> to release both managed
/// and unmanaged resources; <c>false</c> to release only unmanaged
/// resources.</param>
///
protected virtual void Dispose(bool disposing)
{
if (disposing)
{
// free managed resources
}

// free native resources
if (handle.IsAllocated)
{
handle.Free();
ptrData = IntPtr.Zero;
}
}





Here we have almost the same pattern. However, note that unmanaged code should always be freed, even if Dispose has not been called explictly. So the code for releasing unmanaged resources is executed unconditionally in the Dispose(bool) method.

Sunday, 9 September 2012

Haar Feature Face Detection in C#

A new article has been published in CodeProject! This article details the Viola-Jones face detection algorithm available in the Accord.NET Framework. The article page also provides a standalone package for face detection which can be reused without instantiating the entire framework.




 










The article is also participating in CodeProject's monthly competition under the categories Best C# Article and Best Overall Article of the month! Feel free to cast your vote if you liked and otherwise found them useful!  





This month we have other great articles participating in the competition: Marcelo de Oliveira's Social News excels in the Web category; Roy's Inline MSIL in C# also presents a very interesting reading in the C# category. The later may eventually be extremely useful to leverage performance in managed applications. 





For those who don't know, CodeProject is a amazingly useful site which publishes user-created articles and news. Every month, the best articles among all submissions are selected to win prizes and gain popularity as well!







Monday, 9 July 2012

Always test your hypothesis...


... but also always make sure to interpret results correctly! This post presents a quick intro on how
to perform statistical hypothesis testing and power analysis with the Accord.NET Framework in C#.










Contents



  1. Hypothesis testing


    1. Statistical hypothesis testing

    2. My test turned insignificant.
      Should I accept the null hypothesis?

    3. Further criticism


  2. The Accord.NET Framework


    1. Available tests

    2. Example problems


  3. Suggestions

  4. References















Hypothesis testing



What does hypothesis testing means in statistics (and should also mean everywhere, for that matter)?
You may recall from Karl Popper's theory of falsiability that good theories can rarely be accurately
proven, but you may gain a considerable confidence on them by constantly challenging and failing to
refute them.



This comes from the fact that it is often easier to falsify something than to prove it. Consider
for instance the white-swan/black-swan example: Let's say a theory states that all swans are white.
This is a very strong claim; it does not applies to one or a few particular observations of a swan,
but all of them. It would be rather difficult to verify if all of the swans in Earth
are indeed white. It is thus almost impossible to prove this theory directly.



However, the catch is that it will take only a single contrary example to refute it. If
we find a single swan that is black, the entire theory should be rejected, so alternate theories could
be raised. It should be fairly easy to attempt to prove a theory wrong. If a theory continuously survives
those attempts to be proven wrong, it becomes stronger. This does not necessarily means it is correct,
only that it is very unlikely to be wrong.



This is pretty much how the science method works; it also provides a solution to the demarcation problem
originally proposed by Kant: the problem of separating the sciences from the pseudo-sciences (i.e. astronomy
from astrology). A "good" theory should be easy to attack so we can try to refute it; and by constantly
challenging it and failing to prove it wrong, we gain further confidence that this theory may,
indeed, be right. In short:



Often the most interesting theories can't be proven right, they can only be proven wrong. By continuously
refuting alternatives, a theory becomes stronger (but most likely never reaching the 'truth').



Answering the question in the first phrase of this section, hypothesis testing means verifying if a
theory holds even when confronted with alternative theories. In statistical hypothesis testing, this
often means checking if a hypothesis holds even when confronted with the fact that it may have just
happened to be true
by pure chance or plain luck.






Statistical hypothesis testing



Fisher (1925) also noted that we can't
always prove a theory but we can attempt to refute it. Therefore, statistical hypothesis testing includes
stating a hypothesis, which is the hypothesis we are trying to invalidade; and check if we can confidently
reject it by confronting it with data from a test or experiment. This hypothesis to be pickpocketed
is often called the null hypothesis (commonly denoted H0). It receives this
name as it is usually the hypothesis of no change: there is no difference, nothing changed after the
experiment, there is no effect.



The hypotheses verified by statistical hypothesis tests are often theories about whether or not a random
 sample from a population comes from a given probability distribution. This seems weird, but several
problems can be cast in this way. Suppose, for example, we would like to determine if students from
a classroom have significantly different grades than students from another room. Any difference could
possibly be attributed to chance, as some students may just perform better on a exam because of luck.



An exam was applied to both classrooms. The exam results (by each student) are written below:



















Classroom A

Classroom B

8.12, 8.34, 7.54,

8.98, 8.24, 7.15,

6.60, 7.84, 8.68,

9.44, 8.83, 8.21,

8.83, 10.0, 7.94,

9.58, 9.44, 8.36,

8.48, 8.47, 8.02,

8.20, 10.0, 8.66,

8.48, 9.17, 6.54,

7.50

7.50, 6.70, 8.55,

7.84, 9.23, 6.10,

8.45, 8.27, 7.01,

7.18, 9.05, 8.18,

7.70, 7.93, 8.20,

8.19, 7.65, 9.25,

8.71, 8.34, 7.47,

7.47, 8.24, 7.10,

7.87, 10.0, 8.26,

6.82, 7.53

Students: 28

Mean: 8.416

Students: 29

Mean: 7.958



We have two hypothesis:



  • Results for classroom A are not significantly different from the results from classroom B. Any difference
    in means could have been explained due to chance alone.

  • Results are indeed different. The apparent differences are very unlikely to have occurred by chance.



Since we have less than 30 samples, we will be using a
Two-Sample T-Test
to test the hypothesis that the population's mean values of the two samples
are not equal. Besides, we will not be assuming equal variances. So let's we create our test object:



using Accord.Statistics;

using Accord.Statistics.Testing;

using Accord.Statistics.Testing.Power;




TwoSampleTTest
test = new TwoSampleTTest(A, B,

          hypothesis: TwoSampleHypothesis.ValuesAreNotEqual);


And now we can query it:



Console.WriteLine("Significant: "
+ test.Significant); // true; the test is significant


Which reveals the test is indeed significant. And now we have at least two problems to address...




Problems




Problem 1: Statistical significance does not imply practical significance



So the test was significant. But would this mean the difference itself is significant?
Would this mean there any serious problem with the school teaching method?



No - but it doesn't mean the contrary either. It is impossible to tell just by looking at the p-level.



The test only said there was a difference, but it can not tell the importance of this difference.
Besides the two classes having performed so differently they could trigger statistical significance,
we don't know if this difference has any practical significance. A statistical test being significant
is not a proof; it is just an evidence to be balanced together with other pieces of information in order
to drawn a conclusion.



Perhaps one of best examples illustrating this problem
is given by Martha K. Smith
:



Suppose a large clinical trial is carried out to compare a new medical treatment with a standard one.
The statistical analysis shows a statistically significant difference in lifespan when using the new
treatment compared to the old one. But the increase in lifespan is at most three days, with average
increase less than 24 hours, and with poor quality of life during the period of extended life. Most
people would not consider the improvement practically significant.


In our classroom example, the difference in means is about 0.46 points. If principals believe a difference
of less than 0.50 in a scale from 0.00 to 10.00 is not that critical, there may be no need to force
students from the room with lower grades to start taking extra lessons after school. In other words,
statistical hypothesis testing does not lead to automatic decision making. A statistically significant
test is just another evidence which should be balanced with other clues in order to take a decision or
draw a conclusion
.



Problem 2: Powerless tests can be misleading



The p-level reported by the significance test is the probability of the extreme data we found
be occurring given the null hypothesis is correct. Often, this is not the case. We
must also know the probability that the test will reject the null hypothesis when the null hypothesis
is false
. To do so, we must compute the power of our test, and, better yet,
we should have used this information to conclude how many samples we would need to achieve a more informative
test before we conducted our experiment. The power is then the probabability of detecting
a difference if this difference indeed exists (Smith,
2011
). So let's see:



// Check the actual power of the test...

Console.WriteLine("Power: "
+ test.Analysis.Power); // gives about 0.50


The test we performed had astonishingly small power; so if the null hypothesis is false (and there is
actually a difference between the classrooms) we have only about 50% chance of correctly rejecting it.
Therefore, this would also lead to a 50% chance of producing a false negative - incorrectly saying there
is no difference when actually there is.  The table below exemplifies the different errors we can
get by rejecting or not the null hypothesis.




















Null hypothesis is true

Null hypothesis is false

Fail to reject


the null hypothesis

Correct

True negative

Type II error (beta)

False negative

Reject the


null hypothesis

Type I error (alpha)

False positive

Correct outcome

True positive


Tests with little statistical power are
often inconsistent in the literature
. Suppose, for example, that the score from the first student
from classroom B had earned a  7.52 instead of 7.50. Due to the low power of the test, this little
change would already be sufficient to render the test nonsignificant, and we will not be able to reject
the null hypothesis that the two population means aren't different anymore. Due to the low power
of the test, we can't distinguish between a correct true negative and a type II error. This is why
powerless tests can be misleading and should never be relied upon for decision making (Smith,
2011b
).



The power of a test increases with the sample size. To obtain a power of at least 80%, let's see
how many samples should have been collected:



// Create a post-hoc analysis to determine sample size

var
analysis = new TwoSampleTTestPowerAnalysis(test);

analysis.Power = 0.80;

analysis.ComputeSamples();




Console.WriteLine("Samples in each group:
"oup: "
+ Math.Ceiling(analysis.Samples1));
// gives 57


So, we would actually need 57 students in each classroom to confidently affirm whether there
was an difference or not. Pretty disappointing, since in the real world we wouldn't be able to enroll
more students and wait years until we could perform another exam just to adjust the sample size. On
those situations, the power for the test can be increased by increasing the significance threshold ( href="http://otg.downstate.edu/downloads/2007/Spring07/thomas.pdf">Thomas, Juanes, 1996), although
clearly sacrificing our ability to detect false positives (Type I errors) in the process.






My test turned insignificant. Should I accept the null hypothesis?



The short answer is 'only if you have enough power'. Otherwise, definitively no.



If you have reason to believe the test you performed had enough power to detect a difference within
the given Type-II error rate, and it didn't, then accepting the null would most likely be acceptable.
The acceptance should also be accompanied of an analysis of confidence intervals or effect sizes. Consider,
for example, that some actual scientific discoveries were
indeed made by accepting the null hypothesis rather than by contradicting it
; one notable example
being the discovery of the X-Ray (Yu,
2012
).






Further criticism



Much of the criticism associated with statistical hypothesis testing is often related not to the use
of statistical hypothesis testing per se, but on how the significance outcomes from such tests are interpreted.
Often it boils down to incorrectly believing a p-value is the probability of a null hypothesis being
true, when, in fact, it is only the probability of obtaining a test statistic as extreme as the one
calculated from the data, always within the assumptions of the test under question.



Moreover, another problem may arise when we chose a null hypothesis which is obviously false. There
is no point on hypothesis testing when the null hypothesis couldn't possibly be true. For instance,
it is very difficult to believe a parameter of a continuous distribution is exactly equal to
an hypothesized value, such as zero. Given enough samples, it will always be possible to find a difference,
as small as it gets, and the test will invariably turn significant. Under those specific circunstances,
statistical testing can be useless as it has relationship to practical significance. That is why analyzing
the effect size is important in order to determine the practical significance of an hypothesis test.
Useful hypothesis would also need to be probable, plausible and falsifiable (Beaulieu-Prévost,
2005
).



The following links also summarize much of the criticism in statistical hypothesis testing. The last
one includes very interesting (if not enlightening) comments (in the comment section) on common criticisms
of the hypothesis testing method.









The Accord.NET Framework



Now that we presented the statistical hypothesis testing framework, and now that the reader is aware
of its drawbacks, we can start talking about performing those tests with the aid of a computer. The
Accord.NET Framework is a framework for scientific computing with wide support for statistical and power
analysis tests, without entering the merit if they are valid or no. In short, it provides some scissors;
feel free to run with them
.






Tests available in the framework (at the time of this writing)



As it may already have been noticed, the sample code included in the previous section was C# code using
the framework. In the aforementioned example, we created a T-Test for comparing the population means
of two samples drawn from Normal distributions. The framework, nevertheless, includes
many other tests
, some with support for
power analysis
. Those include:












Parametric tests

Nonparametric tests
















Tests marked with a * are available in versions for one and two samples. Tests on the second row can
be used to test hypothesis about
contingency tables
. Just remembering, the framework is
open source and all code is available on Google Code
.



A class diagram for the hypothesis testing module is shown in the picture below. Click for a larger version.






Class diagram for the
Accord.Statistics.Testing namespace
.



Framework usage should be rather simple. In order to illustrate it, the next section brings some example
problems and their solution using the hypothesis testing module.








Example problems and solutions 



Problem 1. Clairvoyant card game.



This is the second example from Wikipedia's page on hypothesis testing. In this example, a person
is tested for clairvoyance (ability of gaining information about something through extra sensory perception;
detecting something without using the known human senses.




// A person is shown the reverse of a playing card 25 times and is
// asked which of the four suits the card belongs to. Every time
// the person correctly guesses the suit of the card, we count this
// result as a correct answer. Let's suppose the person obtained 13
// correctly answers out of the 25 cards.

// Since each suit appears 1/4 of the time in the card deck, we
// would assume the probability of producing a correct answer by
// chance alone would be of 1/4.

// And finally, we must consider we are interested in which the
// subject performs better than what would expected by chance.
// In other words, that the person's probability of predicting
// a card is higher than the chance hypothesized value of 1/4.

BinomialTest test = new BinomialTest(
successes: 13, trials: 25,
hypothesizedProbability: 1.0 / 4.0,
alternate: OneSampleHypothesis.ValueIsGreaterThanHypothesis);

Console.WriteLine("Test p-Value: " + test.PValue); // ~ 0.003
Console.WriteLine("Significant? " + test.Significant); // True.



Problem 2. Worried insurance company



This is a common example with variations given by many sources. Some of them can be found
here
and here.




// An insurance company is reviewing its current policy rates. When the
// company initially set its rates, they believed the average claim amount
// was about $1,800. Now that the company is already up and running, the
// executive directors want to know whether the mean is greater than $1,800.

double hypothesizedMean = 1800;

// Now we have two hypothesis. The null hypothesis (H0) is that there is no
// difference between the initial set value of $1,800 and the average claim
// amount of the population. The alternate hypothesis is that the average
// is greater than $1,800.

// H0 : population mean ≤ $1,800
// H1 : population mean > $1,800

OneSampleHypothesis alternate = OneSampleHypothesis.ValueIsGreaterThanHypothesis;

// To verify those hypothesis, we draw 40 random claims and obtain a
// sample mean of $1,950. The standard deviation of claims is assumed
// to be around $500.

double sampleMean = 1950;
double standardDev = 500;
int sampleSize = 40;

// Let's create our test and check the results
ZTest test = new ZTest(sampleMean, standardDev,
sampleSize, hypothesizedMean, alternate);

Console.WriteLine("Test p-Value: " + test.PValue); // ~0.03
Console.WriteLine("Significant? " + test.Significant); // True.

// In case we would like more information about what was calculated:
Console.WriteLine("z statistic: " + test.Statistic); // ~1.89736
Console.WriteLine("std. error: " + test.StandardError); // 79.05694
Console.WriteLine("test tail: " + test.Tail); // one Upper (right)

Console.WriteLine("alpha level: " + test.Size); // 0.05


Problem 3. Differences among multiple groups (ANOVA)



This example comes from Wikipedia's page on the F-test. Suppose we would like to study the effect of
three different levels of a factor ona response (such as, for example, three levels of a fertilizer
on plant growth. We have made 6 observations for each of the three levels a1, a2 and a3,
and have written the results as in the table below.




double[][] outcomes = new double[,]
{
// a1 a2 a3
{ 6, 8, 13 },
{ 8, 12, 9 },
{ 4, 9, 11 },
{ 5, 11, 8 },
{ 3, 6, 7 },
{ 4, 8, 12 },
}
.Transpose().ToArray();

// Now we can create an ANOVA for the outcomes
OneWayAnova anova = new OneWayAnova(outcomes);

// Retrieve the F-test
FTest test = anova.FTest;

Console.WriteLine("Test p-value: " + test.PValue); // ~0.002
Console.WriteLine("Significant? " + test.Significant); // true

// Show the ANOVA table
DataGridBox.Show(anova.Table);


The last line in the example shows the ANOVA table using the framework's DataGridBox object. The DataGridBox
is a convenience class for displaying DataGridViews just as one would display a message using MessageBox.
The table is shown below:


width="600px" />


 Problem 4. Biased bees



This example comes from the stats page of the College of Saint Benedict and Saint John's University
(Kirkman, 1996). It is a very interesting example
as it shows a case in which a t-test fails to see a difference between the samples because of the non-normality
of of the sample's distributions. The Kolmogorov-Smirnov nonparametric test, on the other hand,
succeeds.



The example deals with the preference of bees between two nearby blooming trees in an empty field. The
experimenter has colelcted data measurinbg how much time does a bee spents near a particular tree. The
time starts to be measured when a bee first touches the tree, and is stopped when the bee moves more
than 1 meter far from it. The samples below represents the measured time, in seconds, of the observed
bees for each of the trees.




double[] redwell =
{
23.4, 30.9, 18.8, 23.0, 21.4, 1, 24.6, 23.8, 24.1, 18.7, 16.3, 20.3,
14.9, 35.4, 21.6, 21.2, 21.0, 15.0, 15.6, 24.0, 34.6, 40.9, 30.7,
24.5, 16.6, 1, 21.7, 1, 23.6, 1, 25.7, 19.3, 46.9, 23.3, 21.8, 33.3,
24.9, 24.4, 1, 19.8, 17.2, 21.5, 25.5, 23.3, 18.6, 22.0, 29.8, 33.3,
1, 21.3, 18.6, 26.8, 19.4, 21.1, 21.2, 20.5, 19.8, 26.3, 39.3, 21.4,
22.6, 1, 35.3, 7.0, 19.3, 21.3, 10.1, 20.2, 1, 36.2, 16.7, 21.1, 39.1,
19.9, 32.1, 23.1, 21.8, 30.4, 19.62, 15.5
};

double[] whitney =
{
16.5, 1, 22.6, 25.3, 23.7, 1, 23.3, 23.9, 16.2, 23.0, 21.6, 10.8, 12.2,
23.6, 10.1, 24.4, 16.4, 11.7, 17.7, 34.3, 24.3, 18.7, 27.5, 25.8, 22.5,
14.2, 21.7, 1, 31.2, 13.8, 29.7, 23.1, 26.1, 25.1, 23.4, 21.7, 24.4, 13.2,
22.1, 26.7, 22.7, 1, 18.2, 28.7, 29.1, 27.4, 22.3, 13.2, 22.5, 25.0, 1,
6.6, 23.7, 23.5, 17.3, 24.6, 27.8, 29.7, 25.3, 19.9, 18.2, 26.2, 20.4,
23.3, 26.7, 26.0, 1, 25.1, 33.1, 35.0, 25.3, 23.6, 23.2, 20.2, 24.7, 22.6,
39.1, 26.5, 22.7
};

// Create a t-test as a first attempt.
var t = new TwoSampleTTest(redwell, whitney);

Console.WriteLine("T-Test");
Console.WriteLine("Test p-value: " + t.PValue); // ~0.837
Console.WriteLine("Significant? " + t.Significant); // false

// Create a non-parametric Kolmogovor Smirnov test
var ks = new TwoSampleKolmogorovSmirnovTest(redwell, whitney);

Console.WriteLine("KS-Test");
Console.WriteLine("Test p-value: " + ks.PValue); // ~0.038
Console.WriteLine("Significant? " + ks.Significant); // true


Problem 5. Comparing classifier performances



The last example comes from (E.
Ientilucci, 2006
) and deals with comparing the performance of two different raters (classifiers)
to see if their performance are significantly different.



 Suppose an experimenter has two classification systems, both trained to classify observations
into one of 4 mutually exclusive categories. In order to measure the performance of each classifier,
the experimenter confronted their classification labels with the ground truth for a testing dataset,
writing the respective results in the form of contingency tables.



The hypothesis to be tested is that the performance of the two classifiers are the same.



             
// Create the confusion matrix for the first sytem.
var a = new GeneralConfusionMatrix(new class="kwrd">int[,]]]
{
{ 317, 23, 0, 0 },
{ 61, 120, 0, 0 },
{ 2, 4, 60, 0 },
{ 35, 29, 0, 8 },
});

// Create the confusion matrix for the second system.
var b = new GeneralConfusionMatrix(new class="kwrd">int[,]
{
{ 377, 79, 0, 0 },
{ 2, 72, 0, 0 },
{ 33, 5, 60, 0 },
{ 3, 20, 0, 8 },
});


var test = new TwoMatrixKappaTest(a, b);

Console.WriteLine("Test p-value: " + test.PValue); // ~0.628
Console.WriteLine("Significant? " + test.Significant); // false


In this case,
the test didn't show enough evidence to confidently reject the null hypothesis. Therefore, one
should restrain from affirming anything about differences between the two systems, unless the power
for the test is known.



Unfortunately I could not find a clear indication in the literature about the power of a two matrix Kappa test. However,
since the test statistic is asymptotically normal, one would try checking the power for this test by
analysis the power of the underlying Z-test. If there is enough power, one could possibly accept the
null hypothesis that there are no large differences between the two systems.







Suggestions



As always, I expect the above discussion and examples could be useful for interested readers and
users. However, if you believe you have found a flaw or would like to discuss any portion of this
post, please feel free to do so by posting on the comments section.




PS: The classroom example uses a T-test to test for differences in populations means. The T-Test assumes a normal distribution. The data, however, it not exactly normal, since it is crippled between 0 and 10. Suggestions for a better example would also be appreciated!








References



R. A. Fisher, 1925. Statistical Methods for Research Workers. Available online from:
http://psychclassics.yorku.ca/Fisher/Methods/



M. K. Smith, 2011. Common mistakes in using statistics: Spotting and Avoiding Them - Power of a
Statistical Procedure. Available online from:
http://www.ma.utexas.edu/users/mks/statmistakes/power.html



M. K. Smith, 2011b. Common mistakes in using statistics: Spotting and Avoiding Them - Detrimental
Effects of Underpowered or Overpowered Studies. Available online from:
http://www.ma.utexas.edu/users/mks/statmistakes/UnderOverPower.html



L. Thomas, F. Juanes, 1996. The importance of statistical power analysis: an example from animal behaviour,
Animal Behaviour, Volume 52, Issue 4, October., Pages 856-859. Available online from: http://otg.downstate.edu/downloads/2007/Spring07/thomas.pdf



C. H. (Alex) Yu, 2012. Don't believe in the null hypothesis? Available online from:
http://www.creative-wisdom.com/computer/sas/hypothesis.html



D. Beaulieu-Prévost, 2005. Statistical decision and falsification in science : going beyond the null
hypothesis. Séminaires CIC. Université de Montréal.



T.W. Kirkman, 1996. Statistics to Use. Acessed July 2012. Avalilable online from
http://www.physics.csbsju.edu/stats/



E. Ientilucci, 2006. "On Using and Computing the Kappa Statistic". Available online from http://www.cis.rit.edu/~ejipci/Reports/On_Using_and_Computing_the_Kappa_Statistic.pdf

Monday, 28 May 2012

Accord.NET has a new home!

As some might have noticed, Origo is going away until the end of the month. Origo maintainers were kind enough to support previous project owners with a tarball containing the entire contents from the hosted projects upon request; this means all Accord.NET content previously hosted on Origo can be gradually restored at Google Code one the next months.




The Accord.NET Framework is now at http://code.google.com/p/accord/



With the hosting change there also comes a few updates. Version 2.7 is about to be released soon; this version now includes the highly requested Bag of Visual Words (BoW) based on the SURF feature point detector. This feature can be used to extract fixed-length feature vectors from images and train standard methods such as Support Vector Machines to perform image classification. Initial experiments with the χ² (chi-square) kernel have shown promising results.






The Accord.NET Framework project page, now hosted at Google Code.



Other hopefully useful features include new learning algorithms for hidden Markov models, such as the Viterbi training and the Maximum Likelihood Estimation training; and the introduction of the ISampleable interface for specifying distributions which can be sampled efficiently (i.e. now it is possible to drawn random samples from some distributions - including hidden Markov models). Other nice addition is the inclusion of the ScatterplotBox class for displaying data points in a scatterplot in a similar way to MessageBox.Show. It comes at hand to quickly inspect some spacial data in the middle of other calculations or even in live demonstrations.



As others might also have noticed, this site's design also has changed, hopefully for the better.



As usual, I hope someone will find the new release useful :-)

Thursday, 5 April 2012

Quadratic Programming in C#


I have manually translated and adapted the QuadProg solver for quadratic programming problems made by Berwin A. Turlach. His code was originally published under the GNU Library License, which has now been superseded by the GNU Lesser License. This adapted version honors the original work and is thus distributed under the same license.




Introduction



Despite the name, the terms linear or quadratic programming have little resemblance to the set of activities most people now know as programming. Those terms usually usually refers to a specific set of function optimization methods, i.e. methods which can be used to determine the maximum or minimum points of special kinds of functions under a given number of solution constraints. For example, suppose we would like to determine the minimum value of the function:



f(x, y) = 2x + y + 4



Under the constraints that x and y must be non-negative (i.e. either positive or zero). This may seem fairly simple and trivial, but remember that practical linear programming problems may have hundreds or even thousands of variables and possibly million constraints.



When the problem to be solved involves a quadratic function instead of a linear function, but still presents linear constraints, this problem can be cast as a quadratic programming problem. Quadratic functions are polynomial functions in each each term may have at most a total degree of 2. For example, consider the function



f(x, y, z) = 2x² + 5xy + y² - z² + x – 5.



Now let's check the sum of the degrees for each variable on the polynomial terms. We start by writing the missing terms of the polynomial



f(x, y, z) = 2 x2y0z0 + 5 x1y1z0 + 2 x0y2z0 - x0y0z2 + x1y0z0 - 5 x0y0z0



and then proceed to check the sum of the degrees at each term. In the first term, 2+0+0 = 2. For the second, 1+1+0 = 2, and so on. Those functions have a nice property that they can be expressed in a matrix form



f(x) = 1/2 xT Q x + cTx.



Here, x and c are vectors. The matrix Q is a symmetric matrix specifying how the variables combine in the quadratic terms of the function. If this matrix is positive definite, then the function is convex, and the optimization has a single, unique optimum (we say it has a global optimum). The coefficients c specify the linear terms of our function.



Source code



The available source code is based on a translation of the Fortran code written by Berwin A. Turlach. However, some modifications have been made. Fortran uses column-major ordering for matrices, meaning that matrices are stored in memory in sequential order of column elements. Almost all other languages use row-major ordering, including C, C++, Java and C#. In order to improve data locality, I have modified the code to use the transpose of the original matrices D and A. I have also modified the QP formulation adopted in the Goldfarb and Idnani paper to reflect the form presented in the introduction.



This code is part of the Accord.NET Framework. However, the version available in this blog post will most likely not be the most recently, updated, fixed and enhanced version of the code. For the latest version, be sure to download the latest version of the framework on the project site or through a NuGet package.



Using the code



The first step in solving a quadratic programming problem is, well, specifying the problem. To specify a quadratic programming problem, one would need two components: a matrix D describing the relationship between the quadratic terms, and a vector d describing the linear terms. Perhaps this would work better with an example.



Suppose we are trying to solve a minimization problem. Given a function, the goal in such problems is to find the correct set of function arguments which would result in the minimum possible value for the function. An example of a quadratic minimization problem is given below:











min f(x, y) = 2x² - xy + 4y² - 5x – 6y



subject to the constraints:



x - y = 5


x = 10




wolfram
(generated with Wolfram Alpha)


However, note that this problem involves a set of constraints. The required solution for this minimization problem is required to lie in the interval specified by the constraints. More specifically, any x and y pair candidate for being a minimal of the function must respect the relations x - y = 5 and x >= 10. Thus, instead of lying in the unconstrained minimum of the function surface shown above, the solution lies slightly off the center of the surface. This is an obvious easy problem to solve manually, but it will fit for this demonstration.




wolfram2



As it can be seen (and also live demonstrated by asking Wolfram Alpha) the solution lies on the point (10,5), and the constrained minimum of the function is given by 170. So, now that we know what a quadratic programming problem looks like, how can we actually solve it?



Specifying the objective function



The first step in solving a quadratic programming problem is to specify the objective function. Using this code, there are three ways to specify it. Each of them has their own advantages and disadvantages.



1. Manually specifying the QP matrix.



This is the most common approach for numerical software, and probably the most cumbersome for the user. The problem matrix has to be specified manually. This matrix is sometimes denoted Q, D or H as it actually denotes the Hessian matrix for the problem.



The matrix Q is used to describe the quadratic terms of our problem. It is a n x n matrix, in which n corresponds to the number of variables in our problem, covering all possible combinations of variables. Recall our example given on the start of this section. We have 2 variables, x and y. Thus, our matrix Q is 2 x 2. The possible combinations for x and y are expressed in the table below.





















 

x

y

x

x*x

x*y

y

y*x

y*y


To form our matrix Q, we can take all coefficients associated with each pair mentioned on the table above. The diagonal elements should also be multiplied by two (this is actually because the matrix is the Hessian matrix of the problem: it is the matrix of all second-order derivatives for the function. Since we have only at most quadratic terms, the elementary power rule of derivation “drops” the ² from the x² and y² terms – I think a mathematician would hit me with a stick for explaining it like this, but it serves well for a quick, non-technical explanation).



Remember our quadratic terms were 2x²  - 1xy + 4y². Writing the terms on their proper position and differentiating, we have:



 


As it can be seen, the matrix is also symmetric (and often, but not always, positive definite). The next step, more trivial, is to write a vector d containing the linear terms. The linear terms are –5x –6y, and thus our vector d can be given by:





Therefore our C# code can be created like this:



    double[,] Q =
{
{ +4, -1 },
{ -1, +8 },
};

double[] d =
{
-5,
-6
};




2. Using lambda expressions



This approach is a bit more intuitive and less error prone. However, it involves lambdas functions and some people find it hard to follow them. Another disadvantage is that we will lose the edit & continue debugging ability of visual studio. The advantage is that the compiler may catch some obvious syntax errors automatically.




    double x = 0, y = 0;

QuadraticObjectiveFunction f = // 2x² - xy + 4y² - 5x - 6y
new QuadraticObjectiveFunction(() => 2*(x*x) - (x*y) + 4*(y*y) - 5*x - 6*y);



Note that the x and y variables could have been initialized to any value. They are only used as symbols, and not used in any computations.



3. Using text strings



This approach is more intuitive but a bit more error prone. The function can be specified using strings, as in a standard mathematical formula. However, since all we have are strings, there is no way to enforce static, compile time checking.




    QuadraticObjectiveFunction f = 
new QuadraticObjectiveFunction("2x² - xy + 4y² - 5x - 6y");



Couldn’t be easier.



Specifying the constraints



The next step in specifying a quadratic programming problem is to specify the constraints. The constraints can be specified in almost the same way as the objective function.



1. Manually specifying the constraints matrix



The first option is to manually specify the constraints matrix A and vector b. The constraint matrix expresses the way the variables should be combined when compared to corresponding value on vector b. It is possible to specify either equality constraints or inequality constraints. The formulation used in this code is slightly different from the one used in Turlach’s original implementation. The constraints are supposed to be in the form:



A1 x = b1


A2 x = b2



This means that each line of matrix A expresses a possible combination of variables x which should be compared to the corresponding line of b. An integer variable m can be specified to denote how many of the first rows of matrix A should be treated as equalities rather than inequalities. Recall that in our example the constraints are given by 1x -1y = 5 and 1x = 10. Lets write this down in a tabular form:



























#

x

y

?

b

q1

1

-1

=

5

q2

1

0

=

10


Thus our matrix A and vector b can be specified as:

















And not forgetting that m = 1, because the first constraint is actually an equality.



2. Using classes and objects



A more natural way to specify constraints is using the classes and objects of the Accord.NET Framework. The LinearConstraint class allows one to specify a single constraint using an object-oriented approach. It doesn’t have the most intuitive usage on earth, but has much more expressiveness. It can also be read aloud, it that adds anything! :-)




    List<LinearConstraint> list = new List<LinearConstraint>();

list.Add(new LinearConstraint(numberOfVariables: 1)
{
VariablesAtIndices = new[] { 0 }, // index 0 (x)
ShouldBe = ConstraintType.GreaterThanOrEqualTo,
Value = 10
});

list.Add(new LinearConstraint(numberOfVariables: 2)
{
VariablesAtIndices = new int[] { 0, 1 }, // index 0 (x) and index 1 (y)
CombinedAs = new double[] { 1, -1 }, // when combined as 1x -1y
ShouldBe = ConstraintType.EqualTo,
Value = 5
});



The specification is centered around the notion that variables are numbered and have an associated index. For example, x is the zero-th variable of the problem. Thus x has an index of 0 and y has an index of 1. So for example, reading aloud the last constraint, it is possible to express how the variables at indices 0 and 1, when combined as 1x and –1y, should be equal to value 5.



2. Using lambda expressions



A more intuitive way to express constraints is again using lambda expressions. And again the problems are the same: some people find it hard to follow and we lose edit & continue.




    var constraints = new List<LinearConstraint>();
constraints.Add(new LinearConstraint(f, () => x - y == 5));
constraints.Add(new LinearConstraint(f, () => x >= 10));



3. Using text strings



Same as above, but with strings.




    var constraints = new List<LinearConstraint>();
constraints.Add(new LinearConstraint(f, "x - y = 5"));
constraints.Add(new LinearConstraint(f, "x >= 10"));



Finally, creating and solving the problem



Once we have specified what do we want, we can now ask the code for a solution. In case we have opted for manually specifying the matrix A, vector b and integer m, we can use:




    // Create the optimization problem
var solver = new GoldfarbIdnaniQuadraticSolver(numberOfVariables:2, A, b, m);



In case we have opted for creating a list of constraints instead, we can use:




    // Create our optimization problem
var solver = new GoldfarbIdnaniQuadraticSolver(numberOfVariables: 2, constraints: list);



After the solver object has been created, we can call Minimize() to solve the problem. In case we have opted for manually specifying Q and d, we can use:


    // Attempt to solve the problem
double minimumValue = solver.Minimize(Q, d);



And in case we have opted for creating a QuadraticObjectiveFunction object, we can use:




    // Attempt to solve the problem
double minimumValue = target.Minimize(f);



In either case, the solution will be available in the Solution property of the solver object, and will be given by:




    double value = solver.Value;    // f(x,y) = 170
double x = solver.Solution[0]; // x = 10
double y = solver.Solution[1]; // y = 5



Sample application



The Accord.NET Framework now includes a sample application demonstrating the use of the Goldfarb-Idnani Quadratic Programming Solver. It can be downloaded at the Accord.NET Framework site, and also comes together with recent versions of the framework (> 2.6).






Solver



Solver sample application included in the Accord.NET Framework.



Remarks



Because the code has been translated by hand (in contrast of using automatic translators such as f2c) there could be potential bugs in the code. I have tested the code behavior against R’s quadprog package and still didn’t find errors. But this does not mean the code is bug-free. As always, as is the case of everything else in this blog, this code is published in good faith, but I can not guarantee the correctness of everything. Please read the disclaimer for more information.



References