I have written a library (HaarCascadeClassifier.dll) that makes object detection possible in .NET without any other library requirement. It is an Open Source project that contains implementation of the Viola-Jones object detection algorithm. The library uses haar cascades generated by OpenCV (XML files) to detect particular objects such as faces. In this post, I share HaarCascadeClassifier.dll and its usage.
Introduction
The most popular and the fastest implementation of Viola-Jones object detection algorithm is undoubtedly the implementation of OpenCV. But OpenCV requires wrapper classes to be usable with .NET languages and Bitmap
objects of .NET have to be converted to IplImage
format before it is used with OpenCV. On the other hand, programs that use OpenCV are dependent on all OpenCV libraries and their wrappers. These are not problems when functions of OpenCV are used. But if we need only to detect objects on a Bitmap, it isn't worth to make our programs dependent on all OpenCV libraries and wrappers...
I have written a library (HaarCascadeClassifier.dll) that makes object detection possible in .NET without any other library requirement. It is an Open Source project that contains implementation of the Viola-Jones object detection algorithm. The library uses haar cascades generated by OpenCV (XML files) to detect particular objects such as faces.
It can be used for object detection purposes or only to understand the algorithm and how parameters affect the result or the speed.
Background
In fact, my purpose is to share HaarCascadeClassifier.dll and its usage. So I will try to summarize the algorithm.
Algorithm of Viola and Jones doesn't use pixels directly to detect objects. It uses rectangular features that are called "haar-like features". These features can be represented using 2, 3, or 4 rectangles.
Haar-like Features
The value of a haar-like feature is calculated as the difference between the average of the pixels under the rectangle that covers the feature and the average of the pixels under black rectangles. Each rectangle is represented with 5 values; x
, y
, width
, height
, and weight
. x
and y
are coordinates where the rectangle will be placed on the searching window. Width
and height
are the size of the rectangle. Weight
is multiplied by the sum of all the pixels insinde the rectangle. (Therefore, we can say that weight values of black rectangles are negative.)
Cascades and Stages
A cascade of weak classifiers are used to determine if a subwindow on an image contains the searched object. The structure of a cascade can be represented programmatically as follows:
Public Structure FeatureRect
Public Rectangle As Rectangle
Public Weight As Single
End Structure
Public Structure Node
Public FeatureRects As List(Of FeatureRect)
Public Threshold As Single
Public LeftVal As Single
Public RightVal As Single
Public HasLNode As Boolean
Public LeftNode As Integer
Public HasRNode As Boolean
Public RightNode As Integer
End Structure
Public Structure Tree
Public Nodes As List(Of Node)
End Structure
Public Structure Stage
Public Trees As List(Of Tree)
Public Threshold As Single
End Structure
Public Stages As List(Of Stage)
Features are applied to a subimage until the subimage is rejected by one of them or passed from all.
The value of a node is calculated as the sum of all the features inside the node and it is used to determine the next node or the value that will be added to the stage value. If the current node is a leaf (it means it has no nodes) and the calculated node value is less than the threshold, then the left value (else right value) is added to the stage value (RightVal
, LeftVal
). If the current node is not a leaf, features on the next node (if the feature value is smaller than the threshold LeftNode
, else RightNode
) that is selected according to the value of the current node have to be applied to the same subimage. After it is done for each tree in the current stage, the calculated stage value is compared with the stage threshold. If it is less, it means the subimage doesn't contain the searched object and it is eliminated directly without applying the remaining stages.
Integral Image
Integral image is an approach that speeds up the algorithm. Integral image at a location is the sum of all pixels above and on the left of the location. (CalculateCumSums32bpp()
, CalculateCumSums24bpp()
, CalculateCumSums8bpp()
). It is used to calculate the sum of pixels within a rectangle very rapidly.
For example, the sum of pixels under the hatched rectangle can be calculated as IntegralImage
(D) - IntegralImage
(B) - IntegralImage
(C) + IntegralImage
(A).
Variance Normalization
During the training process, variance of each image is normalized to reduce the effect of different lighting conditions. Therefore, normalization must be done for all subwindows during detection. Also thresholds can be normalized instead of normalizing image variance (this is faster). Thresholds are multiplied by standard deviation of pixels within the current subwindow for normalization.
Scanning Process
The scanning process is an important part of the algorithm. Almost all parameters required by the Detect()
function of the library are used to optimize this process.
During the scanning process, classifiers are applied to all subimages using a scaled searching window. (Features are also scaled.)
While Scaler < MaxScale
....
Scaler = Scaler * ScaleMult
End While
The step size of the searching window is calculated by multiplying SlidingRatio
and the current searching window width. It means each step of the searching window is equal to StepSize
pixels.
For i = 0 To ImageWidth - SearchingWinWidth - 1 Step StepSize
For j = 0 To ImageWidth - SearchingWinHeight - 1 Step StepSize
...
Next
Next
Average of Nested Rectangles
Objects can be detected at different scales more than once. This causes the rectangles that represent locations of objects to be nested.
The average of these rectangles must be calculated and all the nested rectangles must be eliminated from results that will be returned.
If distances between edges of rectangles are short enough, they are considered to be nested. The destination threshold is obtained by multiplying sizeMultForNesRectCon
by the size of one of these rectangles (AreTheyNested()
).
Using the Library
The library is easy to use in all .NET languages. An example is given below in C#. More examples can be downloaded with compiled executables from here.
Parameters
Parameter | Description |
MaxDetCount | The detector stops searching when number of detected objects reaches this value. |
MinNRectCount | Minimum neighbour areas that have to be passed from the detector to verify existence of searched object. |
FirstScale | First scaler of searching window. |
MaxScale | Maximum scaler of searching window. |
ScaleMult | ScaleMult (Scale Multiplier) and the current scaler are multiplied to make an increment on the current scaler. |
SizeMultForNesRectCon | SizeMultForNesRectCon (Size Multiplier For Nested Object Control) and size of each rectangle are multiplied separately to obtain maximum acceptable horizontal and vertical distances between the current rectangle and the others. Maximum distances are used to check if rectangles are nested or not. |
SlidingRatio | The ratio of step size to scaled searching window width. (CurrentStepSize = ScaledWindowWidth * SlidingRatio ) |
Pen | A "Pen " object to draw rectangles on given bitmap. If it is given as null , nothing will be drawn. |
An Example
Preparing the required parameters:
int maxDetCount = Int32.MaxValue;
int minNRectCount = 0;
float firstScale = detector.Size2Scale(100);
float maxScale = detector.Size2Scale(400);
float scaleMult = 1.1f;
float sizeMultForNesRectCon = 0.3f;
float slidingRatio = 0.2f;
Pen pen = new Pen(Brushes.Red, 4);
HaarCascadeClassifer.HaarDetector.DetectionParams detectorParameters;
detectorParameters =
new HaarCascadeClassifer.HaarDetector.DetectionParams(maxDetCount, minNRectCount,
firstScale, maxScale, scaleMult, sizeMultForNesRectCon, slidingRatio, pen);
Loading an embedded XML haar cascade:
XmlDocument xmlDoc=new XmlDocument();
xmlDoc.LoadXml(HaarCascadeClassifer.My.Resources.Resources.haarcascade_frontalface_alt);
HaarDetector detector = new HaarDetector(xmlDoc);
Loading a haar cascade from an external XML file:
XmlDocument xmlDoc=new XmlDocument();
xmlDoc.Load("haarcascade_frontalface_alt.xml");
HaarDetector detector = new HaarDetector(xmlDoc);
Detection:
Bitmap bmp = new Bitmap("C:\bitmap.bmp");
HaarCascadeClassifer.HaarDetector.DResults results =
detector.Detect(ref bmp, detectorParameters);
"results
" contains the following:
results.NOfObjects
results.DetectedOLocs
results.SearchedSubRegionCount
History
- 14th August, 2012: Initial version