- categories: Data Science, Definition
Definition:
The Vapnik-Chervonenkis (VC) dimension is a measure of the capacity or complexity of a hypothesis class in statistical learning theory. Specifically, it is the largest number of points that can be shattered by the hypothesis class .
A set of points is shattered by if, for every possible binary labeling of those points, there exists a hypothesis that correctly classifies them.
Formal Definition:
The VC dimension of a hypothesis class , denoted , is the size of the largest finite set such that can be shattered by . If can shatter arbitrarily large sets, .
Key Concepts:
-
Shattering:
A hypothesis class shatters a set if for every possible labeling of (i.e., labelings), there exists a hypothesis that perfectly separates the points according to that labeling. -
Examples:
- A hypothesis class of linear classifiers in can shatter any 3 non-collinear points but cannot shatter 4 points in general position. Thus, .
- A hypothesis class of all functions can shatter any finite set, so .
Properties:
-
Relation to Learning:
The VC dimension quantifies the capacity of .- A higher VC dimension implies a more expressive hypothesis class, which can fit more complex data but is more prone to overfitting.
- A lower VC dimension implies a less expressive hypothesis class, which may underfit.
-
Generalization Bound:
If , then with high probability, the generalization error is bounded as:
where is the number of training samples, and is the confidence level. -
Finite VC Dimension:
If has finite VC dimension, it satisfies the PAC (Probably Approximately Correct) learning framework.
Applications:
-
Model Selection:
Helps compare the complexity of hypothesis classes (e.g., linear classifiers, decision trees). -
Generalization Analysis:
Used to derive theoretical guarantees on a model’s ability to generalize beyond training data. -
Support Vector Machines (SVMs):
The VC dimension is related to the margin of separation, impacting the capacity of SVMs.
Examples:
- Linear Classifiers in :
- VC dimension = 3 (3 points in general position can be shattered).
- Axis-Aligned Rectangles in :
- VC dimension = 4 (4 points in a specific arrangement can be shattered).
- Polynomial Classifiers of Degree :
- VC dimension = in (related to the number of parameters).