The exact sample complexity of PAC-learning problems with unit VC dimension

Paul W. Goldberg

    Research output: Working paper

    Abstract

    The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.
    Original languageEnglish
    Place of PublicationAston Triangle, Birmingham B4 7ET, UK
    PublisherAston University
    Number of pages12
    Publication statusUnpublished - Dec 1996

    Fingerprint

    Vapnik-Chervonenkis Dimension
    Unit
    Sample Size
    Requirements
    Order of Growth
    Learning
    Optimal Algorithm
    Higher Dimensions
    Learning Algorithm
    Upper and Lower Bounds
    Machine Learning
    Upper bound
    Dependent

    Keywords

    • Vapnik-Chervonenkis (VC) dimension
    • machine learning
    • Probably Approximately Correct (PAC) framework
    • optimal algorithm-dependent bounds

    Cite this

    Goldberg, P. W. (1996). The exact sample complexity of PAC-learning problems with unit VC dimension. Aston Triangle, Birmingham B4 7ET, UK: Aston University.
    Goldberg, Paul W. / The exact sample complexity of PAC-learning problems with unit VC dimension. Aston Triangle, Birmingham B4 7ET, UK : Aston University, 1996.
    @techreport{15fb6fa124b24e20b98477304b77a774,
    title = "The exact sample complexity of PAC-learning problems with unit VC dimension",
    abstract = "The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.",
    keywords = "Vapnik-Chervonenkis (VC) dimension, machine learning, Probably Approximately Correct (PAC) framework, optimal algorithm-dependent bounds",
    author = "Goldberg, {Paul W.}",
    year = "1996",
    month = "12",
    language = "English",
    publisher = "Aston University",
    type = "WorkingPaper",
    institution = "Aston University",

    }

    Goldberg, PW 1996 'The exact sample complexity of PAC-learning problems with unit VC dimension' Aston University, Aston Triangle, Birmingham B4 7ET, UK.

    The exact sample complexity of PAC-learning problems with unit VC dimension. / Goldberg, Paul W.

    Aston Triangle, Birmingham B4 7ET, UK : Aston University, 1996.

    Research output: Working paper

    TY - UNPB

    T1 - The exact sample complexity of PAC-learning problems with unit VC dimension

    AU - Goldberg, Paul W.

    PY - 1996/12

    Y1 - 1996/12

    N2 - The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.

    AB - The Vapnik-Chervonenkis (VC) dimension is a combinatorial measure of a certain class of machine learning problems, which may be used to obtain upper and lower bounds on the number of training examples needed to learn to prescribed levels of accuracy. Most of the known bounds apply to the Probably Approximately Correct (PAC) framework, which is the framework within which we work in this paper. For a learning problem with some known VC dimension, much is known about the order of growth of the sample-size requirement of the problem, as a function of the PAC parameters. The exact value of sample-size requirement is however less well-known, and depends heavily on the particular learning algorithm being used. This is a major obstacle to the practical application of the VC dimension. Hence it is important to know exactly how the sample-size requirement depends on VC dimension, and with that in mind, we describe a general algorithm for learning problems having VC dimension 1. Its sample-size requirement is minimal (as a function of the PAC parameters), and turns out to be the same for all non-trivial learning problems having VC dimension 1. While the method used cannot be naively generalised to higher VC dimension, it suggests that optimal algorithm-dependent bounds may improve substantially on current upper bounds.

    KW - Vapnik-Chervonenkis (VC) dimension

    KW - machine learning

    KW - Probably Approximately Correct (PAC) framework

    KW - optimal algorithm-dependent bounds

    M3 - Working paper

    BT - The exact sample complexity of PAC-learning problems with unit VC dimension

    PB - Aston University

    CY - Aston Triangle, Birmingham B4 7ET, UK

    ER -

    Goldberg PW. The exact sample complexity of PAC-learning problems with unit VC dimension. Aston Triangle, Birmingham B4 7ET, UK: Aston University. 1996 Dec.