Browse Definitions :
Definition

validation set

Contributor(s): Matthew Haughn

A validation set is a set of data used to train artificial intelligence (AI) with the goal of finding and optimizing the best model to solve a given problem. Validation sets are also known as dev sets.

supervised AI is trained on a corpus of training data. Training, tuning, model selection and testing are performed with three different datasets: the training set, the validation set and the testing set. Validation sets are used to select and tune the final AI model.

Training sets make up the majority of the total data, averaging 60 percent. In testing, the models are fit to parameters in a process that is known as adjusting weights.

The validation set makes up about 20 percent of the bulk of data used. The validation set contrasts with training and test sets in that it is an intermediate phase used for choosing the best model and optimizing it. Validation is sometimes considered a part of the training phase. It is in this phase that parameter tuning occurs for optimizing the selected model. Overfitting is checked and avoided in the validation set to eliminate errors that can be caused for future predictions and observations if an analysis corresponds too precisely to a specific dataset.

Testing sets make up 20 percent of the bulk of the data. These sets are ideal data and results with which to verify correct operation of an AI. The test set is ensured to be the input data grouped together with verified correct outputs, generally by human verification. This ideal set is used to test results and assess the performance of the final model.

It is generally considered unwise to attempt further adjustment past the testing phase. Attempting to add further optimization outside the validation phase will likely to increase overfitting.

This was last updated in April 2018

Continue Reading About validation set

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

-ADS BY GOOGLE

File Extensions and File Formats

SearchCompliance

  • Whistleblower Protection Act

    The Whistleblower Protection Act of 1989 is a law that protects federal government employees in the United States from ...

  • smart contract

    A smart contract, also known as a cryptocontract, is a computer program that directly controls the transfer of digital currencies...

  • risk map (risk heat map)

    A risk map, also known as a risk heat map, is a data visualization tool for communicating specific risks an organization faces. A...

SearchSecurity

  • certificate authority (CA)

    A certificate authority (CA) is a trusted entity that issues digital certificates, which are data files used to cryptographically...

  • hacktivism

    Hacktivism is the act of hacking, or breaking into a computer system, for a politically or socially motivated purpose.

  • advanced persistent threat (APT)

    An advanced persistent threat (APT) is a prolonged and targeted cyberattack in which an intruder gains access to a network and ...

SearchHealthIT

  • Cerner Corp.

    Cerner Corp. is a public company in North Kansas City, Mo., that provides various health information technologies, ranging from ...

  • clinical decision support system (CDSS)

    A clinical decision support system (CDSS) is an application that analyzes data to help healthcare providers make decisions and ...

  • Health IT (health information technology)

    Health IT (health information technology) is the area of IT involving the design, development, creation, use and maintenance of ...

SearchDisasterRecovery

  • tabletop exercise (TTX)

    A tabletop exercise (TTX) is a disaster preparedness activity that takes participants through the process of dealing with a ...

  • risk mitigation

    Risk mitigation is a strategy to prepare for and lessen the effects of threats faced by a data center.

  • ransomware recovery

    Ransomware recovery is the process of resuming options following a cyberattack that demands payment in exchange for unlocking ...

SearchStorage

  • file system

    In a computer, a file system -- sometimes written filesystem -- is the way in which files are named and where they are placed ...

  • storage virtualization

    Storage virtualization is the pooling of physical storage from multiple storage devices into what appears to be a single storage ...

  • cache (computing)

    A cache -- pronounced CASH -- is hardware or software that is used to store something, usually data, temporarily in a computing ...

Close