What is the difference between training data, validation data, and test data?

Question

Accepted Answer

Training data is used to fit the model parameters. Validation data is used to tune hyperparameters and select the best model architecture during development. The test data is held out completely until the end to provide an unbiased estimate of the model's performance on real-world data. Mixing these up leads to optimistic bias.

What is the difference between training data, validation data, and test data?

Why Interviewers Ask This

How to Answer This Question

Key Points to Cover

Sample Answer

Common Mistakes to Avoid

Sound confident on this question in 5 minutes

Related Interview Questions

Can you explain the difference between supervised and unsupervised learning?

What is Artificial Intelligence and how does it function?

How do you handle missing or inconsistent data in a dataset?