Validation accuracy vs Testing accuracy The 2019 Stack Overflow Developer Survey Results Are InInformation on how value of k in k-fold cross-validation affects resulting accuraciesEstimating the variance of a bootstrap aggregator performance?Inconsistency in cross-validation resultsCross-validation including training, validation, and testing. Why do we need three subsets?My Test accuracy is pretty bad compared to cross-validation accuracyBetter accuracy with validation set than test setFeature selection: is nested cross-validation needed?10-fold cross validation, why having a validation set?Bias-Variance terminology for loss functions in ML vs cross-validation — different things?Is cross-validation better/worse than a third holdout set?

Match Roman Numerals

Output the Arecibo Message

Why doesn't shell automatically fix "useless use of cat"?

What's the name of these plastic connectors

What to do when moving next to a bird sanctuary with a loosely-domesticated cat?

How much of the clove should I use when using big garlic heads?

For what reasons would an animal species NOT cross a *horizontal* land bridge?

Ubuntu Server install with full GUI

Why doesn't UInt have a toDouble()?

Why couldn't they take pictures of a closer black hole?

Slides for 30 min~1 hr Skype tenure track application interview

I am an eight letter word. What am I?

Why can't devices on different VLANs, but on the same subnet, communicate?

Button changing its text & action. Good or terrible?

Accepted by European university, rejected by all American ones I applied to? Possible reasons?

Can there be female White Walkers?

What information about me do stores get via my credit card?

writing variables above the numbers in tikz picture

How to translate "being like"?

Why don't hard Brexiteers insist on a hard border to prevent illegal immigration after Brexit?

Deal with toxic manager when you can't quit

What do these terms in Caesar's Gallic wars mean?

What is the light source in the black hole images?

Are spiders unable to hurt humans, especially very small spiders?



Validation accuracy vs Testing accuracy



The 2019 Stack Overflow Developer Survey Results Are InInformation on how value of k in k-fold cross-validation affects resulting accuraciesEstimating the variance of a bootstrap aggregator performance?Inconsistency in cross-validation resultsCross-validation including training, validation, and testing. Why do we need three subsets?My Test accuracy is pretty bad compared to cross-validation accuracyBetter accuracy with validation set than test setFeature selection: is nested cross-validation needed?10-fold cross validation, why having a validation set?Bias-Variance terminology for loss functions in ML vs cross-validation — different things?Is cross-validation better/worse than a third holdout set?



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


I am trying to get my head straight on terminology which appears confusing. I know there are three 'splits' of data used in Machine learning models.:



  1. Training Data - Train the model

  2. Validation Data - Cross validation for model selection

  3. Testing Data - Test the generalisation error.

Now, as far as I am aware, the validation data is not always used as one can use k-fold cross-validation, reducing the need to further reduce ones dataset. The results of which are known as the validation accuracy. Then once the best model is selected, the model is tested on a 33% split from the initial data set (which has not been used to train). The results of this would be the testing accuracy?



Is this the right way around? or is vice versa? I am finding conflicting terminology used online! I am trying to find some explanations why my validation error is larger than my testing error, but before I find a solution, i would like to get my terminology correct.



Thanks.










share|cite|improve this question









$endgroup$











  • $begingroup$
    Please also take a look at my answer on a similar post which explains the key differences. Specially the last part on validation set.
    $endgroup$
    – Esmailian
    yesterday


















2












$begingroup$


I am trying to get my head straight on terminology which appears confusing. I know there are three 'splits' of data used in Machine learning models.:



  1. Training Data - Train the model

  2. Validation Data - Cross validation for model selection

  3. Testing Data - Test the generalisation error.

Now, as far as I am aware, the validation data is not always used as one can use k-fold cross-validation, reducing the need to further reduce ones dataset. The results of which are known as the validation accuracy. Then once the best model is selected, the model is tested on a 33% split from the initial data set (which has not been used to train). The results of this would be the testing accuracy?



Is this the right way around? or is vice versa? I am finding conflicting terminology used online! I am trying to find some explanations why my validation error is larger than my testing error, but before I find a solution, i would like to get my terminology correct.



Thanks.










share|cite|improve this question









$endgroup$











  • $begingroup$
    Please also take a look at my answer on a similar post which explains the key differences. Specially the last part on validation set.
    $endgroup$
    – Esmailian
    yesterday














2












2








2


1



$begingroup$


I am trying to get my head straight on terminology which appears confusing. I know there are three 'splits' of data used in Machine learning models.:



  1. Training Data - Train the model

  2. Validation Data - Cross validation for model selection

  3. Testing Data - Test the generalisation error.

Now, as far as I am aware, the validation data is not always used as one can use k-fold cross-validation, reducing the need to further reduce ones dataset. The results of which are known as the validation accuracy. Then once the best model is selected, the model is tested on a 33% split from the initial data set (which has not been used to train). The results of this would be the testing accuracy?



Is this the right way around? or is vice versa? I am finding conflicting terminology used online! I am trying to find some explanations why my validation error is larger than my testing error, but before I find a solution, i would like to get my terminology correct.



Thanks.










share|cite|improve this question









$endgroup$




I am trying to get my head straight on terminology which appears confusing. I know there are three 'splits' of data used in Machine learning models.:



  1. Training Data - Train the model

  2. Validation Data - Cross validation for model selection

  3. Testing Data - Test the generalisation error.

Now, as far as I am aware, the validation data is not always used as one can use k-fold cross-validation, reducing the need to further reduce ones dataset. The results of which are known as the validation accuracy. Then once the best model is selected, the model is tested on a 33% split from the initial data set (which has not been used to train). The results of this would be the testing accuracy?



Is this the right way around? or is vice versa? I am finding conflicting terminology used online! I am trying to find some explanations why my validation error is larger than my testing error, but before I find a solution, i would like to get my terminology correct.



Thanks.







machine-learning






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Apr 7 at 18:26









BillyJo_ramblerBillyJo_rambler

296




296











  • $begingroup$
    Please also take a look at my answer on a similar post which explains the key differences. Specially the last part on validation set.
    $endgroup$
    – Esmailian
    yesterday

















  • $begingroup$
    Please also take a look at my answer on a similar post which explains the key differences. Specially the last part on validation set.
    $endgroup$
    – Esmailian
    yesterday
















$begingroup$
Please also take a look at my answer on a similar post which explains the key differences. Specially the last part on validation set.
$endgroup$
– Esmailian
yesterday





$begingroup$
Please also take a look at my answer on a similar post which explains the key differences. Specially the last part on validation set.
$endgroup$
– Esmailian
yesterday











2 Answers
2






active

oldest

votes


















1












$begingroup$

There isn't a standard terminology in this context (and I have seen long discussions and debates regarding this topic), so I completely understand you, but you should get used to different terminology (and assume that terminology might not be consistent or it change across sources).



I would like to point out a few things:



  • I have never seen people use the expression "validation accuracy" (or dataset) to refer to the test accuracy (or dataset), but I have seen people use the term "test accuracy" (or dataset) to refer to the validation accuracy (or dataset). In other words, the test (or testing) accuracy often refers to the validation accuracy, that is, the accuracy you calculate on the data set you do not use for training, but you use (during the training process) for validating (or "testing") the generalisation ability of your model or for "early stopping".


  • In k-fold cross-validation, people usually only mention two datasets: training and testing (or validation).


  • k-fold cross-validation is just a way of validating the model on different subsets of the data. This can be done for several reasons. For example, you have a small amount of data, so your validation (and training) dataset is quite small, so you want to have a better understanding of the model's generalisation ability by validating it on several subsets of the whole dataset.


  • You should likely have a separate (from the validation dataset) dataset for testing, because the validation dataset can be used for early stopping, so, in a certain way, it is dependent on the training process


I would suggest to use the following terminology



  • Training dataset: the data used to fit the model.

  • Validation dataset: the data used to validate the generalisation ability of the model or for early stopping, during the training process.

  • Testing dataset: the data used to for other purposes other than training and validating.

Note that some of these datasets might overlap, but this might almost never be a good thing (if you have enough data).






share|cite|improve this answer











$endgroup$












  • $begingroup$
    If the testing dataset overlaps with either of the others, it is definitely not a good thing. The test accuracy must measure performance on unseen data. If any part of training saw the data, then it isn't test data, and representing it as such is dishonest. Allowing the validation set to overlap with the training set isn't dishonest, but it probably won't accomplish its task as well. (e.g., if you're doing early stopping, and your validation set and training sets overlap, overfitting may occur and not be detected.)
    $endgroup$
    – Ray
    Apr 7 at 23:44











  • $begingroup$
    @Ray I didn't say it is a good thing. Indeed, see my point "You should likely have a separate (from the validation dataset) dataset for testing...".
    $endgroup$
    – nbro
    Apr 7 at 23:46











  • $begingroup$
    You said "If that's a 'good' thing or not, it's another question." I suspected from the rest that you understood the problems that that overlap would cause, but the problems with that should be made very clear, since contaminating your test data with training samples completely ruins its value.
    $endgroup$
    – Ray
    Apr 7 at 23:48











  • $begingroup$
    @Ray I wanted more to refer to the overlap between the training and validation datasets. Anyway, I think it's good that you wanted to clarify or emphasise this point. I edited my answer to emphasise this point.
    $endgroup$
    – nbro
    Apr 7 at 23:51



















1












$begingroup$

@nbro's answer is complete. I just add a couple of explanations to supplement. In more traditional textbooks data is often partitioned into two sets: training and test. In recent years, with more complex models and increasing need for model selection, development sets or validations sets are also considered. Devel/validation should have no overlap with the test set or the reporting accuracy/ error evaluation is not valid. In the modern setting: the model is trained on the training set, tested on the validation set to see if it is a good fit, possibly model is tweaked and trained again and validated again for multiple times. When the final model is selected, the testing set is used to calculate accuracy, error reports. The important thing is that the test set is only touched once.






share|cite|improve this answer








New contributor




user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function ()
    return StackExchange.using("mathjaxEditing", function ()
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    );
    );
    , "mathjax-editing");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "65"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401696%2fvalidation-accuracy-vs-testing-accuracy%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1












    $begingroup$

    There isn't a standard terminology in this context (and I have seen long discussions and debates regarding this topic), so I completely understand you, but you should get used to different terminology (and assume that terminology might not be consistent or it change across sources).



    I would like to point out a few things:



    • I have never seen people use the expression "validation accuracy" (or dataset) to refer to the test accuracy (or dataset), but I have seen people use the term "test accuracy" (or dataset) to refer to the validation accuracy (or dataset). In other words, the test (or testing) accuracy often refers to the validation accuracy, that is, the accuracy you calculate on the data set you do not use for training, but you use (during the training process) for validating (or "testing") the generalisation ability of your model or for "early stopping".


    • In k-fold cross-validation, people usually only mention two datasets: training and testing (or validation).


    • k-fold cross-validation is just a way of validating the model on different subsets of the data. This can be done for several reasons. For example, you have a small amount of data, so your validation (and training) dataset is quite small, so you want to have a better understanding of the model's generalisation ability by validating it on several subsets of the whole dataset.


    • You should likely have a separate (from the validation dataset) dataset for testing, because the validation dataset can be used for early stopping, so, in a certain way, it is dependent on the training process


    I would suggest to use the following terminology



    • Training dataset: the data used to fit the model.

    • Validation dataset: the data used to validate the generalisation ability of the model or for early stopping, during the training process.

    • Testing dataset: the data used to for other purposes other than training and validating.

    Note that some of these datasets might overlap, but this might almost never be a good thing (if you have enough data).






    share|cite|improve this answer











    $endgroup$












    • $begingroup$
      If the testing dataset overlaps with either of the others, it is definitely not a good thing. The test accuracy must measure performance on unseen data. If any part of training saw the data, then it isn't test data, and representing it as such is dishonest. Allowing the validation set to overlap with the training set isn't dishonest, but it probably won't accomplish its task as well. (e.g., if you're doing early stopping, and your validation set and training sets overlap, overfitting may occur and not be detected.)
      $endgroup$
      – Ray
      Apr 7 at 23:44











    • $begingroup$
      @Ray I didn't say it is a good thing. Indeed, see my point "You should likely have a separate (from the validation dataset) dataset for testing...".
      $endgroup$
      – nbro
      Apr 7 at 23:46











    • $begingroup$
      You said "If that's a 'good' thing or not, it's another question." I suspected from the rest that you understood the problems that that overlap would cause, but the problems with that should be made very clear, since contaminating your test data with training samples completely ruins its value.
      $endgroup$
      – Ray
      Apr 7 at 23:48











    • $begingroup$
      @Ray I wanted more to refer to the overlap between the training and validation datasets. Anyway, I think it's good that you wanted to clarify or emphasise this point. I edited my answer to emphasise this point.
      $endgroup$
      – nbro
      Apr 7 at 23:51
















    1












    $begingroup$

    There isn't a standard terminology in this context (and I have seen long discussions and debates regarding this topic), so I completely understand you, but you should get used to different terminology (and assume that terminology might not be consistent or it change across sources).



    I would like to point out a few things:



    • I have never seen people use the expression "validation accuracy" (or dataset) to refer to the test accuracy (or dataset), but I have seen people use the term "test accuracy" (or dataset) to refer to the validation accuracy (or dataset). In other words, the test (or testing) accuracy often refers to the validation accuracy, that is, the accuracy you calculate on the data set you do not use for training, but you use (during the training process) for validating (or "testing") the generalisation ability of your model or for "early stopping".


    • In k-fold cross-validation, people usually only mention two datasets: training and testing (or validation).


    • k-fold cross-validation is just a way of validating the model on different subsets of the data. This can be done for several reasons. For example, you have a small amount of data, so your validation (and training) dataset is quite small, so you want to have a better understanding of the model's generalisation ability by validating it on several subsets of the whole dataset.


    • You should likely have a separate (from the validation dataset) dataset for testing, because the validation dataset can be used for early stopping, so, in a certain way, it is dependent on the training process


    I would suggest to use the following terminology



    • Training dataset: the data used to fit the model.

    • Validation dataset: the data used to validate the generalisation ability of the model or for early stopping, during the training process.

    • Testing dataset: the data used to for other purposes other than training and validating.

    Note that some of these datasets might overlap, but this might almost never be a good thing (if you have enough data).






    share|cite|improve this answer











    $endgroup$












    • $begingroup$
      If the testing dataset overlaps with either of the others, it is definitely not a good thing. The test accuracy must measure performance on unseen data. If any part of training saw the data, then it isn't test data, and representing it as such is dishonest. Allowing the validation set to overlap with the training set isn't dishonest, but it probably won't accomplish its task as well. (e.g., if you're doing early stopping, and your validation set and training sets overlap, overfitting may occur and not be detected.)
      $endgroup$
      – Ray
      Apr 7 at 23:44











    • $begingroup$
      @Ray I didn't say it is a good thing. Indeed, see my point "You should likely have a separate (from the validation dataset) dataset for testing...".
      $endgroup$
      – nbro
      Apr 7 at 23:46











    • $begingroup$
      You said "If that's a 'good' thing or not, it's another question." I suspected from the rest that you understood the problems that that overlap would cause, but the problems with that should be made very clear, since contaminating your test data with training samples completely ruins its value.
      $endgroup$
      – Ray
      Apr 7 at 23:48











    • $begingroup$
      @Ray I wanted more to refer to the overlap between the training and validation datasets. Anyway, I think it's good that you wanted to clarify or emphasise this point. I edited my answer to emphasise this point.
      $endgroup$
      – nbro
      Apr 7 at 23:51














    1












    1








    1





    $begingroup$

    There isn't a standard terminology in this context (and I have seen long discussions and debates regarding this topic), so I completely understand you, but you should get used to different terminology (and assume that terminology might not be consistent or it change across sources).



    I would like to point out a few things:



    • I have never seen people use the expression "validation accuracy" (or dataset) to refer to the test accuracy (or dataset), but I have seen people use the term "test accuracy" (or dataset) to refer to the validation accuracy (or dataset). In other words, the test (or testing) accuracy often refers to the validation accuracy, that is, the accuracy you calculate on the data set you do not use for training, but you use (during the training process) for validating (or "testing") the generalisation ability of your model or for "early stopping".


    • In k-fold cross-validation, people usually only mention two datasets: training and testing (or validation).


    • k-fold cross-validation is just a way of validating the model on different subsets of the data. This can be done for several reasons. For example, you have a small amount of data, so your validation (and training) dataset is quite small, so you want to have a better understanding of the model's generalisation ability by validating it on several subsets of the whole dataset.


    • You should likely have a separate (from the validation dataset) dataset for testing, because the validation dataset can be used for early stopping, so, in a certain way, it is dependent on the training process


    I would suggest to use the following terminology



    • Training dataset: the data used to fit the model.

    • Validation dataset: the data used to validate the generalisation ability of the model or for early stopping, during the training process.

    • Testing dataset: the data used to for other purposes other than training and validating.

    Note that some of these datasets might overlap, but this might almost never be a good thing (if you have enough data).






    share|cite|improve this answer











    $endgroup$



    There isn't a standard terminology in this context (and I have seen long discussions and debates regarding this topic), so I completely understand you, but you should get used to different terminology (and assume that terminology might not be consistent or it change across sources).



    I would like to point out a few things:



    • I have never seen people use the expression "validation accuracy" (or dataset) to refer to the test accuracy (or dataset), but I have seen people use the term "test accuracy" (or dataset) to refer to the validation accuracy (or dataset). In other words, the test (or testing) accuracy often refers to the validation accuracy, that is, the accuracy you calculate on the data set you do not use for training, but you use (during the training process) for validating (or "testing") the generalisation ability of your model or for "early stopping".


    • In k-fold cross-validation, people usually only mention two datasets: training and testing (or validation).


    • k-fold cross-validation is just a way of validating the model on different subsets of the data. This can be done for several reasons. For example, you have a small amount of data, so your validation (and training) dataset is quite small, so you want to have a better understanding of the model's generalisation ability by validating it on several subsets of the whole dataset.


    • You should likely have a separate (from the validation dataset) dataset for testing, because the validation dataset can be used for early stopping, so, in a certain way, it is dependent on the training process


    I would suggest to use the following terminology



    • Training dataset: the data used to fit the model.

    • Validation dataset: the data used to validate the generalisation ability of the model or for early stopping, during the training process.

    • Testing dataset: the data used to for other purposes other than training and validating.

    Note that some of these datasets might overlap, but this might almost never be a good thing (if you have enough data).







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited Apr 7 at 23:53

























    answered Apr 7 at 18:52









    nbronbro

    8111023




    8111023











    • $begingroup$
      If the testing dataset overlaps with either of the others, it is definitely not a good thing. The test accuracy must measure performance on unseen data. If any part of training saw the data, then it isn't test data, and representing it as such is dishonest. Allowing the validation set to overlap with the training set isn't dishonest, but it probably won't accomplish its task as well. (e.g., if you're doing early stopping, and your validation set and training sets overlap, overfitting may occur and not be detected.)
      $endgroup$
      – Ray
      Apr 7 at 23:44











    • $begingroup$
      @Ray I didn't say it is a good thing. Indeed, see my point "You should likely have a separate (from the validation dataset) dataset for testing...".
      $endgroup$
      – nbro
      Apr 7 at 23:46











    • $begingroup$
      You said "If that's a 'good' thing or not, it's another question." I suspected from the rest that you understood the problems that that overlap would cause, but the problems with that should be made very clear, since contaminating your test data with training samples completely ruins its value.
      $endgroup$
      – Ray
      Apr 7 at 23:48











    • $begingroup$
      @Ray I wanted more to refer to the overlap between the training and validation datasets. Anyway, I think it's good that you wanted to clarify or emphasise this point. I edited my answer to emphasise this point.
      $endgroup$
      – nbro
      Apr 7 at 23:51

















    • $begingroup$
      If the testing dataset overlaps with either of the others, it is definitely not a good thing. The test accuracy must measure performance on unseen data. If any part of training saw the data, then it isn't test data, and representing it as such is dishonest. Allowing the validation set to overlap with the training set isn't dishonest, but it probably won't accomplish its task as well. (e.g., if you're doing early stopping, and your validation set and training sets overlap, overfitting may occur and not be detected.)
      $endgroup$
      – Ray
      Apr 7 at 23:44











    • $begingroup$
      @Ray I didn't say it is a good thing. Indeed, see my point "You should likely have a separate (from the validation dataset) dataset for testing...".
      $endgroup$
      – nbro
      Apr 7 at 23:46











    • $begingroup$
      You said "If that's a 'good' thing or not, it's another question." I suspected from the rest that you understood the problems that that overlap would cause, but the problems with that should be made very clear, since contaminating your test data with training samples completely ruins its value.
      $endgroup$
      – Ray
      Apr 7 at 23:48











    • $begingroup$
      @Ray I wanted more to refer to the overlap between the training and validation datasets. Anyway, I think it's good that you wanted to clarify or emphasise this point. I edited my answer to emphasise this point.
      $endgroup$
      – nbro
      Apr 7 at 23:51
















    $begingroup$
    If the testing dataset overlaps with either of the others, it is definitely not a good thing. The test accuracy must measure performance on unseen data. If any part of training saw the data, then it isn't test data, and representing it as such is dishonest. Allowing the validation set to overlap with the training set isn't dishonest, but it probably won't accomplish its task as well. (e.g., if you're doing early stopping, and your validation set and training sets overlap, overfitting may occur and not be detected.)
    $endgroup$
    – Ray
    Apr 7 at 23:44





    $begingroup$
    If the testing dataset overlaps with either of the others, it is definitely not a good thing. The test accuracy must measure performance on unseen data. If any part of training saw the data, then it isn't test data, and representing it as such is dishonest. Allowing the validation set to overlap with the training set isn't dishonest, but it probably won't accomplish its task as well. (e.g., if you're doing early stopping, and your validation set and training sets overlap, overfitting may occur and not be detected.)
    $endgroup$
    – Ray
    Apr 7 at 23:44













    $begingroup$
    @Ray I didn't say it is a good thing. Indeed, see my point "You should likely have a separate (from the validation dataset) dataset for testing...".
    $endgroup$
    – nbro
    Apr 7 at 23:46





    $begingroup$
    @Ray I didn't say it is a good thing. Indeed, see my point "You should likely have a separate (from the validation dataset) dataset for testing...".
    $endgroup$
    – nbro
    Apr 7 at 23:46













    $begingroup$
    You said "If that's a 'good' thing or not, it's another question." I suspected from the rest that you understood the problems that that overlap would cause, but the problems with that should be made very clear, since contaminating your test data with training samples completely ruins its value.
    $endgroup$
    – Ray
    Apr 7 at 23:48





    $begingroup$
    You said "If that's a 'good' thing or not, it's another question." I suspected from the rest that you understood the problems that that overlap would cause, but the problems with that should be made very clear, since contaminating your test data with training samples completely ruins its value.
    $endgroup$
    – Ray
    Apr 7 at 23:48













    $begingroup$
    @Ray I wanted more to refer to the overlap between the training and validation datasets. Anyway, I think it's good that you wanted to clarify or emphasise this point. I edited my answer to emphasise this point.
    $endgroup$
    – nbro
    Apr 7 at 23:51





    $begingroup$
    @Ray I wanted more to refer to the overlap between the training and validation datasets. Anyway, I think it's good that you wanted to clarify or emphasise this point. I edited my answer to emphasise this point.
    $endgroup$
    – nbro
    Apr 7 at 23:51














    1












    $begingroup$

    @nbro's answer is complete. I just add a couple of explanations to supplement. In more traditional textbooks data is often partitioned into two sets: training and test. In recent years, with more complex models and increasing need for model selection, development sets or validations sets are also considered. Devel/validation should have no overlap with the test set or the reporting accuracy/ error evaluation is not valid. In the modern setting: the model is trained on the training set, tested on the validation set to see if it is a good fit, possibly model is tweaked and trained again and validated again for multiple times. When the final model is selected, the testing set is used to calculate accuracy, error reports. The important thing is that the test set is only touched once.






    share|cite|improve this answer








    New contributor




    user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    $endgroup$

















      1












      $begingroup$

      @nbro's answer is complete. I just add a couple of explanations to supplement. In more traditional textbooks data is often partitioned into two sets: training and test. In recent years, with more complex models and increasing need for model selection, development sets or validations sets are also considered. Devel/validation should have no overlap with the test set or the reporting accuracy/ error evaluation is not valid. In the modern setting: the model is trained on the training set, tested on the validation set to see if it is a good fit, possibly model is tweaked and trained again and validated again for multiple times. When the final model is selected, the testing set is used to calculate accuracy, error reports. The important thing is that the test set is only touched once.






      share|cite|improve this answer








      New contributor




      user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      $endgroup$















        1












        1








        1





        $begingroup$

        @nbro's answer is complete. I just add a couple of explanations to supplement. In more traditional textbooks data is often partitioned into two sets: training and test. In recent years, with more complex models and increasing need for model selection, development sets or validations sets are also considered. Devel/validation should have no overlap with the test set or the reporting accuracy/ error evaluation is not valid. In the modern setting: the model is trained on the training set, tested on the validation set to see if it is a good fit, possibly model is tweaked and trained again and validated again for multiple times. When the final model is selected, the testing set is used to calculate accuracy, error reports. The important thing is that the test set is only touched once.






        share|cite|improve this answer








        New contributor




        user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        $endgroup$



        @nbro's answer is complete. I just add a couple of explanations to supplement. In more traditional textbooks data is often partitioned into two sets: training and test. In recent years, with more complex models and increasing need for model selection, development sets or validations sets are also considered. Devel/validation should have no overlap with the test set or the reporting accuracy/ error evaluation is not valid. In the modern setting: the model is trained on the training set, tested on the validation set to see if it is a good fit, possibly model is tweaked and trained again and validated again for multiple times. When the final model is selected, the testing set is used to calculate accuracy, error reports. The important thing is that the test set is only touched once.







        share|cite|improve this answer








        New contributor




        user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|cite|improve this answer



        share|cite|improve this answer






        New contributor




        user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered Apr 7 at 22:22









        user3089485user3089485

        163




        163




        New contributor




        user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        user3089485 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f401696%2fvalidation-accuracy-vs-testing-accuracy%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            What does it mean to find percent difference when two values are equivalent? The 2019 Stack Overflow Developer Survey Results Are InWhat does “percent of change” mean?Find what percent X is between two numbers?Unable to determine 'original amount' in simple percentage problemsWhat is the correct percent difference formula?How does proportionality hold when quantities are high? And the percentage increase formulaprofit and loss GRE questionProfitability calculationWhat is the difference between $xtimes 0.8$ and $x div 1.2 ? $Finding the percent probability of completing BUDs trainingCalculating Percent Difference with zero and near zero values

            Why did some early computer designers eschew integers?What register size did early computers use?What other computers used this floating-point format?Why did so many early microcomputers use the MOS 6502 and variants?Why were early computers named “Mark”?Why did expert systems fall?Why were early personal computer monitors not green?When did “Zen” in computer programming become a thing?History of advanced hardwareWere there any working computers using residue number systems?Why did some CPUs use two Read/Write lines, and others just one?

            How to avoid repetitive long generic constraints in Rust The 2019 Stack Overflow Developer Survey Results Are In Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) The Ask Question Wizard is Live! Data science time! April 2019 and salary with experienceIs it possible to automatically implement a trait for any tuple that is made up of types that all implement the trait?Is there a constraint that restricts my generic method to numeric types?How can foreign key constraints be temporarily disabled using T-SQL?How do I use reflection to call a generic method?How to create a generic array in Java?How to get a class instance of generics type THow is `last` allowed to be called for an Args value?How to implement a trait for a parameterized traitAvoiding PhantomData in a struct to enforce type constraintsIs it possible to return part of a struct by reference?Associated References types as Value Types