Upload New File

25f91738 · Elias Ervelä · ec4c702b · 25f91738
Commit 25f91738 authored 3 years ago by Elias Ervelä
--- a/ADA2021_exercise1_elias_ervela.ipynb
+++ b/ADA2021_exercise1_elias_ervela.ipynb
+{"nbformat":4,"nbformat_minor":0,"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.6.7"},"toc":{"base_numbering":1,"nav_menu":{},"number_sections":false,"sideBar":true,"skip_h1_title":false,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{},"toc_section_display":true,"toc_window_display":true},"varInspector":{"cols":{"lenName":16,"lenType":16,"lenVar":40},"kernels_config":{"python":{"delete_cmd_postfix":"","delete_cmd_prefix":"del ","library":"var_list.py","varRefreshCmd":"print(var_dic_list())"},"r":{"delete_cmd_postfix":") ","delete_cmd_prefix":"rm(","library":"var_list.r","varRefreshCmd":"cat(var_dic_list()) "}},"types_to_exclude":["module","function","builtin_function_or_method","instance","_Feature"],"window_display":false},"colab":{"name":"ADA2021_exercise1_elias_ervela.ipynb","provenance":[]}},"cells":[{"cell_type":"markdown","metadata":{"id":"Um709OzlT-d3"},"source":["Elias Ervelä <br>\n","student number 518434 <br>\n","emerve@utu.fi  <br>\n","Feb, 1, 2021  <br>"]},{"cell_type":"markdown","metadata":{"id":"0Z42VwVAT-d8"},"source":["# Exercise 1 | TKO_2096 Application of Data Analysis 2021"]},{"cell_type":"markdown","metadata":{"id":"BTfW_4sZT-d9"},"source":["#### Nested cross-validation for K-nearest neighbors <br>\n","- Use Python 3 to program a nested cross-validation for the k-nearest neighbors (kNN) method so that the number of neighbours k is automatically selected from the range 1 to 10. In other words, the base learning algorithm is kNN but the actual learning algorithm, whose prediction performance will be evaluated with nested CV, is kNN with automatic CV-based model selection (see the lectures and the pseudo codes presented on them for more info on this interpretation).\n","- As a kNN implementation, you can use sklearn: http://scikit-learn.org/stable/modules/neighbors.html but your own kNN implementation can also be used if you like to keep more control on what is happening in the learning process. The CV implementation should be easily modifiable, since the forthcoming exercises involve different problem-dependent CV variations.\n","- Use the nested CV implementation on the iris data and report the resulting classification accuracy. Hint: you can use the nested CV example provided on sklearn documentation: https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html as a starting point and compare your nested CV implementation with that but do NOT use the ready made CV implementations of sklearn as the idea of the exercise is to learn to split the data on your own. The other exercises need more sophisticated data splitting which are not necessarily available in libraries.\n","- Return your solution for each exercise BOTH as a Jupyter Notebook file and as a PDF-file made from it.\n","- Return the report to the course page on **Monday 1st of February** at the latest.  "]},{"cell_type":"markdown","metadata":{"id":"8V0XlaCET-d9"},"source":["## Import libraries"]},{"cell_type":"code","metadata":{"id":"TVDAyuvTT-d9"},"source":["#In this cell import all libraries you need. For example: \n","import numpy as np\n","import pandas as pd\n","from sklearn.datasets import load_iris  # Iris dataset\n","from sklearn.neighbors import KNeighborsClassifier\n","from sklearn.model_selection import StratifiedKFold"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"RkCHU8yYT-d-"},"source":["## Results the nested cross-validation"]},{"cell_type":"code","metadata":{"id":"o1_WlgDtmHTr"},"source":["# Load the dataset\n","iris = load_iris()\n","X_iris = iris.data\n","y_iris = iris.target\n","#X_iris\n","#y_iris"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"AYvcC8_5oGbU","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1612128300416,"user_tz":-120,"elapsed":1708,"user":{"displayName":"Elias Ervelä","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GhcVQbqAobpSX3NE6w5d6aZPU_VzlnsvBC9GkyMtw=s64","userId":"11858975235946053692"}},"outputId":"22cbb38e-9d67-4715-b91c-406fb22e6900"},"source":["#In this cell run your script for nested CV and print the result.\n","\n","test_splits = 5  # How many folds in the outer cv\n","val_splits = 5  # How many folds in the inner cv\n","\n","skf_test = StratifiedKFold(n_splits = test_splits, shuffle=True)  # I wasn't sure if StratifiedKFold was allowed to use, but I did used them.\n","skf_val = StratifiedKFold(n_splits = val_splits, shuffle=True)\n","\n","\n","test_score_sum = 0  # Setuping the test sets score sum variable\n","outer_iteration = 0  # Tracking the outer iteration\n","\n","# Test the performance with cross-validation of automatically selected hypothesis.\n","for train_val_index, test_index in skf_test.split(X_iris, y_iris):\n","\n","  outer_iteration += 1\n","  print(\"Outer CV iteration: \", outer_iteration)\n","\n","  # Split the data into test and train/validation\n","  X_train_val, X_test = X_iris[train_val_index], X_iris[test_index]\n","  y_train_val, y_test = y_iris[train_val_index], y_iris[test_index]\n","\n","\n","  best_performance = -1  # Best performance of a iteration\n","  best_k = 0  # Value of k in best iteration\n","\n","  # Go through k = 1, ..., 10 and pick the best performing.\n","  for k in range(1,11):\n","\n","    val_score_sum = 0  # Sum of the validation score\n","\n","    # Valuate the performance of given k, with cross-validation.\n","    for train_index, val_index in skf_val.split(X_train_val, y_train_val):\n","\n","      # Split data into validation and training\n","      X_train, X_val = X_iris[train_val_index], X_iris[val_index]\n","      y_train, y_val = y_iris[train_val_index], y_iris[val_index]\n","\n","      neigh = KNeighborsClassifier(n_neighbors=k)  # Setup kNN\n","      neigh.fit(X_train, y_train)  # Train the model\n","      val_score_sum += neigh.score(X_val, y_val)  # Test the model with validation sum the score\n","\n","    avg_performance = val_score_sum/val_splits  # Average performance\n","    print(\"  Avg val score with k=\", k, \": \", avg_performance)\n","\n","    if avg_performance > best_performance:\n","      best_performance = avg_performance\n","      best_k = k\n","\n","  print(\" Best k: \", best_k)\n","\n","  # Train with whole X_train_val with best k and test it\n","  neigh = KNeighborsClassifier(n_neighbors=best_k)  # Setup kNN\n","  neigh.fit(X_train_val, y_train_val)  # Train the model\n","  test_score_sum += neigh.score(X_test, y_test)  \n","  print(\" Test score with k=\", best_k, \": \", neigh.score(X_test, y_test))\n","  \n","avg_test_score = test_score_sum/test_splits\n","print(\"Avg test score:\", avg_test_score)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Outer CV iteration:  1\n","  Avg val score with k= 1 :  0.9916666666666668\n","  Avg val score with k= 2 :  0.975\n","  Avg val score with k= 3 :  0.9666666666666668\n","  Avg val score with k= 4 :  0.9666666666666668\n","  Avg val score with k= 5 :  0.9583333333333334\n","  Avg val score with k= 6 :  0.9666666666666666\n","  Avg val score with k= 7 :  0.9583333333333334\n","  Avg val score with k= 8 :  0.975\n","  Avg val score with k= 9 :  0.975\n","  Avg val score with k= 10 :  0.975\n"," Best k:  1\n"," Test score with k= 1 :  0.9666666666666667\n","Outer CV iteration:  2\n","  Avg val score with k= 1 :  0.9833333333333334\n","  Avg val score with k= 2 :  0.975\n","  Avg val score with k= 3 :  0.9666666666666668\n","  Avg val score with k= 4 :  0.9666666666666668\n","  Avg val score with k= 5 :  0.975\n","  Avg val score with k= 6 :  0.9833333333333334\n","  Avg val score with k= 7 :  0.9833333333333334\n","  Avg val score with k= 8 :  0.9833333333333334\n","  Avg val score with k= 9 :  0.9833333333333334\n","  Avg val score with k= 10 :  0.9833333333333334\n"," Best k:  1\n"," Test score with k= 1 :  0.9333333333333333\n","Outer CV iteration:  3\n","  Avg val score with k= 1 :  1.0\n","  Avg val score with k= 2 :  0.9833333333333334\n","  Avg val score with k= 3 :  0.9583333333333334\n","  Avg val score with k= 4 :  0.9666666666666666\n","  Avg val score with k= 5 :  0.9583333333333334\n","  Avg val score with k= 6 :  0.975\n","  Avg val score with k= 7 :  0.9833333333333334\n","  Avg val score with k= 8 :  0.9833333333333334\n","  Avg val score with k= 9 :  0.9833333333333334\n","  Avg val score with k= 10 :  0.9833333333333334\n"," Best k:  1\n"," Test score with k= 1 :  0.9666666666666667\n","Outer CV iteration:  4\n","  Avg val score with k= 1 :  0.9833333333333334\n","  Avg val score with k= 2 :  0.975\n","  Avg val score with k= 3 :  0.9583333333333334\n","  Avg val score with k= 4 :  0.9583333333333334\n","  Avg val score with k= 5 :  0.9583333333333333\n","  Avg val score with k= 6 :  0.9583333333333334\n","  Avg val score with k= 7 :  0.9583333333333334\n","  Avg val score with k= 8 :  0.9666666666666668\n","  Avg val score with k= 9 :  0.9666666666666666\n","  Avg val score with k= 10 :  0.975\n"," Best k:  1\n"," Test score with k= 1 :  0.9333333333333333\n","Outer CV iteration:  5\n","  Avg val score with k= 1 :  1.0\n","  Avg val score with k= 2 :  0.9833333333333334\n","  Avg val score with k= 3 :  0.9583333333333334\n","  Avg val score with k= 4 :  0.9666666666666666\n","  Avg val score with k= 5 :  0.9583333333333334\n","  Avg val score with k= 6 :  0.9666666666666666\n","  Avg val score with k= 7 :  0.975\n","  Avg val score with k= 8 :  0.975\n","  Avg val score with k= 9 :  0.9666666666666666\n","  Avg val score with k= 10 :  0.9833333333333334\n"," Best k:  1\n"," Test score with k= 1 :  1.0\n","Avg test score: 0.96\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"RVa7SqyRUF4X"},"source":["These results are suprisingly good."]},{"cell_type":"code","metadata":{"id":"RefYmo1G2BWR"},"source":["# Made after returning this\n","def kNN_nestedCrossValidation(X, y, test_splits, val_splits, k_range, print_steps):\n","\n","  from sklearn.model_selection import StratifiedKFold\n","  from sklearn.neighbors import KNeighborsClassifier\n","  from sklearn.model_selection import LeaveOneOut\n","\n","  if test_splits == \"loo\":\n","    splits_test = LeaveOneOut()\n","    test_splits = len(X)\n","  else:\n","    splits_test = StratifiedKFold(n_splits = test_splits, shuffle=True)\n","\n","  if val_splits == \"loo\":\n","    splits_val = LeaveOneOut()\n","    val_splits = len(X)\n","  else:\n","    splits_val = StratifiedKFold(n_splits = val_splits, shuffle=True)\n","\n","  test_score_sum = 0  # Setuping the test sets score sum variable\n","  outer_iteration = 0  # Tracking the outer iteration\n","\n","  # Test the performance with cross-validation of automatically selected hypothesis.\n","  for train_val_index, test_index in splits_test.split(X, y):\n","\n","    outer_iteration += 1\n","    if print_steps:\n","     print(\"Outer CV iteration: \", outer_iteration)\n","\n","    # Split the data into test and train/validation\n","    X_train_val, X_test = X[train_val_index], X[test_index]\n","    y_train_val, y_test = y[train_val_index], y[test_index]\n","\n","\n","    best_performance = -1  # Setup best performance of a iteration\n","    best_k = 0  # Value of k in best iteration\n","\n","    # Go through k = 1, ..., 10 and pick the best performing.\n","    for k in k_range:\n","\n","      val_score_sum = 0  # Sum of the validation score\n","\n","      # Valuate the performance of given k, with cross-validation.\n","      for train_index, val_index in splits_val.split(X_train_val, y_train_val):\n","\n","        # Split data into validation and training\n","        X_train, X_val = X[train_val_index], X[val_index]\n","        y_train, y_val = y[train_val_index], y[val_index]\n","\n","        neigh = KNeighborsClassifier(n_neighbors=k)  # Setup kNN\n","        neigh.fit(X_train, y_train)  # Train the model\n","        val_score_sum += neigh.score(X_val, y_val)  # Test the model with validation sum the score\n","\n","      avg_performance = val_score_sum/val_splits  # Average performance\n","      if print_steps:\n","       print(\"  Avg val score with k=\", k, \": \", avg_performance)\n","\n","      if avg_performance > best_performance:\n","        best_performance = avg_performance\n","        best_k = k\n","    if print_steps:\n","     print(\" Best k: \", best_k)\n","\n","    # Train with whole X_train_val with best k and test it\n","    neigh = KNeighborsClassifier(n_neighbors=best_k)  # Setup kNN\n","    neigh.fit(X_train_val, y_train_val)  # Train the model\n","    test_score_sum += neigh.score(X_test, y_test)  \n","    if print_steps:\n","     print(\" Test score with k=\", best_k, \": \", neigh.score(X_test, y_test))\n","    \n","  avg_test_score = test_score_sum/test_splits\n","  print(\"Avg test score:\", avg_test_score)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"6_jDHgeL2F5X","executionInfo":{"status":"ok","timestamp":1612875077677,"user_tz":-120,"elapsed":2358,"user":{"displayName":"Elias Ervelä","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GhcVQbqAobpSX3NE6w5d6aZPU_VzlnsvBC9GkyMtw=s64","userId":"11858975235946053692"}},"outputId":"32fc54ed-cc3d-48e6-a969-c5861f0f16db"},"source":["from sklearn.datasets import load_iris  # Iris dataset\n","iris = load_iris()\n","X = iris.data\n","y = iris.target\n","\n","kNN_nestedCrossValidation(X, y, test_splits=10, val_splits=2, k_range=range(1,3), print_steps=False)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Avg test score: 0.9600000000000002\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"SHrJHMv_2H2i"},"source":[""],"execution_count":null,"outputs":[]}]}
\ No newline at end of file
+%% Cell type:markdown id: tags:
+Elias Ervelä <br>
+student number 518434 <br>
+emerve@utu.fi  <br>
+Feb, 1, 2021  <br>
+%% Cell type:markdown id: tags:
+# Exercise 1 | TKO_2096 Application of Data Analysis 2021
+%% Cell type:markdown id: tags:
+#### Nested cross-validation for K-nearest neighbors <br>
+- Use Python 3 to program a nested cross-validation for the k-nearest neighbors (kNN) method so that the number of neighbours k is automatically selected from the range 1 to 10. In other words, the base learning algorithm is kNN but the actual learning algorithm, whose prediction performance will be evaluated with nested CV, is kNN with automatic CV-based model selection (see the lectures and the pseudo codes presented on them for more info on this interpretation).
+- As a kNN implementation, you can use sklearn: http://scikit-learn.org/stable/modules/neighbors.html but your own kNN implementation can also be used if you like to keep more control on what is happening in the learning process. The CV implementation should be easily modifiable, since the forthcoming exercises involve different problem-dependent CV variations.
+- Use the nested CV implementation on the iris data and report the resulting classification accuracy. Hint: you can use the nested CV example provided on sklearn documentation: https://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html as a starting point and compare your nested CV implementation with that but do NOT use the ready made CV implementations of sklearn as the idea of the exercise is to learn to split the data on your own. The other exercises need more sophisticated data splitting which are not necessarily available in libraries.
+- Return your solution for each exercise BOTH as a Jupyter Notebook file and as a PDF-file made from it.
+- Return the report to the course page on **Monday 1st of February** at the latest.
+%% Cell type:markdown id: tags:
+## Import libraries
+%% Cell type:code id: tags:
+``` python
+#In this cell import all libraries you need. For example:
+import numpy as np
+import pandas as pd
+from sklearn.datasets import load_iris  # Iris dataset
+from sklearn.neighbors import KNeighborsClassifier
+from sklearn.model_selection import StratifiedKFold
+```
+%% Cell type:markdown id: tags:
+## Results the nested cross-validation
+%% Cell type:code id: tags:
+``` python
+# Load the dataset
+iris = load_iris()
+X_iris = iris.data
+y_iris = iris.target
+#X_iris
+#y_iris
+```
+%% Cell type:code id: tags:
+``` python
+#In this cell run your script for nested CV and print the result.
+test_splits = 5  # How many folds in the outer cv
+val_splits = 5  # How many folds in the inner cv
+skf_test = StratifiedKFold(n_splits = test_splits, shuffle=True)  # I wasn't sure if StratifiedKFold was allowed to use, but I did used them.
+skf_val = StratifiedKFold(n_splits = val_splits, shuffle=True)
+test_score_sum = 0  # Setuping the test sets score sum variable
+outer_iteration = 0  # Tracking the outer iteration
+# Test the performance with cross-validation of automatically selected hypothesis.
+for train_val_index, test_index in skf_test.split(X_iris, y_iris):
+  outer_iteration += 1
+  print("Outer CV iteration: ", outer_iteration)
+  # Split the data into test and train/validation
+  X_train_val, X_test = X_iris[train_val_index], X_iris[test_index]
+  y_train_val, y_test = y_iris[train_val_index], y_iris[test_index]
+  best_performance = -1  # Best performance of a iteration
+  best_k = 0  # Value of k in best iteration
+  # Go through k = 1, ..., 10 and pick the best performing.
+  for k in range(1,11):
+    val_score_sum = 0  # Sum of the validation score
+    # Valuate the performance of given k, with cross-validation.
+    for train_index, val_index in skf_val.split(X_train_val, y_train_val):
+      # Split data into validation and training
+      X_train, X_val = X_iris[train_val_index], X_iris[val_index]
+      y_train, y_val = y_iris[train_val_index], y_iris[val_index]
+      neigh = KNeighborsClassifier(n_neighbors=k)  # Setup kNN
+      neigh.fit(X_train, y_train)  # Train the model
+      val_score_sum += neigh.score(X_val, y_val)  # Test the model with validation sum the score
+    avg_performance = val_score_sum/val_splits  # Average performance
+    print("  Avg val score with k=", k, ": ", avg_performance)
+    if avg_performance > best_performance:
+      best_performance = avg_performance
+      best_k = k
+  print(" Best k: ", best_k)
+  # Train with whole X_train_val with best k and test it
+  neigh = KNeighborsClassifier(n_neighbors=best_k)  # Setup kNN
+  neigh.fit(X_train_val, y_train_val)  # Train the model
+  test_score_sum += neigh.score(X_test, y_test)
+  print(" Test score with k=", best_k, ": ", neigh.score(X_test, y_test))
+avg_test_score = test_score_sum/test_splits
+print("Avg test score:", avg_test_score)
+```
+%% Output
+    Outer CV iteration:  1
+      Avg val score with k= 1 :  0.9916666666666668
+      Avg val score with k= 2 :  0.975
+      Avg val score with k= 3 :  0.9666666666666668
+      Avg val score with k= 4 :  0.9666666666666668
+      Avg val score with k= 5 :  0.9583333333333334
+      Avg val score with k= 6 :  0.9666666666666666
+      Avg val score with k= 7 :  0.9583333333333334
+      Avg val score with k= 8 :  0.975
+      Avg val score with k= 9 :  0.975
+      Avg val score with k= 10 :  0.975
+     Best k:  1
+     Test score with k= 1 :  0.9666666666666667
+    Outer CV iteration:  2
+      Avg val score with k= 1 :  0.9833333333333334
+      Avg val score with k= 2 :  0.975
+      Avg val score with k= 3 :  0.9666666666666668
+      Avg val score with k= 4 :  0.9666666666666668
+      Avg val score with k= 5 :  0.975
+      Avg val score with k= 6 :  0.9833333333333334
+      Avg val score with k= 7 :  0.9833333333333334
+      Avg val score with k= 8 :  0.9833333333333334
+      Avg val score with k= 9 :  0.9833333333333334
+      Avg val score with k= 10 :  0.9833333333333334
+     Best k:  1
+     Test score with k= 1 :  0.9333333333333333
+    Outer CV iteration:  3
+      Avg val score with k= 1 :  1.0
+      Avg val score with k= 2 :  0.9833333333333334
+      Avg val score with k= 3 :  0.9583333333333334
+      Avg val score with k= 4 :  0.9666666666666666
+      Avg val score with k= 5 :  0.9583333333333334
+      Avg val score with k= 6 :  0.975
+      Avg val score with k= 7 :  0.9833333333333334
+      Avg val score with k= 8 :  0.9833333333333334
+      Avg val score with k= 9 :  0.9833333333333334
+      Avg val score with k= 10 :  0.9833333333333334
+     Best k:  1
+     Test score with k= 1 :  0.9666666666666667
+    Outer CV iteration:  4
+      Avg val score with k= 1 :  0.9833333333333334
+      Avg val score with k= 2 :  0.975
+      Avg val score with k= 3 :  0.9583333333333334
+      Avg val score with k= 4 :  0.9583333333333334
+      Avg val score with k= 5 :  0.9583333333333333
+      Avg val score with k= 6 :  0.9583333333333334
+      Avg val score with k= 7 :  0.9583333333333334
+      Avg val score with k= 8 :  0.9666666666666668
+      Avg val score with k= 9 :  0.9666666666666666
+      Avg val score with k= 10 :  0.975
+     Best k:  1
+     Test score with k= 1 :  0.9333333333333333
+    Outer CV iteration:  5
+      Avg val score with k= 1 :  1.0
+      Avg val score with k= 2 :  0.9833333333333334
+      Avg val score with k= 3 :  0.9583333333333334
+      Avg val score with k= 4 :  0.9666666666666666
+      Avg val score with k= 5 :  0.9583333333333334
+      Avg val score with k= 6 :  0.9666666666666666
+      Avg val score with k= 7 :  0.975
+      Avg val score with k= 8 :  0.975
+      Avg val score with k= 9 :  0.9666666666666666
+      Avg val score with k= 10 :  0.9833333333333334
+     Best k:  1
+     Test score with k= 1 :  1.0
+    Avg test score: 0.96
+%% Cell type:markdown id: tags:
+These results are suprisingly good.
+%% Cell type:code id: tags:
+``` python
+# Made after returning this
+def kNN_nestedCrossValidation(X, y, test_splits, val_splits, k_range, print_steps):
+  from sklearn.model_selection import StratifiedKFold
+  from sklearn.neighbors import KNeighborsClassifier
+  from sklearn.model_selection import LeaveOneOut
+  if test_splits == "loo":
+    splits_test = LeaveOneOut()
+    test_splits = len(X)
+  else:
+    splits_test = StratifiedKFold(n_splits = test_splits, shuffle=True)
+  if val_splits == "loo":
+    splits_val = LeaveOneOut()
+    val_splits = len(X)
+  else:
+    splits_val = StratifiedKFold(n_splits = val_splits, shuffle=True)
+  test_score_sum = 0  # Setuping the test sets score sum variable
+  outer_iteration = 0  # Tracking the outer iteration
+  # Test the performance with cross-validation of automatically selected hypothesis.
+  for train_val_index, test_index in splits_test.split(X, y):
+    outer_iteration += 1
+    if print_steps:
+     print("Outer CV iteration: ", outer_iteration)
+    # Split the data into test and train/validation
+    X_train_val, X_test = X[train_val_index], X[test_index]
+    y_train_val, y_test = y[train_val_index], y[test_index]
+    best_performance = -1  # Setup best performance of a iteration
+    best_k = 0  # Value of k in best iteration
+    # Go through k = 1, ..., 10 and pick the best performing.
+    for k in k_range:
+      val_score_sum = 0  # Sum of the validation score
+      # Valuate the performance of given k, with cross-validation.
+      for train_index, val_index in splits_val.split(X_train_val, y_train_val):
+        # Split data into validation and training
+        X_train, X_val = X[train_val_index], X[val_index]
+        y_train, y_val = y[train_val_index], y[val_index]
+        neigh = KNeighborsClassifier(n_neighbors=k)  # Setup kNN
+        neigh.fit(X_train, y_train)  # Train the model
+        val_score_sum += neigh.score(X_val, y_val)  # Test the model with validation sum the score
+      avg_performance = val_score_sum/val_splits  # Average performance
+      if print_steps:
+       print("  Avg val score with k=", k, ": ", avg_performance)
+      if avg_performance > best_performance:
+        best_performance = avg_performance
+        best_k = k
+    if print_steps:
+     print(" Best k: ", best_k)
+    # Train with whole X_train_val with best k and test it
+    neigh = KNeighborsClassifier(n_neighbors=best_k)  # Setup kNN
+    neigh.fit(X_train_val, y_train_val)  # Train the model
+    test_score_sum += neigh.score(X_test, y_test)
+    if print_steps:
+     print(" Test score with k=", best_k, ": ", neigh.score(X_test, y_test))
+  avg_test_score = test_score_sum/test_splits
+  print("Avg test score:", avg_test_score)
+```
+%% Cell type:code id: tags:
+``` python
+from sklearn.datasets import load_iris  # Iris dataset
+iris = load_iris()
+X = iris.data
+y = iris.target
+kNN_nestedCrossValidation(X, y, test_splits=10, val_splits=2, k_range=range(1,3), print_steps=False)
+```
+%% Output
+    Avg test score: 0.9600000000000002
+%% Cell type:code id: tags:
+``` python
+```