Machine Learning - MRI Analysis Project (Part II)
*This is a second part of an introductory post on machine learning. It is based on a practical project I worked on at ETH. Here I give you an explanation of two crucial steps in machine learning pipeline – Model selection and Validation. I then wrap up with some useful takeaways from the project, that you can eventually use in your own work and concise summary. If you missed the first part of the post, or just feel you could use a refresher, you can find it here.* Model Selection Although we have discussed quite few steps so far, they were only preparatory (albeit necessary and crucial) and it is only now, in model selection phase, that we actually get our hands on machine learning algorithms. We can further divide this phase into two steps – training and validation. Let’s look at training first. In the scope of our project we needed to deal with both regression and classification, here I will describe the latter one as it is perhaps more intuitive and again, I will illustrate it on an example from our project. Recalling the task at hand – to disambiguate between mentally healthy and sick patients from an MRI scan, we can imagine a simplified scenario where each brain scan is described by single two-dimensional vector of features. Further, assume that they form two point-clouds that are perfectly linearly separable as show in the figure: Fig. 5: Idealized example of classification task Fig. 6: More realistic example of a classification task, notice that data don't form any recognizable sub-groups We may than write down an equation of line that divides the two-dimensional space into two half-planes. Whenever we make a new observation (take an MRI of a new patient), his brain, described by two feature vector, will fall to one of those two regions and we will label it accordingly as ‘healthy’ or ‘sick’. ...