Machine Learning Model Training|Course hero helper

Posted: January 24th, 2023

Assignment 2: Machine Learning Model Training 1

• Two multi-part, multiple-choice questions.

Need a custom paper ASAP?
We can do it today.
Tailored to your instructions. 0% plagiarism.

• AI in Healthcare with Phase 2 data set (HTML file)

• Details of the Q1 & Q2 m/c questions are shown in the attached question files.

• Lecture notes on Machine Learning in Healthcare for your reference

Phase II Model Training Question Sheet

Click on the HTML file attached to read the scenario. After reading through the case, please review the two questions in this assignment.

Keep the HTML file open so that it is easier for you to look for the information questioned in the quiz/exercise. Check all the correct answers with an explanation in 2 or 3 sentences.

Q 1 Part 1.

The team split the data into two partitions: the training set and the test set. It is considered best practice to have a third partition– the validation set. What added utility is there in having a validation set? Check all that apply.

 

The validation set can be used for tuning hyperparameters

 

The validation set can be used for early stopping

 

The test set should be used for final evaluation only

 

The validation set can be used for updating the model directly

Part 2.

The team split the data randomly, without accounting for the patient to whom each exam belongs. Why would this be a problem? Recall: “The COVID dataset consists of 30,000 exams across 21,000 patients (some patients may be associated with multiple exams)”

 

Patient overlap between the training and test sets may lead to problems with model convergence due to exposure to the test set

 

Patient overlap between the training and test sets may lead to problems with model bias because of the underrepresentation of certain patient demographics in the training set

 

Patient overlap between the training and test sets may lead to the leakage of PHI or other sensitive data

 

Patient overlap between the training and test sets may lead to inflated model performance due to unrealistic evaluation conditions

 

 

Part 3.

The team downsized the images to 224 by 224 pixels. Why might this lead to worse model performance?

 

The discriminative features in the image may be too small to identify without a higher resolution

 

Many publicly available models use 224 by 224 pixel images

 

Memory constraints may limit the model’s ability to process high-resolution images

 

224 by 224 pixel chest x-rays are easier to classify than 3000 by 3000 pixel chest x-rays

 

Part 4.

Why are Convolutional Neural Networks (CNN) particularly well suited for image classification tasks? Check all that apply.

 

CNN architectures take advantage of feature locality through the use of filters

 

CNN architectures leverage multiple decision trees in order to make their predictions more robust

 

CNN architectures are parameter-efficient because they use the same set of weights on each region of the image

 

CNN architectures can condition on previous timesteps, which it takes as input in addition to the images themselves

 

 

 

 

 

 

 

Part 5.

What learning phenomena is the team observing?

 

Convergence

 

Overfitting

 

Underfitting

 

Generalization

 

Part 6

(i)

A colleague approaches you and suggests that it would be better if you created a model that relied only on observable features and exam metadata (patient age, gender, ethnicity, etc.). What trade-offs must be considered when using lab values as features?

Answer in 3- 5 sentences

(ii)

Before using the new public COVID dataset, you want to verify that there is no PHI in the data. What are some privacy issues that could come into play with imaging data?

Answer in 3- 5 sentences

 

 

 

 

 

 

 

 

 

Q 2 Part 1

 

The D-DIMER values are highly concentrated <1k, but there are many samples that are several orders of magnitude apart from the rest of the samples. What is the most likely explanation for this? (Hint: look at the data samples, particularly the exam metadata.)

 

There is a large disparity in D-DIMER lab values across patient gender

 

There is a large disparity in D-DIMER lab values across patient age

 

The data collected from one the clinics may use different units

 

The data was collected from two cohorts from two different time periods

 

Part 2.

 

Which of the following strategies can be used in order to accommodate for the missing values in the EHR dataset? Check all that apply.

 

A logistic regression model can be trained after the missing values are synthetically generated, using a process known as imputation

 

A tree-based model, such as random forest, can be trained directly on the data with missing values

 

A tree-based model, such as random forest, can be trained after the missing values are synthetically generated, using a process known as imputation

 

A logistic regression model can be trained directly on the data with missing values

 

 

 

 

 

 

Part 3.

Which of the following is FALSE regarding logistic regression models?

 

Logistic regression uses the sigmoid activation function

 

Logistic regression can take unstructured inputs, such as images or text

 

Logistic regression produces values between 0 and 1, regardless of the scale of the features

 

Logistic regression is commonly used for classification problems

Part 4.

Which of the following is FALSE regarding random forest models?

 

Random forest models are a type of decision tree algorithm

 

Random forest models are highly interpretable

 

Random forest models learn multiple decision trees that each learn on a subset of the available features

 

Random forest models require feature normalization (i.e. scaling the features such that they are between 0 and 1) in order to work effectively

 

SOLUTION

 

Machine learning model training is the process of using a set of labeled data, called a training dataset, to learn the parameters of a model that can make predictions on new, unseen data. The process typically involves feeding the training data into the model, adjusting the model’s parameters to minimize an error metric, and repeating this process until the model’s performance on the training data is satisfactory. Once the model is trained, it can be used to make predictions on new, unseen data.

Expert paper writers are just a few clicks away

Place an order in 3 easy steps. Takes less than 5 mins.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00