Posted: January 24th, 2023
Assignment 4: Machine Learning Model Evaluation
• Three multi-part, multiple-choice questions.
• AI in Healthcare with Phase 2 data set (HTML file)
• Details of the Q1 & Q2 m/c questions are shown in the attached question files.
• Lecture notes on Machine Learning in Healthcare for your reference
Phase IV Model Evaluation and Deployment Question Sheet
Open the HTML file attached to read the scenario. After reading through the case, please review the three questions in this assignment.
Keep the HTML file open so that it is easier for you to look for the information questioned in the following question. Select the correct answer with an explanation in 2 to 3 sentences.
Q 1 .
Part 1
Which of the following definitions best describes the AUROC metric?
How many of the samples predicted positive are actually positive
How well the model is able to segment objects
How many positive samples are correctly predicted as positive
How well separated the predicted probabilities of negatives are from positives
Part 2.
Why might model B be more useful than model A in worklist prioritization?
Model B is not more useful than model A
Model B places more abnormal exams early in the worklist than model A
Model B places more normal exams early in the worklist than model A
Model B places fewer abnormal exams at the end of the worklist than model A
Part 3.
How might both models be leveraged in order to produce a better performance immediately overall?
Each model can be deployed at separate clinics
For new exams, the predictions of both models can be averaged together
The models can be re-trained using the same hyperparameters
The trained models can be used to train other models
Part 4
When models are evaluated, they are often trained multiple times. Why might this be the case?
Q 2
Part 1
For automated triage, which of the following is the most important metric to satisfy when choosing an operating point?
Specificity, as it measures how many negative samples are correctly predicted as negative
PPV / Precision, as it measures how many of the samples predicted positive are actually positive
Sensitivity / Recall, as it measures how many positive samples are correctly predicted as positive
Intersection Over Union (IOU), as it measures how well the model is able to segment the lesions
Part 2.
Who is the beneficiary of the EHR-based invasive mechanical ventilation predictor?
1/ The provider – the output helps the clinician manage their patients
2/ The patient – the output will help the patient make informed decisions
3/The hospital – the output will identify the need for additional ICU rooms
4 / None of the above
Part 3.
Hypoxemia is an important feature in the EHR-based invasive mechanical ventilation predictor model that was validated in the independent sample. This can be used as a component of the:
Valid clinical association
Analytical validation
Clinical validation
None of the above
Part 4.
The model has identified that patient X is at high risk of invasive mechanical ventilation within the next 48 hours and this information was sent to the clinical team. What action can be taken based on the prediction and any mitigation strategy.
Reserve a ventilator and bed in ICU to ensure the patient has the resources they need when they begin to clinically decline
Administration of an experimental drug currently under investigation for COVID deterioration
Notify the insurance company to ensure payment of the anticipated extended length of stay
None of the above
Q3
The question will no longer be tied to the HTML file , as it applies more broadly to many deployment settings.
A multi-part multiple-choice question that deals with issues and hypothetical scenarios related to model deployment in the clinical setting.
Part 1.
Before you deploy the model, you want to test the fairness of the model based on anti-classification. What variables would you use to test fairness?
Gender
Race
County of residence
Gender and Race
All of the above
Part 2.
You provide the model output to the clinical team. What would be the risk category?
Category I
Category II
Category III
Category IV
Part 3.
Given the novelty of COVID, what do you think could be used for a valid clinical association argument?
Performance of a clinical trial based on your AI solution
Literature Searches
Examples of how your model can generate new evidence
All of the above
Part 4.
Your model uses past symptoms as a predictor for invasive mechanical ventilation, however 40% of your population are on public insurance and likely do not have the same access to care as those on private insurance. How would this bias your results?
Under reporting of symptoms in public insured patients
Patients on public insurance are likely to have more symptoms than patients on private insurance because they are known to have more comorbidities
This will not bias the outcome, need for invasive mechanical ventilation because symptoms were not a top predictor of the outcome
None of the above
Gender
Part 5
In real-time, it takes 24 hours to obtain and process the images needed for the CXR-based COVID detector. Would this time lag affect the clinical utility of the AI solution? Why or Why not?
SOLUTION
Machine learning model evaluation is the process of assessing the performance of a model on a given dataset. This is done to determine the accuracy, reliability, and overall quality of the model. Common evaluation metrics include accuracy, precision, recall, and F1 score.
Once a model has been evaluated and its performance has been deemed satisfactory, it can then be deployed in a production environment. Deployment involves making the model available for use by others, such as through an API or as a standalone application. This typically involves additional steps such as optimization for performance and integration with other systems.
Place an order in 3 easy steps. Takes less than 5 mins.