Meeting Notes June 23 2018

Where we are today (6/23):

96.2 R2 and 96.9 Explained variance
RMSE 5.36 (Everything is now in terms of IC50 | Before it was in terms of normalized scale)
Mean Absolute Error 2.8
Median Absolute Error 2.59

For most of them, the error is only 2.5 up or down

Model Selections :

  * Linear SVM: Found features that only that model can find the right relationship
  * No other model (Ensemble or not) had performance that's close to that.
  * LASSO and Ridge : Both did pretty bad initially. However, still more testing needed considering reduced feature sets.
  * Adapated Boosting : did well on MSE
  * Multi-layered Perceptron : Did well on MSE

Next Steps:

Split into groups. 1 Group to start looking into inference. And Group 2 to continue on modeling.

Group 1 : Inference | Kate, Blake

Extract and analyze feature importance. Incorporate other methods, such as clustering, similarity analysis, focusing on the best IC50 (OSM-S-169)
Working with the modeling side by side. Work on providing insights and reasoning to that recommendations.
Build the inference pipeline to take in as input the performing feature sets and provide the output. 80 R2 or higher should be good performing ones.
Making sure the test predictions look realistic is important. What features we would need to narrow down for the testing/experiment.
Call out the risks of testing our recommendation based on possible error.
Consider Building something like Prediction Interval so we can call out a wider window of possibility and the risks involved
Review Blake's Streamline : 9 Models into the function. Pass a string. Similar to how we do grid search. U can do that - this function does grid search on all models

Group 2 :Building The Model | Roopa, Chris, Amy

Continue working on new models on individual paths, find new approaches that work and provide recommendations for predictors.
Multiple set of features that are performing well. Tried with 80, Try with 300.
Divide & Conquer - Step method algorithm with different models and starting features. Moving from the local maximum to newer peak. Go for robust-ment of step methods.
Combine efforts on best model. Use as a starting point and run step algorithm.
Consider Least angle regression : Instead of traditional ordinary least squares It uses angle between vectors.

Presentation Topics - Outline | Blake

Github Repo
Folder Structure For Reuse. How well it's working. The why behind it.
Where we are and our Inferences So Far.
- Accuracy of the model, test results. Best predictors.
- Demo ?
- Series 3 a repesentation of Selleck DB [ Series 3 : Compounds that they thought they may be potent and determined the IC50 Values. Selleck Compounds : No IC50. Compounds that they had purchased and intend to perform experiments based on our recommendations ]
What we are working on now.

Weekly Discussion Points

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meeting Notes June 23 2018

Where we are today (6/23):

Model Selections :

Next Steps:

Group 1 : Inference | Kate, Blake

Group 2 :Building The Model | Roopa, Chris, Amy

Presentation Topics - Outline | Blake

Clone this wiki locally