Skip to content

Meeting Notes June 23 2018

srinivasannambi edited this page Jun 24, 2018 · 1 revision

Where we are today (6/23):

  • 96.2 R2 and 96.9 Explained variance
  • RMSE 5.36 (Everything is now in terms of IC50 | Before it was in terms of normalized scale)
  • Mean Absolute Error 2.8
  • Median Absolute Error 2.59
  • For most of them, the error is only 2.5 up or down

    Model Selections :

      * Linear SVM: Found features that only that model can find the right relationship
      * No other model (Ensemble or not) had performance that's close to that.
      * LASSO and Ridge : Both did pretty bad initially. However, still more testing needed considering reduced feature sets.
      * Adapated Boosting : did well on MSE
      * Multi-layered Perceptron : Did well on MSE
    

Next Steps:

Split into groups. 1 Group to start looking into inference. And Group 2 to continue on modeling.

Group 1 : Inference | Kate, Blake

  • Extract and analyze feature importance. Incorporate other methods, such as clustering, similarity analysis, focusing on the best IC50 (OSM-S-169)
  • Working with the modeling side by side. Work on providing insights and reasoning to that recommendations.
  • Build the inference pipeline to take in as input the performing feature sets and provide the output. 80 R2 or higher should be good performing ones.
  • Making sure the test predictions look realistic is important. What features we would need to narrow down for the testing/experiment.
  • Call out the risks of testing our recommendation based on possible error.
  • Consider Building something like Prediction Interval so we can call out a wider window of possibility and the risks involved
  • Review Blake's Streamline : 9 Models into the function. Pass a string. Similar to how we do grid search. U can do that - this function does grid search on all models

Group 2 :Building The Model | Roopa, Chris, Amy

  • Continue working on new models on individual paths, find new approaches that work and provide recommendations for predictors.
  • Multiple set of features that are performing well. Tried with 80, Try with 300.
  • Divide & Conquer - Step method algorithm with different models and starting features. Moving from the local maximum to newer peak. Go for robust-ment of step methods.
  • Combine efforts on best model. Use as a starting point and run step algorithm.
  • Consider Least angle regression : Instead of traditional ordinary least squares It uses angle between vectors.

Presentation Topics - Outline | Blake

  • Github Repo
  • Folder Structure For Reuse. How well it's working. The why behind it.
  • Where we are and our Inferences So Far.
    • Accuracy of the model, test results. Best predictors.
    • Demo ?
    • Series 3 a repesentation of Selleck DB [ Series 3 : Compounds that they thought they may be potent and determined the IC50 Values. Selleck Compounds : No IC50. Compounds that they had purchased and intend to perform experiments based on our recommendations ]
  • What we are working on now.