Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add suggestion bundle support #3041

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

granthamtaylor
Copy link
Contributor

@granthamtaylor granthamtaylor commented Jan 7, 2025

Why are the changes needed?

I recently created the flytekitplugins.optuna plugin.

This PR adds support to bundle all of the suggestions into any number of dictionaries.

This is for the cases in which one wants to add dozens or more hyperparameters. Adding each of them as a type function input is pretty exhausting. In our case, we can collect all of these suggestions into any number of dictionaries at runtime.

What changes were proposed in this pull request?

I add support for bundled suggestions to flytekitplugins.optuna. Users should provide any kwargs of type dict[str, Suggestion|Any].

How was this patch tested?

I have added an example and unit tests for the new bundling functionality.

Setup process

import math
from typing import Union

import flytekit as fl
from flytekitplugins.optuna import Optimizer, suggest

@fl.task(container_image=image)
async def objective(suggestions: dict[str, Union[int, float]], power: int) -> float:

    # building out a large set of typed inputs is exhausting, so we can just use a dict

    x, y, z = suggestions["x"], suggestions["y"], suggestions["z"]

    return math.log((((x - 5) ** 2) + (y + 4) ** 4 + (3 * z - 3) ** 2)) ** power


@fl.eager(container_image=image)
async def train(concurrency: int, n_trials: int) -> float:
    optimizer = Optimizer(objective, concurrency, n_trials)

    suggestions = {
        "x": suggest.float(low=-10, high=10),
        "y": suggest.integer(low=-10, high=10),
        "z": suggest.category([-5, 0, 3, 6, 9]),
    }

    await optimizer(suggestions=suggestions, power=2)

    return optimizer.study.best_value

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Summary by Bito

Enhanced flytekit-optuna plugin with bundled hyperparameter suggestions and improved type safety. Implemented dictionary-based parameter passing and callback mechanism for better management. Modified Optimizer class with instance-based suggest method and parameter organization using parent-child relationships. Added proper handling of optimizer's best value and streamlined multiple hyperparameter handling through runtime dictionary collection.

Unit tests added: True

Estimated effort to review (1-5, lower is better): 2

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 7, 2025

Code Review Agent Run #411aad

Actionable Suggestions - 3
  • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py - 2
    • Consider revising validation logic for parameters · Line 65-68
    • Consider consolidating duplicate suggestion logic · Line 130-145
  • plugins/flytekit-optuna/tests/test_optimizer.py - 1
    • Wrong objective function used in test · Line 47-47
Review Details
  • Files reviewed - 2 · Commit Range: 4298d66..f1d291f
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
    • plugins/flytekit-optuna/tests/test_optimizer.py
  • Files skipped - 1
    • plugins/flytekit-optuna/README.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 7, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Feature Improvement - Enhanced Optuna Parameter Bundling Support

optimizer.py - Added support for bundled parameter suggestions and callback mechanism

setup.py - Added typing-extensions dependency

test_optimizer.py - Added tests for bundled suggestions and callback functionality

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 7, 2025

Code Review Agent Run #c016c8

Actionable Suggestions - 3
  • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py - 1
    • Consider making suggestions a keyword argument · Line 128-130
  • plugins/flytekit-optuna/tests/test_optimizer.py - 2
    • Consider adding dictionary key validation · Line 37-37
    • Consider using TypedDict for suggestions parameter · Line 37-37
Additional Suggestions - 1
  • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: f1d291f..f481da6
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
    • plugins/flytekit-optuna/tests/test_optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.79%. Comparing base (f634d53) to head (749ac03).
Report is 5 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3041   +/-   ##
=======================================
  Coverage   82.79%   82.79%           
=======================================
  Files           3        3           
  Lines         186      186           
=======================================
  Hits          154      154           
  Misses         32       32           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 8, 2025

Code Review Agent Run #b3b166

Actionable Suggestions - 0
Additional Suggestions - 1
  • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: f481da6..0554472
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
    • plugins/flytekit-optuna/tests/test_optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 8, 2025

Code Review Agent Run #af2f75

Actionable Suggestions - 1
  • plugins/flytekit-optuna/tests/test_optimizer.py - 1
    • Consider adding error handling for optimizer · Line 24-29
Additional Suggestions - 1
  • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: 0554472..f57dc7a
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
    • plugins/flytekit-optuna/tests/test_optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@granthamtaylor granthamtaylor enabled auto-merge (squash) January 8, 2025 02:20
@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 8, 2025

Code Review Agent Run #5bfe4b

Actionable Suggestions - 1
  • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py - 1
Review Details
  • Files reviewed - 2 · Commit Range: f57dc7a..b76e948
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
    • plugins/flytekit-optuna/tests/test_optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 8, 2025

Code Review Agent Run #b54cd9

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: b76e948..8b9af79
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 8, 2025

Code Review Agent Run #7205fa

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 8b9af79..d8a5bf4
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@granthamtaylor
Copy link
Contributor Author

Another example:

import asyncio

import flytekit as fl
import sklearn
from sklearn.model_selection import train_test_split
import xgboost as xgb
import numpy as np
import optuna
import plotly

from flytekitplugins.optuna import Optimizer, suggest

image = fl.ImageSpec(
    builder="union",
    packages=[
        "flytekit==1.15.0b0",
        "optuna>=4.0.0",
        "scikit-learn>=1.6.0",
        "xgboost>=2.1.3",
        "plotly>=5.24.1",
    ],
)


@fl.task(container_image=image)
async def objective(params: dict[str, int | float | str]) -> float:

    (data, target) = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
    dtrain = xgb.DMatrix(train_x, label=train_y)
    dvalid = xgb.DMatrix(valid_x, label=valid_y)

    bst = xgb.train(params, dtrain)
    valid_yhat = bst.predict(dvalid)
    accuracy = sklearn.metrics.accuracy_score(valid_y, np.rint(valid_yhat))

    return accuracy


@fl.eager(container_image=image)
async def train(concurrency: int, n_trials: int):
    
    study = optuna.create_study(direction="maximize")
    
    optimizer = Optimizer(objective, concurrency, n_trials, study)

    params = {
        "lambda": suggest.float(1e-8, 1.0, log=True),
        "alpha": suggest.float(1e-8, 1.0, log=True),
        "subsample": suggest.float(0.2, 1.0),
        "colsample_bytree": suggest.float(0.2, 1.0),
        "max_depth": suggest.integer(3, 9, step=2),
        "min_child_weight": suggest.integer(2, 10),
        "eta": suggest.float(1e-8, 1.0, log=True),
        "gamma": suggest.float(1e-8, 1.0, log=True),
        "grow_policy": suggest.category(["depthwise", "lossguide"]),
        "sample_type": suggest.category(["uniform", "weighted"]),
        "normalize_type": suggest.category(["tree", "forest"]),
        "rate_drop": suggest.float(1e-8, 1.0, log=True),
        "skip_drop": suggest.float(1e-8, 1.0, log=True),
        "objective": "binary:logistic",
        "tree_method": "exact",
        "booster": "dart",
    }

    await optimizer(params=params)

@granthamtaylor
Copy link
Contributor Author

In some extreme cases, it is useful to be able to define suggestions programmatically. However, this cannot happen within an objective task, because we need to tell the Study what each task is doing. Therefore, users should be able to define a callback function in which they may express their programmatic suggestion logic.

This is complicated but I think it is the best solution. It takes advantage of Flyte's static typing and enables caching too.

import asyncio
from typing import Any

import flytekit as fl
import sklearn
from sklearn.model_selection import train_test_split
import xgboost as xgb
import numpy as np
import optuna
import plotly

from optimizer import Optimizer

image = fl.ImageSpec(
    builder="union",
    packages=[
        "flytekit==1.15.0b0",
        "optuna>=4.0.0",
        "scikit-learn>=1.6.0",
        "xgboost>=2.1.3",
        "plotly>=5.24.1",
    ],
)


@fl.task(container_image=image)
async def objective(params: dict[str, int | float | str]) -> float:
    (data, target) = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
    dtrain = xgb.DMatrix(train_x, label=train_y)
    dvalid = xgb.DMatrix(valid_x, label=valid_y)

    bst = xgb.train(params, dtrain)
    valid_yhat = bst.predict(dvalid)
    accuracy = sklearn.metrics.accuracy_score(valid_y, np.rint(valid_yhat))
    
    print(accuracy)

    return accuracy


@fl.eager(container_image=image)
async def train(concurrency: int, n_trials: int):

    study = optuna.create_study(direction="maximize")
    
    def callback(trial: optuna.Trial, inputs: dict[str, Any]) -> dict[str, Any]:

        params = {
            "verbosity": 0,
            "objective": "binary:logistic",
            # use exact for small dataset.
            "tree_method": "exact",
            # defines booster, gblinear for linear functions.
            "booster": trial.suggest_categorical("booster", ["gbtree", "gblinear", "dart"]),
            # L2 regularization weight.
            "lambda": trial.suggest_float("lambda", 1e-8, 1.0, log=True),
            # L1 regularization weight.
            "alpha": trial.suggest_float("alpha", 1e-8, 1.0, log=True),
            # sampling ratio for training data.
            "subsample": trial.suggest_float("subsample", 0.2, 1.0),
            # sampling according to each tree.
            "colsample_bytree": trial.suggest_float("colsample_bytree", 0.2, 1.0),
        }

        if params["booster"] in ["gbtree", "dart"]:
            # maximum depth of the tree, signifies complexity of the tree.
            params["max_depth"] = trial.suggest_int("max_depth", 3, 9, step=2)
            # minimum child weight, larger the term more conservative the tree.
            params["min_child_weight"] = trial.suggest_int("min_child_weight", 2, 10)
            params["eta"] = trial.suggest_float("eta", 1e-8, 1.0, log=True)
            # defines how selective algorithm is.
            params["gamma"] = trial.suggest_float("gamma", 1e-8, 1.0, log=True)
            params["grow_policy"] = trial.suggest_categorical("grow_policy", ["depthwise", "lossguide"])

        if params["booster"] == "dart":
            params["sample_type"] = trial.suggest_categorical("sample_type", ["uniform", "weighted"])
            params["normalize_type"] = trial.suggest_categorical("normalize_type", ["tree", "forest"])
            params["rate_drop"] = trial.suggest_float("rate_drop", 1e-8, 1.0, log=True)
            params["skip_drop"] = trial.suggest_float("skip_drop", 1e-8, 1.0, log=True)

        return dict(params=params)

    optimizer = Optimizer(objective, concurrency, n_trials, study, callback)

    await optimizer()

    show("History", optuna.visualization.plot_optimization_history(optimizer.study))
    show("Timeline", optuna.visualization.plot_timeline(optimizer.study))
    show("Coordinates", optuna.visualization.plot_parallel_coordinate(optimizer.study))
    show("Importance", optuna.visualization.plot_param_importances(optimizer.study))


def show(name: str, fig: plotly.graph_objects.Figure):
    html: str = plotly.io.to_html(
        fig=fig,
        config={
            "scrollZoom": False,
            "displayModeBar": False,
            "doubleClick": "reset",
        },
    )

    fl.Deck(name, html)


if __name__ == "__main__":
    loss = asyncio.run(train(concurrency=1, n_trials=10))
    print(loss)

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 9, 2025

Code Review Agent Run #a3ee63

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: d8a5bf4..a488fa0
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
    • plugins/flytekit-optuna/tests/test_optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 9, 2025

Code Review Agent Run #74ccb6

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: a488fa0..8bc3adb
    • plugins/flytekit-optuna/tests/test_optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

@flyte-bot
Copy link
Contributor

flyte-bot commented Jan 10, 2025

Code Review Agent Run #bf90ad

Actionable Suggestions - 5
  • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py - 4
    • Consider extracting trial execution logic · Line 137-140
    • Consider extracting trial execution logic · Line 137-140
    • Consider keeping objective as required field · Line 59-59
    • Use public interface instead of private · Line 83-83
  • plugins/flytekit-optuna/tests/test_optimizer.py - 1
    • Consider passing objective to Optimizer directly · Line 106-106
Review Details
  • Files reviewed - 3 · Commit Range: 8bc3adb..749ac03
    • plugins/flytekit-optuna/flytekitplugins/optuna/optimizer.py
    • plugins/flytekit-optuna/setup.py
    • plugins/flytekit-optuna/tests/test_optimizer.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

AI Code Review powered by Bito Logo

Comment on lines +137 to +140
if self.callback is not None:
promise = self.callback(trial=trial, **inputs)
else:
promise = self.objective(**process(trial, inputs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting trial execution logic

Consider extracting the trial execution logic into a separate method to improve code organization and reusability. The conditional logic for handling callback vs direct objective execution could be cleaner in its own method.

Code suggestion
Check the AI-generated fix before applying
Suggested change
if self.callback is not None:
promise = self.callback(trial=trial, **inputs)
else:
promise = self.objective(**process(trial, inputs))
promise = await self._execute_trial(trial, inputs)
async def _execute_trial(self, trial: optuna.Trial, inputs: dict[str, Any]) -> Awaitable[Union[float, tuple[float, ...]]]:
if self.callback is not None:
return await self.callback(trial=trial, **inputs)
return await self.objective(**process(trial, inputs))

Code Review Run #bf90ad


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged


study = optuna.create_study(direction="maximize")

optimizer = Optimizer(concurrency=concurrency, n_trials=n_trials, study=study, callback=callback)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider passing objective to Optimizer directly

Consider passing the objective function directly to the Optimizer constructor. The current implementation relies on the callback to invoke the objective function which may not be the intended pattern.

Code suggestion
Check the AI-generated fix before applying
Suggested change
optimizer = Optimizer(concurrency=concurrency, n_trials=n_trials, study=study, callback=callback)
optimizer = Optimizer(objective=objective, concurrency=concurrency, n_trials=n_trials, study=study, callback=callback)

Code Review Run #bf90ad


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

concurrency: int
n_trials: int
study: Optional[optuna.Study] = None
objective: Optional[Union[PythonFunctionTask, PythonFunctionWorkflow]] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider keeping objective as required field

The removal of the required objective field from the class definition may cause issues since it's used in the __post_init__ method. Consider keeping it as required field or updating the initialization logic.

Code suggestion
Check the AI-generated fix before applying
 @@ -55,6 +55,7 @@
 class Optimizer:
     concurrency: int
     n_trials: int
 +    objective: Union[PythonFunctionTask, PythonFunctionWorkflow]
     study: Optional[optuna.Study] = None
     callback: Optional[Callable[Concatenate[optuna.Trial, P], Awaitable[Union[float, tuple[float, ...]]]]] = None

Code Review Run #bf90ad


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

if isinstance(self.objective, PythonFunctionTask):
func = self.objective.task_function
elif isinstance(self.objective, PythonFunctionWorkflow):
func = self.objective._workflow_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use public interface instead of private

Consider using public interface instead of accessing private member '_workflow_function'

Code suggestion
Check the AI-generated fix before applying
Suggested change
func = self.objective._workflow_function
func = self.objective.workflow_function

Code Review Run #bf90ad


Is this a valid issue, or was it incorrectly flagged by the Agent?

  • it was incorrectly flagged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants