Merge branch 'main' into refactor-pattern-logic-catalog-cli

kedro-org · Aug 12, 2024 · 13e6da9 · 13e6da9
2 parents 20d2b91 + d5916a1
commit 13e6da9
Show file tree

Hide file tree

Showing 8 changed files with 30 additions and 81 deletions.
diff --git a/RELEASE.md b/RELEASE.md
@@ -1,6 +1,7 @@
 # Upcoming Release 0.19.8
 
 ## Major features and improvements
+* Made default run entrypoint in `__main__.py` work in interactive environments such as IPyhon and Databricks.
 
 ## Bug fixes and other changes
 * Moved `_find_run_command()` and `_find_run_command_in_plugins()` from `__main__.py` in the project template to the framework itself.

diff --git a/docs/source/deployment/databricks/databricks_deployment_workflow.md b/docs/source/deployment/databricks/databricks_deployment_workflow.md
@@ -36,9 +36,8 @@ The sequence of steps described in this section is as follows:
 2. [Install Kedro and the Databricks CLI in a new virtual environment](#install-kedro-and-the-databricks-cli-in-a-new-virtual-environment)
 3. [Authenticate the Databricks CLI](#authenticate-the-databricks-cli)
 4. [Create a new Kedro project](#create-a-new-kedro-project)
-5. [Create an entry point for Databricks](#create-an-entry-point-for-databricks)
-6. [Package your project](#package-your-project)
-7. [Upload project data and configuration to DBFS](#upload-project-data-and-configuration-to-dbfs)
+5. [Package your project](#package-your-project)
+6. [Upload project data and configuration to DBFS](#upload-project-data-and-configuration-to-dbfs)
 
 ### Note your Databricks username and host
 
@@ -99,64 +98,6 @@ This command creates a new Kedro project using the `databricks-iris` starter tem
  If you are not using the `databricks-iris` starter to create a Kedro project, **and** you are working with a version of Kedro **earlier than 0.19.0**, then you should [disable file-based logging](https://docs.kedro.org/en/0.18.14/logging/logging.html#disable-file-based-logging) to prevent Kedro from attempting to write to the read-only file system.
  ```
 
-### Create an entry point for Databricks
-
-The default entry point of a Kedro project uses a Click command line interface (CLI), which is not compatible with Databricks. To run your project as a Databricks job, you must define a new entry point specifically for use on Databricks.
-
-The `databricks-iris` starter has this entry point pre-built, so there is no extra work to do here, but generally you must **create an entry point manually for your own projects using the following steps**:
-
-1. **Create an entry point script**: Create a new file in `<project_root>/src/iris_databricks` named `databricks_run.py`. Copy the following code to this file:
-
-```python
-import argparse
-import logging
-
-from kedro.framework.project import configure_project
-from kedro.framework.session import KedroSession
-
-
-def main():
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--env", dest="env", type=str)
-    parser.add_argument("--conf-source", dest="conf_source", type=str)
-    parser.add_argument("--package-name", dest="package_name", type=str)
-
-    args = parser.parse_args()
-    env = args.env
-    conf_source = args.conf_source
-    package_name = args.package_name
-
-    # https://kb.databricks.com/notebooks/cmd-c-on-object-id-p0.html
-    logging.getLogger("py4j.java_gateway").setLevel(logging.ERROR)
-    logging.getLogger("py4j.py4j.clientserver").setLevel(logging.ERROR)
-
-    configure_project(package_name)
-    with KedroSession.create(env=env, conf_source=conf_source) as session:
-        session.run()
-
-
-if __name__ == "__main__":
-    main()
-```
-
-2. **Define a new entry point**: Open `<project_root>/pyproject.toml` in a text editor or IDE and add a new line in the `[project.scripts]` section, so that it becomes:
-
-```python
-[project.scripts]
-databricks_run = "<package_name>.databricks_run:main"
-```
-
-Remember to replace <package_name> with the correct package name for your project.
-
-This process adds an entry point to your project which can be used to run it on Databricks.
-
-```{note}
-Because you are no longer using the default entry-point for Kedro, you will not be able to run your project with the options it usually provides. Instead, the `databricks_run` entry point in the above code and in the `databricks-iris` starter contains a simple implementation of two options:
-- `--package_name` (required): the package name (defined in `pyproject.toml`) of your packaged project.
-- `--env`: specifies a [Kedro configuration environment](../../configuration/configuration_basics.md#configuration-environments) to load for your run.
-- `--conf-source`: specifies the location of the `conf/` directory to use with your Kedro project.
-```
-
 ### Package your project
 
 To package your Kedro project for deployment on Databricks, you must create a Wheel (`.whl`) file, which is a binary distribution of your project. In the root directory of your Kedro project, run the following command:
@@ -182,14 +123,14 @@ There are several ways to upload data to DBFS: you can use the [DBFS API](https:
 - **Upload your project's data and config**: at the command line in your local environment, use the following Databricks CLI commands to upload your project's locally stored data and configuration to DBFS:
 
 ```bash
-databricks fs cp --recursive <project_root>/data/ dbfs:/FileStore/iris-databricks/data
-databricks fs cp --recursive <project_root>/conf/ dbfs:/FileStore/iris-databricks/conf
+databricks fs cp --recursive <project_root>/data/ dbfs:/FileStore/iris_databricks/data
+databricks fs cp --recursive <project_root>/conf/ dbfs:/FileStore/iris_databricks/conf
 ```
 
 The `--recursive` flag ensures that the entire folder and its contents are uploaded. You can list the contents of the destination folder in DBFS using the following command:
 
 ```bash
-databricks fs ls dbfs:/FileStore/iris-databricks/data
+databricks fs ls dbfs:/FileStore/iris_databricks/data
 ```
 
 You should see the contents of the project's `data/` directory printed to your terminal:
@@ -205,6 +146,10 @@ You should see the contents of the project's `data/` directory printed to your t
 08_reporting
 ```
 
+```{note}
+ If you are not using the `databricks-iris` starter to create a Kedro project, then you should make sure your catalog entries point to the DBFS storage.
+ ```
+
 ## Deploy and run your Kedro project using the workspace UI
 
 To run your packaged project on Databricks, login to your Databricks account and perform the following steps in the workspace:
@@ -235,9 +180,6 @@ Configure the job cluster with the following settings:
 - In the `name` field enter `kedro_deployment_demo`.
 - Select the radio button for `Single node`.
 - Select the runtime `13.3 LTS` in the `Databricks runtime version` field.
-- In the `Advanced options` section, under the `Spark` tab, locate the `Environment variables` field. Add the following line:
-`KEDRO_LOGGING_CONFIG="/dbfs/FileStore/iris-databricks/conf/logging.yml"`
-Here, ensure you specify the correct path to your custom logging configuration. This step is crucial because the default Kedro logging configuration incorporates the rich library, which is incompatible with Databricks jobs. In the `databricks-iris` Kedro starter, the `rich` handler in `logging.yml` is altered to a `console` handler for compatibility. For additional information about logging configurations, refer to the [Kedro Logging Manual](https://docs.kedro.org/en/stable/logging/index.html).
 - Leave all other settings with their default values in place.
 
 The final configuration for the job cluster should look the same as the following:
@@ -250,14 +192,14 @@ Configure the job with the following settings:
 
 - Enter `iris-databricks` in the `Name` field.
 - In the dropdown menu for the `Type` field, select `Python wheel`.
-- In the `Package name` field, enter `iris_databricks`. This is the name of your package as defined in your project's `src/setup.py` file.
-- In the `Entry Point` field, enter `databricks_run`. This is the name of the [entry point](#create-an-entry-point-for-databricks) to run your package from.
+- In the `Package name` field, enter `iris_databricks`. This is the name of your package as defined in your project's `pyproject.toml` file.
+- In the `Entry Point` field, enter `iris-databricks`. This is the name of the entry point to run your package from as defined in your project's `pyproject.toml` file.
 - Ensure the job cluster you created in step two is selected in the dropdown menu for the `Cluster` field.
 - In the `Dependent libraries` field, click `Add` and upload [your project's `.whl` file](#package-your-project), making sure that the radio buttons for `Upload` and `Python Whl` are selected for the `Library Source` and `Library Type` fields.
-- In the `Parameters` field, enter the following list of runtime options:
+- In the `Parameters` field, enter the following runtime option:
 
 ```bash
-["--conf-source", "/dbfs/FileStore/iris-databricks/conf", "--package-name", "iris_databricks"]
+["--conf-source", "/dbfs/FileStore/iris_databricks/conf"]
 ```
 
 The final configuration for your job should look the same as the following:
@@ -278,7 +220,7 @@ The following things happen when you run your job:
 
 - The job cluster is provisioned and started (job status: `Pending`).
 - The packaged Kedro project and all its dependencies are installed (job status: `Pending`)
-- The packaged Kedro project is run from the specified `databricks_run` entry point (job status: `In Progress`).
+- The packaged Kedro project is run from the specified `iris-databricks` entry point (job status: `In Progress`).
 - The packaged code finishes executing and the job cluster is stopped (job status: `Succeeded`).
 
 A run will take roughly six to seven minutes.

diff --git a/docs/source/meta/images/databricks_configure_job_cluster.png b/docs/source/meta/images/databricks_configure_job_cluster.png
diff --git a/docs/source/meta/images/databricks_configure_new_job.png b/docs/source/meta/images/databricks_configure_new_job.png
diff --git a/...st_starter/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/__main__.py b/...st_starter/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/__main__.py
@@ -1,6 +1,8 @@
 """{{ cookiecutter.project_name }} file for ensuring the package is executable
 as `{{ cookiecutter.repo_name }}` and `python -m {{ cookiecutter.python_package }}`
 """
+
+import sys
 from pathlib import Path
 
 from kedro.framework.cli.utils import find_run_command
@@ -10,6 +12,10 @@
 def main(*args, **kwargs):
     package_name = Path(__file__).parent.name
     configure_project(package_name)
+
+    interactive = hasattr(sys, 'ps1')
+    kwargs["standalone_mode"] = not interactive
+
     run = find_run_command(package_name)
     run(*args, **kwargs)
 

diff --git a/kedro/framework/cli/cli.py b/kedro/framework/cli/cli.py
@@ -5,7 +5,6 @@
 from __future__ import annotations
 
 import importlib
-import logging
 import sys
 import traceback
 from collections import defaultdict
@@ -39,9 +38,6 @@
 v{version}
 """
 
-logger = logging.getLogger(__name__)
-logger.addHandler(logging.StreamHandler(sys.stderr))
-
 
 @click.group(context_settings=CONTEXT_SETTINGS, name="Kedro")
 @click.version_option(version, "--version", "-V", help="Show version and exit")
@@ -208,13 +204,12 @@ def main(
                 click.echo(message)
                 click.echo(hint)
                 sys.exit(exc.code)
-        except Exception as error:
-            logger.error(f"An error has occurred: {error}")
+        except Exception:
             self._cli_hook_manager.hook.after_command_run(
                 project_metadata=self._metadata, command_args=args, exit_code=1
             )
             hook_called = True
-            sys.exit(1)
+            raise
         finally:
             if not hook_called:
                 self._cli_hook_manager.hook.after_command_run(

diff --git a/...es/project/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/__main__.py b/...es/project/{{ cookiecutter.repo_name }}/src/{{ cookiecutter.python_package }}/__main__.py
@@ -1,6 +1,7 @@
 """{{ cookiecutter.project_name }} file for ensuring the package is executable
 as `{{ cookiecutter.repo_name }}` and `python -m {{ cookiecutter.python_package }}`
 """
+import sys
 from pathlib import Path
 
 from kedro.framework.cli.utils import find_run_command
@@ -10,6 +11,10 @@
 def main(*args, **kwargs):
     package_name = Path(__file__).parent.name
     configure_project(package_name)
+
+    interactive = hasattr(sys, 'ps1')
+    kwargs["standalone_mode"] = not interactive
+
     run = find_run_command(package_name)
     run(*args, **kwargs)
 

diff --git a/tests/framework/cli/test_cli.py b/tests/framework/cli/test_cli.py
@@ -520,7 +520,7 @@ def test_main_hook_exception_handling(self, fake_metadata):
             project_metadata=kedro_cli._metadata, command_args=[], exit_code=1
         )
 
-        assert "An error has occurred: Test Exception" in result.output
+        assert result.exit_code == 1
 
     @patch("sys.exit")
     def test_main_hook_finally_block(self, fake_metadata):
@@ -535,7 +535,7 @@ def test_main_hook_finally_block(self, fake_metadata):
             project_metadata=kedro_cli._metadata, command_args=[], exit_code=0
         )
 
-        assert "An error has occurred:" not in result.output
+        assert result.exit_code == 0
 
 
 @mark.usefixtures("chdir_to_dummy_project")