"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "84f0f73a0f76"
+ },
+ "source": [
+ "| | |\n",
+ "|-|-|\n",
+ "| Author(s) |[Ivan Nardini](https://github.com/inardini) |"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "tvgnzT1CKxrO"
+ },
+ "source": [
+ "## Overview\n",
+ "\n",
+ "This notebook shows how to use Vertex AI Python SDK for Gen AI Evaluation Service for evaluating multimodal task with your locally-defined `CustomMetric`, and use your own autorater model to perform model-based metric evaluation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7fRop-TJ5iZW"
+ },
+ "source": [
+ "
\n",
+ "⚠️ This notebook shows an experiment approach to evaluate multimodal task using Vertex AI Python SDK for Gen AI Evaluation Service. The result of the evaluation depends on the autorater's capabilities of handling multimodal inputs with evaluation criteria. ⚠️\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3XZf_4VEOvFo"
+ },
+ "source": [
+ "## Getting Started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kE20na1OOvFo"
+ },
+ "source": [
+ "### Install Vertex AI SDK for Gen AI Evaluation Service"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "abLuRgBzOvFp"
+ },
+ "outputs": [],
+ "source": [
+ "%pip install -U -q google-cloud-aiplatform[evaluation]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "R5Xep4W9lq-Z"
+ },
+ "source": [
+ "### Restart runtime\n",
+ "\n",
+ "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
+ "\n",
+ "The restart might take a minute or longer. After it's restarted, continue to the next step."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "XRvKdaPDTznN"
+ },
+ "outputs": [],
+ "source": [
+ "import IPython\n",
+ "\n",
+ "app = IPython.Application.instance()\n",
+ "app.kernel.do_shutdown(True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SbmM4z7FOBpM"
+ },
+ "source": [
+ "
\n",
+ "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dmWOrTJ3gx13"
+ },
+ "source": [
+ "### Authenticate your notebook environment (Colab only)\n",
+ "\n",
+ "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "NyKGtVQjgx13"
+ },
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "\n",
+ "if \"google.colab\" in sys.modules:\n",
+ " from google.colab import auth\n",
+ "\n",
+ " auth.authenticate_user()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DF4l8DTdWgPY"
+ },
+ "source": [
+ "### Set Google Cloud project information and initialize Vertex AI SDK\n",
+ "\n",
+ "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
+ "\n",
+ "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "GTL_YzF9OvFq"
+ },
+ "outputs": [],
+ "source": [
+ "# Use the environment variable if the user doesn't provide Project ID.\n",
+ "import os\n",
+ "\n",
+ "import vertexai\n",
+ "\n",
+ "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
+ "\n",
+ "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
+ " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
+ "\n",
+ "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
+ "\n",
+ "vertexai.init(project=PROJECT_ID, location=LOCATION)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EdvJRUWRNGHE"
+ },
+ "source": [
+ "## Evaluate your multimodal use case with custom metric\n",
+ "\n",
+ "Imagine you have a customer insurance app which provides several services including live incident support using an LLM. Given a conversation about a car accident and an image of the damaged car, you want to evaluate the coherence of the generated responses.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5303c05f7aa6"
+ },
+ "source": [
+ "### Import libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "6fc324893334"
+ },
+ "outputs": [],
+ "source": [
+ "# General\n",
+ "from IPython.display import HTML, Markdown, display\n",
+ "from vertexai.evaluation import CustomMetric, EvalTask\n",
+ "from vertexai.generative_models import (\n",
+ " GenerationConfig,\n",
+ " GenerativeModel,\n",
+ " HarmBlockThreshold,\n",
+ " HarmCategory,\n",
+ " Part,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "tfQ7sPtOjZOw"
+ },
+ "source": [
+ "### Library settings"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "cellView": "form",
+ "id": "RjWUgU1TjZOw"
+ },
+ "outputs": [],
+ "source": [
+ "# @title\n",
+ "import json\n",
+ "import logging\n",
+ "import warnings\n",
+ "\n",
+ "import pandas as pd\n",
+ "\n",
+ "logging.getLogger(\"urllib3.connectionpool\").setLevel(logging.ERROR)\n",
+ "warnings.filterwarnings(\"ignore\")\n",
+ "\n",
+ "# pd.set_option('display.max_colwidth', None)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QfaFPmm7gC2V"
+ },
+ "source": [
+ "### Helpers"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "cellView": "form",
+ "id": "2kyy7mGhgEj9"
+ },
+ "outputs": [],
+ "source": [
+ "def display_eval_result(\n",
+ " eval_result: dict | object,\n",
+ " title: str | None = None,\n",
+ " metrics: list[str] | None = None,\n",
+ ") -> None:\n",
+ " \"\"\"Display the evaluation results.\"\"\"\n",
+ " summary_metrics, metrics_table = (\n",
+ " eval_result.summary_metrics,\n",
+ " eval_result.metrics_table,\n",
+ " )\n",
+ "\n",
+ " metrics_df = pd.DataFrame.from_dict(summary_metrics, orient=\"index\").T\n",
+ " if metrics:\n",
+ " metrics_df = metrics_df.filter(\n",
+ " [\n",
+ " metric\n",
+ " for metric in metrics_df.columns\n",
+ " if any(selected_metric in metric for selected_metric in metrics)\n",
+ " ]\n",
+ " )\n",
+ " metrics_table = metrics_table.filter(\n",
+ " [\n",
+ " metric\n",
+ " for metric in metrics_table.columns\n",
+ " if any(selected_metric in metric for selected_metric in metrics)\n",
+ " ]\n",
+ " )\n",
+ "\n",
+ " if title:\n",
+ " # Display the title with Markdown for emphasis\n",
+ " display(Markdown(f\"## {title}\"))\n",
+ " # Display the summary metrics DataFrame\n",
+ " display(Markdown(\"### Summary Metrics\"))\n",
+ " display(metrics_df)\n",
+ " # Display the metrics table DataFrame\n",
+ " display(Markdown(\"### Row-based Metrics\"))\n",
+ " display(metrics_table)\n",
+ "\n",
+ "\n",
+ "def display_explanations(\n",
+ " eval_result: dict | object, metrics: list[str] | None = None, n: int = 1\n",
+ ") -> None:\n",
+ " \"\"\"Display the explanations.\"\"\"\n",
+ " style = \"white-space: pre-wrap; width: 1500px; overflow-x: auto;\"\n",
+ " metrics_table = eval_result.metrics_table\n",
+ " df = metrics_table.sample(n=n)\n",
+ "\n",
+ " if metrics:\n",
+ " df = df.filter(\n",
+ " [\"response\", \"baseline_model_response\"]\n",
+ " + [\n",
+ " metric\n",
+ " for metric in df.columns\n",
+ " if any(selected_metric in metric for selected_metric in metrics)\n",
+ " ]\n",
+ " )\n",
+ " for index, row in df.iterrows():\n",
+ " for col in df.columns:\n",
+ " display(HTML(f\"
{col}:
{row[col]}
\"))\n",
+ " display(HTML(\"\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "17XhFaeuQu31"
+ },
+ "source": [
+ "### Load your evaluation dataset\n",
+ "\n",
+ "Depending on your task and metrics you are planning to calculate, you need to collect your evaluation dataset.\n",
+ "\n",
+ "In this scenario, you have a `context` column which contains both the conversation and the location of the image for a specific car accident scenario.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "_o2YjZvbQ1dH"
+ },
+ "outputs": [],
+ "source": [
+ "instruction = \"\"\"\n",
+ "You are an insurance agent specializing in car accident assessments.\n",
+ "You will be provided with a conversation about a car accident and an image of the damaged car.\n",
+ "Your task is to analyze the image and identify the primary type of damage visible.\n",
+ "Use the conversation for context, but prioritize the visual evidence from the image.\n",
+ "\n",
+ "Categorize the primary damage as one of the following:\n",
+ "\n",
+ "* bumper\n",
+ "* engine_compartment\n",
+ "* hood\n",
+ "* lateral\n",
+ "* windshield\n",
+ "\n",
+ "If the image is unclear or the damage is not visible, respond with \"Unable to determine damage type from the provided image.\"\n",
+ "If the primary damage is something other than the listed categories, respond with \"Damage type not listed in available categories.\"\n",
+ "\n",
+ "Conversation:\n",
+ "{conversation}\n",
+ "\n",
+ "Image:\n",
+ "{image_of_car_accident}\n",
+ "\n",
+ "Provide your assessment of the primary damage type based on the image.\n",
+ "\"\"\"\n",
+ "\n",
+ "context = [\n",
+ " {\n",
+ " \"conversation\": '''AI insurance app: \"Hello, I'm the AI assistant for your car insurance. It looks like you've been in an accident. Could you please tell me what happened?\" App user: \"Yeah, I was just rear-ended while waiting at a red light.\" AI insurance app: \"I'm sorry to hear that. Could you take some pictures of the damage to your vehicle, including the license plate of the other car if possible?\" App user: (uploads the provided image) \"Here's the damage to my bumper.\" AI insurance app: \"Thank you. Can you describe any injuries to yourself or any passengers?\" App user: \"No, thankfully everyone is okay. Just a bit shaken up.\" AI insurance app: \"That's good to hear. I've created a claim based on the information and photo you provided. A representative will be in touch with you shortly to gather further details and discuss the next steps.\" App user: \"Okay, thank you.\" AI insurance app: \"You're welcome. Please don't hesitate to contact us if you have any questions.\"''',\n",
+ " \"image_of_car_accident\": \"gs://cloud-samples-data/generative-ai/evaluation/use_cases/car_assessment/bumper.jpg\",\n",
+ " },\n",
+ " {\n",
+ " \"conversation\": '''AI insurance app: \"Hi there! I'm Amelia, your AI assistant for [Insurance company name]. I see you've been in an accident. I'm so sorry to hear that. Are you okay?\" Driver: \"I'm a little shaken up, but I'm okay. My car isn't so lucky, though.\" AI insurance app: \"Oh no, I'm so sorry to hear that. Can you tell me what happened?\" Driver: \"I was stopped at a red light when I was rear-ended by another car. The damage to my car is pretty bad.\" AI insurance app: \"I understand. Can you take some pictures of the damage to your car, including the license plate of the other car if possible?\" Driver: \"Sure, here you go.\" (uploads the provided image) AI insurance app: \"Thank you. Is there anyone else involved in the accident?\" Driver: \"No, just me and the driver of the other car.\" AI insurance app: \"Okay. Do you need medical attention?\" Driver: \"No, I'm fine.\" AI insurance app: \"I'm glad to hear that. I've created a claim based on the information and photo you provided. A representative will be in touch with you shortly to gather further details and discuss the next steps.\" Driver: \"Okay, thank you.\" AI insurance app: \"You're welcome. Please don't hesitate to contact us if you have any questions.\"''',\n",
+ " \"image_of_car_accident\": \"gs://cloud-samples-data/generative-ai/evaluation/use_cases/car_assessment/engine_compartment.jpg\",\n",
+ " },\n",
+ " {\n",
+ " \"conversation\": '''**AI insurance app:** \"Hello, it appears you've been in an accident. Are you alright?\" **App user:** \"Yes, I'm okay. Just a bit shaken up.\" **AI insurance app:** \"I'm glad to hear you're physically unharmed. Could you please describe what happened?\" **App user:** \"Someone ran a red light and hit the front of my car.\" **AI insurance app:** \"I understand. To help assess the damage, could you please take some photos of your vehicle, especially the impacted areas? If possible, include a photo of the other vehicle's license plate.\" **App user:** (uploads the provided image) \"Here's the damage to my car.\" **AI insurance app:** \"Thank you for providing that. Were there any other vehicles involved, or was it just the two cars?\" **App user:** \"No, it was just us.\" **AI insurance app:** \"Okay. And to confirm, you don't require any medical assistance at this time?\" **App user:** \"No, I don't think so. Thankfully.\" **AI insurance app:** \"Alright. I've created an accident claim with the information and photos you've provided. One of our representatives will contact you soon to gather more details and guide you through the next steps.\" **App user:** \"Thank you, I appreciate the help.\" **AI insurance app:** \"You're very welcome. Please don't hesitate to reach out through the app if you have any further questions.\"''',\n",
+ " \"image_of_car_accident\": \"gs://cloud-samples-data/generative-ai/evaluation/use_cases/car_assessment/hood.jpg\",\n",
+ " },\n",
+ " {\n",
+ " \"conversation\": '''AI insurance app: \"Hi there! I'm Amelia, your AI assistant for [Insurance company name]. I see you've been in an accident. I'm so sorry to hear that. Are you okay?\" Driver: \"I'm a little shaken up, but I'm okay. My car isn't so lucky, though.\" AI insurance app: \"Oh no, I'm so sorry to hear that. Can you tell me what happened?\" Driver: \"I was stopped at a red light when I was rear-ended by another car. The damage to my car is pretty bad.\" AI insurance app: \"I understand. Can you take some pictures of the damage to your car, including the license plate of the other car if possible?\" Driver: \"Sure, here you go.\" (uploads the provided image) AI insurance app: \"Thank you. Is there anyone else involved in the accident?\" Driver: \"No, just me and the driver of the other car.\" AI insurance app: \"Okay. Do you need medical attention?\" Driver: \"No, I'm fine.\" AI insurance app: \"I'm glad to hear that. I've created a claim based on the information and photo you provided. A representative will be in touch with you shortly to gather further details and discuss the next steps.\" Driver: \"Okay, thank you.\" AI insurance app: \"You're welcome. Please don't hesitate to contact us if you have any questions.\"\"''',\n",
+ " \"image_of_car_accident\": \"gs://cloud-samples-data/generative-ai/evaluation/use_cases/car_assessment/lateral.jpg\",\n",
+ " },\n",
+ " {\n",
+ " \"conversation\": '''AI insurance app: \"Hello. I've received an alert that you may have been involved in an accident. Can you confirm and tell me if you're okay?\" App user: \"Yes, I was just in an accident. I'm okay, just a little shaken.\" AI insurance app: \"I'm relieved to hear you're not hurt. Can you tell me what happened?\" App user: \"A rock flew up from a truck in front of me and cracked my windshield.\" AI insurance app: \"I understand. To assess the damage, could you please take a photo of the damage?\" App user: (uploads the provided image) \"Here's a photo of the crack.\" AI insurance app: \"Thank you for providing that. Were there any other vehicles involved?\" App user: \"No, just my car.\" AI insurance app: \"Okay. And you didn't sustain any injuries?\" App user: \"No, thankfully not.\" AI insurance app: \"That's good to hear. I've created a claim for you based on the information and photo you provided. A representative will be in touch shortly to gather more details and guide you through the next steps.\" App user: \"Okay, thank you.\" AI insurance app: \"You're welcome. Please don't hesitate to contact us if you have any questions.\"''',\n",
+ " \"image_of_car_accident\": \"gs://cloud-samples-data/generative-ai/evaluation/use_cases/car_assessment/windshield.jpg\",\n",
+ " },\n",
+ "]\n",
+ "\n",
+ "generated_response = [\n",
+ " \"bumper\",\n",
+ " \"engine_compartment\",\n",
+ " \"lateral\",\n",
+ " \"lateral\",\n",
+ " \"windshield\",\n",
+ "]\n",
+ "\n",
+ "reference = [\"bumper\", \"engine_compartment\", \"hood\", \"lateral\", \"windshield\"]\n",
+ "\n",
+ "eval_dataset = pd.DataFrame(\n",
+ " {\n",
+ " \"instruction\": instruction,\n",
+ " \"context\": context,\n",
+ " \"response\": generated_response,\n",
+ " \"reference\": reference,\n",
+ " }\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "icfOEQopmpky"
+ },
+ "outputs": [],
+ "source": [
+ "eval_dataset.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EFUFtKRIUpAe"
+ },
+ "source": [
+ "### Bring your own autorater\n",
+ "\n",
+ "Gen AI Evaluation Service allows you to bring any autorater you prefer. In this case, you use `gemini-1.5-pro` with a typical structured output to evaluate your task."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "oV6ZI1E5Uu1c"
+ },
+ "outputs": [],
+ "source": [
+ "def get_autorater_response(metric_prompt: list) -> dict:\n",
+ " metric_response_schema = {\n",
+ " \"type\": \"OBJECT\",\n",
+ " \"properties\": {\n",
+ " \"score\": {\"type\": \"NUMBER\"},\n",
+ " \"explanation\": {\"type\": \"STRING\"},\n",
+ " },\n",
+ " \"required\": [\"score\", \"explanation\"],\n",
+ " }\n",
+ "\n",
+ " autorater = GenerativeModel(\n",
+ " \"gemini-1.5-pro\",\n",
+ " generation_config=GenerationConfig(\n",
+ " response_mime_type=\"application/json\",\n",
+ " response_schema=metric_response_schema,\n",
+ " ),\n",
+ " safety_settings={\n",
+ " HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE,\n",
+ " HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,\n",
+ " HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,\n",
+ " HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,\n",
+ " HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,\n",
+ " },\n",
+ " )\n",
+ "\n",
+ " response = autorater.generate_content(metric_prompt)\n",
+ "\n",
+ " response_json = {}\n",
+ "\n",
+ " if response.candidates and len(response.candidates) > 0:\n",
+ " candidate = response.candidates[0]\n",
+ " if (\n",
+ " candidate.content\n",
+ " and candidate.content.parts\n",
+ " and len(candidate.content.parts) > 0\n",
+ " ):\n",
+ " part = candidate.content.parts[0]\n",
+ " if part.text:\n",
+ " response_json = json.loads(part.text)\n",
+ "\n",
+ " return response_json"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "88_KslQFVKme"
+ },
+ "source": [
+ "### Build your custom metric\n",
+ "\n",
+ "To evaluate a multimodal task, you define a custom metric which takes any multimodal content (image, pdf, video and more) and returns a score and an explanation according to some criteria, rating rubric, and evaluation steps using the defined autorater.\n",
+ "\n",
+ "In this scenario, you provide define a coherence metric in this car accident assessment scenario."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ngpoc0KjVNF0"
+ },
+ "outputs": [],
+ "source": [
+ "def custom_coherence_fn(instance):\n",
+ "\n",
+ " conversation = instance[\"context\"][\"conversation\"]\n",
+ " image_of_car_accident = instance[\"context\"][\"image_of_car_accident\"]\n",
+ " response = instance[\"response\"]\n",
+ "\n",
+ " eval_instruction_template = \"\"\"\n",
+ "\n",
+ " # Instruction\n",
+ " You are an insurance agent specializing in evaluating car accident assessments.\n",
+ " You will be provided with a conversation about a car accident and an image of the damaged car.\n",
+ " You should first read the conversation and look at the image carefully, and then evaluate the coherence of the generated responses based on the Criteria provided in the Evaluation section below.\n",
+ " You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n",
+ "\n",
+ " # Evaluation\n",
+ " ## Metric Definition\n",
+ " You will be assessing coherence, which measures the ability to provide a coherent response based on the conversation and car accident image.\n",
+ "\n",
+ " ## Criteria\n",
+ " Coherence: It is the quality of being logical and consistent.\n",
+ " In the context of conversation, it refers to the way that ideas and information are presented in a way that is easy to understand and follow.\n",
+ " A coherent conversation will have a clear flow and will not jump around from topic to topic.\n",
+ " The user will also use language that is appropriate for the audience and will avoid making claims that are not supported by evidence.\n",
+ "\n",
+ " ## Rating Rubric\n",
+ " 5: (Perfectly Aligned) The image precisely matches the damage described in the conversation, and the response accurately reflects the damaged car part.\n",
+ " 4: (Highly Aligned) The image generally supports the conversation's description of the damage, and the response is a suitable representation of the affected area.\n",
+ " 3: (Moderately Aligned) The image shows damage that is plausibly related to the accident described, but there might be minor inconsistencies, and the response is broadly relevant but not entirely specific.\n",
+ " 2: (Poorly Aligned) The image and/or the response have significant inconsistencies with the described accident in the conversation, raising doubts about the claim's validity.\n",
+ " 1: (Misaligned) The image, response, and conversation have major contradictions or are completely unrelated, making the claim appear illogical or fraudulent.\n",
+ "\n",
+ " ## Evaluation Steps\n",
+ " STEP 1: Assess Claim Consistency: Carefully read the conversation to understand the user's description of the accident and the claimed damage.\n",
+ " STEP 2: Analyze Image Relevance: Examine the image to determine if the depicted damage aligns with the user's account. Pay attention to the location and type of damage.\n",
+ " STEP 3: Evaluate Label Accuracy: Check if the generated label correctly identifies the damaged car part as described in the conversation and shown in the image.\n",
+ " STEP 4: Identify Inconsistencies: Look for any discrepancies between the conversation, image, and label. For example, does the image show damage not mentioned in the conversation, or is the label incorrect for the damaged part?\n",
+ " STEP 5: Determine Overall Coherence: Based on the previous steps, assign a coherence score using the 1-5 rubric. Consider the severity of any inconsistencies and their potential impact on the claim's validity.\n",
+ " \"\"\"\n",
+ "\n",
+ " # read image from uri\n",
+ " image_file = Part.from_uri(image_of_car_accident, \"image/jpeg\")\n",
+ "\n",
+ " # generate the eval\n",
+ " evaluation_prompt = [\n",
+ " eval_instruction_template,\n",
+ " \"CONVERSATION: \",\n",
+ " conversation,\n",
+ " \"IMAGE: \",\n",
+ " image_file,\n",
+ " \"GENERATED RESPONSE: \",\n",
+ " response,\n",
+ " ]\n",
+ "\n",
+ " evaluation_response = get_autorater_response(evaluation_prompt)\n",
+ " return {\n",
+ " \"custom_coherence\": evaluation_response.get(\"score\", \"\"),\n",
+ " \"explanation\": evaluation_response.get(\"explanation\", \"\"),\n",
+ " }"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "_7Vhz16neast"
+ },
+ "outputs": [],
+ "source": [
+ "custom_coherence_metric = CustomMetric(\n",
+ " name=\"custom_coherence\",\n",
+ " metric_function=custom_coherence_fn,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "A9-m7tmlb9mt"
+ },
+ "source": [
+ "### Run Evaluation\n",
+ "\n",
+ "With the autorater, and the custom metric, you run your evaluation."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "nkiqjPHhcWo6"
+ },
+ "outputs": [],
+ "source": [
+ "metrics = [\"exact_match\", custom_coherence_metric]\n",
+ "\n",
+ "experiment_name = \"eval-multimodal-metric\"\n",
+ "\n",
+ "eval_task = EvalTask(\n",
+ " dataset=eval_dataset,\n",
+ " metrics=metrics,\n",
+ " experiment=experiment_name,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Sp-TlIORdyFb"
+ },
+ "outputs": [],
+ "source": [
+ "eval_result = eval_task.evaluate()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vKFOB6y_yh3w"
+ },
+ "source": [
+ "### Visualize and validate your evaluation results.\n",
+ "\n",
+ "Using some helpers, you can both evaluation results (scores, and explanations)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "oHEblJYvgSpU"
+ },
+ "outputs": [],
+ "source": [
+ "display_eval_result(eval_result, title=\"Evaluation Results\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "kdTt7ETMgfhR"
+ },
+ "outputs": [],
+ "source": [
+ "display_explanations(eval_result, metrics=[\"custom_coherence\"])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2a4e033321ad"
+ },
+ "source": [
+ "## Cleaning up"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "35VcF-W5peq5"
+ },
+ "outputs": [],
+ "source": [
+ "delete_experiment = True\n",
+ "\n",
+ "if delete_experiment:\n",
+ "\n",
+ " from google.cloud import aiplatform\n",
+ "\n",
+ " aiplatform.init(project=PROJECT_ID, location=LOCATION)\n",
+ " experiment = aiplatform.Experiment(experiment_name)\n",
+ " experiment.delete()"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "evaluate_multimodal_task_image.ipynb",
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/gemini/evaluation/evaluating_prompts_at_scale_with_gemini_batch_prediction_api.ipynb b/gemini/evaluation/evaluating_prompts_at_scale_with_gemini_batch_prediction_api.ipynb
index 3af4e1ec5c..387acfe5ef 100644
--- a/gemini/evaluation/evaluating_prompts_at_scale_with_gemini_batch_prediction_api.ipynb
+++ b/gemini/evaluation/evaluating_prompts_at_scale_with_gemini_batch_prediction_api.ipynb
@@ -119,7 +119,7 @@
},
"outputs": [],
"source": [
- "%pip install google-cloud-aiplatform --upgrade -q"
+ "%pip install --upgrade -q google-cloud-aiplatform google-cloud-bigquery bigframes pandas pandas-gbq"
]
},
{
@@ -182,7 +182,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 4,
"metadata": {
"id": "HDVVYzTISvGv"
},
@@ -208,6 +208,7 @@
"# File containing ground truth data in GCS.\n",
"BUCKET_NAME = \"github-repo\"\n",
"FOLDER = \"generative-ai/gemini/evaluation/sports_files\"\n",
+ "GCS_PREFIX = f\"gs://{BUCKET_NAME}/{FOLDER}\"\n",
"SPORTS_FILE = \"sports_files.csv\""
]
},
@@ -222,24 +223,32 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 5,
"metadata": {
"id": "mUxxhafP4KM9"
},
"outputs": [],
"source": [
- "import base64\n",
"import datetime\n",
"import json\n",
"import time\n",
- "from typing import Any\n",
"\n",
"import bigframes.pandas as bpd\n",
"from google.cloud import bigquery, storage\n",
+ "import pandas as pd\n",
"import vertexai\n",
- "from vertexai.batch_prediction._batch_prediction import BatchPredictionJob\n",
- "from vertexai.generative_models import GenerativeModel, Part\n",
- "\n",
+ "from vertexai.generative_models import GenerationConfig, GenerativeModel, Part\n",
+ "from vertexai.preview.batch_prediction import BatchPredictionJob"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "fb983d347169"
+ },
+ "outputs": [],
+ "source": [
"# BigQuery client.\n",
"bq_client = bigquery.Client(project=PROJECT_ID)\n",
"\n",
@@ -310,9 +319,7 @@
"\n",
"\n",
"create_dataset(dataset_id=BQ_DATASET, location=LOCATION)\n",
- "load_files_table_from_uri(\n",
- " files_table=FILES_TABLE, uri=f\"gs://{BUCKET_NAME}/{FOLDER}/{SPORTS_FILE}\"\n",
- ")"
+ "load_files_table_from_uri(files_table=FILES_TABLE, uri=f\"{GCS_PREFIX}/{SPORTS_FILE}\")"
]
},
{
@@ -333,8 +340,7 @@
"outputs": [],
"source": [
"ground_truth_df = bpd.read_gbq(FILES_TABLE)\n",
- "images_uri = [row[\"path\"] for _, row in ground_truth_df.iterrows()]\n",
- "print(images_uri[:2])"
+ "ground_truth_df[\"path\"][:2]"
]
},
{
@@ -357,7 +363,7 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 9,
"metadata": {
"id": "bk-EpTp0uJR4"
},
@@ -370,8 +376,10 @@
"* tennis\n",
"* volleyball\n",
"\n",
- "- Provide an answer in JSON format. 3. Example response:\n",
- "'{\"sport\": \"baseball\"}'\n",
+ "- Provide an answer in JSON format.\n",
+ "\n",
+ "Example response:\n",
+ "{\"sport\": \"baseball\"}\n",
"\n",
"- Image:\n",
"\"\"\""
@@ -394,36 +402,28 @@
},
"outputs": [],
"source": [
- "def download_blob_into_memory(bucket_name: str, blob_name: str) -> Any:\n",
- " \"\"\"Downloads a blob from GCS into memory.\"\"\"\n",
- " bucket = storage_client.bucket(bucket_name)\n",
- "\n",
- " blob = bucket.blob(blob_name)\n",
- " contents = blob.download_as_bytes()\n",
- " return contents\n",
- "\n",
- "\n",
- "def classify_image(model_id: str, prompt: str, bucket_name: str, blob_name: str) -> str:\n",
+ "def classify_image(model_id: str, prompt: str, gcs_prefix: str, blob_name: str) -> str:\n",
" \"\"\"Classifies an image.\"\"\"\n",
- " model = GenerativeModel(model_id)\n",
- " contents = download_blob_into_memory(bucket_name, blob_name)\n",
- " encoded_image = base64.b64encode(contents).decode(\"utf-8\")\n",
- " image_content = Part.from_data(\n",
- " data=base64.b64decode(encoded_image), mime_type=\"image/jpeg\"\n",
+ " model = GenerativeModel(\n",
+ " model_id,\n",
+ " generation_config=GenerationConfig(response_mime_type=\"application/json\"),\n",
+ " )\n",
+ " image_content = Part.from_uri(\n",
+ " uri=f\"{gcs_prefix}/{blob_name}\", mime_type=\"image/jpeg\"\n",
" )\n",
" contents = [prompt, image_content]\n",
- " return model.generate_content(contents)\n",
+ " return model.generate_content(contents).text\n",
"\n",
"\n",
"blob_name = ground_truth_df.iloc[0][\"path\"]\n",
"response = classify_image(\n",
" model_id=MODEL_ID,\n",
" prompt=prompt,\n",
- " bucket_name=BUCKET_NAME,\n",
- " blob_name=f\"{FOLDER}/{blob_name}\",\n",
+ " gcs_prefix=GCS_PREFIX,\n",
+ " blob_name=blob_name,\n",
")\n",
"print(f\"blob_name: {blob_name}\")\n",
- "print(f\"response: {response.text}\")"
+ "print(f\"response: {response}\")"
]
},
{
@@ -432,8 +432,9 @@
"id": "Q79hx4JINXZh"
},
"source": [
- "### Create a New Line JSON file applying the prompt to each of the images and upload to a BigQuery table\n",
- "In this section, also an `evaluation_id` variable is created to identify the execution run."
+ "### Create a BigQuery table applying the prompt to each of the images\n",
+ "\n",
+ "In this section, an `evaluation_id` variable is created to identify the execution run."
]
},
{
@@ -444,94 +445,57 @@
},
"outputs": [],
"source": [
- "def create_newline_json_file(\n",
- " prompt: str,\n",
- " evaluation_ts: str,\n",
- " evaluation_id: str,\n",
- " file_name: str,\n",
- " bucket_name: str,\n",
- " folder: str,\n",
- " images_uri: list[str],\n",
- ") -> None:\n",
- " \"\"\"Creates a newline delimited JSON file.\"\"\"\n",
- " with open(file_name, \"w\") as outfile:\n",
- " for image_uri in images_uri:\n",
- " contents = download_blob_into_memory(bucket_name, f\"{folder}/{image_uri}\")\n",
- " encoded_image = base64.b64encode(contents).decode(\"utf-8\")\n",
- " request = {\n",
- " \"contents\": [\n",
- " {\n",
- " \"role\": \"user\",\n",
- " \"parts\": [\n",
- " {\"text\": prompt},\n",
- " {\n",
- " \"inlineData\": {\n",
- " \"mimeType\": \"image/jpeg\",\n",
- " \"data\": encoded_image,\n",
- " }\n",
- " },\n",
- " ],\n",
- " }\n",
- " ]\n",
- " }\n",
- " line = {\n",
- " \"evaluation_ts\": evaluation_ts,\n",
- " \"evaluation_id\": evaluation_id,\n",
- " \"prompt_text\": prompt,\n",
- " \"gcs_uri\": image_uri,\n",
- " \"request\": request,\n",
- " }\n",
- "\n",
- " outfile.write(json.dumps(line))\n",
- " outfile.write(\"\\n\")\n",
- "\n",
- "\n",
- "def upload_newline_json_file(json_file_name: str, prompts_table: str) -> None:\n",
- " \"\"\"Uploads a newline delimited JSON file to BigQuery.\"\"\"\n",
- " job_config = bigquery.LoadJobConfig(\n",
- " schema=[\n",
- " bigquery.SchemaField(\"evaluation_ts\", \"STRING\"),\n",
- " bigquery.SchemaField(\"evaluation_id\", \"STRING\"),\n",
- " bigquery.SchemaField(\"prompt_text\", \"STRING\"),\n",
- " bigquery.SchemaField(\"gcs_uri\", \"STRING\"),\n",
- " bigquery.SchemaField(\"request\", \"JSON\"),\n",
- " ],\n",
- " source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,\n",
- " write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,\n",
- " )\n",
- "\n",
- " with open(json_file_name, \"rb\") as source_file:\n",
- " job = bq_client.load_table_from_file(\n",
- " source_file, PROMPTS_TABLE, job_config=job_config\n",
- " )\n",
- "\n",
- " job.result()\n",
- " table = bq_client.get_table(prompts_table)\n",
- " print(\n",
- " f\"Loaded {table.num_rows} rows and {len(table.schema)} columns to \"\n",
- " f\"{prompts_table}\"\n",
- " )\n",
- "\n",
- "\n",
"# Use current time as identifier of the evaluation.\n",
"now = datetime.datetime.now()\n",
+ "evaluation_ts = str(now)\n",
"evaluation_id = f\"{now.year}_{now.month}_{now.day}_{now.hour}_{now.minute}\"\n",
"json_file_name = f\"/tmp/{evaluation_id}.json\"\n",
"\n",
"# Get URIs of the images from the ground truth table in BigQuery.\n",
"ground_truth_df = bpd.read_gbq(FILES_TABLE)\n",
- "images_uri = [row[\"path\"] for _, row in ground_truth_df.iterrows()]\n",
"\n",
- "create_newline_json_file(\n",
- " prompt=prompt,\n",
- " evaluation_ts=str(now),\n",
- " evaluation_id=evaluation_id,\n",
- " file_name=json_file_name,\n",
- " bucket_name=BUCKET_NAME,\n",
- " folder=FOLDER,\n",
- " images_uri=images_uri,\n",
+ "prompts_df = pd.DataFrame(\n",
+ " [\n",
+ " {\n",
+ " \"evaluation_ts\": evaluation_ts,\n",
+ " \"evaluation_id\": evaluation_id,\n",
+ " \"prompt_text\": prompt,\n",
+ " \"gcs_uri\": image_uri,\n",
+ " \"request\": json.dumps(\n",
+ " {\n",
+ " \"contents\": [\n",
+ " {\n",
+ " \"role\": \"user\",\n",
+ " \"parts\": [\n",
+ " {\"text\": prompt},\n",
+ " {\n",
+ " \"fileData\": {\n",
+ " \"mimeType\": \"image/jpeg\",\n",
+ " \"fileUri\": f\"{GCS_PREFIX}/{image_uri}\",\n",
+ " }\n",
+ " },\n",
+ " ],\n",
+ " }\n",
+ " ],\n",
+ " \"generationConfig\": {\"responseMimeType\": \"application/json\"},\n",
+ " }\n",
+ " ),\n",
+ " }\n",
+ " for image_uri in ground_truth_df[\"path\"].values\n",
+ " ]\n",
")\n",
- "upload_newline_json_file(json_file_name=json_file_name, prompts_table=PROMPTS_TABLE)"
+ "\n",
+ "# Save JSONL file\n",
+ "prompts_df.to_json(json_file_name, lines=True)\n",
+ "\n",
+ "# Upload to BQ\n",
+ "prompts_df.to_gbq(PROMPTS_TABLE, PROJECT_ID)\n",
+ "\n",
+ "table = bq_client.get_table(PROMPTS_TABLE)\n",
+ "print(\n",
+ " f\"Loaded {table.num_rows} rows and {len(table.schema)} columns to \"\n",
+ " f\"{PROMPTS_TABLE}\"\n",
+ ")"
]
},
{
@@ -559,14 +523,39 @@
" source_model=MODEL_ID,\n",
" input_dataset=f\"bq://{PROMPTS_TABLE}\",\n",
" output_uri_prefix=f\"bq://{text_generation_table}\",\n",
- ")\n",
- "\n",
- "# Poll until the batch prediction job completes.\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "acbe70fd175d"
+ },
+ "source": [
+ "To check the status of the job, run this cell."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "06982be54458"
+ },
+ "outputs": [],
+ "source": [
+ "# Refresh the job until complete\n",
"while not batch_job.has_ended:\n",
+ " time.sleep(10)\n",
" batch_job.refresh()\n",
- " print(f\"Batch job state: {batch_job.state}\")\n",
- " time.sleep(30)\n",
- "print(f\"Batch job state: {batch_job.state}\")"
+ "\n",
+ "# Check if the job succeeds\n",
+ "if batch_job.has_succeeded:\n",
+ " print(\"Job succeeded!\")\n",
+ "else:\n",
+ " print(f\"Job failed: {batch_job.error}\")\n",
+ "\n",
+ "# Check the location of the output\n",
+ "print(f\"Job output location: {batch_job.output_location}\")"
]
},
{
@@ -588,7 +577,7 @@
"source": [
"text_generation_df = bpd.read_gbq(text_generation_table)\n",
"for row in text_generation_df[\"response\"][:5]:\n",
- " print(json.loads(row)[0][\"content\"][\"parts\"][0][\"text\"])"
+ " print(json.loads(row)[\"candidates\"][0][\"content\"][\"parts\"][0][\"text\"])"
]
},
{
@@ -597,7 +586,7 @@
"id": "q96WpoTw4Kr6"
},
"source": [
- "##Create Views in BigQuery"
+ "## Create Views in BigQuery"
]
},
{
@@ -627,69 +616,14 @@
" view = bigquery.Table(results_view)\n",
"\n",
" view.view_query = rf\"\"\"\n",
- " WITH t1 AS\n",
- " (\n",
- " SELECT\n",
- " evaluation_id,\n",
- " evaluation_ts,\n",
- " prompt_text,\n",
- " gcs_uri,\n",
- " JSON_EXTRACT(response, '$[0].content.parts[0].text') AS json_data\n",
- " FROM `{text_generation_table_prefix}_*`\n",
- " ),\n",
- " t2 AS (\n",
- " SELECT\n",
- " evaluation_id,\n",
- " evaluation_ts,\n",
- " prompt_text,\n",
- " gcs_uri,\n",
- " REGEXP_EXTRACT(json_data, r'```json(.*)```') AS f\n",
- " FROM t1\n",
- " ),\n",
- " t3 AS(\n",
- " SELECT\n",
- " evaluation_id,\n",
- " evaluation_ts,\n",
- " prompt_text,\n",
- " gcs_uri,\n",
- " REPLACE(f, '\\\\n', '') AS f\n",
- " FROM t2\n",
- " ),\n",
- " t4 AS (\n",
- " SELECT\n",
- " evaluation_id,\n",
- " evaluation_ts,\n",
- " prompt_text,\n",
- " gcs_uri,\n",
- " REPLACE(f, '\\\\\"', '\"') AS f\n",
- " FROM t3\n",
- " ),\n",
- " t5 AS (\n",
- " SELECT\n",
- " evaluation_id,\n",
- " evaluation_ts,\n",
- " prompt_text,\n",
- " gcs_uri,\n",
- " JSON_QUERY(f, '$.sport') AS f\n",
- " FROM t4\n",
- " ),\n",
- " t6 AS (\n",
- " SELECT\n",
- " evaluation_id,\n",
- " evaluation_ts,\n",
- " prompt_text,\n",
- " gcs_uri,\n",
- " REPLACE(f, '\"', '') AS f\n",
- " FROM t5\n",
- " )\n",
- "\n",
- " SELECT\n",
+ " SELECT\n",
" evaluation_id,\n",
" evaluation_ts,\n",
" prompt_text,\n",
" gcs_uri,\n",
- " f AS label\n",
- " FROM t6\"\"\"\n",
+ " JSON_VALUE(JSON_VALUE(response, '$.candidates[0].content.parts[0].text'), \"$.sport\") AS label\n",
+ " FROM `{text_generation_table_prefix}_*`\n",
+ " \"\"\"\n",
"\n",
" # Make an API request to create the view.\n",
" view = bq_client.create_table(view, exists_ok=False)\n",
diff --git a/gemini/function-calling/forced_function_calling.ipynb b/gemini/function-calling/forced_function_calling.ipynb
index a7ee6756af..1d3b83c1ac 100644
--- a/gemini/function-calling/forced_function_calling.ipynb
+++ b/gemini/function-calling/forced_function_calling.ipynb
@@ -280,7 +280,7 @@
"outputs": [],
"source": [
"model = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\",\n",
+ " \"gemini-1.5-pro\",\n",
" generation_config=GenerationConfig(temperature=0),\n",
")"
]
diff --git a/gemini/function-calling/function_calling_data_structures.ipynb b/gemini/function-calling/function_calling_data_structures.ipynb
index b1516d79b5..2c142944cb 100644
--- a/gemini/function-calling/function_calling_data_structures.ipynb
+++ b/gemini/function-calling/function_calling_data_structures.ipynb
@@ -254,7 +254,7 @@
"outputs": [],
"source": [
"model = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\",\n",
+ " \"gemini-1.5-pro\",\n",
" generation_config=GenerationConfig(temperature=0),\n",
")"
]
diff --git a/gemini/function-calling/function_calling_service/main.py b/gemini/function-calling/function_calling_service/main.py
index 9bdda047f8..7735b8dc8e 100644
--- a/gemini/function-calling/function_calling_service/main.py
+++ b/gemini/function-calling/function_calling_service/main.py
@@ -57,7 +57,7 @@
)
model = GenerativeModel(
- "gemini-1.5-flash-001",
+ "gemini-1.5-flash",
generation_config=GenerationConfig(temperature=0),
tools=[location_tool],
)
diff --git a/gemini/function-calling/intro_diy_react_agent.ipynb b/gemini/function-calling/intro_diy_react_agent.ipynb
index bf1c46249f..b021bece36 100644
--- a/gemini/function-calling/intro_diy_react_agent.ipynb
+++ b/gemini/function-calling/intro_diy_react_agent.ipynb
@@ -3,7 +3,6 @@
{
"cell_type": "code",
"execution_count": null,
- "id": "5C5UHf-cgV0h",
"metadata": {
"id": "5C5UHf-cgV0h"
},
@@ -26,7 +25,6 @@
},
{
"cell_type": "markdown",
- "id": "jXgCZ0VqYfNM",
"metadata": {
"id": "jXgCZ0VqYfNM"
},
@@ -60,7 +58,6 @@
},
{
"cell_type": "markdown",
- "id": "a221cf0c-119c-4fa3-9987-3b202a51583c",
"metadata": {
"id": "f92df1bde019"
},
@@ -72,7 +69,6 @@
},
{
"cell_type": "markdown",
- "id": "tGULENpgf_Pz",
"metadata": {
"id": "tGULENpgf_Pz"
},
@@ -103,7 +99,6 @@
},
{
"cell_type": "markdown",
- "id": "3781462d-8394-4308-b669-d2d475b274ca",
"metadata": {
"id": "d295151a7c9b"
},
@@ -117,7 +112,6 @@
},
{
"cell_type": "markdown",
- "id": "0d87b108-ca63-4eab-b756-9d7680b8a3e0",
"metadata": {
"id": "6edffc043e41"
},
@@ -139,7 +133,6 @@
},
{
"cell_type": "markdown",
- "id": "6e0c0345-b7dd-465a-b65c-4504b89e630c",
"metadata": {
"id": "d4d798879063"
},
@@ -159,7 +152,6 @@
},
{
"cell_type": "markdown",
- "id": "62204b44-e3c6-4528-9172-3549aa0698d0",
"metadata": {
"id": "39992f621eb0"
},
@@ -175,7 +167,6 @@
},
{
"cell_type": "markdown",
- "id": "f2a1da62-e8a9-43fe-aef7-051b132b11fe",
"metadata": {
"id": "39fb8cb87102"
},
@@ -185,7 +176,6 @@
},
{
"cell_type": "markdown",
- "id": "6aa53e57-062a-47d9-8325-a0525dfc3449",
"metadata": {
"id": "996eae6d82d3"
},
@@ -197,7 +187,6 @@
{
"cell_type": "code",
"execution_count": null,
- "id": "e69dbf86-7de2-44b8-8a70-16ddb8f22e9a",
"metadata": {
"id": "2969acc84135"
},
@@ -208,7 +197,6 @@
},
{
"cell_type": "markdown",
- "id": "c26e4ddd-2d61-481a-a7a6-fa274996160a",
"metadata": {
"id": "70b9a7f00179"
},
@@ -221,7 +209,6 @@
{
"cell_type": "code",
"execution_count": null,
- "id": "8f9a0ed1-b7e0-4a8e-9548-517326e83ab8",
"metadata": {
"id": "cf71cbda01f9"
},
@@ -236,7 +223,6 @@
},
{
"cell_type": "markdown",
- "id": "449a49f5-d2a8-4a98-8575-76e07e1c5022",
"metadata": {
"id": "02d6dfc513c3"
},
@@ -248,7 +234,6 @@
},
{
"cell_type": "markdown",
- "id": "92700fe3-fe5c-4654-b68f-dc8c63ad20f5",
"metadata": {
"id": "575d71266b5b"
},
@@ -261,7 +246,6 @@
{
"cell_type": "code",
"execution_count": null,
- "id": "ed72b998-17fb-4459-a868-79c0967dc3ed",
"metadata": {
"id": "b95a7aa9f3d8"
},
@@ -279,7 +263,6 @@
},
{
"cell_type": "markdown",
- "id": "b68d461a-c16b-47ec-8e8e-75812fce086e",
"metadata": {
"id": "4ee80c5b9d54"
},
@@ -294,7 +277,6 @@
{
"cell_type": "code",
"execution_count": 1,
- "id": "129d4be8-433d-4e15-8c48-5cb300f38d1e",
"metadata": {
"id": "a41550d555ea"
},
@@ -310,7 +292,6 @@
},
{
"cell_type": "markdown",
- "id": "528cbd2e-4a43-4642-ab53-ab1235fd5cb5",
"metadata": {
"id": "71b40692ace5"
},
@@ -321,7 +302,6 @@
{
"cell_type": "code",
"execution_count": 2,
- "id": "a9abdc7f-52e0-454a-8d65-28983e6e7792",
"metadata": {
"id": "a17e54f9eb9d"
},
@@ -339,7 +319,6 @@
},
{
"cell_type": "markdown",
- "id": "Az-OexEYJ9_I",
"metadata": {
"id": "Az-OexEYJ9_I"
},
@@ -350,14 +329,13 @@
{
"cell_type": "code",
"execution_count": 3,
- "id": "fY9wu9XUcxzy",
"metadata": {
"id": "fY9wu9XUcxzy"
},
"outputs": [],
"source": [
"model = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\",\n",
+ " \"gemini-1.5-pro\",\n",
" system_instruction=[\n",
" \"You are an assistant that helps me tidy my room.\"\n",
" \"Your goal is to make sure all the books are on the shelf, all clothes are in the hamper, and the trash is empty.\",\n",
@@ -376,7 +354,6 @@
},
{
"cell_type": "markdown",
- "id": "f8253601-29f8-408e-9fa8-c3e01f2f33d7",
"metadata": {
"id": "4d292a1ed394"
},
@@ -387,7 +364,6 @@
{
"cell_type": "code",
"execution_count": 4,
- "id": "95f75c8d-b7b7-4f6b-b3f5-80815dbf04a8",
"metadata": {
"id": "b724d6f5d271"
},
@@ -399,7 +375,6 @@
{
"cell_type": "code",
"execution_count": 5,
- "id": "dnk1DurbxspA",
"metadata": {
"id": "dnk1DurbxspA"
},
@@ -444,7 +419,6 @@
},
{
"cell_type": "markdown",
- "id": "DyVd9-OALAKc",
"metadata": {
"id": "DyVd9-OALAKc"
},
@@ -456,7 +430,6 @@
{
"cell_type": "code",
"execution_count": 6,
- "id": "sho-9jxJW7Fe",
"metadata": {
"id": "sho-9jxJW7Fe"
},
@@ -505,7 +478,6 @@
},
{
"cell_type": "markdown",
- "id": "4718fe78-b61f-4643-b746-c4361ce55ef5",
"metadata": {
"id": "cdca5fc7ee78"
},
@@ -517,7 +489,6 @@
{
"cell_type": "code",
"execution_count": 7,
- "id": "91212e66-54d1-401a-9bb8-d5520685ee74",
"metadata": {
"id": "163e0a64b8b1"
},
@@ -540,7 +511,6 @@
},
{
"cell_type": "markdown",
- "id": "4e34ffb4-a2e5-405b-a028-49de82a57e78",
"metadata": {
"id": "da5935b90607"
},
@@ -551,7 +521,6 @@
{
"cell_type": "code",
"execution_count": 8,
- "id": "l5V0GIOASWm8",
"metadata": {
"id": "l5V0GIOASWm8"
},
@@ -589,7 +558,6 @@
},
{
"cell_type": "markdown",
- "id": "L2Ytt0GjKfRv",
"metadata": {
"id": "L2Ytt0GjKfRv"
},
@@ -599,7 +567,6 @@
},
{
"cell_type": "markdown",
- "id": "zbzqp2YJ3bfc",
"metadata": {
"id": "zbzqp2YJ3bfc"
},
@@ -642,7 +609,6 @@
},
{
"cell_type": "markdown",
- "id": "Obyi7GxaUXjE",
"metadata": {
"id": "Obyi7GxaUXjE"
},
@@ -659,7 +625,6 @@
{
"cell_type": "code",
"execution_count": 9,
- "id": "MzlEFdXVKAFm",
"metadata": {
"id": "MzlEFdXVKAFm"
},
@@ -721,7 +686,6 @@
{
"cell_type": "code",
"execution_count": 10,
- "id": "4GGRQo8WQvV0",
"metadata": {
"id": "4GGRQo8WQvV0"
},
@@ -791,7 +755,6 @@
},
{
"cell_type": "markdown",
- "id": "VY6cFvSvhAmt",
"metadata": {
"id": "VY6cFvSvhAmt"
},
@@ -818,7 +781,6 @@
{
"cell_type": "code",
"execution_count": 11,
- "id": "c7f1cf4c-b052-4c3b-9b39-fe9a81ba33f0",
"metadata": {
"id": "c69dbb409b30"
},
@@ -887,7 +849,6 @@
},
{
"cell_type": "markdown",
- "id": "1g8wcyWLay_8",
"metadata": {
"id": "1g8wcyWLay_8"
},
@@ -897,7 +858,6 @@
},
{
"cell_type": "markdown",
- "id": "bwpqqrqcfL6l",
"metadata": {
"id": "bwpqqrqcfL6l"
},
@@ -911,7 +871,6 @@
{
"cell_type": "code",
"execution_count": 12,
- "id": "Yk872z-Rax0l",
"metadata": {
"id": "Yk872z-Rax0l"
},
@@ -987,7 +946,6 @@
{
"cell_type": "code",
"execution_count": 13,
- "id": "I6dcLLuTduZY",
"metadata": {
"id": "I6dcLLuTduZY"
},
@@ -1079,7 +1037,6 @@
},
{
"cell_type": "markdown",
- "id": "DUTSUNDHfHS6",
"metadata": {
"id": "DUTSUNDHfHS6"
},
@@ -1090,7 +1047,6 @@
{
"cell_type": "code",
"execution_count": 14,
- "id": "VjhvHk8wfGnc",
"metadata": {
"id": "VjhvHk8wfGnc"
},
@@ -1157,7 +1113,6 @@
},
{
"cell_type": "markdown",
- "id": "Xylz5_c8foms",
"metadata": {
"id": "Xylz5_c8foms"
},
@@ -1171,7 +1126,6 @@
},
{
"cell_type": "markdown",
- "id": "bIDKrkZ1-3ke",
"metadata": {
"id": "bIDKrkZ1-3ke"
},
@@ -1183,7 +1137,6 @@
{
"cell_type": "code",
"execution_count": 15,
- "id": "I8xgYekJ-_Ks",
"metadata": {
"id": "I8xgYekJ-_Ks"
},
@@ -1238,7 +1191,6 @@
},
{
"cell_type": "markdown",
- "id": "Y9k3LX6fDlzB",
"metadata": {
"id": "Y9k3LX6fDlzB"
},
@@ -1248,7 +1200,6 @@
},
{
"cell_type": "markdown",
- "id": "f6dce86a-956d-4744-ab41-b0acc01d3fee",
"metadata": {
"id": "1Rp8IV5XDla9"
},
@@ -1257,7 +1208,7 @@
"\n",
"```\n",
"model_fc = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\", \n",
+ " \"gemini-1.5-pro\", \n",
" system_instruction=[\n",
" \"You are an assistant that helps me tidy my room.\"\n",
" \"Your goal is to make sure all the books are on the shelf, all clothes are in the hamper, and the trash is empty.\",\n",
@@ -1270,7 +1221,6 @@
},
{
"cell_type": "markdown",
- "id": "ZiqEr7OwCs4v",
"metadata": {
"id": "ZiqEr7OwCs4v"
},
@@ -1322,7 +1272,6 @@
},
{
"cell_type": "markdown",
- "id": "598e828e-b569-4ac9-810c-cdd54b0911d4",
"metadata": {
"id": "e98c7c33c936"
},
@@ -1342,7 +1291,6 @@
{
"cell_type": "code",
"execution_count": 16,
- "id": "491c50c1-72e2-4593-88cb-0cc494fe2db4",
"metadata": {
"id": "f66655a3b3a7"
},
@@ -1394,7 +1342,6 @@
{
"cell_type": "code",
"execution_count": 17,
- "id": "f3166b71-583a-4ba9-9457-77f6f30978d3",
"metadata": {
"id": "da084bc71468"
},
@@ -1418,7 +1365,6 @@
{
"cell_type": "code",
"execution_count": 18,
- "id": "cf7a6151-cd48-4218-9996-7086c4e1bce2",
"metadata": {
"id": "6fb3f986f6e7"
},
@@ -1465,7 +1411,6 @@
{
"cell_type": "code",
"execution_count": 19,
- "id": "0ekJ1kScDNc1",
"metadata": {
"id": "0ekJ1kScDNc1"
},
diff --git a/gemini/function-calling/intro_function_calling.ipynb b/gemini/function-calling/intro_function_calling.ipynb
index 50b7eee8ee..30d4eb523a 100644
--- a/gemini/function-calling/intro_function_calling.ipynb
+++ b/gemini/function-calling/intro_function_calling.ipynb
@@ -392,7 +392,7 @@
"outputs": [],
"source": [
"model = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\",\n",
+ " \"gemini-1.5-pro\",\n",
" generation_config=GenerationConfig(temperature=0),\n",
" tools=[retail_tool],\n",
")\n",
diff --git a/gemini/function-calling/multimodal_function_calling.ipynb b/gemini/function-calling/multimodal_function_calling.ipynb
index e6176d9ddb..6e4779725f 100644
--- a/gemini/function-calling/multimodal_function_calling.ipynb
+++ b/gemini/function-calling/multimodal_function_calling.ipynb
@@ -153,7 +153,7 @@
},
"outputs": [],
"source": [
- "# %pip install --upgrade --user --quiet google-cloud-aiplatform wikipedia"
+ "%pip install --upgrade --user --quiet google-cloud-aiplatform wikipedia"
]
},
{
@@ -175,11 +175,11 @@
},
"outputs": [],
"source": [
- "# # Restart kernel after installs so that your environment can access the new packages\n",
- "# import IPython\n",
+ "# Restart kernel after installs so that your environment can access the new packages\n",
+ "import IPython\n",
"\n",
- "# app = IPython.Application.instance()\n",
- "# app.kernel.do_shutdown(True)"
+ "app = IPython.Application.instance()\n",
+ "app.kernel.do_shutdown(True)"
]
},
{
@@ -382,7 +382,7 @@
}
],
"source": [
- "model = GenerativeModel(\"gemini-1.5-pro-001\")\n",
+ "model = GenerativeModel(\"gemini-1.5-pro\")\n",
"generation_config = GenerationConfig(temperature=0)\n",
"\n",
"response = model.generate_content(\n",
@@ -703,7 +703,7 @@
}
],
"source": [
- "model = GenerativeModel(\"gemini-1.5-pro-001\")\n",
+ "model = GenerativeModel(\"gemini-1.5-pro\")\n",
"generation_config = GenerationConfig(temperature=0)\n",
"\n",
"response = model.generate_content(\n",
@@ -926,7 +926,7 @@
}
],
"source": [
- "model = GenerativeModel(\"gemini-1.5-pro-001\")\n",
+ "model = GenerativeModel(\"gemini-1.5-pro\")\n",
"generation_config = GenerationConfig(temperature=0)\n",
"\n",
"response = model.generate_content(\n",
@@ -1140,7 +1140,7 @@
}
],
"source": [
- "model = GenerativeModel(\"gemini-1.5-pro-001\")\n",
+ "model = GenerativeModel(\"gemini-1.5-pro\")\n",
"generation_config = GenerationConfig(temperature=0)\n",
"\n",
"response = model.generate_content(\n",
@@ -1361,7 +1361,7 @@
"outputs": [],
"source": [
"model = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\",\n",
+ " \"gemini-1.5-pro\",\n",
" generation_config=GenerationConfig(temperature=0),\n",
" tools=[chat_tool],\n",
")\n",
diff --git a/gemini/function-calling/parallel_function_calling.ipynb b/gemini/function-calling/parallel_function_calling.ipynb
index ad0d762b63..9272095d26 100644
--- a/gemini/function-calling/parallel_function_calling.ipynb
+++ b/gemini/function-calling/parallel_function_calling.ipynb
@@ -350,7 +350,7 @@
"outputs": [],
"source": [
"model = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\",\n",
+ " \"gemini-1.5-pro\",\n",
" generation_config=GenerationConfig(temperature=0),\n",
" tools=[wikipedia_tool],\n",
")\n",
@@ -634,7 +634,7 @@
"outputs": [],
"source": [
"model = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\",\n",
+ " \"gemini-1.5-pro\",\n",
" generation_config=GenerationConfig(temperature=0),\n",
" tools=[wikipedia_tool],\n",
")\n",
diff --git a/gemini/function-calling/sql-talk-app/app.py b/gemini/function-calling/sql-talk-app/app.py
index 4661a27b0e..da3d1f4df1 100644
--- a/gemini/function-calling/sql-talk-app/app.py
+++ b/gemini/function-calling/sql-talk-app/app.py
@@ -76,7 +76,7 @@
)
model = GenerativeModel(
- "gemini-1.5-pro-001",
+ "gemini-1.5-pro",
generation_config={"temperature": 0},
tools=[sql_query_tool],
)
diff --git a/gemini/function-calling/use_case_company_news_and_insights.ipynb b/gemini/function-calling/use_case_company_news_and_insights.ipynb
index cba08a327f..c32ef8a7e4 100644
--- a/gemini/function-calling/use_case_company_news_and_insights.ipynb
+++ b/gemini/function-calling/use_case_company_news_and_insights.ipynb
@@ -3,7 +3,6 @@
{
"cell_type": "code",
"execution_count": null,
- "id": "7331ed20-4a7e-4ee3-8454-528ac48315f0",
"metadata": {
"id": "2eec5cc39a59"
},
@@ -26,7 +25,6 @@
},
{
"cell_type": "markdown",
- "id": "e6aa9ed3-54c1-48e3-8c8f-e8f3cf152ec3",
"metadata": {
"id": "7c5e37584117"
},
@@ -59,7 +57,6 @@
},
{
"cell_type": "markdown",
- "id": "e762db26-12ed-4287-b79f-37de312c6b59",
"metadata": {
"id": "3d8f91830324"
},
@@ -71,7 +68,6 @@
},
{
"cell_type": "markdown",
- "id": "dbc3cc7d-5349-4f8f-8244-751cb51813eb",
"metadata": {
"id": "7102b39c47f0"
},
@@ -89,7 +85,6 @@
},
{
"cell_type": "markdown",
- "id": "7e82832d-fb80-4b04-9905-79bc6008d42c",
"metadata": {
"id": "be7720612149"
},
@@ -99,7 +94,6 @@
},
{
"cell_type": "markdown",
- "id": "5a0b9005-e629-4206-b7ee-635d36423760",
"metadata": {
"id": "25543ba572f7"
},
@@ -109,7 +103,6 @@
},
{
"cell_type": "markdown",
- "id": "9eee6b6e-3dc0-4f6b-854d-1d6989b95931",
"metadata": {
"id": "308fb67d0e91"
},
@@ -134,7 +127,6 @@
},
{
"cell_type": "markdown",
- "id": "e8c374a2-f30b-4a20-9643-489e11d6e4b4",
"metadata": {
"id": "0f07a65d6572"
},
@@ -161,7 +153,6 @@
},
{
"cell_type": "markdown",
- "id": "56bad415-a400-42c3-800f-5c700a6c6cd7",
"metadata": {
"id": "fcd6fe2a7b6d"
},
@@ -174,7 +165,6 @@
{
"cell_type": "code",
"execution_count": 1,
- "id": "ad3fc543-8dc7-43da-b095-0e4a965b7de4",
"metadata": {
"id": "cc681958b8f7"
},
@@ -185,7 +175,6 @@
},
{
"cell_type": "markdown",
- "id": "5c810968-005f-4776-8d2b-99e04a49b550",
"metadata": {
"id": "8ed31279f009"
},
@@ -200,7 +189,6 @@
{
"cell_type": "code",
"execution_count": 2,
- "id": "1e0edafe-5cf9-4979-87fa-79958402f9dc",
"metadata": {
"id": "567212ff53a6"
},
@@ -214,7 +202,6 @@
},
{
"cell_type": "markdown",
- "id": "b27adccc-967a-469c-a05e-5a1eabaef1a3",
"metadata": {
"id": "972161a99004"
},
@@ -226,7 +213,6 @@
},
{
"cell_type": "markdown",
- "id": "299649d4-7959-455a-8971-d0d3bb5cc474",
"metadata": {
"id": "4460fec7ce2e"
},
@@ -239,7 +225,6 @@
{
"cell_type": "code",
"execution_count": 3,
- "id": "e641fb7d-2ed9-492e-ba8e-fabc86721630",
"metadata": {
"id": "59c887aafd02"
},
@@ -255,7 +240,6 @@
},
{
"cell_type": "markdown",
- "id": "95c810db-eeea-4475-8c03-9016429272f8",
"metadata": {
"id": "56e1e9ca7ddc"
},
@@ -270,7 +254,6 @@
{
"cell_type": "code",
"execution_count": 1,
- "id": "529b3f2d-c270-4937-9b70-16651f260125",
"metadata": {
"id": "41c34a683156"
},
@@ -286,7 +269,6 @@
},
{
"cell_type": "markdown",
- "id": "a0b04f91-62a4-48cf-8a84-50d41aaf5d79",
"metadata": {
"id": "26bcfc24e720"
},
@@ -297,7 +279,6 @@
{
"cell_type": "code",
"execution_count": 2,
- "id": "51a1e99c-a72d-4f83-8c5c-f1e67592230d",
"metadata": {
"id": "6d249cfcaf52"
},
@@ -316,7 +297,6 @@
},
{
"cell_type": "markdown",
- "id": "521aa76f",
"metadata": {
"id": "a8a3a9ff0f3d"
},
@@ -330,7 +310,6 @@
},
{
"cell_type": "markdown",
- "id": "d93f45cd-3cbf-4b54-be6a-bbead953a712",
"metadata": {
"id": "1f968100a531"
},
@@ -340,7 +319,6 @@
},
{
"cell_type": "markdown",
- "id": "194f894d",
"metadata": {
"id": "bf4ba70381cd"
},
@@ -355,7 +333,6 @@
{
"cell_type": "code",
"execution_count": 3,
- "id": "fd4dbed2-35f3-40b7-8bdf-abd598d948e7",
"metadata": {
"id": "82979452ed3f"
},
@@ -426,7 +403,6 @@
},
{
"cell_type": "markdown",
- "id": "91bd0576",
"metadata": {
"id": "43373d59999e"
},
@@ -439,7 +415,6 @@
{
"cell_type": "code",
"execution_count": 4,
- "id": "6d5cd101-14d7-4562-8f5c-73a08fdc28b2",
"metadata": {
"id": "8f75dcb05e2f"
},
@@ -457,7 +432,6 @@
},
{
"cell_type": "markdown",
- "id": "120c5732-68b4-42c8-8f11-2e89f52e8b33",
"metadata": {
"id": "7d4cc72d8e6c"
},
@@ -472,7 +446,6 @@
{
"cell_type": "code",
"execution_count": 5,
- "id": "bdbbab72-a7f1-4bad-9c5a-3e69431c6309",
"metadata": {
"id": "643a1ff45db9"
},
@@ -484,7 +457,6 @@
},
{
"cell_type": "markdown",
- "id": "bf483769-25ac-40f2-bf17-7da437f2e68a",
"metadata": {
"id": "1e32e0694d45"
},
@@ -494,7 +466,6 @@
},
{
"cell_type": "markdown",
- "id": "6930cc77-577d-4dd5-998e-af0db2ca9971",
"metadata": {
"id": "706839cf3dba"
},
@@ -504,7 +475,6 @@
},
{
"cell_type": "markdown",
- "id": "6a2ad181",
"metadata": {
"id": "4ef2ec1e97eb"
},
@@ -515,7 +485,6 @@
{
"cell_type": "code",
"execution_count": 6,
- "id": "672ee652-a36f-4267-8e54-84bc3ad07bf7",
"metadata": {
"id": "81b9950967e1"
},
@@ -547,7 +516,6 @@
},
{
"cell_type": "markdown",
- "id": "16bde708-dfbb-46e9-8070-d8ef687c9a06",
"metadata": {
"id": "a9f9c6b3fb4a"
},
@@ -558,7 +526,6 @@
{
"cell_type": "code",
"execution_count": 7,
- "id": "c2594766-4f93-4dbf-982b-c50131a9fbd9",
"metadata": {
"id": "900d2529a6ac"
},
@@ -574,7 +541,6 @@
},
{
"cell_type": "markdown",
- "id": "25bf5867",
"metadata": {
"id": "87eb49d4fe0f"
},
@@ -587,14 +553,13 @@
{
"cell_type": "code",
"execution_count": 8,
- "id": "9938f4ec-edd7-4738-85bd-ed125bdd54c4",
"metadata": {
"id": "11cb17ef1096"
},
"outputs": [],
"source": [
"gemini_model = GenerativeModel(\n",
- " \"gemini-1.5-pro-001\",\n",
+ " \"gemini-1.5-pro-002\",\n",
" generation_config=GenerationConfig(temperature=0),\n",
" tools=[company_insights_tool],\n",
")"
@@ -602,7 +567,6 @@
},
{
"cell_type": "markdown",
- "id": "f558dfaf-3b8b-4306-94e1-fe13f07361f7",
"metadata": {
"id": "7cdfddffe83b"
},
@@ -613,7 +577,6 @@
{
"cell_type": "code",
"execution_count": 9,
- "id": "1bd88a23-363e-4b8f-bb42-6cb0c841d8be",
"metadata": {
"id": "5be6ada7bb71"
},
@@ -624,7 +587,6 @@
},
{
"cell_type": "markdown",
- "id": "71865ea8-7dd2-426a-b66e-eaa4df23e6d5",
"metadata": {
"id": "711af0ca6d5a"
},
@@ -638,7 +600,6 @@
},
{
"cell_type": "markdown",
- "id": "7cdfa14f-dc76-464b-ba73-e7f22634b542",
"metadata": {
"id": "7f5389c4a114"
},
@@ -648,7 +609,6 @@
},
{
"cell_type": "markdown",
- "id": "2b952425-29c4-4c81-b7ef-adf198ed2c20",
"metadata": {
"id": "0e00b8dd9c4b"
},
@@ -659,7 +619,6 @@
{
"cell_type": "code",
"execution_count": 10,
- "id": "87f92513-c96f-44f2-90f8-488bf700fd32",
"metadata": {
"id": "d9130fa4193a"
},
@@ -718,7 +677,6 @@
},
{
"cell_type": "markdown",
- "id": "7eec74fa-14f9-4693-8c1a-cce19937eb9d",
"metadata": {
"id": "5ea534893263"
},
@@ -737,7 +695,6 @@
{
"cell_type": "code",
"execution_count": 11,
- "id": "11166e3d",
"metadata": {
"id": "daf9c088624e"
},
@@ -866,7 +823,6 @@
},
{
"cell_type": "markdown",
- "id": "9a5a41a8",
"metadata": {
"id": "0fc0ec32b4fe"
},
@@ -887,7 +843,6 @@
{
"cell_type": "code",
"execution_count": 12,
- "id": "ea971e94-a838-4165-87f7-1b2d0833f92a",
"metadata": {
"id": "04f4c36f5bb5"
},
@@ -1025,7 +980,6 @@
},
{
"cell_type": "markdown",
- "id": "95571d4d-8c19-45ba-beed-17d14ef4d0ad",
"metadata": {
"id": "12b76d95eb48"
},
@@ -1042,7 +996,6 @@
{
"cell_type": "code",
"execution_count": 13,
- "id": "86e6d97d-d282-4e8d-8f1e-c325f988cedf",
"metadata": {
"id": "c377e4b2d33b"
},
@@ -1276,7 +1229,6 @@
},
{
"cell_type": "markdown",
- "id": "bd81f8c1-41d8-4de6-b83d-e5ccd34ba6c4",
"metadata": {
"id": "6989344d0365"
},
@@ -1293,7 +1245,6 @@
{
"cell_type": "code",
"execution_count": 14,
- "id": "af6681a8-7600-4492-87c6-a239725e9eae",
"metadata": {
"id": "50edfeabd1c4"
},
@@ -1424,7 +1375,6 @@
},
{
"cell_type": "markdown",
- "id": "b0be49a0",
"metadata": {
"id": "8ed746e33ba0"
},
@@ -1441,7 +1391,6 @@
{
"cell_type": "code",
"execution_count": 15,
- "id": "c5e6d187-238c-41fd-91fb-6d2eacbdb457",
"metadata": {
"id": "f912100fcddc"
},
@@ -1569,7 +1518,6 @@
},
{
"cell_type": "markdown",
- "id": "da28b7a9",
"metadata": {
"id": "a94de4cd2309"
},
diff --git a/gemini/orchestration/intro_langgraph_gemini.ipynb b/gemini/orchestration/intro_langgraph_gemini.ipynb
index 1bea27fc85..8520986fcd 100644
--- a/gemini/orchestration/intro_langgraph_gemini.ipynb
+++ b/gemini/orchestration/intro_langgraph_gemini.ipynb
@@ -330,7 +330,7 @@
},
"outputs": [],
"source": [
- "model = ChatVertexAI(model=\"gemini-1.5-pro-002\", temperature=0)"
+ "model = ChatVertexAI(model=\"gemini-1.5-pro\", temperature=0)"
]
},
{
@@ -549,7 +549,7 @@
"outputs": [],
"source": [
"# Define the system prompt instructing the agent on how to answer the user's question.\n",
- "SYSTEM_PROMPT = \"\"\"Use the tools provided to answer the user's question. \n",
+ "SYSTEM_PROMPT = \"\"\"Use the tools provided to answer the user's question.\n",
"Also look up the exchange rate for all currencies from three months prior\n",
"and compare the values.\"\"\"\n",
"\n",
diff --git a/gemini/orchestration/langgraph_gemini_podcast.ipynb b/gemini/orchestration/langgraph_gemini_podcast.ipynb
new file mode 100644
index 0000000000..dc4289441b
--- /dev/null
+++ b/gemini/orchestration/langgraph_gemini_podcast.ipynb
@@ -0,0 +1,1430 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "id": "ur8xi4C7S06n"
+ },
+ "outputs": [],
+ "source": [
+ "# Copyright 2024 Google LLC\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JAPoU8Sm5E6e"
+ },
+ "source": [
+ "# Build Your Own AI Podcasting Agent with LangGraph & Gemini\n",
+ "## AI-Powered Podcast Creation with Automated Research, Writing, and Refinement\n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "84f0f73a0f76"
+ },
+ "source": [
+ "| | |\n",
+ "|-|-|\n",
+ "| Author(s) | [Kristopher Overholt](https://github.com/koverholt/) |"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "tvgnzT1CKxrO"
+ },
+ "source": [
+ "## Overview\n",
+ "\n",
+ "Creating a podcast can be a very involved process, requiring extensive research, writing, editing, and production. **What if there was a way to leverage the power of AI to streamline the creation of a podcast, automating many of the tasks traditionally performed by humans?** [NotebookLM](https://notebooklm.google.com/), for example, lets users easily generate [audio overviews based on documents](https://blog.google/technology/ai/notebooklm-audio-overviews/).\n",
+ "\n",
+ "
\n",
+ "\n",
+ "But what if you want to customize the length of the podcast, the voices, or the conversation flow and augment it with additional research tools? In this notebook, **you'll recreate this kind of podcast generation functionality by building an AI agent to do the heavy lifting and then customize the entire flow yourself!**\n",
+ "\n",
+ "**This notebook demonstrates how to build a [LangGraph](https://langchain-ai.github.io/langgraph/)-powered AI agent to research, write, and refine a podcast script using the [Gemini API in in Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models).** You'll use LangGraph and LangChain to orchestrate calls to Gemini along with calls to different search tools, allowing the AI to learn about a given topic before writing about it. Then, the AI will critique its work and iterate on the podcast script, improving it with each revision.\n",
+ "\n",
+ "Here's how you'll build and use our AI podcasting agent:\n",
+ "\n",
+ "- **[User]** Define the podcast topic: Provide a clear and concise topic for the podcast.\n",
+ "- **[Agent]** Generate an outline: Use Gemini to create a high-level outline, structuring the podcast's flow.\n",
+ "- **[Agent]** Conduct research: The AI agent will use search tools like arXiv, PubMed, and Wikipedia to gather relevant information.\n",
+ "- **[Agent]** Write a script: Gemini will generate an engaging podcast script, incorporating the research findings.\n",
+ "- **[Agent]** Critique and iterate: The agent will analyze its script, provide a critique, then generate a revised draft.\n",
+ "- **[Agent]** Generate audio: You'll use text-to-speech to generate audio for each line of the podcast script.\n",
+ "\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "61RBz8LLbxCR"
+ },
+ "source": [
+ "## Get started\n",
+ "\n",
+ "This section sets up the environment for the AI podcast agent. This includes:\n",
+ "\n",
+ "- **Installing Libraries:** Installing the required Python libraries\n",
+ "- **Restarting Runtime (Colab Only):** Restarting the Colab runtime\n",
+ "- **Authenticating Environment (Colab Only):** Authenticating to Google Cloud\n",
+ "- **Setting Project Information:** Setting up your Google Cloud project"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "No17Cw5hgx12"
+ },
+ "source": [
+ "### Install Vertex AI SDK and other required packages\n",
+ "\n",
+ "This code cell installs the necessary Python libraries for running the AI podcast agent."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "tFy3H3aPgx12"
+ },
+ "outputs": [],
+ "source": [
+ "%pip install -q -U \\\n",
+ " arxiv \\\n",
+ " google-cloud-texttospeech \\\n",
+ " langgraph \\\n",
+ " langchain-google-vertexai \\\n",
+ " langchain-community \\\n",
+ " pydub \\\n",
+ " pymupdf \\\n",
+ " wikipedia \\\n",
+ " xmltodict"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "R5Xep4W9lq-Z"
+ },
+ "source": [
+ "### Restart runtime\n",
+ "\n",
+ "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
+ "\n",
+ "The restart might take a minute or longer. After it's restarted, continue to the next step."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "id": "XRvKdaPDTznN"
+ },
+ "outputs": [],
+ "source": [
+ "import IPython\n",
+ "\n",
+ "app = IPython.Application.instance()\n",
+ "app.kernel.do_shutdown(True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SbmM4z7FOBpM"
+ },
+ "source": [
+ "
\n",
+ "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dmWOrTJ3gx13"
+ },
+ "source": [
+ "### Authenticate your notebook environment (Colab only)\n",
+ "\n",
+ "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "id": "NyKGtVQjgx13"
+ },
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "\n",
+ "if \"google.colab\" in sys.modules:\n",
+ " from google.colab import auth\n",
+ "\n",
+ " auth.authenticate_user()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DF4l8DTdWgPY"
+ },
+ "source": [
+ "### Set Google Cloud project information and initialize Vertex AI SDK\n",
+ "\n",
+ "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
+ "\n",
+ "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "id": "Nqwi-5ufWp_B"
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "import vertexai\n",
+ "\n",
+ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\", isTemplate: true}\n",
+ "if PROJECT_ID == \"[your-project-id]\":\n",
+ " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
+ "\n",
+ "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
+ "\n",
+ "vertexai.init(project=PROJECT_ID, location=LOCATION)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EdvJRUWRNGHE"
+ },
+ "source": [
+ "## Building the AI podcasting agent\n",
+ "\n",
+ "This section constructs the AI agent. Key steps include:\n",
+ "\n",
+ "- **Initializing Agent Memory and State:** Setting up the agent's memory and defining its data structure\n",
+ "- **Initializing the Gemini Model:** Loading the Gemini language model from Vertex AI\n",
+ "- **Defining Search Tools:** Creating tools to access information sources like arXiv, PubMed, and Wikipedia\n",
+ "- **Defining Workflow Stages:** Defining each stage of the workflow, including prompts and functions\n",
+ "- **Compiling the Workflow:** Structuring the workflow as a graph using LangGraph"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5303c05f7aa6"
+ },
+ "source": [
+ "### Import libraries\n",
+ "\n",
+ "This section imports the necessary libraries for LangGraph, LangChain, Vertex AI, and other utilities needed for your agent's functionality.\n",
+ "\n",
+ "This includes tools for interacting with the Gemini API, defining custom tools, managing agent state, and displaying results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "id": "6fc324893334"
+ },
+ "outputs": [],
+ "source": [
+ "# Common libraries\n",
+ "import logging\n",
+ "import os\n",
+ "import re\n",
+ "\n",
+ "# Typing utilities for data validation and schema definitions\n",
+ "from typing import TypedDict\n",
+ "\n",
+ "from IPython.display import Audio, Image\n",
+ "\n",
+ "# Libraries for text-to-speech generation and audio processing\n",
+ "from google.cloud import texttospeech\n",
+ "from langchain.schema.document import Document\n",
+ "\n",
+ "# Tools\n",
+ "from langchain_community.retrievers import (\n",
+ " ArxivRetriever,\n",
+ " PubMedRetriever,\n",
+ " WikipediaRetriever,\n",
+ ")\n",
+ "\n",
+ "# LangChain and LangGraph components for message handling and tool integration\n",
+ "from langchain_core.messages import HumanMessage, SystemMessage, ToolMessage\n",
+ "from langchain_core.tools import tool\n",
+ "\n",
+ "# LangChain integrations for Gemini API in Google AI Studio and Vertex AI\n",
+ "from langchain_google_vertexai import ChatVertexAI\n",
+ "from langgraph.checkpoint.memory import MemorySaver\n",
+ "from langgraph.graph import END, StateGraph\n",
+ "from langgraph.prebuilt import ToolNode\n",
+ "from pydub import AudioSegment\n",
+ "\n",
+ "# Set logging level to ERROR to filter warnings\n",
+ "logger = logging.getLogger()\n",
+ "logger.setLevel(logging.ERROR)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "e43229f3ad4f"
+ },
+ "source": [
+ "### Initialize agent memory and agent state\n",
+ "\n",
+ "Here, you initialize your [agent's memory](https://langchain-ai.github.io/langgraph/how-tos/memory/manage-conversation-history/) to store information during the workflow.\n",
+ "\n",
+ "You also define the schema for your [agent's state](https://langchain-ai.github.io/langgraph/how-tos/state-model/), which includes the podcast outline, search queries, and drafts."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "id": "cf93d5f0ce00"
+ },
+ "outputs": [],
+ "source": [
+ "# Initialize agent memory\n",
+ "memory = MemorySaver()\n",
+ "\n",
+ "\n",
+ "# Define the agent's state\n",
+ "class AgentState(TypedDict):\n",
+ " revision_number: int\n",
+ " max_revisions: int\n",
+ " search_count: int\n",
+ " max_searches: int\n",
+ " task: str\n",
+ " outline: str\n",
+ " queries: list\n",
+ " content: list\n",
+ " draft: str\n",
+ " critique: str\n",
+ " tool_calls: list"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "27b61a7e7ef6"
+ },
+ "source": [
+ "### Initialize Gemini model\n",
+ "\n",
+ "Initialize the Gemini model from Vertex AI, specifying the model version and temperature settings.\n",
+ "\n",
+ "This sets up the core language model that will power your agent's actions."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "id": "06877aae6673"
+ },
+ "outputs": [],
+ "source": [
+ "model = ChatVertexAI(model=\"gemini-1.5-pro\", temperature=0)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "d591fae74758"
+ },
+ "source": [
+ "### Define search tools\n",
+ "\n",
+ "This section defines custom tools that will be used by your AI podcast agent to gather information from various sources. These tools act as interfaces to external services and provide access to relevant data for the podcast topic.\n",
+ "\n",
+ "Each tool is implemented as a Python function decorated with the `@tool` decorator from LangChain. This decorator makes it easy to integrate these functions into LangGraph workflows.\n",
+ "\n",
+ "The following search tools are defined:\n",
+ "\n",
+ "- **search_arxiv:** Retrieves research papers from arXiv based on a keyword query.\n",
+ "- **search_pubmed:** Searches for information on PubMed, a database of biomedical literature.\n",
+ "- **search_wikipedia:** Fetches information from Wikipedia based on a keyword query.\n",
+ "\n",
+ "Your LangGraph application will use these tool nodes to call the corresponding search functions and obtain information from these external sources. This allows the AI agent to learn about the podcast topic before generating the script."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "id": "0d27ed8a91c1"
+ },
+ "outputs": [],
+ "source": [
+ "@tool\n",
+ "def search_arxiv(query: str) -> list[Document]:\n",
+ " \"\"\"Search for relevant publications on arXiv\"\"\"\n",
+ " retriever = ArxivRetriever(\n",
+ " load_max_docs=2,\n",
+ " get_full_documents=True,\n",
+ " )\n",
+ " docs = retriever.invoke(query)\n",
+ " if docs:\n",
+ " return docs\n",
+ " else:\n",
+ " return [\"No results found on arXiv\"]\n",
+ "\n",
+ "\n",
+ "@tool\n",
+ "def search_pubmed(query: str) -> list[Document]:\n",
+ " \"\"\"Search for information on PubMed\"\"\"\n",
+ " retriever = PubMedRetriever()\n",
+ " docs = retriever.invoke(query)\n",
+ " if docs:\n",
+ " return docs\n",
+ " else:\n",
+ " return [\"No results found on PubMed\"]\n",
+ "\n",
+ "\n",
+ "@tool\n",
+ "def search_wikipedia(query: str) -> list[Document]:\n",
+ " \"\"\"Search for information on Wikipedia\"\"\"\n",
+ " retriever = WikipediaRetriever()\n",
+ " docs = retriever.invoke(query)\n",
+ " if docs:\n",
+ " return docs\n",
+ " else:\n",
+ " return [\"No results found on Wikipedia\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "95fbea6aaa20"
+ },
+ "source": [
+ "### Define workflow stages along with corresponding prompts and functions\n",
+ "\n",
+ "This section defines the different stages of the AI podcast agent's workflow and the corresponding prompt templates and node functions that drive each stage.\n",
+ "\n",
+ "Each stage represents a specific task in the podcast creation process, such as generating an outline, conducting research, writing the script, and providing a critique.\n",
+ "\n",
+ "For each stage, you'll define:\n",
+ "\n",
+ "- **Prompt Template:** A carefully crafted text prompt that instructs the Gemini language model on what to do at that stage. The prompt provides context, instructions, and any necessary input data.\n",
+ "- **Node Function:** A Python function that encapsulates the logic for executing that stage. The function typically involves:\n",
+ " - Constructing the prompt with relevant information from the agent's state.\n",
+ " - Invoking the Gemini API with the prompt.\n",
+ " - Processing the model's response and updating the agent's state.\n",
+ "\n",
+ "These prompt templates and node functions are the building blocks of the LangGraph workflow that orchestrates the entire podcast creation process.\n",
+ "\n",
+ "#### Podcast outline node\n",
+ "\n",
+ "This node generates a structured outline for the podcast based on the user-provided topic:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "id": "4ce60bbc06e6"
+ },
+ "outputs": [],
+ "source": [
+ "OUTLINE_PROMPT = \"\"\"You are an expert writer tasked with writing a high level outline of an engaging 2-minute podcast.\n",
+ "Write such an outline for the user provided topic. Give an outline of the podcast along with any\n",
+ "relevant notes or instructions for the sections.\"\"\"\n",
+ "\n",
+ "\n",
+ "# Generate an outline for the podcast based on the user-provided topic\n",
+ "def podcast_outline_node(state: AgentState):\n",
+ " messages = [\n",
+ " SystemMessage(content=OUTLINE_PROMPT),\n",
+ " HumanMessage(content=state[\"task\"]),\n",
+ " ]\n",
+ " response = model.invoke(messages)\n",
+ " return {\"outline\": response.content}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5d7349c32d28"
+ },
+ "source": [
+ "#### Research plan node\n",
+ "\n",
+ "This node formulates a search query based on the podcast topic and previous queries:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {
+ "id": "87df19f53b95"
+ },
+ "outputs": [],
+ "source": [
+ "RESEARCH_PLAN_PROMPT = \"\"\"You are a researcher tasked with providing information that can\n",
+ "be used when writing the following podcast. Generate one search query consisting of a few\n",
+ "keywords that will be used to gather any relevant information. Do not output any information\n",
+ "other than the query consisting of a few words.\n",
+ "\n",
+ "These were the past queries, do not repeat keywords from past queries in your newly generated query:\n",
+ "---\n",
+ "{queries}\"\"\"\n",
+ "\n",
+ "\n",
+ "# Generates a search query based on the outline\n",
+ "def research_plan_node(state: AgentState):\n",
+ " messages = [\n",
+ " SystemMessage(content=RESEARCH_PLAN_PROMPT.format(queries=state[\"queries\"])),\n",
+ " HumanMessage(content=state[\"task\"]),\n",
+ " ]\n",
+ " response = model.invoke(messages)\n",
+ " queries = state[\"queries\"]\n",
+ " if queries:\n",
+ " queries.append(response.content)\n",
+ " else:\n",
+ " queries = [response.content]\n",
+ " return {\"queries\": queries}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1719cf80233f"
+ },
+ "source": [
+ "#### Research task node\n",
+ "\n",
+ "This node executes a search using the selected tool and query, retrieving relevant information for the podcast:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "id": "9bcbfe53b7d9"
+ },
+ "outputs": [],
+ "source": [
+ "RESEARCH_TASK_PROMPT = \"\"\"Use the available search tools and search queries to find information\n",
+ "relevant to the podcast. Try searching different sources to obtain different articles. Try using\n",
+ "different search tools than what was used previously so that you can obtain a broader range of\n",
+ "information.\n",
+ "\n",
+ "These are the previous tool calls, so you can choose a different tool:\n",
+ "---\n",
+ "{tool_calls}\n",
+ "---\n",
+ "These are the previous search results, so you can aim for different sources and content:\n",
+ "---\n",
+ "{content}\"\"\"\n",
+ "\n",
+ "\n",
+ "# Performs searches using tools\n",
+ "def research_agent_node(state: AgentState):\n",
+ " tool_calls = state[\"tool_calls\"]\n",
+ " content = state[\"content\"]\n",
+ " queries = state[\"queries\"]\n",
+ " query = queries[-1]\n",
+ " messages = [\n",
+ " SystemMessage(\n",
+ " content=RESEARCH_TASK_PROMPT.format(tool_calls=tool_calls, content=content)\n",
+ " ),\n",
+ " HumanMessage(content=query),\n",
+ " ]\n",
+ "\n",
+ " # Perform function calls\n",
+ " tools = [search_arxiv, search_pubmed, search_wikipedia]\n",
+ " model_with_tools = model.bind_tools(tools)\n",
+ " response_tool_calls = model_with_tools.invoke(messages)\n",
+ " if tool_calls:\n",
+ " tool_calls.append(response_tool_calls)\n",
+ " else:\n",
+ " tool_calls = [response_tool_calls]\n",
+ "\n",
+ " # Defines a tool node based on search functions\n",
+ " tool_node = ToolNode(tools)\n",
+ " response = tool_node.invoke({\"messages\": [response_tool_calls]})\n",
+ "\n",
+ " # Add the search results to the content list in the agent state\n",
+ " for message in response.get(\"messages\", []):\n",
+ " if isinstance(message, ToolMessage):\n",
+ " content.insert(0, message.content)\n",
+ "\n",
+ " return {\n",
+ " \"content\": content,\n",
+ " \"tool_calls\": tool_calls,\n",
+ " \"search_count\": state[\"search_count\"] + 1,\n",
+ " }\n",
+ "\n",
+ "\n",
+ "# Determine whether to continue research based on the number of searches performed\n",
+ "def should_continue_tools(state: AgentState):\n",
+ " if state[\"search_count\"] > state[\"max_searches\"]:\n",
+ " return \"generate_script\"\n",
+ " else:\n",
+ " return \"research_plan\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9eaa6d0bff8d"
+ },
+ "source": [
+ "#### Podcast writing node\n",
+ "\n",
+ "This node generates a draft podcast script using the outline and research results, aiming for an engaging and informative style:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "id": "1742523735e8"
+ },
+ "outputs": [],
+ "source": [
+ "WRITER_PROMPT = \"\"\"\n",
+ "You are a writing assistant tasked with writing engaging 2-minute podcast scripts.\n",
+ "\n",
+ "- Generate the best podcast script possible for the user's request and the initial outline.\n",
+ "- The script MUST strictly alternate lines between the two hosts, separating each host's line with a newline.\n",
+ "- Add an intro phrase and outro phrase to start and end the podcast, and use a fun, random name for the podcast show.\n",
+ "- Given a critique, respond with a revised version of your previous script.\n",
+ "- Include lively back-and-forth chatter, reflections, and expressions of amazement between the hosts.\n",
+ "- Cite at least THREE pieces of research throughout the script, choosing the most relevant research for each point.\n",
+ "- DO NOT include ANY of the following:\n",
+ " - Speaker labels (e.g., \"Host 1:\", \"Host 2:\")\n",
+ " - Sound effect descriptions (e.g., \"[Sound of waves]\")\n",
+ " - Formatting instructions (e.g., \"(Emphasis)\", \"[Music fades in]\")\n",
+ " - Any other non-dialogue text.\n",
+ "- Use this format for citations, including the month and year if available:\n",
+ " \"In [Month, Year], [Organization] found that...\"\n",
+ " \"Research from [Organization] in [Month, Year] showed that...\"\n",
+ " \"Back in [Month, Year], a study by [Organization] suggested that...\"\n",
+ "---\n",
+ "Utilize all of the following search results and context as needed:\n",
+ "{content}\n",
+ "---\n",
+ "If this is a revision, the critique will be provided below:\n",
+ "{critique}\"\"\"\n",
+ "\n",
+ "\n",
+ "# Generates a draft of the script based on the content and outline\n",
+ "def generate_script_node(state: AgentState):\n",
+ " messages = [\n",
+ " SystemMessage(\n",
+ " content=WRITER_PROMPT.format(\n",
+ " content=state[\"content\"], critique=state.get(\"critique\", \"\")\n",
+ " )\n",
+ " ),\n",
+ " HumanMessage(\n",
+ " content=f\"{state['task']}\\n\\nHere is my outline:\\n\\n{state['outline']}\"\n",
+ " ),\n",
+ " ]\n",
+ " response = model.invoke(messages)\n",
+ " return {\n",
+ " \"draft\": response.content,\n",
+ " \"search_count\": 0, # Reset the search count for the next revision\n",
+ " \"revision_number\": state.get(\"revision_number\", 1) + 1,\n",
+ " }"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "d00163e72e16"
+ },
+ "source": [
+ "#### Podcast critique node\n",
+ "\n",
+ "This node provides feedback and suggestions for improvement on the generated podcast script:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "id": "de70a68caa8d"
+ },
+ "outputs": [],
+ "source": [
+ "CRITIQUE_PROMPT = \"\"\"You are a producer grading a podcast script.\n",
+ "Generate critique and recommendations for the user's submission.\n",
+ "Provide detailed recommendations, including requests for conciceness, depth, style, etc.\"\"\"\n",
+ "\n",
+ "\n",
+ "# Generates a critique with feedback on the draft podcast script\n",
+ "def perform_critique_node(state: AgentState):\n",
+ " messages = [\n",
+ " SystemMessage(content=CRITIQUE_PROMPT),\n",
+ " HumanMessage(content=state[\"draft\"]),\n",
+ " ]\n",
+ " response = model.invoke(messages)\n",
+ " return {\"critique\": response.content}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "46aa739f9c1e"
+ },
+ "source": [
+ "#### Research critique node\n",
+ "\n",
+ "This node generates a new search query based on the critique of the script, aiming to address weaknesses and find additional information:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "id": "48682bcbb177"
+ },
+ "outputs": [],
+ "source": [
+ "RESEARCH_CRITIQUE_PROMPT = \"\"\"You are a writing assistant tasked with providing information that can\n",
+ "be used when making any requested revisions (as outlined below).\n",
+ "Generate one search query consisting of a few keywords that will be used to gather any relevant\n",
+ "information. Do not output any information other than the query consisting of a few words.\n",
+ "\n",
+ "---\n",
+ "\n",
+ "These were the past queries, so you can vary the query that you generate:\n",
+ "\n",
+ "{queries}\n",
+ "\"\"\"\n",
+ "\n",
+ "\n",
+ "# Generates a new search query based on the critique\n",
+ "def research_critique_node(state: AgentState):\n",
+ " messages = [\n",
+ " SystemMessage(\n",
+ " content=RESEARCH_CRITIQUE_PROMPT.format(queries=state[\"queries\"])\n",
+ " ),\n",
+ " HumanMessage(content=state[\"critique\"]),\n",
+ " ]\n",
+ " response = model.invoke(messages)\n",
+ " queries = state.get(\"queries\", [])\n",
+ " if queries:\n",
+ " queries.append(response.content)\n",
+ " else:\n",
+ " queries = [response.content]\n",
+ " return {\"queries\": queries}\n",
+ "\n",
+ "\n",
+ "# Decide whether to continue to the next revision or end the process\n",
+ "def should_continue(state: AgentState):\n",
+ " if state[\"revision_number\"] > state[\"max_revisions\"]:\n",
+ " return END\n",
+ " return \"perform_critique\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "aae38c0085e3"
+ },
+ "source": [
+ "## Define and compile the LangGraph workflow\n",
+ "\n",
+ "This section defines the structure and flow of the AI podcast agent using LangGraph.\n",
+ "\n",
+ "The workflow is constructed as a graph with nodes representing each stage in the process (e.g., outlining, research, script generation) and edges defining the transitions between these stages.\n",
+ "\n",
+ "The workflow includes two main loops:\n",
+ "\n",
+ "- **Research Loop:** This loop iteratively plans and executes searches until a specified number of searches are completed.\n",
+ "- **Critique and Revision Loop:** This loop handles the script critique, additional research based on the critique, and script revision, repeating for a set number of revisions.\n",
+ "\n",
+ "The `workflow.compile()` function transforms this graph definition into an executable workflow, incorporating memory management to maintain the agent's state throughout the process."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "id": "f7d04cda5f36"
+ },
+ "outputs": [],
+ "source": [
+ "# Initialize the state graph\n",
+ "workflow = StateGraph(AgentState)\n",
+ "\n",
+ "# Define the nodes of the workflow, representing each stage of the process\n",
+ "workflow.add_node(\"podcast_outline\", podcast_outline_node)\n",
+ "workflow.add_node(\"research_plan\", research_plan_node)\n",
+ "workflow.add_node(\"research_agent\", research_agent_node)\n",
+ "workflow.add_node(\"generate_script\", generate_script_node)\n",
+ "workflow.add_node(\"perform_critique\", perform_critique_node)\n",
+ "workflow.add_node(\"research_critique\", research_critique_node)\n",
+ "\n",
+ "# Specify the starting node of the workflow\n",
+ "workflow.set_entry_point(\"podcast_outline\")\n",
+ "\n",
+ "# Define the flow between node and stages\n",
+ "workflow.add_edge(\"podcast_outline\", \"research_plan\")\n",
+ "workflow.add_edge(\"research_plan\", \"research_agent\")\n",
+ "workflow.add_edge(\"perform_critique\", \"research_critique\")\n",
+ "workflow.add_edge(\"research_critique\", \"research_agent\")\n",
+ "\n",
+ "# Define conditional edges for the research loop\n",
+ "workflow.add_conditional_edges(\n",
+ " \"research_agent\",\n",
+ " should_continue_tools,\n",
+ " {\"generate_script\": \"generate_script\", \"research_plan\": \"research_plan\"},\n",
+ ")\n",
+ "\n",
+ "# Define conditional edges for the critique and revision loop\n",
+ "workflow.add_conditional_edges(\n",
+ " \"generate_script\",\n",
+ " should_continue,\n",
+ " {END: END, \"perform_critique\": \"perform_critique\"},\n",
+ ")\n",
+ "\n",
+ "# Compile the workflow graph, enabling memory to track agent state\n",
+ "graph = workflow.compile(checkpointer=memory)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "18154fa6d8b4"
+ },
+ "source": [
+ "### Visualize the workflow\n",
+ "\n",
+ "This cell visualizes the compiled LangGraph workflow as a [Mermaid diagram](https://mermaid.js.org/).\n",
+ "\n",
+ "The diagram provides a clear and intuitive representation of the workflow's structure, showing the nodes, edges, and the flow of execution.\n",
+ "\n",
+ "This visualization helps to understand the overall process and the interactions between different stages of the AI podcast agent."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "id": "f97fe13cc0cf"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/jpeg": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Display a Mermaid diagram of the workflow\n",
+ "Image(graph.get_graph().draw_mermaid_png())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3076e0cd0c4e"
+ },
+ "source": [
+ "### Define the podcast topic\n",
+ "\n",
+ "This cell defines the topic of the podcast that the AI agent will create.\n",
+ "\n",
+ "The topic is assigned to the variable `PODCAST_TOPIC`. Feel free to modify this variable to explore different podcast topics! "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {
+ "id": "6627153c6715"
+ },
+ "outputs": [],
+ "source": [
+ "PODCAST_TOPIC = \"Explore the use of bio-inspired fluid dynamics in the design of underwater robots and vehicles\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a1df657ce053"
+ },
+ "source": [
+ "### Run the AI podcast agent\n",
+ "\n",
+ "This cell executes the compiled LangGraph workflow, running the AI podcast agent to generate the podcast script. \n",
+ "\n",
+ "The code performs these actions:\n",
+ "\n",
+ "- **Clean agent helper function:** This function prepares the agent's output for printing by removing unnecessary characters and formatting\n",
+ "- **Thread Configuration:** A thread configuration is defined to ensure a unique history for this workflow execution\n",
+ "- **Workflow Execution:** The `graph.stream()` method runs the workflow, iterating through each stage and updating the agent's state\n",
+ "- **Output Display:** The code prints the results of each stage, including the agent's actions and generated output"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "id": "338377bc8c25"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Agent Node: podcast_outline\n",
+ "\n",
+ "Agent Result:\n",
+ "{'outline': '## Diving Deep: How Nature Inspires Underwater Robot Design **Podcast Outline:****I. Introduction (0:00 - 0:20):*** **Hook:** Start with an engaging sound effect (gentle waves, whale song) and a captivating question: \"What if we could design underwater robots as efficient and agile as a school of fish?\"* **Introduce topic:** Briefly explain bio-inspired design and its application in underwater robotics.* **Highlight the episode's focus:** Exploring specific examples of how nature inspires fluid dynamics in underwater robots.**II. Example 1: The Grace of Jellyfish (0:20 - 0:45):*** **Introduce the jellyfish:** Describe their unique propulsion mechanism (contracting bell).* **Connect to robotics:** Explain how scientists are mimicking this mechanism to create more efficient underwater robots.* **Specific example:** Mention a real-world robot inspired by jellyfish propulsion (e.g., the \"RoboJelly\").* **Benefits:** Highlight the advantages of this design (e.g\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_plan\n",
+ "\n",
+ "Agent Result:\n",
+ "{'queries': ['biomimetics underwater propulsion optimization ']}\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_agent\n",
+ "\n",
+ "Agent Result:\n",
+ "{'search_count': 1, 'content': ['[Document(metadata={'Published': '2024-09-15', 'Title': 'A Novel Aerial-Aquatic Locomotion Robot with Variable Stiffness Propulsion Module', 'Authors': 'Junzhe Hu, Pengyu Chen, Tianxiang Feng, Yuxuan Wen, Ke Wu, Janet Dong', 'Summary': \"In recent years, the development of robots capable of operating in both\n",
+ "aerial and aquatic environments has gained significant attention. This study\n",
+ "presents the design and fabrication of a novel aerial-aquatic locomotion robot\n",
+ "(AALR). Inspired by the diving beetle, the AALR incorporates a biomimetic\n",
+ "propulsion mechanism with power and recovery strokes. The variable stiffness\n",
+ "propulsion module (VSPM) uses low melting point alloy (LMPA) and variable\n",
+ "stiffness joints (VSJ) to achieve efficient aquatic locomotion while reduce\n",
+ "harm to marine life. The AALR's innovative design integrates the VSPM into the\n",
+ "arms of a traditional quadrotor, allowing for effective aerial-aquatic\n",
+ "locomotion. The VSPM adjusts joint stiffness throug\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_plan\n",
+ "\n",
+ "Agent Result:\n",
+ "{'queries': ['biomimetics underwater propulsion optimization ', 'bio-inspired underwater robotics design ']}\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_agent\n",
+ "\n",
+ "Agent Result:\n",
+ "{'search_count': 2, 'content': ['[Document(metadata={'title': 'Bio-inspired robotics', 'summary': 'Bio-inspired robotic locomotion is a fairly new subcategory of bio-inspired design. It is about learning concepts from nature and applying them to the design of real-world engineered systems. More specifically, this field is about making robots that are inspired by biological systems, including Biomimicry. Biomimicry is copying from nature while bio-inspired design is learning from nature and making a mechanism that is simpler and more effective than the system observed in nature. Biomimicry has led to the development of a different branch of robotics called soft robotics. The biological systems have been optimized for specific tasks according to their habitat. However, they are multifunctional and are not designed for only one specific functionality. Bio-inspired robotics is about studying biological systems, and looking for the mechanisms that may solve a problem in the engineering fiel\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_plan\n",
+ "\n",
+ "Agent Result:\n",
+ "{'queries': ['biomimetics underwater propulsion optimization ', 'bio-inspired underwater robotics design ', 'aquatic animal locomotion drag reduction ']}\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_agent\n",
+ "\n",
+ "Agent Result:\n",
+ "{'search_count': 3, 'content': ['[Document(metadata={'uid': '31466767', 'Title': \"Aquatic animal colors and skin temperature: Biology's selection for reducing oceanic dolphin's skin friction drag.\", 'Published': '2019-07-15', 'Copyright Information': 'Published by Elsevier Ltd.'}, page_content=\"There is currently a growing interest in the area of drag reduction. In this work, the thermal effects of body color of some species of aquatics like Orcas and Dusky dolphins are investigated with respect to their swimming routes and geometric and behavioral characteristics. Considering the marine and atmospheric characteristics of these aquatics' routes, a thermal analysis is performed. The surrounding fluxes including the water flux, sun irradiation, and core temperature are considered in an energy balance to determine the skin temperature of the top side of the animal/organism's body. To study the effects of color on the surface temperature of the aquatic species, an experiment is carried out\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_plan\n",
+ "\n",
+ "Agent Result:\n",
+ "{'queries': ['biomimetics underwater propulsion optimization ', 'bio-inspired underwater robotics design ', 'aquatic animal locomotion drag reduction ', 'bio-inspired underwater robotics fluid dynamics design ']}\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_agent\n",
+ "\n",
+ "Agent Result:\n",
+ "{'search_count': 4, 'content': ['[Document(metadata={'Published': '2024-01-06', 'Title': 'Estimating the Lateral Motion States of an Underwater Robot by Propeller Wake Sensing Using an Artificial Lateral Line', 'Authors': 'Jun Wang, Dexin Zhao, Youxi Zhao, Feitian Zhang, Tongsheng Shen', 'Summary': 'An artificial lateral line (ALL) is a bioinspired flow sensing system of an\n",
+ "underwater robot that consists of distributed flow sensors. The ALL has\n",
+ "achieved great success in sensing the motion states of bioinspired underwater\n",
+ "robots, e.g., robotic fish, that are driven by body undulation and/or tail\n",
+ "flapping. However, the ALL has not been systematically tested and studied in\n",
+ "the sensing of underwater robots driven by rotating propellers due to the\n",
+ "highly dynamic and complex flow field therein. This paper makes a bold\n",
+ "hypothesis that the distributed flow measurements sampled from the propeller\n",
+ "wake flow, although infeasible to represent the entire flow dynamics, provides\n",
+ "sufficient informati\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: generate_script\n",
+ "\n",
+ "Agent Result:\n",
+ "{'revision_number': 2, 'search_count': 0, 'draft': 'Welcome to **AquaTech Insights!**What if we could design underwater robots as efficient and agile as a school of fish?That's the incredible promise of bio-inspired fluid dynamics! We're diving deep into how nature inspires underwater robot design.Let's start with the jellyfish, those masters of effortless movement.Their secret? A mesmerizing dance of contracting bells.Researchers at Virginia Tech are mimicking this to create \"RoboJelly\" robots.These robots can navigate tight spaces with incredible efficiency.Imagine the possibilities for underwater exploration and research!Now, picture a school of fish gliding through the water.It's a symphony of synchronized movement, minimizing drag and maximizing efficiency.In 2024, a team from Peking University found that fish in formation conserve energy by riding each other's wake.Inspired by this, scientists are developing AUVs that can work together in similar formations.This could revolutioni\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: perform_critique\n",
+ "\n",
+ "Agent Result:\n",
+ "{'critique': '## AquaTech Insights: Script Critique and Recommendations**Overall Impression:**This script is a solid start with a clear focus on bio-inspired underwater robotics. You effectively introduce the concept and provide interesting examples. However, it lacks depth and engagement in several areas. **Here's a detailed breakdown:****Strengths:*** **Clear Topic:** The script immediately establishes the focus on bio-inspired underwater robotics.* **Intriguing Examples:** The examples of jellyfish, fish schools, and shark skin are relevant and captivating.* **Enthusiastic Tone:** The script maintains an enthusiastic and engaging tone throughout.**Areas for Improvement:*** **Conciseness:** The script could be more concise and impactful. Avoid repetitive phrases like \"Imagine the possibilities\" and focus on delivering information more directly.* **Depth and Detail:** While the examples are interesting, they lack depth. * **Expand on the science:** Briefly explain *how* the bell \n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_critique\n",
+ "\n",
+ "Agent Result:\n",
+ "{'queries': ['biomimetics underwater propulsion optimization ', 'bio-inspired underwater robotics design ', 'aquatic animal locomotion drag reduction ', 'bio-inspired underwater robotics fluid dynamics design ', 'bio-inspired robotics storytelling engagement ']}\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_agent\n",
+ "\n",
+ "Agent Result:\n",
+ "{'search_count': 1, 'content': ['[Document(metadata={'title': 'Cynthia Breazeal', 'summary': 'Cynthia Breazeal is an American robotics scientist and entrepreneur. She is a former chief scientist and chief experience officer of Jibo, a company she co-founded in 2012 that developed personal assistant robots. Currently, she is a professor of media arts and sciences at the Massachusetts Institute of Technology and the director of the Personal Robots group at the MIT Media Lab. Her most recent work has focused on the theme of living everyday life in the presence of AI, and gradually gaining insight into the long-term impacts of social robots. \n",
+ "\n",
+ "', 'source': 'https://en.wikipedia.org/wiki/Cynthia_Breazeal'}, page_content='Cynthia Breazeal is an American robotics scientist and entrepreneur. She is a former chief scientist and chief experience officer of Jibo, a company she co-founded in 2012 that developed personal assistant robots. Currently, she is a professor of media arts and sciences at \n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_plan\n",
+ "\n",
+ "Agent Result:\n",
+ "{'queries': ['biomimetics underwater propulsion optimization ', 'bio-inspired underwater robotics design ', 'aquatic animal locomotion drag reduction ', 'bio-inspired underwater robotics fluid dynamics design ', 'bio-inspired robotics storytelling engagement ', 'biomimetic underwater robots fluid dynamics applications ']}\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Too Many Requests, waiting for 0.20 seconds...\n",
+ "Agent Node: research_agent\n",
+ "\n",
+ "Agent Result:\n",
+ "{'search_count': 2, 'content': ['[Document(metadata={'uid': '39078729', 'Title': 'Jellyfish-Inspired Soft Robot Driven by Pneumatic Bistable Actuators.', 'Published': '2024-07-30', 'Copyright Information': ''}, page_content=\"Soft actuators offer numerous potential applications; however, challenges persist in achieving a high driving force and fast response speed. In this work, we present the design, fabrication, and analysis of a soft pneumatic bistable actuator (PBA) mimicking jellyfish subumbrellar muscle motion for waterjet propulsion. Drawing inspiration from the jellyfish jet propulsion and the characteristics of bistable structure, we develop an elastic band stretch prebending PBA with a simple structure, low inflation cost, exceptional driving performance, and stable driving force output. Through a bionic analysis of jellyfish body structure and motion, we integrate the PBA into a jellyfish-like prototype, enabling it to achieve jet propulsion. To enhance the swimming performanc\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_plan\n",
+ "\n",
+ "Agent Result:\n",
+ "{'queries': ['biomimetics underwater propulsion optimization ', 'bio-inspired underwater robotics design ', 'aquatic animal locomotion drag reduction ', 'bio-inspired underwater robotics fluid dynamics design ', 'bio-inspired robotics storytelling engagement ', 'biomimetic underwater robots fluid dynamics applications ', 'biomimetic underwater vehicles design challenges ']}\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_agent\n",
+ "\n",
+ "Agent Result:\n",
+ "{'search_count': 3, 'content': [\"[Document(metadata={'Published': '2024-10-03', 'Title': 'Coastal Underwater Evidence Search System with Surface-Underwater Collaboration', 'Authors': 'Hin Wang Lin, Pengyu Wang, Zhaohua Yang, Ka Chun Leung, Fangming Bao, Ka Yu Kui, Jian Xiang Erik Xu, Ling Shi', 'Summary': 'The Coastal underwater evidence search system with surface-underwater\n",
+ "collaboration is designed to revolutionize the search for artificial objects in\n",
+ "coastal underwater environments, overcoming limitations associated with\n",
+ "traditional methods such as divers and tethered remotely operated vehicles. Our\n",
+ "innovative multi-robot collaborative system consists of three parts, an\n",
+ "autonomous surface vehicle as a mission control center, a towed underwater\n",
+ "vehicle for wide-area search, and a biomimetic underwater robot inspired by\n",
+ "marine organisms for detailed inspections of identified areas. We conduct\n",
+ "extensive simulations and real-world experiments in pond environments and\n",
+ "coastal fields to d\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_plan\n",
+ "\n",
+ "Agent Result:\n",
+ "{'queries': ['biomimetics underwater propulsion optimization ', 'bio-inspired underwater robotics design ', 'aquatic animal locomotion drag reduction ', 'bio-inspired underwater robotics fluid dynamics design ', 'bio-inspired robotics storytelling engagement ', 'biomimetic underwater robots fluid dynamics applications ', 'biomimetic underwater vehicles design challenges ', 'biomimetic underwater robots maneuverability efficiency ']}\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: research_agent\n",
+ "\n",
+ "Agent Result:\n",
+ "{'search_count': 4, 'content': ['[Document(metadata={'uid': '39078729', 'Title': 'Jellyfish-Inspired Soft Robot Driven by Pneumatic Bistable Actuators.', 'Published': '2024-07-30', 'Copyright Information': ''}, page_content=\"Soft actuators offer numerous potential applications; however, challenges persist in achieving a high driving force and fast response speed. In this work, we present the design, fabrication, and analysis of a soft pneumatic bistable actuator (PBA) mimicking jellyfish subumbrellar muscle motion for waterjet propulsion. Drawing inspiration from the jellyfish jet propulsion and the characteristics of bistable structure, we develop an elastic band stretch prebending PBA with a simple structure, low inflation cost, exceptional driving performance, and stable driving force output. Through a bionic analysis of jellyfish body structure and motion, we integrate the PBA into a jellyfish-like prototype, enabling it to achieve jet propulsion. To enhance the swimming performanc\n",
+ "\n",
+ "====================\n",
+ "\n",
+ "Agent Node: generate_script\n",
+ "\n",
+ "Agent Result:\n",
+ "{'revision_number': 3, 'search_count': 0, 'draft': 'Welcome to Water World Wonders! What if we told you the ocean's depths hold the secrets to building better robots? That's right, we're diving deep into the world of bio-inspired design, where engineers are turning to nature to create the next generation of underwater robots.Take the jellyfish, for example.These graceful creatures move with such ease, pulsating their bell-shaped bodies to glide through the water.In 2012, researchers at Virginia Tech drew inspiration from this to create \"RoboJelly,\" a robot that mimics the jellyfish's efficient propulsion.This design allows for incredible maneuverability in tight spaces, perfect for exploring delicate coral reefs or even shipwrecks.But nature's brilliance doesn't stop there!Have you ever wondered how fish swim in schools so effortlessly?It turns out, by swimming in formation, they create a slipstream that reduces drag and saves energy.In fact, a study by the University of Groningen in 2\n",
+ "\n",
+ "====================\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Function to clean and format agent output for display\n",
+ "\n",
+ "\n",
+ "def clean_agent_result(data):\n",
+ " agent_result = str(data)\n",
+ " agent_result = re.sub(\n",
+ " r\"[^\\x00-\\x7F]+\", \" \", agent_result\n",
+ " ) # Remove non-ASCII characters\n",
+ " agent_result = re.sub(r\"\\\\\\\\n\", \"\\n\", agent_result) # Replace escaped newlines\n",
+ " agent_result = re.sub(r\"\\\\n\", \"\", agent_result) # Replace newlines\n",
+ " agent_result = re.sub(r\"\\\\'\", \"'\", agent_result) # Replace escaped single quotes\n",
+ " return agent_result\n",
+ "\n",
+ "\n",
+ "# Thread ID for unique history in workflow execution\n",
+ "thread = {\"configurable\": {\"thread_id\": \"1\"}}\n",
+ "\n",
+ "# Run the LangGraph workflow, passing the initial state and thread configuration\n",
+ "for state in graph.stream(\n",
+ " {\n",
+ " \"task\": PODCAST_TOPIC,\n",
+ " \"revision_number\": 1, # Current revision number\n",
+ " \"max_revisions\": 2, # Maximum number of revisions allowed\n",
+ " \"search_count\": 0, # Current search number\n",
+ " \"max_searches\": 3, # Maximum number of searches allowed per revision\n",
+ " \"content\": [],\n",
+ " \"queries\": [],\n",
+ " \"tool_calls\": [],\n",
+ " },\n",
+ " thread,\n",
+ "):\n",
+ " # Print a snippet of the results of each workflow stage\n",
+ " for k, v in state.items():\n",
+ " print(f\"Agent Node: {k}\\n\")\n",
+ " print(\"Agent Result:\")\n",
+ " print(clean_agent_result(v)[:1000])\n",
+ " print(\"\\n====================\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cd8d951762fb"
+ },
+ "source": [
+ "### Parse and display the final podcast script\n",
+ "\n",
+ "This section extracts and prepares the final podcast script generated by the AI agent.\n",
+ "\n",
+ "It displays the script for review, where each string in the list will be narrated by a different text-to-speech voice."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "id": "599397cab03c"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['Welcome to Water World Wonders!',\n",
+ " \"What if we told you the ocean's depths hold the secrets to building better robots?\",\n",
+ " \"That's right, we're diving deep into the world of bio-inspired design, where engineers are turning to nature to create the next generation of underwater robots.\",\n",
+ " 'Take the jellyfish, for example.',\n",
+ " 'These graceful creatures move with such ease, pulsating their bell-shaped bodies to glide through the water.',\n",
+ " 'In 2012, researchers at Virginia Tech drew inspiration from this to create \"RoboJelly,\" a robot that mimics the jellyfish\\'s efficient propulsion.',\n",
+ " 'This design allows for incredible maneuverability in tight spaces, perfect for exploring delicate coral reefs or even shipwrecks.',\n",
+ " \"But nature's brilliance doesn't stop there!\",\n",
+ " 'Have you ever wondered how fish swim in schools so effortlessly?',\n",
+ " 'It turns out, by swimming in formation, they create a slipstream that reduces drag and saves energy.',\n",
+ " 'In fact, a study by the University of Groningen in 2010 found that fish in schools can reduce their energy expenditure by up to 20%!',\n",
+ " 'Researchers are now applying this principle to develop autonomous underwater vehicles, or AUVs, that can work together in similar formations.',\n",
+ " 'Imagine a fleet of these AUVs mapping the ocean floor, searching for missing objects, or monitoring underwater ecosystems with incredible efficiency.',\n",
+ " \"And that's just the tip of the iceberg!\",\n",
+ " 'Scientists are also exploring the use of shark skin, which has tiny scales that reduce drag, to design faster and more fuel-efficient underwater vehicles.',\n",
+ " 'The possibilities are truly endless!',\n",
+ " 'From the bioluminescence of deep-sea creatures to the powerful tentacles of an octopus, nature offers a treasure trove of inspiration for engineers.',\n",
+ " \"So, we've explored how the elegance of jellyfish and the teamwork of fish are shaping the future of underwater robotics.\",\n",
+ " 'The field of bio-inspired design is brimming with potential, promising to revolutionize how we explore, understand, and interact with the ocean depths.',\n",
+ " 'Be sure to check out the show notes for more information on this fascinating topic.',\n",
+ " 'Until next time, keep exploring Water World Wonders!']"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "podcast_script = state[\"generate_script\"][\"draft\"]\n",
+ "parsed_script = [\n",
+ " text for text in (line.strip() for line in podcast_script.splitlines()) if text\n",
+ "]\n",
+ "parsed_script"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "e075edfe820d"
+ },
+ "source": [
+ "### Generate audio for the podcast\n",
+ "\n",
+ "This cell generates audio for each line of the parsed podcast script using Google Cloud's Text-to-Speech API.\n",
+ "\n",
+ "It creates separate audio files for each line, alternating between two different voices to simulate a conversation between two podcast hosts.\n",
+ "\n",
+ "The code:\n",
+ "\n",
+ "1. **Initializes the Text-to-Speech Client:** Sets up the connection to the API.\n",
+ "2. **Defines Audio Configuration:** Specifies the desired output audio format (MP3).\n",
+ "3. **Iterates through Script Lines:** Generates audio for each line, alternating voices.\n",
+ "4. **Saves Audio Files:** Writes the generated audio to separate MP3 files."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {
+ "id": "c74e3badfd35"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Audio content written to file part-0.mp3\n",
+ "Audio content written to file part-1.mp3\n",
+ "Audio content written to file part-2.mp3\n",
+ "Audio content written to file part-3.mp3\n",
+ "Audio content written to file part-4.mp3\n",
+ "Audio content written to file part-5.mp3\n",
+ "Audio content written to file part-6.mp3\n",
+ "Audio content written to file part-7.mp3\n",
+ "Audio content written to file part-8.mp3\n",
+ "Audio content written to file part-9.mp3\n",
+ "Audio content written to file part-10.mp3\n",
+ "Audio content written to file part-11.mp3\n",
+ "Audio content written to file part-12.mp3\n",
+ "Audio content written to file part-13.mp3\n",
+ "Audio content written to file part-14.mp3\n",
+ "Audio content written to file part-15.mp3\n",
+ "Audio content written to file part-16.mp3\n",
+ "Audio content written to file part-17.mp3\n",
+ "Audio content written to file part-18.mp3\n",
+ "Audio content written to file part-19.mp3\n",
+ "Audio content written to file part-20.mp3\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Instantiates a client\n",
+ "client = texttospeech.TextToSpeechClient()\n",
+ "\n",
+ "# Select the type of audio file you want returned\n",
+ "audio_config = texttospeech.AudioConfig(audio_encoding=texttospeech.AudioEncoding.MP3)\n",
+ "\n",
+ "audio_files = []\n",
+ "for count, line in enumerate(parsed_script):\n",
+ " # Set the text input to be synthesized\n",
+ " synthesis_input = texttospeech.SynthesisInput(text=line)\n",
+ "\n",
+ " # Choose the voice for the current line, alternating between hosts\n",
+ " if count % 2 == 0:\n",
+ " voice_name = \"en-US-Journey-O\"\n",
+ " elif count % 2 == 1:\n",
+ " voice_name = \"en-US-Journey-D\"\n",
+ "\n",
+ " # Configure voice parameters: language and voice name\n",
+ " voice = texttospeech.VoiceSelectionParams(\n",
+ " language_code=\"en-US\",\n",
+ " name=voice_name,\n",
+ " )\n",
+ "\n",
+ " # Generate audio using the Text-to-Speech API\n",
+ " response = client.synthesize_speech(\n",
+ " input=synthesis_input, voice=voice, audio_config=audio_config\n",
+ " )\n",
+ "\n",
+ " # Save the generated audio to an MP3 file\n",
+ " filename = f\"part-{str(count)}.mp3\"\n",
+ " audio_files.append(filename)\n",
+ " with open(filename, \"wb\") as out:\n",
+ " out.write(response.audio_content)\n",
+ " print(f\"Audio content written to file {filename}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ea674bbf2628"
+ },
+ "source": [
+ "### Combine audio files and generate final podcast\n",
+ "\n",
+ "This cell combines the individual audio files generated in the previous step into a single podcast file.\n",
+ "\n",
+ "It also adds brief silences between each line for better listening experience.\n",
+ "\n",
+ "The final podcast is saved as `gemini-podcast.mp3`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {
+ "id": "9a4e93adc415"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Podcast content written to file gemini-podcast.mp3\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Initialize audio segment\n",
+ "full_audio = AudioSegment.silent(duration=200)\n",
+ "\n",
+ "# Concatenate audio segments with silence in between\n",
+ "for file in audio_files:\n",
+ " sound = AudioSegment.from_mp3(file)\n",
+ " silence = AudioSegment.silent(duration=200)\n",
+ " full_audio += sound + silence\n",
+ " os.remove(file) # Remove the individual part files after combining\n",
+ "\n",
+ "# Save the final audio output to a file\n",
+ "podcast_filename = \"gemini-podcast.mp3\"\n",
+ "full_audio.export(podcast_filename)\n",
+ "print(f\"Podcast content written to file {podcast_filename}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2be9b47ae8a3"
+ },
+ "source": [
+ "### Listen to your AI-generated podcast!\n",
+ "\n",
+ "This cell plays the final podcast generated by the AI agent.\n",
+ "\n",
+ "The `Audio` object from `IPython.display` is used to embed the audio player directly into the notebook. The podcast will start playing automatically.\n",
+ "\n",
+ "Enjoy your AI-created podcast!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {
+ "id": "2f50dcfe1651"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "Audio(filename=podcast_filename, rate=32000, autoplay=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "df34f0351197"
+ },
+ "source": [
+ "## Conclusion: Building AI-powered podcast agents\n",
+ "\n",
+ "This notebook showcases the exciting potential of using AI to automate the podcast creation process. By combining the power of the Gemini API with the flexibility of LangGraph, you built an intelligent agent capable of:\n",
+ "\n",
+ "- **Generating Podcast Outlines:** Structuring the flow and content of the podcast.\n",
+ "- **Conducting Research:** Gathering information from various sources like arXiv, PubMed, and Wikipedia.\n",
+ "- **Writing Engaging Scripts:** Crafting podcast scripts with natural-sounding dialogue, citations, and a conversational style.\n",
+ "- **Critiquing and Revising:** Providing feedback on the script and iteratively refining it.\n",
+ "- **Generating Audio:** Using text-to-speech technology to create the final podcast audio.\n",
+ "\n",
+ "This is just a starting point! You can customize this workflow further by:\n",
+ "\n",
+ "- **Adding New Research Tools:** Integrate additional sources of information relevant to your podcast topics.\n",
+ "- **Experimenting with Prompts:** Refine the prompts to guide the AI agent towards your desired style and content.\n",
+ "- **Exploring Different Voices:** Use a wider range of voices for the podcast hosts to create unique and engaging listening experiences.\n",
+ "\n",
+ "The possibilities are endless! As AI technology continues to advance, you can expect even more creative and innovative applications in podcasting and other content creation domains.\n",
+ "\n",
+ "You can learn more about [LangGraph](https://langchain-ai.github.io/langgraph/), the [Gemini API in Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models), or the [chat model provider for Vertex AI in LangChain](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm/) in their respective documentation pages."
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "langgraph_gemini_podcast.ipynb",
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/gemini/rag-engine/rag_engine_vector_search.ipynb b/gemini/rag-engine/rag_engine_vector_search.ipynb
new file mode 100644
index 0000000000..1147094284
--- /dev/null
+++ b/gemini/rag-engine/rag_engine_vector_search.ipynb
@@ -0,0 +1,643 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ur8xi4C7S06n"
+ },
+ "outputs": [],
+ "source": [
+ "# Copyright 2024 Google LLC\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "JAPoU8Sm5E6e"
+ },
+ "source": [
+ "# Vertex AI RAG Engine with Vertex AI Vector Search\n",
+ "\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2f4b6b36",
+ "metadata": {
+ "id": "2f4b6b36"
+ },
+ "source": [
+ "| | |\n",
+ "|-|-|\n",
+ "|Author(s) | [Zack Akil](https://github.com/zackail)|"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e65a3863",
+ "metadata": {
+ "id": "e65a3863"
+ },
+ "source": [
+ "## Overview\n",
+ "\n",
+ "The AI Quick Build experience is a Gemini and Imagen powered gameshow experience where contestants race against the clock to construct the best interpretation of a given object, e.g., a \"lighthouse.\" Gemini provides live commentary and performs the final judging of whose build is the best. Then, Imagen takes what has been built and brings it to life.\n",
+ "\n",
+ " This notebook provides an insight into how the AI Quick Build experience was built and how you can build similar features into your applications.\n",
+ "\n",
+ "The notebook is divided into three main sections:\n",
+ "\n",
+ "1. **Real-time commentary:** This section demonstrates how to generate live commentary using Gemini Flash and the Cloud Text-to-Speech API.\n",
+ "2. **Judging:** This section showcases how to use Gemini Pro for in-depth analysis and judgment of the builds, providing detailed reasoning and a winner announcement script.\n",
+ "3. **Image generation:** This section explores how to enhance the winner's creation by generating a dynamic background using Imagen, and then creating an animated GIF that transitions between the original and generated images.\n",
+ "\n",
+ "### Objective\n",
+ "\n",
+ "This notebook aims to:\n",
+ "\n",
+ "* Provide a clear understanding of how the AI Quick Build experience was developed.\n",
+ "* Enable users to build similar experiences by demonstrating the capabilities of Gemini and Imagen.\n",
+ "* Inspire users to explore new applications of these technologies in areas like education and entertainment.\n",
+ "\n",
+ "### Costs\n",
+ "\n",
+ "This tutorial uses billable components of Google Cloud:\n",
+ "\n",
+ "- Vertex AI\n",
+ "- Cloud Text-to-Speech\n",
+ "\n",
+ "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Text-to-Speech pricing](https://cloud.google.com/text-to-speech/pricing), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "951930c4",
+ "metadata": {
+ "id": "951930c4"
+ },
+ "source": [
+ "## Get Started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e31deb12",
+ "metadata": {
+ "id": "e31deb12"
+ },
+ "source": [
+ "### Install Vertex AI SDK and other required packages"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "428311af",
+ "metadata": {
+ "id": "428311af"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install --upgrade --quiet google-cloud-texttospeech \\\n",
+ " google-cloud-aiplatform"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9005c308",
+ "metadata": {
+ "id": "9005c308"
+ },
+ "source": [
+ "### Restart runtime\n",
+ "\n",
+ "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.\n",
+ "\n",
+ "The restart might take a minute or longer. After it's restarted, continue to the next step."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "49b60539",
+ "metadata": {
+ "id": "49b60539"
+ },
+ "outputs": [],
+ "source": [
+ "import IPython\n",
+ "\n",
+ "app = IPython.Application.instance()\n",
+ "app.kernel.do_shutdown(True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b78c686d",
+ "metadata": {
+ "id": "b78c686d"
+ },
+ "source": [
+ "
\n",
+ "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9f0de68e",
+ "metadata": {
+ "id": "9f0de68e"
+ },
+ "source": [
+ "### Authenticate your notebook environment (Colab only)\n",
+ "\n",
+ "If you're running this notebook on Google Colab, run the cell below to authenticate your environment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "06a9699c",
+ "metadata": {
+ "id": "06a9699c"
+ },
+ "outputs": [],
+ "source": [
+ "PROJECT_ID = \"YOUR PROJECT ID\" # @param {type:\"string\"}\n",
+ "LOCATION = \"us-central1\" # @param {type:\"string\"}\n",
+ "import sys\n",
+ "\n",
+ "if \"google.colab\" in sys.modules:\n",
+ " from google.colab import auth\n",
+ "\n",
+ " auth.authenticate_user(project_id=PROJECT_ID)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "51ad2c80",
+ "metadata": {
+ "id": "51ad2c80"
+ },
+ "source": [
+ "### Set Google Cloud project information and initialize Vertex AI SDK\n",
+ "\n",
+ "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
+ "\n",
+ "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "80af9f03",
+ "metadata": {
+ "id": "80af9f03"
+ },
+ "outputs": [],
+ "source": [
+ "import vertexai\n",
+ "\n",
+ "vertexai.init(project=PROJECT_ID, location=LOCATION)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "81a6e2f3",
+ "metadata": {
+ "id": "81a6e2f3"
+ },
+ "source": [
+ "# Part 1: Real-Time Commentary\n",
+ "\n",
+ "This section demonstrates how to use Gemini Flash and the Cloud Text-to-Speech API to generate real-time commentary on the builds as they are being constructed.\n",
+ "\n",
+ "Gemini Flash's low latency allows us to provide immediate feedback and analysis, enhancing the interactive nature of the experience.\n",
+ "\n",
+ "We will also showcase the power of using Google Cloud's new Casual voices for text-to-speech, which can create a more natural and engaging commentary experience."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "87798176",
+ "metadata": {
+ "id": "87798176"
+ },
+ "source": [
+ "### Setting up the Commentator's Voice\n",
+ "\n",
+ "Here we define a helper function that utilizes the Cloud Text-to-Speech API to generate speech from a given text script. We will use this function to give our commentator a voice."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cc85c36f",
+ "metadata": {
+ "id": "cc85c36f"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "import IPython\n",
+ "from IPython import display\n",
+ "from google.cloud import texttospeech\n",
+ "\n",
+ "texttospeech_client = texttospeech.TextToSpeechClient()\n",
+ "\n",
+ "\n",
+ "def speak(script: str, voice_name: str = \"en-US-Casual-K\") -> IPython.lib.display.Audio:\n",
+ " \"\"\"\n",
+ " Generates speech from the given script using the specified voice and plays it.\n",
+ "\n",
+ " Args:\n",
+ " script: The text script to convert to speech.\n",
+ " voice_name: The name of the voice to use (default: 'en-US-Casual-K').\n",
+ " \"\"\"\n",
+ " generate_speech(script, voice_name)\n",
+ " return display.Audio(\"output.wav\")\n",
+ "\n",
+ "\n",
+ "def generate_speech(\n",
+ " script: str,\n",
+ " voice_name: str = \"en-US-Casual-K\",\n",
+ " output_file_name: str = \"output.wav\",\n",
+ ") -> str:\n",
+ " \"\"\"Generates an wav audio file from the given script using Google Cloud Text-to-Speech.\n",
+ "\n",
+ " Args:\n",
+ " script: The text script to convert to speech.\n",
+ " voice_name: The name of the voice to use (e.g., 'en-US-Casual-K').\n",
+ " output_file_name: The name of the output MP3 file.\n",
+ " \"\"\"\n",
+ " synthesis_input = texttospeech.SynthesisInput(text=script)\n",
+ "\n",
+ " language_code = extract_language_code(voice_name)\n",
+ "\n",
+ " voice = texttospeech.VoiceSelectionParams(\n",
+ " language_code=language_code, name=voice_name\n",
+ " )\n",
+ "\n",
+ " # Select the type of audio file you want returned\n",
+ " audio_config = texttospeech.AudioConfig(\n",
+ " audio_encoding=texttospeech.AudioEncoding.LINEAR16\n",
+ " )\n",
+ "\n",
+ " # Perform the text-to-speech request on the text input with the selected\n",
+ " tts_response = texttospeech_client.synthesize_speech(\n",
+ " input=synthesis_input, voice=voice, audio_config=audio_config\n",
+ " )\n",
+ "\n",
+ " # The response's audio_content is binary.\n",
+ " with open(output_file_name, \"wb\") as out:\n",
+ " # Write the response to the output file.\n",
+ " out.write(tts_response.audio_content)\n",
+ " print(f\"Audio content written to file {output_file_name}\")\n",
+ " return output_file_name\n",
+ "\n",
+ "\n",
+ "def extract_language_code(voice_name: str) -> str | None:\n",
+ " \"\"\"Extracts the language code by splitting on the second hyphen.\"\"\"\n",
+ " parts = voice_name.split(\"-\", 2) # Split into max 3 parts\n",
+ " if len(parts) >= 2:\n",
+ " return \"-\".join(parts[:2]) # Join the first two parts\n",
+ " else:\n",
+ " return None\n",
+ "\n",
+ "\n",
+ "# Demonstrate the casual voice\n",
+ "speak(\"Hello, I'm ready to speak\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c692e79c",
+ "metadata": {
+ "id": "c692e79c"
+ },
+ "source": [
+ "### Generating Live Commentary with Gemini Flash and Cloud Text-to-Speech\n",
+ "\n",
+ "In this section, we'll use Gemini Flash to generate commentary based on images of the players' builds. We'll then use the `speak` function we defined earlier to convert this commentary into speech."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "oLEJ918-y6hE",
+ "metadata": {
+ "id": "oLEJ918-y6hE"
+ },
+ "outputs": [],
+ "source": [
+ "# Import vertex ai classes\n",
+ "from vertexai.generative_models import GenerationConfig, GenerativeModel, Image, Part"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d002764c",
+ "metadata": {
+ "id": "d002764c"
+ },
+ "outputs": [],
+ "source": [
+ "# Fetch images of players' builds from Google Cloud Storage\n",
+ "!gsutil cp gs://cloud-samples-data/generative-ai/image/quick_build_* ."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "5827c565",
+ "metadata": {
+ "id": "5827c565"
+ },
+ "outputs": [],
+ "source": [
+ "image_urls = [\n",
+ " \"quick_build_1.png\",\n",
+ " \"quick_build_2.png\",\n",
+ " \"quick_build_3.png\",\n",
+ " \"quick_build_4.png\",\n",
+ "]\n",
+ "\n",
+ "# Load the images using the Vertex AI SDK Image loader\n",
+ "all_images = [Image.load_from_file(url) for url in image_urls]\n",
+ "\n",
+ "# We could also load the images directly from Google Cloud Storage with Image.from_uri(url)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f0fbf92f",
+ "metadata": {
+ "id": "f0fbf92f"
+ },
+ "outputs": [],
+ "source": [
+ "# Define the player names\n",
+ "player_names = [\"Elara\", \"Finn\", \"Juniper\", \"Rhys\"]\n",
+ "\n",
+ "# Display the player build images\n",
+ "for i, img in enumerate(all_images):\n",
+ " print(f\"Player {i+1}, {player_names[i]}:\")\n",
+ " display.display(img)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "4d17e406",
+ "metadata": {
+ "id": "4d17e406"
+ },
+ "outputs": [],
+ "source": [
+ "# Define the design challenge\n",
+ "design = \"lighthouse\"\n",
+ "\n",
+ "# Initialize a list to store previous comments\n",
+ "previous_comments = []\n",
+ "\n",
+ "# Define the commentator's persona\n",
+ "commentator_persona = \"\"\"\n",
+ "whimsical, but can be quippy\n",
+ "\"\"\"\n",
+ "\n",
+ "# Define the prompt for generating commentary\n",
+ "commentary_prompt = f\"\"\"\n",
+ "You are a commentator at a high-stakes toy brick building contest where participants\n",
+ "have only 100 seconds to construct a {design}.\n",
+ "They will be judged on the following Creative\n",
+ "Use of Color\n",
+ "Structural Integrity:\n",
+ "Adherence to Brief:\n",
+ "Detail and Complexity:\n",
+ "Make a short comment about one of the players' builds so far.\n",
+ "Make it specific about something visual.\n",
+ "Make sure to call out the player by name.\n",
+ "\n",
+ "Your persona is {commentator_persona}.\n",
+ "\n",
+ "Here are the previous comments said:\n",
+ "--- Start of previous comments ---\n",
+ "{previous_comments}\n",
+ "--- End of previous comments ---\n",
+ "\n",
+ "Mix up who the comments are about so that everyone is included,\n",
+ "but only talk about the players mentioned below:\n",
+ "Here are the latest photos of all of the builds:\n",
+ "\"\"\"\n",
+ "\n",
+ "# Initialize the prompt parts with the base prompt\n",
+ "prompt_parts = [commentary_prompt]\n",
+ "\n",
+ "# Add player information and images to the prompt parts\n",
+ "for i, name in enumerate(player_names):\n",
+ " prompt_player_part = f\"Player {i+1}, {name}:\"\n",
+ " prompt_parts.append(prompt_player_part)\n",
+ "\n",
+ " players_photo = all_images[i]\n",
+ " prompt_photo_part = Part.from_image(players_photo)\n",
+ "\n",
+ " prompt_parts.append(prompt_photo_part)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9d5105e1",
+ "metadata": {
+ "id": "9d5105e1"
+ },
+ "source": [
+ "### Leveraging Controlled Generation to Structure the Output\n",
+ "\n",
+ "Here, we will use **Controlled Generation** to ensure that Gemini Flash outputs the commentary in a specific JSON format. This will allow us to easily parse the response and extract the information we need, such as the commentary script, the player the commentary is about, and their number.\n",
+ "\n",
+ "Learn more about Controlled Generation: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "777f2d82",
+ "metadata": {
+ "id": "777f2d82"
+ },
+ "outputs": [],
+ "source": [
+ "# Load the Gemini Flash model\n",
+ "gemini_flash = GenerativeModel(\"gemini-1.5-flash\")\n",
+ "\n",
+ "# Define the response schema for controlled generation\n",
+ "response_schema = {\n",
+ " \"type\": \"object\",\n",
+ " \"properties\": {\n",
+ " \"commentaryScript\": {\"type\": \"string\"},\n",
+ " \"playerNameCommentaryIsAbout\": {\"type\": \"string\"},\n",
+ " \"playerNumberCommentaryIsAbout\": {\"type\": \"integer\"},\n",
+ " },\n",
+ " \"required\": [\n",
+ " \"commentaryScript\",\n",
+ " \"playerNameCommentaryIsAbout\",\n",
+ " \"playerNumberCommentaryIsAbout\",\n",
+ " ],\n",
+ "}\n",
+ "\n",
+ "# Configure the generation settings, including the response MIME type and schema\n",
+ "generation_config = GenerationConfig(\n",
+ " response_mime_type=\"application/json\",\n",
+ " response_schema=response_schema,\n",
+ " temperature=0.7,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "a8953fa9",
+ "metadata": {
+ "id": "a8953fa9"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Oh, Elara, you're really going for that rainbow look, aren't you? I'm not sure if it's a lighthouse or a birthday cake, but it's certainly got my attention!\n",
+ "Elara\n",
+ "1\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Generate commentary using Gemini Flash and the defined prompt parts and configuration\n",
+ "commentary = gemini_flash.generate_content(\n",
+ " prompt_parts, generation_config=generation_config\n",
+ ")\n",
+ "\n",
+ "# Parse the JSON response\n",
+ "commentary_dict = json.loads(commentary.text)\n",
+ "commentary_script = commentary_dict.get(\"commentaryScript\")\n",
+ "player_name_commentary_is_about = commentary_dict.get(\"playerNameCommentaryIsAbout\")\n",
+ "player_number_commentary_is_about = commentary_dict.get(\"playerNumberCommentaryIsAbout\")\n",
+ "\n",
+ "# Print the extracted information\n",
+ "print(commentary_script)\n",
+ "print(player_name_commentary_is_about)\n",
+ "print(player_number_commentary_is_about)\n",
+ "\n",
+ "# Display the image of the player being commented on\n",
+ "all_images[player_number_commentary_is_about - 1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "42a7fb17",
+ "metadata": {
+ "id": "42a7fb17"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Audio content written to file output.wav\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Convert the commentary script to speech and play it\n",
+ "speak(commentary_script)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "72130a81",
+ "metadata": {
+ "id": "72130a81"
+ },
+ "source": [
+ "# Part 2: Contest Judging with Gemini Pro\n",
+ "\n",
+ "This section demonstrates how to use Gemini Pro to judge the completed builds. Gemini Pro offers more complex and in-depth reasoning capabilities, making it ideal for evaluating the builds based on specific criteria and selecting an overall winner."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "4364dbff",
+ "metadata": {
+ "id": "4364dbff"
+ },
+ "outputs": [],
+ "source": [
+ "# Define the design challenge\n",
+ "design = \"lighthouse\"\n",
+ "\n",
+ "# Define the judging prompt\n",
+ "judging_prompt = f\"\"\"You are judging a high-stakes toy brick building contest where\n",
+ "participants have only 100 seconds to construct a {design}.\n",
+ "\n",
+ " Evaluate each entry on a scale of 1-10 based on 'Creativity' & 'Execution' the following criteria:\n",
+ " Creative Use of Color: Assess how intentionally and effectively different colored bricks have been used\n",
+ " to enhance the overall design and convey specific themes or emotions.\n",
+ " Structural Integrity: Consider the stability and durability of the build.\n",
+ " Does the structure hold together under scrutiny, and does it demonstrate a sound understanding of LEGO mechanics?\n",
+ " Adherence to Brief: Evaluate how closely the design aligns with the given prompt or theme.\n",
+ " Does it capture the essence of the concept and avoid unnecessary deviations?\n",
+ " Detail and Complexity: Assess the level of detail and complexity incorporated into the build.\n",
+ " Do the smaller elements contribute to the overall aesthetic and functionality?\n",
+ "\n",
+ " And choose an overall winner!\n",
+ " Output an \"annoucment script\" that builds up to announcing the winner. Keep it consice\n",
+ " And include a \"winner reasoning\" that goes into specific details about why that winner was choosen. Keep it consice\n",
+ " Here are the finished builds:\"\n",
+ " \"\"\"\n",
+ "\n",
+ "# Initialize the prompt parts with the judging prompt\n",
+ "judging_prompt_parts = [judging_prompt]\n",
+ "\n",
+ "# Define the player names\n",
+ "player_names = [\"Elara\", \"Finn\", \"Juniper\", \"Rhys\"]\n",
+ "\n",
+ "# Add player information and images to the judging prompt parts\n",
+ "for i, name in enumerate(player_names):\n",
+ " prompt_player_part = f\"Player {i+1}, {name}:\"\n",
+ " judging_prompt_parts.append(prompt_player_part)\n",
+ "\n",
+ " players_photo = all_images[i]\n",
+ " prompt_photo_part = Part.from_image(players_photo)\n",
+ "\n",
+ " judging_prompt_parts.append(prompt_photo_part)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "90f99ae0",
+ "metadata": {
+ "id": "90f99ae0"
+ },
+ "outputs": [],
+ "source": [
+ "# Define the base response schema for judging\n",
+ "judging_response_schema_base = {\n",
+ " \"type\": \"object\",\n",
+ " \"properties\": {\n",
+ " \"scores\": {\n",
+ " \"type\": \"object\",\n",
+ " \"properties\": {},\n",
+ " },\n",
+ " \"overallWinner\": {\n",
+ " \"type\": \"object\",\n",
+ " \"properties\": {\n",
+ " \"playerName\": {\n",
+ " \"type\": \"string\",\n",
+ " },\n",
+ " \"playerNumber\": {\n",
+ " \"type\": \"number\",\n",
+ " },\n",
+ " \"winnerReasoning\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": \"specific details about why this winner was choosen.\",\n",
+ " },\n",
+ " },\n",
+ " },\n",
+ " \"winnerAnnouncementScript\": {\n",
+ " \"type\": \"string\",\n",
+ " \"description\": \"This is what the announcer will read out verbatum, comment on some specific aspects of all of the contenstants builds, keeping the winner ambiguous to the very last word, e.g '... and that is why the winner is... [insert player name]!'\",\n",
+ " },\n",
+ " },\n",
+ "}\n",
+ "\n",
+ "# Define the response schema for individual players' scores\n",
+ "judging_response_schema_player = {\n",
+ " \"type\": \"object\",\n",
+ " \"properties\": {\n",
+ " \"playerName\": {\n",
+ " \"type\": \"string\",\n",
+ " },\n",
+ " \"playerNumber\": {\n",
+ " \"type\": \"number\",\n",
+ " },\n",
+ " \"score\": {\n",
+ " \"type\": \"number\",\n",
+ " \"description\": \"score between 1-10 for their overall build\",\n",
+ " },\n",
+ " \"scoringReasoning\": {\n",
+ " \"type\": \"string\",\n",
+ " },\n",
+ " },\n",
+ "}\n",
+ "\n",
+ "# Add player-specific schemas to the base schema dynamically\n",
+ "for i, name in enumerate(player_names):\n",
+ " judging_response_schema_base[\"properties\"][\"scores\"][\"properties\"][\n",
+ " f\"player_{i+1}\"\n",
+ " ] = judging_response_schema_player"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "3dfb907b",
+ "metadata": {
+ "id": "3dfb907b"
+ },
+ "outputs": [],
+ "source": [
+ "# Load the Gemini Pro model\n",
+ "gemini_pro = GenerativeModel(\"gemini-1.5-pro\")\n",
+ "\n",
+ "# Configure the generation settings for judging\n",
+ "generation_config = GenerationConfig(\n",
+ " response_mime_type=\"application/json\",\n",
+ " response_schema=judging_response_schema_base,\n",
+ " temperature=0.7,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "3a930b96",
+ "metadata": {
+ "id": "3a930b96"
+ },
+ "outputs": [],
+ "source": [
+ "# Generate judging results using Gemini Pro and the defined prompt parts and configuration\n",
+ "judging = gemini_pro.generate_content(\n",
+ " judging_prompt_parts, generation_config=generation_config\n",
+ ")\n",
+ "\n",
+ "# Parse the JSON response\n",
+ "judging_dict = json.loads(judging.text)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ffac0121",
+ "metadata": {
+ "id": "ffac0121"
+ },
+ "outputs": [],
+ "source": [
+ "# Display the judging results\n",
+ "judging_dict"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "db33a3f4",
+ "metadata": {
+ "id": "db33a3f4"
+ },
+ "outputs": [],
+ "source": [
+ "# Extract information about the winner and the announcement script\n",
+ "winning_player_name = judging_dict[\"overallWinner\"][\"playerName\"]\n",
+ "winning_player_number = judging_dict[\"overallWinner\"][\"playerNumber\"]\n",
+ "winning_player_reasoning = judging_dict[\"overallWinner\"][\"winnerReasoning\"]\n",
+ "winner_announcement_script = judging_dict[\"winnerAnnouncementScript\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "id": "b930d0c7",
+ "metadata": {
+ "id": "b930d0c7"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "All of our contestants have crafted impressive lighthouses in just 100 seconds, demonstrating remarkable skill and creativity under pressure. Elara's vibrant design immediately catches the eye, while Finn's lighthouse stands as a beacon of classic simplicity. Juniper's build showcases a unique color palette, and Rhys's lighthouse impresses with its attention to detail and sturdy structure. And the winner is... Rhys!\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Print and speak the winner announcement script\n",
+ "print(winner_announcement_script)\n",
+ "# speak(winner_announcement_script)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "7b0924f2",
+ "metadata": {
+ "id": "7b0924f2"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Winner is Rhys!\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Announce the winner and display their image\n",
+ "print(f\"Winner is {winning_player_name}!\")\n",
+ "all_images[winning_player_number - 1]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "eeb118c3",
+ "metadata": {
+ "id": "eeb118c3"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Rhys's lighthouse stands out with its well-defined base, classic lighthouse shape, and creative use of color to mimic a real lighthouse. The build is solid and well-executed, demonstrating a strong understanding of the prompt.\n",
+ "Audio content written to file output.wav\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Print and speak the winner reasoning\n",
+ "print(winning_player_reasoning)\n",
+ "speak(winning_player_reasoning)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fccb7305",
+ "metadata": {
+ "id": "fccb7305"
+ },
+ "source": [
+ "# Part 3: Bringing the Creation to Life with Imagen\n",
+ "\n",
+ "In this section, we will use Imagen to enhance the winning build by generating a dynamic and engaging background. We will then create a fun animated GIF that transitions between the original build image and the Imagen-generated image.\n",
+ "\n",
+ "This showcases how Imagen can be used to create compelling visuals that complement and enhance user-generated content."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "8ff8abf2",
+ "metadata": {
+ "id": "8ff8abf2"
+ },
+ "outputs": [],
+ "source": [
+ "from vertexai.preview.vision_models import Image as Imagen_Image\n",
+ "from vertexai.preview.vision_models import ImageGenerationModel"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "389668b0",
+ "metadata": {
+ "id": "389668b0"
+ },
+ "outputs": [],
+ "source": [
+ "# Load the Imagen model\n",
+ "imagen_model = ImageGenerationModel.from_pretrained(\"imagegeneration@006\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f8358a05",
+ "metadata": {
+ "id": "f8358a05"
+ },
+ "outputs": [],
+ "source": [
+ "# Load the winning build's image\n",
+ "selected_image = image_urls[winning_player_number - 1]\n",
+ "\n",
+ "base_img = Imagen_Image.load_from_file(location=selected_image)\n",
+ "base_img.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "21a1850a",
+ "metadata": {
+ "id": "21a1850a"
+ },
+ "outputs": [],
+ "source": [
+ "# Define the prompt for Imagen to generate a background\n",
+ "image_prompt = \"a macro lens photo of a toy brick lighthouse in a rocky stormy costal toy brick sea scene. The sea streches out into the distance\"\n",
+ "\n",
+ "# Generate images using Imagen's edit_image function with background masking\n",
+ "images = imagen_model.edit_image(\n",
+ " prompt=image_prompt,\n",
+ " base_image=base_img,\n",
+ " mask_mode=\"background\",\n",
+ " mask_dilation=0.003, # Use a small mask dilation to avoid erasing parts of the subject\n",
+ " number_of_images=3,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "78552a40",
+ "metadata": {
+ "id": "78552a40"
+ },
+ "outputs": [],
+ "source": [
+ "# Display the generated images\n",
+ "for i, img in enumerate(images):\n",
+ " print(\"generated image \", i)\n",
+ " img.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "id": "94c7a4b7",
+ "metadata": {
+ "id": "94c7a4b7"
+ },
+ "outputs": [],
+ "source": [
+ "# Select a favourite generated image\n",
+ "favourite_generated_image = 2\n",
+ "\n",
+ "# Save the favourite generated image\n",
+ "images[favourite_generated_image].save(\"favourite_generated_image.png\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b0b5c963",
+ "metadata": {
+ "id": "b0b5c963"
+ },
+ "source": [
+ "### Creating an Animated GIF\n",
+ "\n",
+ "Now, let's create a fun GIF that fades between the original build image and the Imagen-generated image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "id": "1e9f2bd2",
+ "metadata": {
+ "id": "1e9f2bd2"
+ },
+ "outputs": [],
+ "source": [
+ "# import Pillow package for gif building\n",
+ "import PIL"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "id": "6065033a",
+ "metadata": {
+ "id": "6065033a"
+ },
+ "outputs": [],
+ "source": [
+ "# Load the start and end images for the GIF\n",
+ "start_image_url = image_urls[winning_player_number - 1]\n",
+ "end_image_url = \"favourite_generated_image.png\"\n",
+ "\n",
+ "start_image = PIL.Image.open(start_image_url).convert(\"RGB\")\n",
+ "end_image = PIL.Image.open(end_image_url).convert(\"RGB\")\n",
+ "\n",
+ "# Initialize a list to store the GIF frames\n",
+ "gif_frames = []\n",
+ "\n",
+ "# Define the number of steps for the fade transition\n",
+ "gif_steps = 3\n",
+ "\n",
+ "# Generate the frames for the fade-in\n",
+ "for i in range(gif_steps + 1):\n",
+ " gif_frame = PIL.Image.blend(start_image, end_image, i / gif_steps)\n",
+ " gif_frames.append(gif_frame)\n",
+ "\n",
+ "# Add the end frame again so that it pauses at the generated image.\n",
+ "gif_frames.append(end_image)\n",
+ "\n",
+ "# Add the frames for the fade-out (reverse of fade-in)\n",
+ "gif_frames += gif_frames[::-1]\n",
+ "\n",
+ "# Save the frames as a GIF\n",
+ "start_image.save(\n",
+ " \"generated.gif\",\n",
+ " format=\"GIF\",\n",
+ " append_images=gif_frames,\n",
+ " save_all=True,\n",
+ " duration=300,\n",
+ " loop=0,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "328c9afd",
+ "metadata": {
+ "id": "328c9afd"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/gif": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 37,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Display the GIF\n",
+ "display.Image(filename=\"generated.gif\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "25983d78",
+ "metadata": {
+ "id": "25983d78"
+ },
+ "source": [
+ "# Conclusion\n",
+ "\n",
+ "This notebook has demonstrated how to build the backend for a fun and interactive game show experience using Gemini and Imagen. We have explored various capabilities of these models, including:\n",
+ "\n",
+ "* **Real-time commentary generation with Gemini Flash:** Providing immediate feedback and analysis on player builds.\n",
+ "* **In-depth judging and winner selection with Gemini Pro:** Leveraging advanced reasoning capabilities for a more comprehensive evaluation.\n",
+ "* **Enhanced visuals with Imagen:** Bringing the winning build to life with a dynamically generated background and an animated GIF.\n",
+ "\n",
+ "### Potential Applications\n",
+ "\n",
+ "The techniques and concepts presented in this notebook can be applied to various applications beyond game shows, such as:\n",
+ "\n",
+ "* **Education:** Creating interactive learning experiences with personalized feedback and engaging visuals.\n",
+ "* **Content creation:** Generating dynamic content for social media, marketing campaigns, and more.\n",
+ "* **Entertainment:** Building interactive narratives and immersive experiences.\n",
+ "\n",
+ "### Learn More\n",
+ "\n",
+ "To further explore the capabilities of Gemini, Imagen, and other Generative AI technologies, check out the following resources:\n",
+ "\n",
+ "* [Gemini documentation](https://cloud.google.com/vertex-ai)\n",
+ "* [Imagen documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview)\n",
+ "* [Vertex AI Generative AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview)\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "ai_quick_build_experience_backend.ipynb",
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/genkit/README.md b/genkit/README.md
index 973178c1d8..3119834ffe 100644
--- a/genkit/README.md
+++ b/genkit/README.md
@@ -5,3 +5,4 @@ This repository contains solutions using [Firebase Genkit](https://firebase.goog
## Directory Structure
- **generate-synthetic-database/**: This directory contains the source code for a Google Cloud Function that generates a synthetic database.
+- **postcard-generator/**: This directory contains the source code for a Next.js app that generates postcard images based on a journey
diff --git a/genkit/postcard-generator/.env.example b/genkit/postcard-generator/.env.example
new file mode 100644
index 0000000000..3daeee1d62
--- /dev/null
+++ b/genkit/postcard-generator/.env.example
@@ -0,0 +1,11 @@
+# This key should only allow Javascript and Places
+NEXT_PUBLIC_GOOGLE_MAPS_PUBLIC_API_KEY=
+
+# This key should only allow Routes and Static Maps
+GOOGLE_MAPS_API_SERVER_KEY=
+
+# Genkit uses this to determine what Vertex AI project to use. If not set will use your ADC Default project
+GOOGLE_CLOUD_PROJECT=
+
+# Only set to true if you have setup Firebase auth (see docs/deploy.md for details)
+AUTH_ENABLED=false
diff --git a/genkit/postcard-generator/.gitignore b/genkit/postcard-generator/.gitignore
new file mode 100644
index 0000000000..c5501a1f7d
--- /dev/null
+++ b/genkit/postcard-generator/.gitignore
@@ -0,0 +1,40 @@
+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
+
+# dependencies
+/node_modules
+/.pnp
+.pnp.js
+.yarn/install-state.gz
+
+# testing
+/coverage
+
+# next.js
+/.next/
+/out/
+
+# production
+/build
+
+# misc
+.DS_Store
+*.pem
+
+# debug
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+
+# local env files
+.env*.local
+
+# vercel
+.vercel
+
+# typescript
+*.tsbuildinfo
+next-env.d.ts
+
+.env
+
+*.log
\ No newline at end of file
diff --git a/genkit/postcard-generator/README.md b/genkit/postcard-generator/README.md
new file mode 100644
index 0000000000..1069847674
--- /dev/null
+++ b/genkit/postcard-generator/README.md
@@ -0,0 +1,32 @@
+# Postcard Generator
+
+| | |
+| --------- | --------------------------------------- |
+| Author(s) | [Matt Day](https://github.com/mattsday) |
+
+**Looking to get started?** Check out [the docs](docs/README.md) for setup, demo, and more!
+
+This demo showcases [Firebase Genkit](https://firebase.google.com/docs/genkit) running inside a [Next.js](https://nextjs.org) app that can be deployed onto [Firebase App Hosting](https://firebase.google.com/docs/app-hosting). It generates postcard images based on a start and destination, as well as a short script detailing the journey.
+
+![Example Postcard Image](images/example.jpg)
+
+## Overview
+
+This demo has two parts: a webapp and a Genkit development environment. Both use the same code, but have different aspects to demo. The most important thing is that it's the same codebase for both - just different tools to interact suitable for different personas and tasks.
+
+See [the documentation](docs/README.md) for more information on how to deploy this and also for an example demo script.
+
+## Using
+
+See [the docs](docs/README.md) to get started!
+
+### TL;DR
+
+```sh
+export PROJECT_ID="my-project-id"
+cd terraform
+terraform init && terraform apply -var="project_id=${PROJECT_ID}"
+cd ..
+npm install
+npx firebase-tools@latest apphosting:backends:create --project="${PROJECT_ID}"
+```
diff --git a/genkit/postcard-generator/app/components/AuthContext.tsx b/genkit/postcard-generator/app/components/AuthContext.tsx
new file mode 100644
index 0000000000..cbbb26a26f
--- /dev/null
+++ b/genkit/postcard-generator/app/components/AuthContext.tsx
@@ -0,0 +1,92 @@
+/**
+ * Copyright 2024 Google LLC
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+"use strict";
+"use client";
+
+import { ReactNode, createContext, useState, useEffect, useContext } from "react";
+import { signInWithPopup, signOut, User, onAuthStateChanged, GoogleAuthProvider, onIdTokenChanged } from "firebase/auth";
+import { firebaseAuth } from "@/libs/firebase/clientApp";
+
+interface AuthProviderProps {
+ children?: ReactNode;
+ enabled?: boolean;
+}
+
+interface AuthContext {
+ user: User | null;
+ googleSignIn: () => Promise;
+ logOut: () => Promise;
+ enabled: boolean;
+}
+
+export async function googleSignIn() {
+ const provider = new GoogleAuthProvider();
+ await signInWithPopup(firebaseAuth, provider);
+}
+
+export async function logOut() {
+ await signOut(firebaseAuth);
+}
+
+const AuthContext = createContext({} as AuthContext);
+
+export const AuthContextProvider = ({
+ children, enabled = false,
+}: AuthProviderProps): JSX.Element => {
+ const [user, setUser] = useState(null);
+
+ useEffect(() => {
+ if (!enabled) {
+ return;
+ }
+ const unsubscribe = onAuthStateChanged(firebaseAuth, (currentUser) => {
+ setUser(currentUser);
+ });
+ return () => unsubscribe();
+ }, [user, enabled]);
+
+ useEffect(() => {
+ if (!enabled) {
+ return;
+ }
+ return onIdTokenChanged(firebaseAuth, async (currentUser) => {
+ if (!currentUser) {
+ setUser(null);
+ }
+ else {
+ setUser(currentUser);
+ }
+ });
+ }, [enabled]);
+
+ return (
+
+ {children}
+
+ );
+};
+
+export const UserAuth = () => {
+ return useContext(AuthContext);
+};
diff --git a/genkit/postcard-generator/app/components/Navigation.tsx b/genkit/postcard-generator/app/components/Navigation.tsx
new file mode 100644
index 0000000000..ac7ec550f4
--- /dev/null
+++ b/genkit/postcard-generator/app/components/Navigation.tsx
@@ -0,0 +1,40 @@
+/**
+ * Copyright 2024 Google LLC
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+"use strict";
+"use client";
+
+import * as React from "react";
+import AppBar from "@mui/material/AppBar";
+import Box from "@mui/material/Box";
+import Toolbar from "@mui/material/Toolbar";
+import Typography from "@mui/material/Typography";
+import TopLoginLogout from "./TopLoginLogout";
+
+export default function Navigation() {
+ return (
+
+
+
+
+ Postcard Generator
+
+
+
+
+
+ );
+}
diff --git a/genkit/postcard-generator/app/components/PlaceAutoComplete.tsx b/genkit/postcard-generator/app/components/PlaceAutoComplete.tsx
new file mode 100644
index 0000000000..68e23a2018
--- /dev/null
+++ b/genkit/postcard-generator/app/components/PlaceAutoComplete.tsx
@@ -0,0 +1,40 @@
+/**
+ * Copyright 2024 Google LLC
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+"use strict";
+"use client";
+
+import { PlacePicker } from "@googlemaps/extended-component-library/react";
+import Stack from "@mui/material/Stack";
+import React from "react";
+
+export interface PlaceAutoCompleteProps {
+ description: string;
+ value: string;
+ id: string;
+ handleChange: (e: Event) => void;
+}
+
+export default function PlaceAutoComplete({ description, value: defaultValue, id, handleChange }: PlaceAutoCompleteProps) {
+ return (
+
+
+
+
+ );
+}
diff --git a/genkit/postcard-generator/app/components/Postcard.tsx b/genkit/postcard-generator/app/components/Postcard.tsx
new file mode 100644
index 0000000000..5735307212
--- /dev/null
+++ b/genkit/postcard-generator/app/components/Postcard.tsx
@@ -0,0 +1,169 @@
+/**
+ * Copyright 2024 Google LLC
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+"use strict";
+import LinearProgress from "@mui/material/LinearProgress";
+import Card from "@mui/material/Card";
+import CardMedia from "@mui/material/CardMedia";
+import CardContent from "@mui/material/CardContent";
+import Alert from "@mui/material/Alert";
+import ExpandMoreIcon from "@mui/icons-material/ExpandMore";
+import IconButton, { IconButtonProps } from "@mui/material/IconButton";
+import { styled } from "@mui/material/styles";
+import { useState } from "react";
+import Typography from "@mui/material/Typography";
+import Collapse from "@mui/material/Collapse";
+import CardActions from "@mui/material/CardActions";
+import Markdown from "react-markdown";
+import DownloadIcon from "@mui/icons-material/Download";
+import ShareIcon from "@mui/icons-material/Share";
+
+export interface PostcardImageProps {
+ postcardImage: string | null;
+ generating: boolean;
+ description: string;
+ start: string | null;
+ end: string | null;
+ error: string | null;
+ mapImage: string | null;
+ story: string | null;
+}
+interface ExpandMoreProps extends IconButtonProps {
+ expand: boolean;
+}
+
+const ExpandMore = styled((props: ExpandMoreProps) => {
+ // eslint-disable-next-line @typescript-eslint/no-unused-vars
+ const { expand, ...other } = props;
+ return ;
+})(({ theme }) => ({
+ marginLeft: "auto",
+ transition: theme.transitions.create("transform", {
+ duration: theme.transitions.duration.shortest,
+ }),
+ variants: [
+ {
+ props: ({ expand }) => !expand,
+ style: {
+ transform: "rotate(0deg)",
+ },
+ },
+ {
+ props: ({ expand }) => !!expand,
+ style: {
+ transform: "rotate(180deg)",
+ },
+ },
+ ],
+}));
+
+export default function PostcardImage({ postcardImage, generating, error, mapImage, description, story }: PostcardImageProps) {
+ const [expanded, setExpanded] = useState(false);
+
+ const handleExpandClick = () => {
+ setExpanded(!expanded);
+ };
+
+ async function download() {
+ if (postcardImage) {
+ // Use fetch to convert the image data URL into a blob
+ const response = await fetch(postcardImage);
+ // Create a download link
+ const url = window.URL.createObjectURL(await response.blob());
+ const link = document.createElement("a");
+ link.href = url;
+ link.setAttribute("download", "postcard.png");
+ // Simulate a click on the link to trigger the download
+ document.body.appendChild(link);
+ link.click();
+ link.parentNode?.removeChild(link);
+ }
+ }
+
+ return (
+