prompt_optimization_tweet_generation_example.ipynb
1 { 2 "cells": [ 3 { 4 "cell_type": "markdown", 5 "id": "2a6e3449", 6 "metadata": {}, 7 "source": [ 8 "# Prompt Optimization with Evidently: Tweet Generation Example\n", 9 "This tutorial shows how to optimize prompts for generating engaging tweets using Evidently's `PromptOptimizer` API. \n", 10 "We'll iteratively improve a tweet generation prompt to maximize how engaging LLM-generated tweets are, according to a classifier.\n", 11 "\n", 12 "## What you'll learn:\n", 13 "- How to define a tweet generation function with OpenAI\n", 14 "- How to set up an LLM judge to classify tweet engagement\n", 15 "- How to optimize a tweet generation prompt based on feedback\n", 16 "- How to inspect the best optimized prompt" 17 ] 18 }, 19 { 20 "cell_type": "code", 21 "id": "3c9fdb12", 22 "metadata": {}, 23 "source": [ 24 "# Install packages if needed\n", 25 "# !pip install evidently openai pandas" 26 ], 27 "outputs": [], 28 "execution_count": null 29 }, 30 { 31 "cell_type": "code", 32 "id": "3fcd60c7", 33 "metadata": {}, 34 "source": [ 35 "import pandas as pd\n", 36 "import openai\n", 37 "\n", 38 "from evidently.descriptors import LLMEval\n", 39 "from evidently.llm.templates import BinaryClassificationPromptTemplate\n", 40 "from evidently.llm.optimization import PromptOptimizer, PromptExecutionLog, Params" 41 ], 42 "outputs": [], 43 "execution_count": null 44 }, 45 { 46 "cell_type": "markdown", 47 "id": "e62d754e", 48 "metadata": {}, 49 "source": [ 50 "## Define a Tweet Generation Function" 51 ] 52 }, 53 { 54 "cell_type": "code", 55 "id": "6103a03b", 56 "metadata": {}, 57 "source": [ 58 "def basic_tweet_generation(topic, model=\"gpt-3.5-turbo\", instructions=\"\"):\n", 59 " response = openai.chat.completions.create(\n", 60 " model=model,\n", 61 " messages=[\n", 62 " {\"role\": \"system\", \"content\": instructions},\n", 63 " {\"role\": \"user\", \"content\": f\"Write a short paragraph about {topic}\"}\n", 64 " ]\n", 65 " )\n", 66 " return response.choices[0].message.content" 67 ], 68 "outputs": [], 69 "execution_count": null 70 }, 71 { 72 "cell_type": "markdown", 73 "id": "4d4cf710", 74 "metadata": {}, 75 "source": [ 76 "## Define a Tweet Quality Judge" 77 ] 78 }, 79 { 80 "cell_type": "code", 81 "id": "2d7f0b76", 82 "metadata": {}, 83 "source": [ 84 "tweet_quality = BinaryClassificationPromptTemplate(\n", 85 " pre_messages=[(\"system\", \"You are evaluating the quality of tweets\")],\n", 86 " criteria=\"\"\"\n", 87 "Text is ENGAGING if it meets at least one of the following:\n", 88 " • Strong hook (question, surprise, bold statement)\n", 89 " • Uses emotion, humor, or opinion\n", 90 " • Encourages interaction\n", 91 " • Shows personality or distinct tone\n", 92 " • Includes vivid language or emojis\n", 93 " • Sparks curiosity or insight\n", 94 "\n", 95 "Text is NEUTRAL if it lacks these qualities.\n", 96 "\"\"\",\n", 97 " target_category=\"ENGAGING\",\n", 98 " non_target_category=\"NEUTRAL\",\n", 99 " uncertainty=\"non_target\",\n", 100 " include_reasoning=True,\n", 101 ")\n", 102 "\n", 103 "judge = LLMEval(\"basic_tweet_generation.result\", template=tweet_quality,\n", 104 " provider=\"openai\", model=\"gpt-4o-mini\", alias=\"Tweet quality\")\n" 105 ], 106 "outputs": [], 107 "execution_count": null 108 }, 109 { 110 "cell_type": "markdown", 111 "id": "a9b49b50", 112 "metadata": {}, 113 "source": [ 114 "## Define a Prompt Execution Function" 115 ] 116 }, 117 { 118 "cell_type": "code", 119 "id": "78245abf", 120 "metadata": {}, 121 "source": [ 122 "def run_prompt(generation_prompt: str, context) -> pd.Series:\n", 123 " \"\"\"generate engaging tweets\"\"\"\n", 124 " my_topics = [\n", 125 " \"testing in AI engineering is as important as in development\",\n", 126 " \"CI/CD is applicable in AI\",\n", 127 " \"Collaboration of subject matter experts and AI engineers improves product\",\n", 128 " \"Start LLM apps development from test cases generation\",\n", 129 " \"evidently is a great tool for LLM testing\"\n", 130 " ]\n", 131 " tweets = [basic_tweet_generation(topic, model=\"gpt-3.5-turbo\", instructions=generation_prompt) for topic in my_topics * 3]\n", 132 " return pd.Series(tweets)" 133 ], 134 "outputs": [], 135 "execution_count": null 136 }, 137 { 138 "cell_type": "markdown", 139 "id": "53c2281d", 140 "metadata": {}, 141 "source": [ 142 "## Run the Prompt Optimizer" 143 ] 144 }, 145 { 146 "cell_type": "code", 147 "id": "6cbb4971", 148 "metadata": {}, 149 "source": [ 150 "optimizer = PromptOptimizer(\"tweet_gen_example\", strategy=\"feedback\", verbose=True)\n", 151 "await optimizer.arun(run_prompt, scorer=judge, base_prompt=\"You are tweet generator\", repetitions=5)\n", 152 "# sync version\n", 153 "# optimizer.run(run_prompt, scorer=judge)" 154 ], 155 "outputs": [], 156 "execution_count": null 157 }, 158 { 159 "cell_type": "markdown", 160 "id": "ced639e2", 161 "metadata": {}, 162 "source": [ 163 "## View the Best Optimized Prompt" 164 ] 165 }, 166 { 167 "cell_type": "code", 168 "id": "2cff870f", 169 "metadata": {}, 170 "source": [ 171 "print(optimizer.best_prompt())" 172 ], 173 "outputs": [], 174 "execution_count": null 175 }, 176 { 177 "metadata": {}, 178 "cell_type": "code", 179 "source": "optimizer.print_stats()", 180 "id": "b865c5caa2f1cb4c", 181 "outputs": [], 182 "execution_count": null 183 } 184 ], 185 "metadata": { 186 "kernelspec": { 187 "display_name": "Python 3 (ipykernel)", 188 "language": "python", 189 "name": "python3" 190 }, 191 "language_info": { 192 "codemirror_mode": { 193 "name": "ipython", 194 "version": 3 195 }, 196 "file_extension": ".py", 197 "mimetype": "text/x-python", 198 "name": "python", 199 "nbconvert_exporter": "python", 200 "pygments_lexer": "ipython3", 201 "version": "3.11.11" 202 } 203 }, 204 "nbformat": 4, 205 "nbformat_minor": 5 206 }