/ examples / cookbook / prompt_optimization_tweet_generation_example.ipynb
prompt_optimization_tweet_generation_example.ipynb
  1  {
  2   "cells": [
  3    {
  4     "cell_type": "markdown",
  5     "id": "2a6e3449",
  6     "metadata": {},
  7     "source": [
  8      "# Prompt Optimization with Evidently: Tweet Generation Example\n",
  9      "This tutorial shows how to optimize prompts for generating engaging tweets using Evidently's `PromptOptimizer` API. \n",
 10      "We'll iteratively improve a tweet generation prompt to maximize how engaging LLM-generated tweets are, according to a classifier.\n",
 11      "\n",
 12      "## What you'll learn:\n",
 13      "- How to define a tweet generation function with OpenAI\n",
 14      "- How to set up an LLM judge to classify tweet engagement\n",
 15      "- How to optimize a tweet generation prompt based on feedback\n",
 16      "- How to inspect the best optimized prompt"
 17     ]
 18    },
 19    {
 20     "cell_type": "code",
 21     "id": "3c9fdb12",
 22     "metadata": {},
 23     "source": [
 24      "# Install packages if needed\n",
 25      "# !pip install evidently openai pandas"
 26     ],
 27     "outputs": [],
 28     "execution_count": null
 29    },
 30    {
 31     "cell_type": "code",
 32     "id": "3fcd60c7",
 33     "metadata": {},
 34     "source": [
 35      "import pandas as pd\n",
 36      "import openai\n",
 37      "\n",
 38      "from evidently.descriptors import LLMEval\n",
 39      "from evidently.llm.templates import BinaryClassificationPromptTemplate\n",
 40      "from evidently.llm.optimization import PromptOptimizer, PromptExecutionLog, Params"
 41     ],
 42     "outputs": [],
 43     "execution_count": null
 44    },
 45    {
 46     "cell_type": "markdown",
 47     "id": "e62d754e",
 48     "metadata": {},
 49     "source": [
 50      "## Define a Tweet Generation Function"
 51     ]
 52    },
 53    {
 54     "cell_type": "code",
 55     "id": "6103a03b",
 56     "metadata": {},
 57     "source": [
 58      "def basic_tweet_generation(topic, model=\"gpt-3.5-turbo\", instructions=\"\"):\n",
 59      "    response = openai.chat.completions.create(\n",
 60      "        model=model,\n",
 61      "        messages=[\n",
 62      "            {\"role\": \"system\", \"content\": instructions},\n",
 63      "            {\"role\": \"user\", \"content\": f\"Write a short paragraph about {topic}\"}\n",
 64      "        ]\n",
 65      "    )\n",
 66      "    return response.choices[0].message.content"
 67     ],
 68     "outputs": [],
 69     "execution_count": null
 70    },
 71    {
 72     "cell_type": "markdown",
 73     "id": "4d4cf710",
 74     "metadata": {},
 75     "source": [
 76      "## Define a Tweet Quality Judge"
 77     ]
 78    },
 79    {
 80     "cell_type": "code",
 81     "id": "2d7f0b76",
 82     "metadata": {},
 83     "source": [
 84      "tweet_quality = BinaryClassificationPromptTemplate(\n",
 85      "    pre_messages=[(\"system\", \"You are evaluating the quality of tweets\")],\n",
 86      "    criteria=\"\"\"\n",
 87      "Text is ENGAGING if it meets at least one of the following:\n",
 88      "  • Strong hook (question, surprise, bold statement)\n",
 89      "  • Uses emotion, humor, or opinion\n",
 90      "  • Encourages interaction\n",
 91      "  • Shows personality or distinct tone\n",
 92      "  • Includes vivid language or emojis\n",
 93      "  • Sparks curiosity or insight\n",
 94      "\n",
 95      "Text is NEUTRAL if it lacks these qualities.\n",
 96      "\"\"\",\n",
 97      "    target_category=\"ENGAGING\",\n",
 98      "    non_target_category=\"NEUTRAL\",\n",
 99      "    uncertainty=\"non_target\",\n",
100      "    include_reasoning=True,\n",
101      ")\n",
102      "\n",
103      "judge = LLMEval(\"basic_tweet_generation.result\", template=tweet_quality,\n",
104      "                provider=\"openai\", model=\"gpt-4o-mini\", alias=\"Tweet quality\")\n"
105     ],
106     "outputs": [],
107     "execution_count": null
108    },
109    {
110     "cell_type": "markdown",
111     "id": "a9b49b50",
112     "metadata": {},
113     "source": [
114      "## Define a Prompt Execution Function"
115     ]
116    },
117    {
118     "cell_type": "code",
119     "id": "78245abf",
120     "metadata": {},
121     "source": [
122      "def run_prompt(generation_prompt: str, context) -> pd.Series:\n",
123      "    \"\"\"generate engaging tweets\"\"\"\n",
124      "    my_topics = [\n",
125      "        \"testing in AI engineering is as important as in development\",\n",
126      "        \"CI/CD is applicable in AI\",\n",
127      "        \"Collaboration of subject matter experts and AI engineers improves product\",\n",
128      "        \"Start LLM apps development from test cases generation\",\n",
129      "        \"evidently is a great tool for LLM testing\"\n",
130      "    ]\n",
131      "    tweets = [basic_tweet_generation(topic, model=\"gpt-3.5-turbo\", instructions=generation_prompt) for topic in my_topics * 3]\n",
132      "    return pd.Series(tweets)"
133     ],
134     "outputs": [],
135     "execution_count": null
136    },
137    {
138     "cell_type": "markdown",
139     "id": "53c2281d",
140     "metadata": {},
141     "source": [
142      "## Run the Prompt Optimizer"
143     ]
144    },
145    {
146     "cell_type": "code",
147     "id": "6cbb4971",
148     "metadata": {},
149     "source": [
150      "optimizer = PromptOptimizer(\"tweet_gen_example\", strategy=\"feedback\", verbose=True)\n",
151      "await optimizer.arun(run_prompt, scorer=judge, base_prompt=\"You are tweet generator\", repetitions=5)\n",
152      "# sync version\n",
153      "# optimizer.run(run_prompt, scorer=judge)"
154     ],
155     "outputs": [],
156     "execution_count": null
157    },
158    {
159     "cell_type": "markdown",
160     "id": "ced639e2",
161     "metadata": {},
162     "source": [
163      "## View the Best Optimized Prompt"
164     ]
165    },
166    {
167     "cell_type": "code",
168     "id": "2cff870f",
169     "metadata": {},
170     "source": [
171      "print(optimizer.best_prompt())"
172     ],
173     "outputs": [],
174     "execution_count": null
175    },
176    {
177     "metadata": {},
178     "cell_type": "code",
179     "source": "optimizer.print_stats()",
180     "id": "b865c5caa2f1cb4c",
181     "outputs": [],
182     "execution_count": null
183    }
184   ],
185   "metadata": {
186    "kernelspec": {
187     "display_name": "Python 3 (ipykernel)",
188     "language": "python",
189     "name": "python3"
190    },
191    "language_info": {
192     "codemirror_mode": {
193      "name": "ipython",
194      "version": 3
195     },
196     "file_extension": ".py",
197     "mimetype": "text/x-python",
198     "name": "python",
199     "nbconvert_exporter": "python",
200     "pygments_lexer": "ipython3",
201     "version": "3.11.11"
202    }
203   },
204   "nbformat": 4,
205   "nbformat_minor": 5
206  }