audio_api.md
1 --- 2 title: "Audio" 3 id: audio-api 4 description: "Transcribes audio files." 5 slug: "/audio-api" 6 --- 7 8 <a id="whisper_local"></a> 9 10 ## Module whisper\_local 11 12 <a id="whisper_local.LocalWhisperTranscriber"></a> 13 14 ### LocalWhisperTranscriber 15 16 Transcribes audio files using OpenAI's Whisper model on your local machine. 17 18 For the supported audio formats, languages, and other parameters, see the 19 [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper 20 [GitHub repository](https://github.com/openai/whisper). 21 22 ### Usage example 23 24 ```python 25 from haystack.components.audio import LocalWhisperTranscriber 26 27 whisper = LocalWhisperTranscriber(model="small") 28 whisper.warm_up() 29 transcription = whisper.run(sources=["test/test_files/audio/answer.wav"]) 30 ``` 31 32 <a id="whisper_local.LocalWhisperTranscriber.__init__"></a> 33 34 #### LocalWhisperTranscriber.\_\_init\_\_ 35 36 ```python 37 def __init__(model: WhisperLocalModel = "large", 38 device: ComponentDevice | None = None, 39 whisper_params: dict[str, Any] | None = None) 40 ``` 41 42 Creates an instance of the LocalWhisperTranscriber component. 43 44 **Arguments**: 45 46 - `model`: The name of the model to use. Set to one of the following models: 47 "tiny", "base", "small", "medium", "large" (default). 48 For details on the models and their modifications, see the 49 [Whisper documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). 50 - `device`: The device for loading the model. If `None`, automatically selects the default device. 51 52 <a id="whisper_local.LocalWhisperTranscriber.warm_up"></a> 53 54 #### LocalWhisperTranscriber.warm\_up 55 56 ```python 57 def warm_up() -> None 58 ``` 59 60 Loads the model in memory. 61 62 <a id="whisper_local.LocalWhisperTranscriber.to_dict"></a> 63 64 #### LocalWhisperTranscriber.to\_dict 65 66 ```python 67 def to_dict() -> dict[str, Any] 68 ``` 69 70 Serializes the component to a dictionary. 71 72 **Returns**: 73 74 Dictionary with serialized data. 75 76 <a id="whisper_local.LocalWhisperTranscriber.from_dict"></a> 77 78 #### LocalWhisperTranscriber.from\_dict 79 80 ```python 81 @classmethod 82 def from_dict(cls, data: dict[str, Any]) -> "LocalWhisperTranscriber" 83 ``` 84 85 Deserializes the component from a dictionary. 86 87 **Arguments**: 88 89 - `data`: The dictionary to deserialize from. 90 91 **Returns**: 92 93 The deserialized component. 94 95 <a id="whisper_local.LocalWhisperTranscriber.run"></a> 96 97 #### LocalWhisperTranscriber.run 98 99 ```python 100 @component.output_types(documents=list[Document]) 101 def run(sources: list[str | Path | ByteStream], 102 whisper_params: dict[str, Any] | None = None) 103 ``` 104 105 Transcribes a list of audio files into a list of documents. 106 107 **Arguments**: 108 109 - `sources`: A list of paths or binary streams to transcribe. 110 - `whisper_params`: For the supported audio formats, languages, and other parameters, see the 111 [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper 112 [GitHup repo](https://github.com/openai/whisper). 113 114 **Returns**: 115 116 A dictionary with the following keys: 117 - `documents`: A list of documents where each document is a transcribed audio file. The content of 118 the document is the transcription text, and the document's metadata contains the values returned by 119 the Whisper model, such as the alignment data and the path to the audio file used 120 for the transcription. 121 122 <a id="whisper_local.LocalWhisperTranscriber.transcribe"></a> 123 124 #### LocalWhisperTranscriber.transcribe 125 126 ```python 127 def transcribe(sources: list[str | Path | ByteStream], 128 **kwargs) -> list[Document] 129 ``` 130 131 Transcribes the audio files into a list of Documents, one for each input file. 132 133 For the supported audio formats, languages, and other parameters, see the 134 [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper 135 [github repo](https://github.com/openai/whisper). 136 137 **Arguments**: 138 139 - `sources`: A list of paths or binary streams to transcribe. 140 141 **Returns**: 142 143 A list of Documents, one for each file. 144 145 <a id="whisper_remote"></a> 146 147 ## Module whisper\_remote 148 149 <a id="whisper_remote.RemoteWhisperTranscriber"></a> 150 151 ### RemoteWhisperTranscriber 152 153 Transcribes audio files using the OpenAI's Whisper API. 154 155 The component requires an OpenAI API key, see the 156 [OpenAI documentation](https://platform.openai.com/docs/api-reference/authentication) for more details. 157 For the supported audio formats, languages, and other parameters, see the 158 [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text). 159 160 ### Usage example 161 162 ```python 163 from haystack.components.audio import RemoteWhisperTranscriber 164 165 whisper = RemoteWhisperTranscriber(model="whisper-1") 166 transcription = whisper.run(sources=["test/test_files/audio/answer.wav"]) 167 ``` 168 169 <a id="whisper_remote.RemoteWhisperTranscriber.__init__"></a> 170 171 #### RemoteWhisperTranscriber.\_\_init\_\_ 172 173 ```python 174 def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 175 model: str = "whisper-1", 176 api_base_url: str | None = None, 177 organization: str | None = None, 178 http_client_kwargs: dict[str, Any] | None = None, 179 **kwargs) 180 ``` 181 182 Creates an instance of the RemoteWhisperTranscriber component. 183 184 **Arguments**: 185 186 - `api_key`: OpenAI API key. 187 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 188 during initialization. 189 - `model`: Name of the model to use. Currently accepts only `whisper-1`. 190 - `organization`: Your OpenAI organization ID. See OpenAI's documentation on 191 [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 192 - `api_base`: An optional URL to use as the API base. For details, see the 193 OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio). 194 - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 195 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`). 196 - `kwargs`: Other optional parameters for the model. These are sent directly to the OpenAI 197 endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio) for more details. 198 Some of the supported parameters are: 199 - `language`: The language of the input audio. 200 Provide the input language in ISO-639-1 format 201 to improve transcription accuracy and latency. 202 - `prompt`: An optional text to guide the model's 203 style or continue a previous audio segment. 204 The prompt should match the audio language. 205 - `response_format`: The format of the transcript 206 output. This component only supports `json`. 207 - `temperature`: The sampling temperature, between 0 208 and 1. Higher values like 0.8 make the output more 209 random, while lower values like 0.2 make it more 210 focused and deterministic. If set to 0, the model 211 uses log probability to automatically increase the 212 temperature until certain thresholds are hit. 213 214 <a id="whisper_remote.RemoteWhisperTranscriber.to_dict"></a> 215 216 #### RemoteWhisperTranscriber.to\_dict 217 218 ```python 219 def to_dict() -> dict[str, Any] 220 ``` 221 222 Serializes the component to a dictionary. 223 224 **Returns**: 225 226 Dictionary with serialized data. 227 228 <a id="whisper_remote.RemoteWhisperTranscriber.from_dict"></a> 229 230 #### RemoteWhisperTranscriber.from\_dict 231 232 ```python 233 @classmethod 234 def from_dict(cls, data: dict[str, Any]) -> "RemoteWhisperTranscriber" 235 ``` 236 237 Deserializes the component from a dictionary. 238 239 **Arguments**: 240 241 - `data`: The dictionary to deserialize from. 242 243 **Returns**: 244 245 The deserialized component. 246 247 <a id="whisper_remote.RemoteWhisperTranscriber.run"></a> 248 249 #### RemoteWhisperTranscriber.run 250 251 ```python 252 @component.output_types(documents=list[Document]) 253 def run(sources: list[str | Path | ByteStream]) 254 ``` 255 256 Transcribes the list of audio files into a list of documents. 257 258 **Arguments**: 259 260 - `sources`: A list of file paths or `ByteStream` objects containing the audio files to transcribe. 261 262 **Returns**: 263 264 A dictionary with the following keys: 265 - `documents`: A list of documents, one document for each file. 266 The content of each document is the transcribed text. 267