audio_api.md
1 --- 2 title: "Audio" 3 id: audio-api 4 description: "Transcribes audio files." 5 slug: "/audio-api" 6 --- 7 8 9 ## whisper_local 10 11 ### LocalWhisperTranscriber 12 13 Transcribes audio files using OpenAI's Whisper model on your local machine. 14 15 For the supported audio formats, languages, and other parameters, see the 16 [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper 17 [GitHub repository](https://github.com/openai/whisper). 18 19 ### Usage example 20 21 <!-- test-ignore --> 22 23 ```python 24 from haystack.components.audio import LocalWhisperTranscriber 25 26 whisper = LocalWhisperTranscriber(model="small") 27 transcription = whisper.run(sources=["test/test_files/audio/answer.wav"]) 28 ``` 29 30 #### __init__ 31 32 ```python 33 __init__( 34 model: WhisperLocalModel = "large", 35 device: ComponentDevice | None = None, 36 whisper_params: dict[str, Any] | None = None, 37 ) -> None 38 ``` 39 40 Creates an instance of the LocalWhisperTranscriber component. 41 42 **Parameters:** 43 44 - **model** (<code>WhisperLocalModel</code>) – The name of the model to use. Set to one of the following models: 45 "tiny", "base", "small", "medium", "large" (default). 46 For details on the models and their modifications, see the 47 [Whisper documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages). 48 - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device. 49 50 #### warm_up 51 52 ```python 53 warm_up() -> None 54 ``` 55 56 Loads the model in memory. 57 58 #### to_dict 59 60 ```python 61 to_dict() -> dict[str, Any] 62 ``` 63 64 Serializes the component to a dictionary. 65 66 **Returns:** 67 68 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 69 70 #### from_dict 71 72 ```python 73 from_dict(data: dict[str, Any]) -> LocalWhisperTranscriber 74 ``` 75 76 Deserializes the component from a dictionary. 77 78 **Parameters:** 79 80 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 81 82 **Returns:** 83 84 - <code>LocalWhisperTranscriber</code> – The deserialized component. 85 86 #### run 87 88 ```python 89 run( 90 sources: list[str | Path | ByteStream], 91 whisper_params: dict[str, Any] | None = None, 92 ) -> dict[str, Any] 93 ``` 94 95 Transcribes a list of audio files into a list of documents. 96 97 **Parameters:** 98 99 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of paths or binary streams to transcribe. 100 - **whisper_params** (<code>dict\[str, Any\] | None</code>) – For the supported audio formats, languages, and other parameters, see the 101 [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper 102 [GitHup repo](https://github.com/openai/whisper). 103 104 **Returns:** 105 106 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 107 - `documents`: A list of documents where each document is a transcribed audio file. The content of 108 the document is the transcription text, and the document's metadata contains the values returned by 109 the Whisper model, such as the alignment data and the path to the audio file used 110 for the transcription. 111 112 #### transcribe 113 114 ```python 115 transcribe( 116 sources: list[str | Path | ByteStream], **kwargs: Any 117 ) -> list[Document] 118 ``` 119 120 Transcribes the audio files into a list of Documents, one for each input file. 121 122 For the supported audio formats, languages, and other parameters, see the 123 [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper 124 [github repo](https://github.com/openai/whisper). 125 126 **Parameters:** 127 128 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of paths or binary streams to transcribe. 129 130 **Returns:** 131 132 - <code>list\[Document\]</code> – A list of Documents, one for each file. 133 134 ## whisper_remote 135 136 ### RemoteWhisperTranscriber 137 138 Transcribes audio files using the OpenAI's Whisper API. 139 140 The component requires an OpenAI API key, see the 141 [OpenAI documentation](https://platform.openai.com/docs/api-reference/authentication) for more details. 142 For the supported audio formats, languages, and other parameters, see the 143 [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text). 144 145 ### Usage example 146 147 ```python 148 from haystack.components.audio import RemoteWhisperTranscriber 149 150 whisper = RemoteWhisperTranscriber(model="whisper-1") 151 transcription = whisper.run(sources=["test/test_files/audio/answer.wav"]) 152 ``` 153 154 #### __init__ 155 156 ```python 157 __init__( 158 api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"), 159 model: str = "whisper-1", 160 api_base_url: str | None = None, 161 organization: str | None = None, 162 http_client_kwargs: dict[str, Any] | None = None, 163 **kwargs: Any 164 ) -> None 165 ``` 166 167 Creates an instance of the RemoteWhisperTranscriber component. 168 169 **Parameters:** 170 171 - **api_key** (<code>Secret</code>) – OpenAI API key. 172 You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter 173 during initialization. 174 - **model** (<code>str</code>) – Name of the model to use. Currently accepts only `whisper-1`. 175 - **organization** (<code>str | None</code>) – Your OpenAI organization ID. See OpenAI's documentation on 176 [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization). 177 - **api_base_url** (<code>str | None</code>) – An optional URL to use as the API base. For details, see the 178 OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio). 179 - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`. 180 For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client). 181 - **kwargs** (<code>Any</code>) – Other optional parameters for the model. These are sent directly to the OpenAI 182 endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio) for more details. 183 Some of the supported parameters are: 184 - `language`: The language of the input audio. 185 Provide the input language in ISO-639-1 format 186 to improve transcription accuracy and latency. 187 - `prompt`: An optional text to guide the model's 188 style or continue a previous audio segment. 189 The prompt should match the audio language. 190 - `response_format`: The format of the transcript 191 output. This component only supports `json`. 192 - `temperature`: The sampling temperature, between 0 193 and 1. Higher values like 0.8 make the output more 194 random, while lower values like 0.2 make it more 195 focused and deterministic. If set to 0, the model 196 uses log probability to automatically increase the 197 temperature until certain thresholds are hit. 198 199 #### to_dict 200 201 ```python 202 to_dict() -> dict[str, Any] 203 ``` 204 205 Serializes the component to a dictionary. 206 207 **Returns:** 208 209 - <code>dict\[str, Any\]</code> – Dictionary with serialized data. 210 211 #### from_dict 212 213 ```python 214 from_dict(data: dict[str, Any]) -> RemoteWhisperTranscriber 215 ``` 216 217 Deserializes the component from a dictionary. 218 219 **Parameters:** 220 221 - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from. 222 223 **Returns:** 224 225 - <code>RemoteWhisperTranscriber</code> – The deserialized component. 226 227 #### run 228 229 ```python 230 run(sources: list[str | Path | ByteStream]) -> dict[str, Any] 231 ``` 232 233 Transcribes the list of audio files into a list of documents. 234 235 **Parameters:** 236 237 - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of file paths or `ByteStream` objects containing the audio files to transcribe. 238 239 **Returns:** 240 241 - <code>dict\[str, Any\]</code> – A dictionary with the following keys: 242 - `documents`: A list of documents, one document for each file. 243 The content of each document is the transcribed text.