Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.26 / haystack-api / audio_api.md
audio_api.md
  1  ---
  2  title: "Audio"
  3  id: audio-api
  4  description: "Transcribes audio files."
  5  slug: "/audio-api"
  6  ---
  7  
  8  
  9  ## whisper_local
 10  
 11  ### LocalWhisperTranscriber
 12  
 13  Transcribes audio files using OpenAI's Whisper model on your local machine.
 14  
 15  For the supported audio formats, languages, and other parameters, see the
 16  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
 17  [GitHub repository](https://github.com/openai/whisper).
 18  
 19  ### Usage example
 20  
 21  ```python
 22  from haystack.components.audio import LocalWhisperTranscriber
 23  
 24  whisper = LocalWhisperTranscriber(model="small")
 25  transcription = whisper.run(sources=["test/test_files/audio/answer.wav"])
 26  ```
 27  
 28  #### __init__
 29  
 30  ```python
 31  __init__(
 32      model: WhisperLocalModel = "large",
 33      device: ComponentDevice | None = None,
 34      whisper_params: dict[str, Any] | None = None,
 35  )
 36  ```
 37  
 38  Creates an instance of the LocalWhisperTranscriber component.
 39  
 40  **Parameters:**
 41  
 42  - **model** (<code>WhisperLocalModel</code>) – The name of the model to use. Set to one of the following models:
 43    "tiny", "base", "small", "medium", "large" (default).
 44    For details on the models and their modifications, see the
 45    [Whisper documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).
 46  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
 47  
 48  #### warm_up
 49  
 50  ```python
 51  warm_up() -> None
 52  ```
 53  
 54  Loads the model in memory.
 55  
 56  #### to_dict
 57  
 58  ```python
 59  to_dict() -> dict[str, Any]
 60  ```
 61  
 62  Serializes the component to a dictionary.
 63  
 64  **Returns:**
 65  
 66  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 67  
 68  #### from_dict
 69  
 70  ```python
 71  from_dict(data: dict[str, Any]) -> LocalWhisperTranscriber
 72  ```
 73  
 74  Deserializes the component from a dictionary.
 75  
 76  **Parameters:**
 77  
 78  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
 79  
 80  **Returns:**
 81  
 82  - <code>LocalWhisperTranscriber</code> – The deserialized component.
 83  
 84  #### run
 85  
 86  ```python
 87  run(
 88      sources: list[str | Path | ByteStream],
 89      whisper_params: dict[str, Any] | None = None,
 90  )
 91  ```
 92  
 93  Transcribes a list of audio files into a list of documents.
 94  
 95  **Parameters:**
 96  
 97  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of paths or binary streams to transcribe.
 98  - **whisper_params** (<code>dict\[str, Any\] | None</code>) – For the supported audio formats, languages, and other parameters, see the
 99    [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
100    [GitHup repo](https://github.com/openai/whisper).
101  
102  **Returns:**
103  
104  - – A dictionary with the following keys:
105  - `documents`: A list of documents where each document is a transcribed audio file. The content of
106    the document is the transcription text, and the document's metadata contains the values returned by
107    the Whisper model, such as the alignment data and the path to the audio file used
108    for the transcription.
109  
110  #### transcribe
111  
112  ```python
113  transcribe(
114      sources: list[str | Path | ByteStream],
115      **kwargs: list[str | Path | ByteStream]
116  ) -> list[Document]
117  ```
118  
119  Transcribes the audio files into a list of Documents, one for each input file.
120  
121  For the supported audio formats, languages, and other parameters, see the
122  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
123  [github repo](https://github.com/openai/whisper).
124  
125  **Parameters:**
126  
127  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of paths or binary streams to transcribe.
128  
129  **Returns:**
130  
131  - <code>list\[Document\]</code> – A list of Documents, one for each file.
132  
133  ## whisper_remote
134  
135  ### RemoteWhisperTranscriber
136  
137  Transcribes audio files using the OpenAI's Whisper API.
138  
139  The component requires an OpenAI API key, see the
140  [OpenAI documentation](https://platform.openai.com/docs/api-reference/authentication) for more details.
141  For the supported audio formats, languages, and other parameters, see the
142  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text).
143  
144  ### Usage example
145  
146  ```python
147  from haystack.components.audio import RemoteWhisperTranscriber
148  
149  whisper = RemoteWhisperTranscriber(model="whisper-1")
150  transcription = whisper.run(sources=["test/test_files/audio/answer.wav"])
151  ```
152  
153  #### __init__
154  
155  ```python
156  __init__(
157      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
158      model: str = "whisper-1",
159      api_base_url: str | None = None,
160      organization: str | None = None,
161      http_client_kwargs: dict[str, Any] | None = None,
162      **kwargs: dict[str, Any] | None
163  )
164  ```
165  
166  Creates an instance of the RemoteWhisperTranscriber component.
167  
168  **Parameters:**
169  
170  - **api_key** (<code>Secret</code>) – OpenAI API key.
171    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
172    during initialization.
173  - **model** (<code>str</code>) – Name of the model to use. Currently accepts only `whisper-1`.
174  - **organization** (<code>str | None</code>) – Your OpenAI organization ID. See OpenAI's documentation on
175    [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
176  - **api_base_url** (<code>str | None</code>) – An optional URL to use as the API base. For details, see the
177    OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio).
178  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
179    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
180  - **kwargs** – Other optional parameters for the model. These are sent directly to the OpenAI
181    endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio) for more details.
182    Some of the supported parameters are:
183  - `language`: The language of the input audio.
184    Provide the input language in ISO-639-1 format
185    to improve transcription accuracy and latency.
186  - `prompt`: An optional text to guide the model's
187    style or continue a previous audio segment.
188    The prompt should match the audio language.
189  - `response_format`: The format of the transcript
190    output. This component only supports `json`.
191  - `temperature`: The sampling temperature, between 0
192    and 1. Higher values like 0.8 make the output more
193    random, while lower values like 0.2 make it more
194    focused and deterministic. If set to 0, the model
195    uses log probability to automatically increase the
196    temperature until certain thresholds are hit.
197  
198  #### to_dict
199  
200  ```python
201  to_dict() -> dict[str, Any]
202  ```
203  
204  Serializes the component to a dictionary.
205  
206  **Returns:**
207  
208  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
209  
210  #### from_dict
211  
212  ```python
213  from_dict(data: dict[str, Any]) -> RemoteWhisperTranscriber
214  ```
215  
216  Deserializes the component from a dictionary.
217  
218  **Parameters:**
219  
220  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
221  
222  **Returns:**
223  
224  - <code>RemoteWhisperTranscriber</code> – The deserialized component.
225  
226  #### run
227  
228  ```python
229  run(sources: list[str | Path | ByteStream])
230  ```
231  
232  Transcribes the list of audio files into a list of documents.
233  
234  **Parameters:**
235  
236  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of file paths or `ByteStream` objects containing the audio files to transcribe.
237  
238  **Returns:**
239  
240  - – A dictionary with the following keys:
241  - `documents`: A list of documents, one document for each file.
242    The content of each document is the transcribed text.