Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.28 / haystack-api / audio_api.md
audio_api.md
  1  ---
  2  title: "Audio"
  3  id: audio-api
  4  description: "Transcribes audio files."
  5  slug: "/audio-api"
  6  ---
  7  
  8  
  9  ## whisper_local
 10  
 11  ### LocalWhisperTranscriber
 12  
 13  Transcribes audio files using OpenAI's Whisper model on your local machine.
 14  
 15  For the supported audio formats, languages, and other parameters, see the
 16  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
 17  [GitHub repository](https://github.com/openai/whisper).
 18  
 19  ### Usage example
 20  
 21  <!-- test-ignore -->
 22  
 23  ```python
 24  from haystack.components.audio import LocalWhisperTranscriber
 25  
 26  whisper = LocalWhisperTranscriber(model="small")
 27  transcription = whisper.run(sources=["test/test_files/audio/answer.wav"])
 28  ```
 29  
 30  #### __init__
 31  
 32  ```python
 33  __init__(
 34      model: WhisperLocalModel = "large",
 35      device: ComponentDevice | None = None,
 36      whisper_params: dict[str, Any] | None = None,
 37  ) -> None
 38  ```
 39  
 40  Creates an instance of the LocalWhisperTranscriber component.
 41  
 42  **Parameters:**
 43  
 44  - **model** (<code>WhisperLocalModel</code>) – The name of the model to use. Set to one of the following models:
 45    "tiny", "base", "small", "medium", "large" (default).
 46    For details on the models and their modifications, see the
 47    [Whisper documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).
 48  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
 49  
 50  #### warm_up
 51  
 52  ```python
 53  warm_up() -> None
 54  ```
 55  
 56  Loads the model in memory.
 57  
 58  #### to_dict
 59  
 60  ```python
 61  to_dict() -> dict[str, Any]
 62  ```
 63  
 64  Serializes the component to a dictionary.
 65  
 66  **Returns:**
 67  
 68  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 69  
 70  #### from_dict
 71  
 72  ```python
 73  from_dict(data: dict[str, Any]) -> LocalWhisperTranscriber
 74  ```
 75  
 76  Deserializes the component from a dictionary.
 77  
 78  **Parameters:**
 79  
 80  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
 81  
 82  **Returns:**
 83  
 84  - <code>LocalWhisperTranscriber</code> – The deserialized component.
 85  
 86  #### run
 87  
 88  ```python
 89  run(
 90      sources: list[str | Path | ByteStream],
 91      whisper_params: dict[str, Any] | None = None,
 92  ) -> dict[str, Any]
 93  ```
 94  
 95  Transcribes a list of audio files into a list of documents.
 96  
 97  **Parameters:**
 98  
 99  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of paths or binary streams to transcribe.
100  - **whisper_params** (<code>dict\[str, Any\] | None</code>) – For the supported audio formats, languages, and other parameters, see the
101    [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
102    [GitHup repo](https://github.com/openai/whisper).
103  
104  **Returns:**
105  
106  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
107  - `documents`: A list of documents where each document is a transcribed audio file. The content of
108    the document is the transcription text, and the document's metadata contains the values returned by
109    the Whisper model, such as the alignment data and the path to the audio file used
110    for the transcription.
111  
112  #### transcribe
113  
114  ```python
115  transcribe(
116      sources: list[str | Path | ByteStream], **kwargs: Any
117  ) -> list[Document]
118  ```
119  
120  Transcribes the audio files into a list of Documents, one for each input file.
121  
122  For the supported audio formats, languages, and other parameters, see the
123  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
124  [github repo](https://github.com/openai/whisper).
125  
126  **Parameters:**
127  
128  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of paths or binary streams to transcribe.
129  
130  **Returns:**
131  
132  - <code>list\[Document\]</code> – A list of Documents, one for each file.
133  
134  ## whisper_remote
135  
136  ### RemoteWhisperTranscriber
137  
138  Transcribes audio files using the OpenAI's Whisper API.
139  
140  The component requires an OpenAI API key, see the
141  [OpenAI documentation](https://platform.openai.com/docs/api-reference/authentication) for more details.
142  For the supported audio formats, languages, and other parameters, see the
143  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text).
144  
145  ### Usage example
146  
147  ```python
148  from haystack.components.audio import RemoteWhisperTranscriber
149  
150  whisper = RemoteWhisperTranscriber(model="whisper-1")
151  transcription = whisper.run(sources=["test/test_files/audio/answer.wav"])
152  ```
153  
154  #### __init__
155  
156  ```python
157  __init__(
158      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
159      model: str = "whisper-1",
160      api_base_url: str | None = None,
161      organization: str | None = None,
162      http_client_kwargs: dict[str, Any] | None = None,
163      **kwargs: Any
164  ) -> None
165  ```
166  
167  Creates an instance of the RemoteWhisperTranscriber component.
168  
169  **Parameters:**
170  
171  - **api_key** (<code>Secret</code>) – OpenAI API key.
172    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
173    during initialization.
174  - **model** (<code>str</code>) – Name of the model to use. Currently accepts only `whisper-1`.
175  - **organization** (<code>str | None</code>) – Your OpenAI organization ID. See OpenAI's documentation on
176    [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
177  - **api_base_url** (<code>str | None</code>) – An optional URL to use as the API base. For details, see the
178    OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio).
179  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
180    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
181  - **kwargs** (<code>Any</code>) – Other optional parameters for the model. These are sent directly to the OpenAI
182    endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio) for more details.
183    Some of the supported parameters are:
184  - `language`: The language of the input audio.
185    Provide the input language in ISO-639-1 format
186    to improve transcription accuracy and latency.
187  - `prompt`: An optional text to guide the model's
188    style or continue a previous audio segment.
189    The prompt should match the audio language.
190  - `response_format`: The format of the transcript
191    output. This component only supports `json`.
192  - `temperature`: The sampling temperature, between 0
193    and 1. Higher values like 0.8 make the output more
194    random, while lower values like 0.2 make it more
195    focused and deterministic. If set to 0, the model
196    uses log probability to automatically increase the
197    temperature until certain thresholds are hit.
198  
199  #### to_dict
200  
201  ```python
202  to_dict() -> dict[str, Any]
203  ```
204  
205  Serializes the component to a dictionary.
206  
207  **Returns:**
208  
209  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
210  
211  #### from_dict
212  
213  ```python
214  from_dict(data: dict[str, Any]) -> RemoteWhisperTranscriber
215  ```
216  
217  Deserializes the component from a dictionary.
218  
219  **Parameters:**
220  
221  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
222  
223  **Returns:**
224  
225  - <code>RemoteWhisperTranscriber</code> – The deserialized component.
226  
227  #### run
228  
229  ```python
230  run(sources: list[str | Path | ByteStream]) -> dict[str, Any]
231  ```
232  
233  Transcribes the list of audio files into a list of documents.
234  
235  **Parameters:**
236  
237  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of file paths or `ByteStream` objects containing the audio files to transcribe.
238  
239  **Returns:**
240  
241  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
242  - `documents`: A list of documents, one document for each file.
243    The content of each document is the transcribed text.