Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.27 / haystack-api / audio_api.md
audio_api.md
  1  ---
  2  title: "Audio"
  3  id: audio-api
  4  description: "Transcribes audio files."
  5  slug: "/audio-api"
  6  ---
  7  
  8  
  9  ## whisper_local
 10  
 11  ### LocalWhisperTranscriber
 12  
 13  Transcribes audio files using OpenAI's Whisper model on your local machine.
 14  
 15  For the supported audio formats, languages, and other parameters, see the
 16  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
 17  [GitHub repository](https://github.com/openai/whisper).
 18  
 19  ### Usage example
 20  
 21  ```python
 22  from haystack.components.audio import LocalWhisperTranscriber
 23  
 24  whisper = LocalWhisperTranscriber(model="small")
 25  transcription = whisper.run(sources=["test/test_files/audio/answer.wav"])
 26  ```
 27  
 28  #### __init__
 29  
 30  ```python
 31  __init__(
 32      model: WhisperLocalModel = "large",
 33      device: ComponentDevice | None = None,
 34      whisper_params: dict[str, Any] | None = None,
 35  ) -> None
 36  ```
 37  
 38  Creates an instance of the LocalWhisperTranscriber component.
 39  
 40  **Parameters:**
 41  
 42  - **model** (<code>WhisperLocalModel</code>) – The name of the model to use. Set to one of the following models:
 43    "tiny", "base", "small", "medium", "large" (default).
 44    For details on the models and their modifications, see the
 45    [Whisper documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).
 46  - **device** (<code>ComponentDevice | None</code>) – The device for loading the model. If `None`, automatically selects the default device.
 47  
 48  #### warm_up
 49  
 50  ```python
 51  warm_up() -> None
 52  ```
 53  
 54  Loads the model in memory.
 55  
 56  #### to_dict
 57  
 58  ```python
 59  to_dict() -> dict[str, Any]
 60  ```
 61  
 62  Serializes the component to a dictionary.
 63  
 64  **Returns:**
 65  
 66  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
 67  
 68  #### from_dict
 69  
 70  ```python
 71  from_dict(data: dict[str, Any]) -> LocalWhisperTranscriber
 72  ```
 73  
 74  Deserializes the component from a dictionary.
 75  
 76  **Parameters:**
 77  
 78  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
 79  
 80  **Returns:**
 81  
 82  - <code>LocalWhisperTranscriber</code> – The deserialized component.
 83  
 84  #### run
 85  
 86  ```python
 87  run(
 88      sources: list[str | Path | ByteStream],
 89      whisper_params: dict[str, Any] | None = None,
 90  ) -> dict[str, Any]
 91  ```
 92  
 93  Transcribes a list of audio files into a list of documents.
 94  
 95  **Parameters:**
 96  
 97  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of paths or binary streams to transcribe.
 98  - **whisper_params** (<code>dict\[str, Any\] | None</code>) – For the supported audio formats, languages, and other parameters, see the
 99    [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
100    [GitHup repo](https://github.com/openai/whisper).
101  
102  **Returns:**
103  
104  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
105  - `documents`: A list of documents where each document is a transcribed audio file. The content of
106    the document is the transcription text, and the document's metadata contains the values returned by
107    the Whisper model, such as the alignment data and the path to the audio file used
108    for the transcription.
109  
110  #### transcribe
111  
112  ```python
113  transcribe(
114      sources: list[str | Path | ByteStream], **kwargs: Any
115  ) -> list[Document]
116  ```
117  
118  Transcribes the audio files into a list of Documents, one for each input file.
119  
120  For the supported audio formats, languages, and other parameters, see the
121  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
122  [github repo](https://github.com/openai/whisper).
123  
124  **Parameters:**
125  
126  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of paths or binary streams to transcribe.
127  
128  **Returns:**
129  
130  - <code>list\[Document\]</code> – A list of Documents, one for each file.
131  
132  ## whisper_remote
133  
134  ### RemoteWhisperTranscriber
135  
136  Transcribes audio files using the OpenAI's Whisper API.
137  
138  The component requires an OpenAI API key, see the
139  [OpenAI documentation](https://platform.openai.com/docs/api-reference/authentication) for more details.
140  For the supported audio formats, languages, and other parameters, see the
141  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text).
142  
143  ### Usage example
144  
145  ```python
146  from haystack.components.audio import RemoteWhisperTranscriber
147  
148  whisper = RemoteWhisperTranscriber(model="whisper-1")
149  transcription = whisper.run(sources=["test/test_files/audio/answer.wav"])
150  ```
151  
152  #### __init__
153  
154  ```python
155  __init__(
156      api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
157      model: str = "whisper-1",
158      api_base_url: str | None = None,
159      organization: str | None = None,
160      http_client_kwargs: dict[str, Any] | None = None,
161      **kwargs: Any
162  ) -> None
163  ```
164  
165  Creates an instance of the RemoteWhisperTranscriber component.
166  
167  **Parameters:**
168  
169  - **api_key** (<code>Secret</code>) – OpenAI API key.
170    You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
171    during initialization.
172  - **model** (<code>str</code>) – Name of the model to use. Currently accepts only `whisper-1`.
173  - **organization** (<code>str | None</code>) – Your OpenAI organization ID. See OpenAI's documentation on
174    [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
175  - **api_base_url** (<code>str | None</code>) – An optional URL to use as the API base. For details, see the
176    OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio).
177  - **http_client_kwargs** (<code>dict\[str, Any\] | None</code>) – A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
178    For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/#client).
179  - **kwargs** (<code>Any</code>) – Other optional parameters for the model. These are sent directly to the OpenAI
180    endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio) for more details.
181    Some of the supported parameters are:
182  - `language`: The language of the input audio.
183    Provide the input language in ISO-639-1 format
184    to improve transcription accuracy and latency.
185  - `prompt`: An optional text to guide the model's
186    style or continue a previous audio segment.
187    The prompt should match the audio language.
188  - `response_format`: The format of the transcript
189    output. This component only supports `json`.
190  - `temperature`: The sampling temperature, between 0
191    and 1. Higher values like 0.8 make the output more
192    random, while lower values like 0.2 make it more
193    focused and deterministic. If set to 0, the model
194    uses log probability to automatically increase the
195    temperature until certain thresholds are hit.
196  
197  #### to_dict
198  
199  ```python
200  to_dict() -> dict[str, Any]
201  ```
202  
203  Serializes the component to a dictionary.
204  
205  **Returns:**
206  
207  - <code>dict\[str, Any\]</code> – Dictionary with serialized data.
208  
209  #### from_dict
210  
211  ```python
212  from_dict(data: dict[str, Any]) -> RemoteWhisperTranscriber
213  ```
214  
215  Deserializes the component from a dictionary.
216  
217  **Parameters:**
218  
219  - **data** (<code>dict\[str, Any\]</code>) – The dictionary to deserialize from.
220  
221  **Returns:**
222  
223  - <code>RemoteWhisperTranscriber</code> – The deserialized component.
224  
225  #### run
226  
227  ```python
228  run(sources: list[str | Path | ByteStream]) -> dict[str, Any]
229  ```
230  
231  Transcribes the list of audio files into a list of documents.
232  
233  **Parameters:**
234  
235  - **sources** (<code>list\[str | Path | ByteStream\]</code>) – A list of file paths or `ByteStream` objects containing the audio files to transcribe.
236  
237  **Returns:**
238  
239  - <code>dict\[str, Any\]</code> – A dictionary with the following keys:
240  - `documents`: A list of documents, one document for each file.
241    The content of each document is the transcribed text.