Cradicle Explorer

/ docs-website / reference_versioned_docs / version-2.24 / haystack-api / audio_api.md
audio_api.md
  1  ---
  2  title: "Audio"
  3  id: audio-api
  4  description: "Transcribes audio files."
  5  slug: "/audio-api"
  6  ---
  7  
  8  <a id="whisper_local"></a>
  9  
 10  ## Module whisper\_local
 11  
 12  <a id="whisper_local.LocalWhisperTranscriber"></a>
 13  
 14  ### LocalWhisperTranscriber
 15  
 16  Transcribes audio files using OpenAI's Whisper model on your local machine.
 17  
 18  For the supported audio formats, languages, and other parameters, see the
 19  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
 20  [GitHub repository](https://github.com/openai/whisper).
 21  
 22  ### Usage example
 23  
 24  ```python
 25  from haystack.components.audio import LocalWhisperTranscriber
 26  
 27  whisper = LocalWhisperTranscriber(model="small")
 28  whisper.warm_up()
 29  transcription = whisper.run(sources=["test/test_files/audio/answer.wav"])
 30  ```
 31  
 32  <a id="whisper_local.LocalWhisperTranscriber.__init__"></a>
 33  
 34  #### LocalWhisperTranscriber.\_\_init\_\_
 35  
 36  ```python
 37  def __init__(model: WhisperLocalModel = "large",
 38               device: ComponentDevice | None = None,
 39               whisper_params: dict[str, Any] | None = None)
 40  ```
 41  
 42  Creates an instance of the LocalWhisperTranscriber component.
 43  
 44  **Arguments**:
 45  
 46  - `model`: The name of the model to use. Set to one of the following models:
 47  "tiny", "base", "small", "medium", "large" (default).
 48  For details on the models and their modifications, see the
 49  [Whisper documentation](https://github.com/openai/whisper?tab=readme-ov-file#available-models-and-languages).
 50  - `device`: The device for loading the model. If `None`, automatically selects the default device.
 51  
 52  <a id="whisper_local.LocalWhisperTranscriber.warm_up"></a>
 53  
 54  #### LocalWhisperTranscriber.warm\_up
 55  
 56  ```python
 57  def warm_up() -> None
 58  ```
 59  
 60  Loads the model in memory.
 61  
 62  <a id="whisper_local.LocalWhisperTranscriber.to_dict"></a>
 63  
 64  #### LocalWhisperTranscriber.to\_dict
 65  
 66  ```python
 67  def to_dict() -> dict[str, Any]
 68  ```
 69  
 70  Serializes the component to a dictionary.
 71  
 72  **Returns**:
 73  
 74  Dictionary with serialized data.
 75  
 76  <a id="whisper_local.LocalWhisperTranscriber.from_dict"></a>
 77  
 78  #### LocalWhisperTranscriber.from\_dict
 79  
 80  ```python
 81  @classmethod
 82  def from_dict(cls, data: dict[str, Any]) -> "LocalWhisperTranscriber"
 83  ```
 84  
 85  Deserializes the component from a dictionary.
 86  
 87  **Arguments**:
 88  
 89  - `data`: The dictionary to deserialize from.
 90  
 91  **Returns**:
 92  
 93  The deserialized component.
 94  
 95  <a id="whisper_local.LocalWhisperTranscriber.run"></a>
 96  
 97  #### LocalWhisperTranscriber.run
 98  
 99  ```python
100  @component.output_types(documents=list[Document])
101  def run(sources: list[str | Path | ByteStream],
102          whisper_params: dict[str, Any] | None = None)
103  ```
104  
105  Transcribes a list of audio files into a list of documents.
106  
107  **Arguments**:
108  
109  - `sources`: A list of paths or binary streams to transcribe.
110  - `whisper_params`: For the supported audio formats, languages, and other parameters, see the
111  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
112  [GitHup repo](https://github.com/openai/whisper).
113  
114  **Returns**:
115  
116  A dictionary with the following keys:
117  - `documents`: A list of documents where each document is a transcribed audio file. The content of
118  the document is the transcription text, and the document's metadata contains the values returned by
119  the Whisper model, such as the alignment data and the path to the audio file used
120  for the transcription.
121  
122  <a id="whisper_local.LocalWhisperTranscriber.transcribe"></a>
123  
124  #### LocalWhisperTranscriber.transcribe
125  
126  ```python
127  def transcribe(sources: list[str | Path | ByteStream],
128                 **kwargs) -> list[Document]
129  ```
130  
131  Transcribes the audio files into a list of Documents, one for each input file.
132  
133  For the supported audio formats, languages, and other parameters, see the
134  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text) and the official Whisper
135  [github repo](https://github.com/openai/whisper).
136  
137  **Arguments**:
138  
139  - `sources`: A list of paths or binary streams to transcribe.
140  
141  **Returns**:
142  
143  A list of Documents, one for each file.
144  
145  <a id="whisper_remote"></a>
146  
147  ## Module whisper\_remote
148  
149  <a id="whisper_remote.RemoteWhisperTranscriber"></a>
150  
151  ### RemoteWhisperTranscriber
152  
153  Transcribes audio files using the OpenAI's Whisper API.
154  
155  The component requires an OpenAI API key, see the
156  [OpenAI documentation](https://platform.openai.com/docs/api-reference/authentication) for more details.
157  For the supported audio formats, languages, and other parameters, see the
158  [Whisper API documentation](https://platform.openai.com/docs/guides/speech-to-text).
159  
160  ### Usage example
161  
162  ```python
163  from haystack.components.audio import RemoteWhisperTranscriber
164  
165  whisper = RemoteWhisperTranscriber(model="whisper-1")
166  transcription = whisper.run(sources=["test/test_files/audio/answer.wav"])
167  ```
168  
169  <a id="whisper_remote.RemoteWhisperTranscriber.__init__"></a>
170  
171  #### RemoteWhisperTranscriber.\_\_init\_\_
172  
173  ```python
174  def __init__(api_key: Secret = Secret.from_env_var("OPENAI_API_KEY"),
175               model: str = "whisper-1",
176               api_base_url: str | None = None,
177               organization: str | None = None,
178               http_client_kwargs: dict[str, Any] | None = None,
179               **kwargs)
180  ```
181  
182  Creates an instance of the RemoteWhisperTranscriber component.
183  
184  **Arguments**:
185  
186  - `api_key`: OpenAI API key.
187  You can set it with an environment variable `OPENAI_API_KEY`, or pass with this parameter
188  during initialization.
189  - `model`: Name of the model to use. Currently accepts only `whisper-1`.
190  - `organization`: Your OpenAI organization ID. See OpenAI's documentation on
191  [Setting Up Your Organization](https://platform.openai.com/docs/guides/production-best-practices/setting-up-your-organization).
192  - `api_base`: An optional URL to use as the API base. For details, see the
193  OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio).
194  - `http_client_kwargs`: A dictionary of keyword arguments to configure a custom `httpx.Client`or `httpx.AsyncClient`.
195  For more information, see the [HTTPX documentation](https://www.python-httpx.org/api/`client`).
196  - `kwargs`: Other optional parameters for the model. These are sent directly to the OpenAI
197  endpoint. See OpenAI [documentation](https://platform.openai.com/docs/api-reference/audio) for more details.
198  Some of the supported parameters are:
199  - `language`: The language of the input audio.
200    Provide the input language in ISO-639-1 format
201    to improve transcription accuracy and latency.
202  - `prompt`: An optional text to guide the model's
203    style or continue a previous audio segment.
204    The prompt should match the audio language.
205  - `response_format`: The format of the transcript
206    output. This component only supports `json`.
207  - `temperature`: The sampling temperature, between 0
208  and 1. Higher values like 0.8 make the output more
209  random, while lower values like 0.2 make it more
210  focused and deterministic. If set to 0, the model
211  uses log probability to automatically increase the
212  temperature until certain thresholds are hit.
213  
214  <a id="whisper_remote.RemoteWhisperTranscriber.to_dict"></a>
215  
216  #### RemoteWhisperTranscriber.to\_dict
217  
218  ```python
219  def to_dict() -> dict[str, Any]
220  ```
221  
222  Serializes the component to a dictionary.
223  
224  **Returns**:
225  
226  Dictionary with serialized data.
227  
228  <a id="whisper_remote.RemoteWhisperTranscriber.from_dict"></a>
229  
230  #### RemoteWhisperTranscriber.from\_dict
231  
232  ```python
233  @classmethod
234  def from_dict(cls, data: dict[str, Any]) -> "RemoteWhisperTranscriber"
235  ```
236  
237  Deserializes the component from a dictionary.
238  
239  **Arguments**:
240  
241  - `data`: The dictionary to deserialize from.
242  
243  **Returns**:
244  
245  The deserialized component.
246  
247  <a id="whisper_remote.RemoteWhisperTranscriber.run"></a>
248  
249  #### RemoteWhisperTranscriber.run
250  
251  ```python
252  @component.output_types(documents=list[Document])
253  def run(sources: list[str | Path | ByteStream])
254  ```
255  
256  Transcribes the list of audio files into a list of documents.
257  
258  **Arguments**:
259  
260  - `sources`: A list of file paths or `ByteStream` objects containing the audio files to transcribe.
261  
262  **Returns**:
263  
264  A dictionary with the following keys:
265  - `documents`: A list of documents, one document for each file.
266  The content of each document is the transcribed text.
267