Upload API
Push your own audio into Audioscrape — interviews, conference talks, internal recordings, customer
calls — and have it transcribed, diarised, entity-extracted, and indexed alongside every podcast on
the platform. One endpoint, two intake modes: send the file directly as multipart/form-data,
or hand us a URL with application/json and we'll fetch it server-side.
Machine-readable spec: For the always-current OpenAPI definition, see the interactive Swagger UI or download /api/openapi.json.
The endpoint ¶
All uploads target a single route. The Content-Type header decides the intake mode:
multipart/form-data for raw bytes, application/json for URL-fetch.
{id} is the numeric ID of an upload-type dataset you own (or are a member of
via its parent workspace). Datasets created for podcast feeds will reject uploads — create a dataset
with content_type: "uploads" first. The authenticated caller must be a workspace member;
otherwise the API returns 404 (we deliberately don't disclose whether the dataset exists).
Multipart upload ¶
Use multipart when the audio lives on the machine making the request — browser file picker, a CLI
script streaming a local file, a desktop client. The file bytes travel in the audio form
field.
curl -X POST "https://www.audioscrape.com/api/datasets/123/items" \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "audio=@/path/to/interview.mp3" \
-F "title=Q3 customer interview — ACME Corp" \
-F "description=30-min discovery call with the ACME product team"
{
"success": true,
"item_id": 84217,
"message": "Audio file queued for transcription with priority 100"
}
Form fields
| Field | Type | Required | Description |
|---|---|---|---|
audio |
file | yes | The audio file. Default formats: mp3, wav, m4a, ogg, flac (configurable per dataset). |
title |
text | optional | Display title. Defaults to the filename without extension if omitted. |
description |
text | optional | Free-form description, surfaced in the item record and in search snippets. |
Datasets configured with require_metadata = true will reject uploads that omit either
title or description.
JSON URL upload ¶
Use the JSON variant when the audio is already hosted somewhere on the public internet — an S3
presigned URL, a CDN object, your own static server. Audioscrape downloads it, stores a copy on its own
object storage, and queues the same transcription pipeline. This mirrors what the
transcribe_audio MCP tool does under the hood, so the two surfaces stay at parity.
curl -X POST "https://www.audioscrape.com/api/datasets/123/items" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"audio_url": "https://files.example.com/recording.mp3",
"title": "Founder fireside — March 2026",
"description": "Recording from our internal all-hands"
}'
{
"success": true,
"item_id": 84218,
"message": "Audio fetched from URL and queued for transcription"
}
JSON body
| Field | Type | Required | Description |
|---|---|---|---|
audio_url |
string | yes | Publicly reachable https:// URL. See SSRF rules below. |
title |
string | optional | Display title. Defaults to the last path segment of the URL if omitted. |
description |
string | optional | Free-form description stored on the item. |
SSRF guards on audio_url
Because the server fetches whatever URL you provide, we apply strict filters to prevent the endpoint from being used as a reflector into internal infrastructure:
- Scheme must be
https— plainhttpand other schemes are rejected. - The hostname is resolved server-side; if any resolved IP is loopback, link-local, RFC 1918 private, broadcast, documentation, unspecified, or IPv6 unique-local / link-local, the request is rejected.
- Fetch has a 120-second timeout and is capped at the dataset's effective file-size limit.
Content-Lengthover the cap is rejected before the body is read. - HTTP status must be 2xx; non-success responses surface as an error to the caller.
- The file extension (derived from the URL path) must match the dataset's allowed-formats list, same as multipart.
Heads up
The URL only needs to be reachable at the moment of upload. We store a copy on our own object storage immediately, so transcription, playback, and search keep working even if the source URL later expires or 404s.
File size and format limits ¶
The effective size cap is the smaller of the dataset's max_file_size_mb setting and the
account-plan cap. Free-tier accounts are hard-capped at 200 MB per upload regardless
of dataset config; paid plans default to the dataset setting (500 MB out of the box, configurable
higher). Files exceeding the cap return HTTP 413.
Default accepted extensions: mp3, wav, m4a, ogg,
flac. Datasets can override this list in their settings.
Transcription minutes consumed by uploads count against your plan's monthly quota. See /pricing for current plan limits.
After upload ¶
A successful upload returns immediately with an item_id. The actual processing happens
asynchronously in the background:
- Storage. For URL uploads, the server fetches the audio (SSRF-checked) and stores it on S3-compatible object storage. Multipart uploads go straight to storage.
-
Pipeline enqueue. An
itemsrow is created with statusqueuedand a pipeline run is added topipeline_queue. Datasets withauto_transcribe = falsestop here. - Transcription. WhisperX (large-v3 with diarisation) runs on a dedicated GPU pool. Wall-clock time is roughly 1–3× real-time depending on queue depth.
- Speaker identification, entity extraction, search indexing. The item then runs through the same downstream stages as podcast episodes — speaker labels, named-entity extraction (people, organisations, places, products), and indexing into the search backends.
Poll the item to see status and grab the transcript once it's ready:
curl "https://www.audioscrape.com/api/items/84217" \
-H "Authorization: Bearer YOUR_API_KEY"
The item moves from queued through processing into the published state. Once
published, the transcript and segments are available via the standard transcript and search endpoints,
and the item appears in dataset listings under
GET /api/datasets/{id}/items.
Response fields
| Field | Type | Description |
|---|---|---|
success |
boolean | Always true on a 200 response. |
item_id |
integer | ID of the newly-created item. Use it with GET /api/items/{id}. |
message |
string | Human-readable status note (e.g. "queued for transcription"). |
Error responses
| Status | Meaning |
|---|---|
400 |
Invalid request body or URL (bad JSON, non-https scheme, unparseable URL). |
401 |
Missing or invalid API key. |
403 |
Caller is not a member of the dataset's parent library, or transcription quota exhausted. |
404 |
Dataset not found (also returned when caller lacks access — we don't disclose existence). |
413 |
File exceeds the per-upload size cap on this plan. |
415 |
Audio format not in the dataset's allowed-formats list. |
422 |
Audio URL unreachable, returned non-2xx, or resolved to a forbidden internal address. |