Libraries, Datasets, and Items API
Audioscrape organizes every piece of audio under a single canonical hierarchy: Library → Dataset → Item. A library is a curated workspace; a dataset is a grouping of related content inside that library; an item is one piece of audio — a podcast episode, an uploaded talk, a hearing recording, or any future format.
Items are universal. Each one has a source_type field
(podcast_episode, upload, youtube, ...)
so the same endpoints work across formats. The older
/api/podcasts and /api/episodes routes remain as
convenience aliases for the most common case, but new integrations should
prefer /api/items and /api/libraries — they
scale across content types.
Machine-readable spec: For the always-current OpenAPI definition, see the interactive Swagger UI or download /api/openapi.json.
List Libraries ¶
Returns every public library on Audioscrape, ordered by subscriber count. The
is_subscribed field on each entry tells you whether the calling
user is already a member.
curl "https://www.audioscrape.com/api/libraries" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"libraries": [
{
"slug": "lukas-library",
"name": "Lukas’ Library",
"description": "Tech, AI, and startup podcasts I follow.",
"curator": "Lukas Schmyrczyk",
"subscriber_count": 142,
"dataset_count": 7,
"episode_count": 1834,
"is_subscribed": true
}
],
"total": 1
}
Response Fields
| Field | Type | Description |
|---|---|---|
| slug | string | Stable URL slug, e.g. lukas-library |
| name | string | Human-readable library name |
| description | string | null | Optional library description |
| curator | string | null | Display name of the library owner |
| subscriber_count | integer | Number of users subscribed to the library |
| dataset_count | integer | Number of public datasets in the library |
| episode_count | integer | Sum of episodes across all public datasets |
| is_subscribed | boolean | Whether the calling user is a member of this library |
Get Library ¶
Returns one library plus the datasets the calling user can see. Public
datasets are always returned; workspace-scoped datasets are returned only
to members. Private libraries return 404 to non-members.
curl "https://www.audioscrape.com/api/libraries/lukas-library" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"slug": "lukas-library",
"name": "Lukas’ Library",
"description": "Tech, AI, and startup podcasts I follow.",
"curator": "Lukas Schmyrczyk",
"subscriber_count": 142,
"dataset_count": 7,
"episode_count": 1834,
"is_subscribed": true,
"datasets": [
{
"id": 12,
"slug": "ai-research",
"name": "AI Research Podcasts",
"description": "Long-form interviews with AI researchers.",
"visibility": "public",
"podcast_count": 14,
"episode_count": 412
}
]
}
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| slug | string | Library slug (e.g. lukas-library) |
Dataset Fields
| Field | Type | Description |
|---|---|---|
| id | integer | Numeric dataset ID (stable, used by /api/datasets/{id} and /api/items?dataset_id=) |
| slug | string | URL slug for the dataset |
| name | string | Human-readable dataset name |
| description | string | null | Optional dataset description |
| visibility | string | One of public, workspace, or private |
| podcast_count | integer | Number of podcasts represented in the dataset |
| episode_count | integer | Total items in the dataset |
Get Dataset ¶
Returns a dataset plus a reference to its parent library, in a single response. Saves a round-trip when you have a dataset ID from a search result or item record and need the surrounding context.
curl "https://www.audioscrape.com/api/datasets/12" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"id": 12,
"slug": "ai-research",
"name": "AI Research Podcasts",
"description": "Long-form interviews with AI researchers.",
"visibility": "public",
"podcast_count": 14,
"episode_count": 412,
"library_slug": "lukas-library",
"library_name": "Lukas’ Library"
}
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| id | integer | Numeric dataset ID |
List Items ¶
Universal listing across content types — podcast episodes, uploads, and
future formats. Filter by source_type, dataset_id,
or library (slug). Results are newest-first. Use this in place
of /api/episodes when your integration needs to work with
multiple content types.
curl "https://www.audioscrape.com/api/items?library=lukas-library&source_type=podcast_episode&limit=20" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"items": [
{
"id": 98765,
"title": "Scaling Laws for Language Models",
"slug": "scaling-laws-for-language-models",
"source_type": "podcast_episode",
"publish_date": "2026-05-14",
"duration_seconds": 4280,
"has_transcript": true,
"podcast": {
"id": 321,
"title": "AI Research Today",
"slug": "ai-research-today"
},
"dataset_id": 12
}
],
"total": 1,
"limit": 20,
"offset": 0
}
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| source_type | string | Filter by source type (podcast_episode, upload, youtube, ...). Omit for all types. |
| dataset_id | integer | Restrict to items in a specific dataset |
| library | string | Restrict to items in a specific library (by slug) |
| limit | integer | Max results (default 20, max 100) |
| offset | integer | Result offset for pagination |
Item Fields
| Field | Type | Description |
|---|---|---|
| id | integer | Stable numeric item ID |
| title | string | Item title |
| slug | string | URL slug |
| source_type | string | podcast_episode, upload, youtube, etc. |
| publish_date | string | null | Publish date in YYYY-MM-DD format |
| duration_seconds | integer | null | Item duration in seconds, when known |
| has_transcript | boolean | Whether a transcript is available for this item |
| podcast | object | null | Reference to the parent podcast (id, title, slug) when source_type = podcast_episode |
| dataset_id | integer | null | ID of the dataset this item belongs to, if any |
Quota Note
Every Libraries / Datasets / Items request counts against your data-call quota. See the pricing page for per-plan limits.
Get Item ¶
Returns one item plus a reference to its parent podcast (when applicable).
Heavy fields are opt-in via ?include= — pass
transcript, entities, or both to load them in the
same response. This avoids paying the cost of a 500 KB+ transcript
when you only need metadata. Equivalent to
GET /api/episodes/{id} for podcast episodes, but works
uniformly across all source types.
curl "https://www.audioscrape.com/api/items/98765?include=transcript,entities" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"episode": {
"id": 98765,
"title": "Scaling Laws for Language Models",
"slug": "scaling-laws-for-language-models",
"description": "A deep dive into compute-optimal training...",
"publish_date": "2026-05-14",
"duration_seconds": 4280,
"enclosure_url": "https://traffic.example.com/episode-98765.mp3"
},
"podcast": {
"id": 321,
"title": "AI Research Today",
"slug": "ai-research-today",
"image_url": "https://images.audioscrape.com/podcasts/321.jpg"
},
"transcript": {
"segments": [
{ "text": "Welcome back to the show.", "start": 0.0, "end": 2.4, "speaker": "SPEAKER_00" }
],
"speakers": ["SPEAKER_00", "SPEAKER_01"],
"total_segments": 1842
},
"entities": [
{ "name": "OpenAI", "entity_type": "organization", "slug": "openai", "mention_count": 12 }
]
}
Path & Query Parameters
| Parameter | In | Type | Description |
|---|---|---|---|
| item_id | path | string | Numeric item ID (works for any source_type) |
| include | query | string | Comma-separated optional fields: transcript, entities. Omit for metadata-only. |
Get Item Transcript ¶
Returns the transcript on its own — segments plus speaker list — for any item, regardless of source type. Use this when you already have item metadata and only need the transcript payload.
curl "https://www.audioscrape.com/api/items/98765/transcript" \
-H "Authorization: Bearer YOUR_API_KEY"
{
"segments": [
{ "text": "Welcome back to the show.", "start": 0.0, "end": 2.4, "speaker": "SPEAKER_00" },
{ "text": "Today we’re talking about scaling laws.", "start": 2.5, "end": 5.1, "speaker": "SPEAKER_00" }
],
"speakers": ["SPEAKER_00", "SPEAKER_01"],
"total_segments": 1842
}
Path Parameters
| Parameter | Type | Description |
|---|---|---|
| item_id | string | Numeric item ID |
Segment Fields
| Field | Type | Description |
|---|---|---|
| text | string | Transcribed text for the segment |
| start | number | Segment start time in seconds |
| end | number | Segment end time in seconds |
| speaker | string | null | Diarized speaker label, when available |