Ingestion Routes

Use these endpoints to ingest papers from preprint servers and publishers into the Drylab database.

INGESTION ROUTES
POST /work/ingest // Ingest a single work from external sources
POST /works/ingest // Batch ingest multiple works

Ingest a single work

Ingests a single work from external sources. Requires an idempotency key.

POST https://api.drylab.bio/v1/work/ingest
curl -X POST https://api.drylab.bio/v1/work/ingest \
     -H "Authorization: Bearer <YOUR_API_KEY>" \
     -H "Content-Type: application/json" \
     -H "Idempotency-Key: ingest-medrxiv-2025.08.07.25333034" \
     -d '{"externalIdType": "medrxivId", "id": "2025.08.07.25333034"}'

Headers

Header
Type
Description

Authorization

string

Bearer <YOUR_API_KEY>

Content-Type

string

Must be application/json

Idempotency-Key

string

Required. Unique key for request deduplication

Request body

Field
Type
Description

externalIdType

string

One of the supported IDs (updated list here)

id

string

Identifier value matching the chosen type

Query parameters

Content expansion

Parameter
Type
Description

expand

string

Comma-separated list: sections, blocks, assets, citations, all.

Artifact download

These options allow the download of text corresponding to the Work and Version that were just ingested.

Parameter
Type
Description

download

enum

Optional. Artifact kind to retrieve: raw, minxml, plain.

saveTo

string (relative)

Optional. Save the artifact to this relative path instead of returning a URL. Requires download.

downloadExpiresIn

integer (30–3600)

Optional. Override pre-signed URL TTL (seconds). Applies only when download is set and no saveTo.

Validation rules for artifact download (correct use of query parameters):
  1. saveTo and downloadExpiresIn both require download to be present.

  2. downloadExpiresIn is rejected when combined with saveTo (since no pre-sign is emitted).

  3. saveTo must be a relative path with no .. segments.

Examples

# Ingest and presign raw artifact (default 300 s TTL)
curl -X POST "https://api.drylab.bio/v1/work/ingest?download=raw" \
     -H "Authorization: Bearer <YOUR_API_KEY>" \
     -H "Content-Type: application/json" \
     -H "Idempotency-Key: ingest-medrxiv-2025.08.07.25333034" \
     -d '{"externalIdType": "medrxivId", "id": "2025.08.07.25333034"}'

# Ingest and presign raw artifact with 15-minute expiry
curl -X POST "https://api.drylab.bio/v1/work/ingest?download=raw&downloadExpiresIn=900" \
     -H "Authorization: Bearer <YOUR_API_KEY>" \
     -H "Content-Type: application/json" \
     -H "Idempotency-Key: ingest-medrxiv-2025.08.07.25333034" \
     -d '{"externalIdType": "medrxivId", "id": "2025.08.07.25333034"}'

# Ingest and save minimal XML locally
curl -X POST "https://api.drylab.bio/v1/work/ingest?download=minxml&saveTo=artifacts/2025-08-07.min.xml.gz" \
     -H "Authorization: Bearer <YOUR_API_KEY>" \
     -H "Content-Type: application/json" \
     -H "Idempotency-Key: ingest-medrxiv-2025.08.07.25333034" \
     -d '{"externalIdType": "medrxivId", "id": "2025.08.07.25333034"}'

Response

SUCCESS (201) WITH PRE-SIGNED DOWNLOAD
{
  "status": "completed",
  "work": WorkCoreSchema,
  "data": {
    "sections": [
      /* present only if expand=sections */
    ],
    "blocks": [
      /* present only if expand=blocks */
    ],
    "assets": [
      /* present only if expand=assets */
    ],
    "citations": [
      /* present only if expand=citations */
    ]
  }
  "download": {
    "mode": "presigned",
    "url": "https://s3.drylab...X-Amz-Signature=...",
    "expires_in": 300
  }
}
SUCCESS (201), SAVED ARTIFACT
{
  "status": "completed",
  "work": WorkCoreSchema,
  "data": {
    "sections": [
      /* present only if expand=sections */
    ],
    "blocks": [
      /* present only if expand=blocks */
    ],
    "assets": [
      /* present only if expand=assets */
    ],
    "citations": [
      /* present only if expand=citations */
    ]
  },
  "download": {
    "mode": "saved",
    "path": "artifacts/2025-08-07.min.xml.gz",
    "size": 123456,
    "content_type": "application/gzip"
  }
}
PROCESSING (202)
{
    "status":"processing",
    "idempotency_key":"batch-4",
    "job_id":"9c56e23d-6e51-41b4-bed0-29ef1711aedd",
    "created_at":"2025-10-20T00:54:30.915Z"
}

Batch ingest works

Ingests multiple works in parallel. Requires an idempotency key.

POST https://api.drylab.bio/v1/works/ingest
 curl -X POST https://api.drylab.bio/v1/works/ingest \
      -H "Authorization: Bearer dl_3kwaw28Y6TGkFsLsd3QZqTbxvjha9CYav" \
      -H "Content-Type: application/json" \
      -H "Idempotency-Key: batch-ingest-medrxiv-2025.08.07.25333034" \
      -d '{
             "works": [
               {"externalIdType": "medrxivId", "id": "2025.08.07.25333034"},
               {"externalIdType": "biorxivId", "id": "2025.07.30.25330817"}
            ]
          }'

Headers

Header
Type
Description

Idempotency-Key

string

Required. Unique key for request deduplication

Content-Type

string

Must be application/json

Request body

Field
Type
Description

works

array

Array of work ingestion requests, "work objects" (1-100 items)

Each "work object":

Field
Type
Description

externalIdType

string

Identifier type. A list of those can be found here.

id

string

Identifier value matching the given type

Query parameters

Content expansion

Parameter
Type
Description

expand

string

Comma-separated fields to expand (sections, blocks, assets, citations, all)

Response

status reflects the overall batch outcome:

Status
Description

completed

All works processed successfully

partial

Some works succeeded, some failed

failed

All works failed to process

Each entry in results keeps its own status:

  • success entries carry the canonical work, optional data, and optional download.

  • error entries drop the work fields and include an error object instead.

  • The download block appears only when you pass download=; it will be "saved" (with path/size/content_type) when you use saveTo, or "pre-signed" (with url/expires_in) otherwise.

  • The data block appears only when expansions were requested using the expand query parameter.

SUCCESS (201)
{
  "status": "completed",
  "results": [
    {
      "index": 0,
      "status": "success",
      "work": {
        /* canonical WorkCoreSchema payload */
      },
      "data": {
        "sections": [ /* present only if expand=sections */ ],
        "blocks":   [ /* present only if expand=blocks   */ ],
        "assets":   [ /* present only if expand=assets   */ ],
        "citations":[ /* present only if expand=citations*/ ]
      },
      "download": {
        "mode": "saved",
        "path": "tmp/ingest/work-7334329443_v1_raw.xml.gz",
        "size": 123456,
        "content_type": "application/gzip"
      }
      // ...or, when presigned: { "mode": "presigned", "url": "...", "expires_in": 900 }
    },
  ]
}
PARTIAL SUCCESS (201)
{
  "status": "partial",
  "results": [
    {
      "index": 0,
      "status": "success",
      "work": {
        /* canonical WorkCoreSchema payload */
      },
      "data": {
        "sections": [ /* present only if expand=sections */ ],
        "blocks":   [ /* present only if expand=blocks   */ ],
        "assets":   [ /* present only if expand=assets   */ ],
        "citations":[ /* present only if expand=citations*/ ]
      },
      "download": {
        "mode": "saved",
        "path": "tmp/ingest/work-7334329443_v1_raw.xml.gz",
        "size": 123456,
        "content_type": "application/gzip"
      }
      // ...or, when presigned: { "mode": "presigned", "url": "...", "expires_in": 900 }
    },
    {
      "index": 1,
      "status": "error",
      "error": {
        "code": "work_not_found",
        "message": "Unable to locate work for external identifier.",
        "details": {
          "externalIdType": "doi",
          "id": "10.0000/example"
        }
      }
    }
  ]
}
PROCESSING (202)
{
    "status":"processing",
    "idempotency_key":"batch-4",
    "job_id":"9c56e23d-6e51-41b4-bed0-29ef1711aedd",
    "created_at":"2025-10-20T00:54:30.915Z"
}

Last updated