Building a Zero-Code DICOM Ingestion Pipeline with Orthanc

The Real Problem: Ingestion Is Not the Product

In medical imaging platforms, ingestion is often treated as a prerequisite rather than a core problem. Yet teams repeatedly invest months building custom pipelines just to move DICOM files from point A to point B.

A typical from-scratch ingestion pipeline ends up requiring filesystem watchers or DICOM SCP services, custom parsing logic that can tolerate malformed objects, vendor-specific edge-case handling, and explicit metadata extraction and normalization. On top of that, teams must design and migrate database schemas, implement idempotency and retry mechanisms, and continuously evolve the system as new modalities and workflows are introduced.

The result is a growing body of glue code that delivers no direct research or clinical value but demands constant attention.

The fundamental mistake is assuming ingestion must be implemented, when in reality it should be configured.

Reframing Ingestion as Configuration Rather Than Code

Orthanc changes the ingestion problem by shifting it out of the application layer and into configuration. Instead of writing software that understands DICOM, Orthanc provides a mature ingestion engine that already handles validation, parsing, normalization, and persistence.

Orthanc is an open-source, lightweight DICOM server and imaging middleware widely used in clinical, research, and vendor-neutral imaging workflows to receive, store, index, and expose medical imaging data.

The engineering task becomes one of describing how data should flow, not implementing how it flows.

When Orthanc is configured as the front door for imaging data, it assumes responsibility for receiving DICOM objects, organizing them according to the DICOM hierarchy, extracting metadata, and maintaining a consistent internal representation. This immediately eliminates the need for custom parsing logic and dramatically reduces the surface area for ingestion-related failures. The system behaves deterministically, even as data volumes grow or upstream sources change.

This approach is particularly valuable in research and clinical environments, where data heterogeneity is the norm rather than the exception.

Proposed Pipeline Architecture

In practice, the ingestion pipeline built around Orthanc is intentionally simple. DICOM data is acquired from upstream systems through whatever mechanisms are appropriate to the environment, such as PACS exports, treatment planning systems, or vendor deliveries. Rather than streaming directly into a custom service, these files are placed into a designated filesystem staging area that serves as the ingestion boundary.

Orthanc is configured to continuously scan this staging directory. As new files appear, Orthanc automatically reads them, validates them as DICOM objects, and parses their contents. Each object is classified into its appropriate position in the DICOM hierarchy, ensuring that patients, studies, series, and instances are consistently represented. Once parsed, metadata is written directly into the database backend.

From an architectural standpoint, this flow is powerful precisely because it is unremarkable. There are no bespoke ingestion workers, no custom queues, and no application-specific assumptions about data structure. The filesystem acts as a stable interface between acquisition and ingestion, while Orthanc handles everything downstream.

{
  "PostgreSQL": {
    "EnableIndex": true,
    "EnableStorage": false,
    "Host": "postgres",
    "Port": 5432,
    "Database": "DICOMDB",
    "Username": "orthanc",
    "EnableSsl": false
  },

  "Indexer": {
    "Enable": true,
    "Threads": 2,
    "Folders": [
      "/data/incoming"
    ],
    "Interval": 60,
    "Extensions": [ "dcm", "dicom" ]
  },

  "ExtraMainDicomTags": {
    "Patient": [
      "0010,1001",
      "0010,1010"
    ],
    "Study": [
      "0008,1048",
      "0032,4000"
    ],
    "Series": [
      "0018,1030",
      "0020,0052"
    ],
    "Instance": [
      "300A,0002",
      "300A,000A",
      "3004,000E"
    ]
  }
}

What These Configuration Sections Control (and Why They Matter)

The configuration excerpt above is intentionally small, but each section plays a critical role in enabling a zero-code ingestion pipeline. Together, they define where data comes from, how it is indexed, and what metadata is preserved—all without introducing application logic.

The Indexer section tells Orthanc where ingestion begins. Rather than receiving data exclusively over the DICOM network protocol, Orthanc is configured to continuously scan a filesystem directory and treat it as an ingestion source.

By specifying a fixed folder, scan interval, and allowed file extensions, the indexer establishes a clean boundary between data acquisition and ingestion. Any upstream system—whether a PACS export, treatment planning system, or batch delivery process—can populate this directory using standard file transfer mechanisms. Orthanc periodically scans the directory, detects new DICOM objects, validates them, and ingests them exactly once.

This approach avoids tight coupling between upstream systems and the ingestion engine. It also makes re-ingestion trivial: files can be reintroduced into the staging directory without modifying ingestion logic or database state. From an operational standpoint, this is significantly safer and easier to reason about than streaming files directly into a custom service.

The PostgreSQL section configures Orthanc to persist its metadata directly into a relational database. When enabled, PostgreSQL becomes the authoritative record of ingestion, rather than a downstream replica or reporting store.

Orthanc manages its own schema, relationships, and consistency guarantees internally. Each ingested DICOM object is mapped into the standard DICOM hierarchy—patients, studies, series, and instances—and written transactionally into the database. This eliminates the need for custom loaders, ORMs, or ETL pipelines whose sole purpose is to translate parsed metadata into relational form.

Importantly, Orthanc’s PostgreSQL backend mirrors the structure of the DICOM standard itself. This alignment ensures that downstream analytics, ETL processes, and research workflows operate on a faithful representation of the original imaging data, rather than a flattened or lossy abstraction.

The ExtraMainDicomTags section addresses one of the most common long-term problems in ingestion systems: deciding which metadata to keep.

By default, Orthanc extracts a core set of DICOM attributes. The ExtraMainDicomTags configuration allows additional tags to be explicitly declared at each level of the DICOM hierarchy—patient, study, series, and instance—without changing ingestion logic or database code.

This is especially important in research and clinical environments, where metadata requirements evolve over time. New modalities, treatment planning fields, or vendor-specific attributes often become relevant months or years after data is first ingested. By declaring these tags declaratively, metadata capture becomes additive rather than destructive.

The result is a system that preserves flexibility without sacrificing structure. Metadata decisions can evolve independently of ingestion mechanics, and the ingestion pipeline itself remains stable even as research questions change.

This design cleanly decouples ingestion from data acquisition. Any upstream system can populate the staging directory, re-ingestion is trivial, failures remain localized and observable, and raw data is never modified in place, preserving auditability.

From an operational perspective, this is far safer than streaming files directly into a custom service.

Direct Persistence into PostgreSQL Without Loader Code

One of the most significant advantages of using Orthanc as an ingestion engine is its native support for PostgreSQL as a metadata backend. With this configuration, Orthanc writes canonical records directly into PostgreSQL, creating a durable and queryable master index of all ingested data.

This eliminates the need for intermediary loaders, ORMs, or ETL scripts whose sole purpose is to translate parsed DICOM metadata into relational form. Orthanc manages schema creation, referential integrity, and consistency internally, and it does so in a way that mirrors the structure of the DICOM standard itself. Patients, studies, series, and instances are preserved as first-class entities rather than being flattened prematurely.

As a result, PostgreSQL becomes more than a storage target; it becomes the authoritative record of ingestion. Downstream systems can rely on it without worrying about partial writes, duplicate records, or mismatches between file state and database state.

This alignment dramatically simplifies downstream analytics and ETL.

Capturing Rich Metadata Without Committing to a Rigid Schema

A common failure mode in custom ingestion systems is committing too early to a fixed metadata schema. Teams decide in advance which tags are important, only to discover months later that critical information was discarded during ingestion. Recovering that data often requires reprocessing entire datasets, assuming the raw files are still available.

Orthanc avoids this problem by allowing additional DICOM tags to be declared explicitly in configuration. These tags are extracted and stored alongside the core metadata without requiring any changes to ingestion logic. This makes metadata capture an additive, low-risk operation rather than a disruptive engineering task.

In practice, this enables ingestion pipelines to evolve alongside research needs. New modalities, new treatment planning fields, or new clinical attributes can be incorporated simply by updating configuration, while the ingestion engine itself remains unchanged. The result is a system that preserves flexibility without sacrificing structure.

Operational Stability and Performance by Design

Beyond metadata handling, Orthanc’s configuration addresses operational concerns that are often underestimated in early pipeline designs. Concurrency limits, background job scheduling, indexing behavior, and storage management are all handled within the ingestion engine itself. This reduces the likelihood of subtle race conditions, partial ingestion states, or performance regressions under load.

Because these concerns are solved once within Orthanc, they do not need to be rediscovered and reimplemented in every environment. The ingestion pipeline behaves predictably even as data volume increases, which is essential in clinical and research settings where reliability matters more than raw throughput.

Containerized Deployment with Clear Boundaries

Deploying Orthanc as a containerized service further reinforces the separation between ingestion infrastructure and downstream systems. Orthanc runs as a self-contained service with well-defined network exposure, persistent storage, and access to the staging directory. The staging directory itself can be mounted read-only, ensuring that ingestion never modifies raw data.

This deployment model supports reproducibility across environments, simplifies security reviews, and makes operational responsibilities explicit. Most importantly, it ensures that ingestion behavior is controlled by configuration rather than environment-specific assumptions baked into code.

services:
  orthanc_dicom_ingestion_pipeline:
  image: jodogne/orthanc-plugins:latest
  container_name: orthanc_dicom_ingestion_pipeline_container
  command: Orthanc /etc/orthanc/orthanc.json
  ports:
  - "9110:9110"
  volumes:
  - ./config.json:/etc/orthanc/orthanc.json
  - ./db:/var/lib/orthanc/db
  - /mydata:/data/incoming:ro
  networks:
  - my_data_lake

networks:
  my_data_lake:
  external: true

When and Why to Introduce Custom Logic

Orthanc supports Lua scripting for cases where ingestion behavior truly needs to be customized, such as conditional routing, anonymization, or advanced metadata handling. However, this capability should be viewed as an extension mechanism rather than a starting point. The ingestion pipeline described here functions entirely without custom code, and that is a feature rather than a limitation.

Custom logic should be introduced only when there is a clearly articulated need, and even then, it can be isolated and reasoned about independently from the core ingestion path. This keeps the system understandable and maintainable over time.

Closing Thoughts

Building a DICOM ingestion pipeline from scratch is an expensive and unnecessary detour. Orthanc provides a mature, scalable ingestion engine that replaces months of custom development with a small number of well-considered configuration decisions.

When ingestion is declarative, reliable, and boring, engineering teams can focus on the problems that actually matter.

Vinith Raj

Engineering reliable systems that help people do meaningful work.