What will you do
Build and operate the data pipelines that pull data from Cape.io’s departmental databases, product platforms, and third-party systems (NetSuite, HubSpot, Zendesk) into a unified platform.
Establish the data quality and deduplication patterns that make the platform trustworthy: entity resolution across overlapping source systems, and clear provenance for every dataset.
Set up lineage and cataloguing so that any dataset consumed downstream is traceable and documented.
Partner with our AI/ML engineers to shape how data is structured, versioned, and served to AI agents and models.
Partner with product engineers to model and integrate data from Cape.io’s product platforms.
Use AI-assisted tooling (LLM code generation, schema mapping, validation) as part of how you work. We expect this, not just permit it.
What will you bring
3–5 years building production data pipelines. Doesn’t need to be at massive scale; it needs to be systems real people depended on.
Hands-on experience across the full modern data stack: orchestration (Airflow, Dagster, or Prefect), transformation (dbt), and warehousing (Snowflake, BigQuery, or Databricks). You don’t need to be an expert in all three, but you need to have worked with each.
Solid Python and SQL. You write code other people will need to read.
Experience pulling data from SaaS and business systems (NetSuite, HubSpot, Zendesk, Salesforce, or equivalent) via APIs, connectors, or event streams.
Hands-on experience with AWS or GCP. We use both.
A demonstrable habit of using AI tools (Cursor, Claude, Copilot, or similar) in your day-to-day engineering work.
A visible learning footprint we can look at: GitHub, a blog, a talk, or projects you can point us to.
Eligible to work in the UK or the Netherlands. We do not sponsor visas.
Strong written and verbal communication. You’ll be holding the central data layer together, which means explaining it to people who don’t think in pipelines.
(bonus) Media, advertising, or adtech experience, especially the fragmented data typical of media operations across markets.
(bonus) MDM, data lineage, or cataloguing tooling (OpenLineage, DataHub, Atlan, or similar) used in a serious role.
(bonus) Understanding of how AI systems consume data: vector databases, embeddings, feature stores, and RAG patterns.
(bonus) Real-time or event-driven architectures (Kafka, Pub/Sub, streaming pipelines).
(bonus) Familiarity with media-specific systems (traffic, clearance and compliance, creative asset management, campaign management).
About the data platform
The challenge: Cape.io operates in 100+ countries. Our data lives across departmental databases, product platforms (creative automation, compliance-and-clearance, distribution), and third-party systems like NetSuite, HubSpot, and Zendesk. Right now, nobody owns unifying it.
Our solution: We’re building the first proper data platform at Cape.io from the ground up — one that ingests data from dozens of sources, deduplicates and governs it, keeps a clear provenance chain, and makes the result accessible to the AI agents, analytics workflows, and downstream teams that depend on it.
Your impact: You’ll be the only person on the central data layer, partnering with our AI/ML engineers on how the data is consumed and with product engineers on the product-side data. This is an individual contributor role today. If the team grows, it has a credible path into a lead position. The media and advertising industry is characterised by exceptionally noisy, fragmented, and inconsistent data across systems and markets — experience with that complexity is a bonus.
Location
London or Amsterdam or Tilburg
Employment
Full Time
Team
Product
























