Lead Data Engineer Job at Life Science Talent (København og omegn)

Lead Data Engineer

Life Science Talent

Location:
Denmark , København og omegn

Category:
IT - Software Development

Contract Type:
Not provided

Salary:

Not provided

Save Job

Apply Position

Job Description:

We're building the expert intelligence layer for scientific research: a knowledge graph that connects the world to leading experts based on publications & clinical trials in precise ontologies. You'll design pipelines that ingest millions of life-science records, shaping a graph of how scientific knowledge is modelled, enriched, & served. This is true green-fields work. Your decisions will lay the data foundations for our entire expert intelligence platform.

Job Responsibility:

Own data end-to-end, design & run data pipelines turning millions of scientific records into a knowledge graph
Implement precision entity resolution & enrichment, disambiguate & enrich experts from noisy data sources
Utilise LLM workflows where it makes sense, for entity extraction, relationship inference & quality validation
Develop vector embeddings & semantic search capabilities to power expert discovery & similarity matching
Model life-science entities & relationships, ontologies, author networks, publication & clinical trial metadata
Build graph & vector data access, performant, accessible, reliable, observable & testable data access
Move fast & ship value incrementally, done-and-iterating beats perfect-and-pending
Radiate intent & document your thinking openly, collaborating async-first in a hybrid environment
Lead when you're the expert, follow when someone else is, challenging assumptions when necessary
Use AI as a daily force multiplier across coding, schema design, debugging, optimisation & validation

Requirements:

Graph Databases: Neo4j, ArangoDB, Neptune
schema design, relationship modelling, query optimisation
Python Data Engineering: ETL development
pandas/polars
distributed processing with Spark or Dask
Entity Resolution: Deduplication, merging, enrichment across heterogeneous scientific data sources
AI-Assisted Data Extraction: LLM entity extraction, schema generation & quality validation
Vector Search: Experience with Pinecone, FAISS, Qdrant, or Weaviate
embeddings, hybrid retrieval
Workflow Orchestration: Robust, observable pipelines using Airflow or Dagster
Data Formats & Standards: Parquet, JSONL, RDF/Turtle
selecting formats for graph & semantic use cases
Embedding Models: Understanding of HuggingFace/OpenAI models, dimensionality tradeoffs & cost
Ownership mindset: Treat data & schemas as products powering multiple domains
Strategic evaluation: Choose tech aligned with our scale, latency expectations, & roadmap needs
Process engineering: Build reliable, repeatable & maintainable workflows
Cross-functional communication: Bridge product engineers & scientific domain teams
Comfort with scientific data realities: Deep rabbit holes of sprawling complexity

Nice to have:

Life Sciences familiarity: Publication, clinical trial, institutional, ontologies (MeSH, SNOMED, Gene Ontology)
Hands-on with scientific datasets: OpenAlex, PubMed/MEDLINE, ORCID, Semantic Scholar, ClinicalTrials.gov

Additional Information:

Job Posted:
January 05, 2026

Employment Type:

Fulltime

Work Type:

Hybrid work

Life Science Talent - All Job Offers

Job Link Share: