Design and implement a knowledge graph-backed pipeline for extracting insights from unstructured documents. Develop and maintain a scalable and reliable system for turning messy inputs into structured knowledge. Collaborate with a senior full-stack engineer to integrate the knowledge graph with the product's read path.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Nice to Have
Job Description
Pinnipedia is a new Berlin startup building a cloud platform that automates and assists the creation of audit-ready IT-security concepts (e.g., BSI-Grundschutz, C5). We’re IGP-funded (2025/26) and co-develop with FU Berlin and pilot users from industry and security consulting.
We’re hiring an AI Engineer to turn messy inputs into structured knowledge and reliable answers.
Your Mission -Own the end-to-end pipeline that turns unstructured documents into a validated, queryable knowledge graph. Accountable for extraction quality, graph integrity, and the data layer that backs the product's read path.
Tasks
• LLM extraction pipelines -document chunking, property and relationship extraction, cross-chunk reconciliation, gap detection. Built with structured-output LLM agents orchestrated by durable workflows.
• Knowledge graph -schema design as typed Pydantic models, Cypher access patterns and indexing strategy, graph operations, schema evolution and migration. Scope ends at the graph boundary: API contracts and query abstractions exposed to consumers belong to the full-stack engineer.
• Deterministic rule engines -table-driven evaluators for cases where code beats LLM judgment; clear contracts between deterministic and probabilistic components.
• Data validation & quality -schema enforcement, required-property contracts, audit trails, eval harnesses (expert review, unsupervised checks, synthetic fixtures, LLM-as-judge).
• Live data ops -backfills, coordinated migrations across relational + graph stores, observability on extraction throughput and quality, incident response.
Existing team boundaries
A senior full-stack engineer already owns FastAPI architecture, infrastructure, auth, the public API surface, and the query abstractions / repositories exposed to product code.
Looking to advance your Development & Programming career with relocation support? Explore Development & Programming Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.
This role owns everything from “document arrives” to “validated facts in the graph,” including the Cypher / graph access patterns and indexes those repositories sit on top of. It does not own:
• HTTP APIs, request/response schemas, or API versioning
• Application-level repository or query-builder abstractions
• Auth, infrastructure, deployment, or the FastAPI surface
Where the two roles meet (e.g. a new graph capability needs a new repository method), they collaborate -the AI/KG engineer specifies the graph contract and access pattern, the full-stack engineer owns how it’s exposed.
Requirements
Must-have
- 5+ years shipping data/AI systems to production with real customers -has been on-call for live pipelines and knows what breaks at 2am.
- Strong Python (typed, modern) and SQL. Comfortable with PostgreSQL under load.
- Production experience with at least one graph database (Neo4j preferred; Neptune, ArangoDB, TigerGraph acceptable) -schema design, query tuning, not toy use.
- Production LLM pipeline experience: structured output, agent orchestration, prompt and version management, evaluation frameworks. PydanticAI, LangChain, DSPy, or Instructor all welcome.
- Durable workflow orchestration in production (DBOS, Temporal, Airflow, Prefect, Dagster).
- Test-first discipline -integration tests against real datastores (Testcontainers or equivalent), not mock-heavy unit tests.
- Fluent English skills.
Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.
Nice-to-have
- Experience with regulated, compliance-driven, or standards-heavy extraction domains (legal, medical, financial, security/audit).
- Designed deterministic evaluators alongside LLM components and knows when to reach for which.
- Contributions to data contracts, schema governance, or ontology work.
- German language skills.
Interested in relocating to Germany? Check out our comprehensive Relocation Jobs in Germany page with detailed relocation packages and benefits.
Benefits
Remote, full-time with flexible scheduling. CET (Berlin) timezone availability expected.
Possibility of relocation if successfull work relationship is achieved after a period of time.
Competitive salary: 32.000–42.000 € base (premium for exceptional senior profiles).
Small, focused team; direct collaboration with the Product Owner and Full-Stack Engineer.
Modern tooling, real ownership, and a learning budget for role-relevant training.
Impact: help SMEs meet rising security requirements with less friction.
Apply on JOIN with your CV (PDF) and a short note (max 200 words) describing how you would design a KG-backed RAG pipeline (ontology scope, indexing, retrieval, and evaluation you’d use).
Process: 20-min intro → 90-min practical (graph modeling + retrieval evaluation) → 45-min team chat → references. We review applications within 5 business days.
Similar Jobs
Explore other opportunities that match your interests
impel-consultants
Reonic