We are seeking a Principal Data Engineer with deep Spark expertise to architect and scale the data backbone behind cutting-edge AI-driven systems. The successful candidate will design and evolve distributed, cloud-based data infrastructure and build high-performance data pipelines. This is a remote opportunity with a VC-backed conversational AI scale-up.
Key Highlights
Technical Skills Required
Benefits & Perks
Job Description
Principal Data Engineer
π Remote β USA
A VC-backed conversational AI scale-up is expanding its engineering team and is looking for a Principal Data Engineer with deep Spark expertise to help architect and scale the data backbone behind cutting-edge AI-driven systems.
What Youβll Do
- Design and evolve distributed, cloud-based data infrastructure that supports both real-time and batch processing at scale.
- Build high-performance data pipelines that power analytics, AI/ML workloads, and integrations with third-party platforms.
- Champion data reliability, quality, and observability, introducing automation and monitoring across pipelines.
- Collaborate closely with engineering, product, and AI teams to deliver data solutions for business-critical initiatives.
What Weβre Looking For
- 5+ years in software development and data engineering with ownership of production-grade systems.
- Proven expertise in Spark/PySpark
- Strong knowledge of distributed computing and modern data modeling approaches.
- Solid programming skills in Python, with an emphasis on clean, maintainable code.
- Hands-on experience with SQL and NoSQL databases (e.g., PostgreSQL, DynamoDB, Cassandra).
- Excellent communicator who can influence and partner across teams.
Bonus Points
- Experience in high-growth, early-stage environments.
- Familiarity with MLOps and deploying ML models into production data workflows.
- A problem-solver at heart, excited by innovation and complex challenges.
Fully remote, great equity and the chance to join a rocketship available here. If you'd like to find out more, don't hesitate to apply!