AI SDET (Software Development Engineer in Test) - Cloud-Native SaaS Platform

Caseware Colombia
Remote Visa Sponsorship Relocation
Apply
AI Summary

As an AI SDET, pioneer and scale AI-driven testing practices from the ground up, contributing to reliable, safe, and high-performing AI capabilities across the organization. Contribute to reducing deployment risks, minimizing hallucinations and drift, and ensuring ethical AI. Drive faster releases and improve product trust, compliance, and innovation for end users.

Key Highlights
Pioneer and scale AI-driven testing practices
Contribute to reducing deployment risks and ensuring ethical AI
Drive faster releases and improve product trust, compliance, and innovation
Key Responsibilities
Evolve a modern, AI-first quality strategy for our fast-scaling SaaS architecture
Integrate AI enhancements into CI/CD pipelines
Establish scalable testing practices that support hyper-growth and petabyte-scale AI data pipelines
Design deterministic and statistical testing approaches for non-deterministic LLM-based and agentic systems
Build automated evaluation pipelines and harnesses for correctness, faithfulness, retrieval quality, generation accuracy, tool-calling, planning sequences, and multi-agent flows
Execute/Develop test frameworks for the full AI lifecycle: prompts, datasets, embeddings, model versions, RAG pipelines (end-to-end validation), and guardrails
Implement red-teaming, bias/fairness checks, and compliance mechanisms; leverage in trend frameworks for metrics and observability
Integrate AI-specific quality signals into CI/CD for automated gating and continuous monitoring
Partner closely with product, data science, AI engineering, and dev teams to test AI features, conduct multi-agent simulations, and ensure high-quality roadmap delivery
Facilitate knowledge sharing and upskilling on AI testing best practices across the Quality Function
Drive core metrics (DORA, test coverage/effectiveness) plus AI-specific indicators (e.g., hallucination rate, context precision, drift detection)
Build real-time dashboards and support A/B testing of models with post-deployment monitoring
Champion a quality-first, ethical AI mindset organization-wide
Mentor SDET's, lead workshops on AI risks/validation, and influence design/deploy/incident processes
As a foundational hire, define roadmaps and best practices for sustainable AI quality assurance
Technical Skills Required
Automated testing infrastructure CI/CD (Jenkins/GitHub Actions) Test pyramid strategies (unit → E2E) Full-stack testing experience Collaboration with dev teams JS/TS Python Java AI evaluation frameworks (e.g., Ragas, DeepEval, LangChain/LangSmith/LangFuse) Performance, Stress and Load testing tools (K6, JMeter, Blazemeter) Observability (NewRelic) Statistical testing methods Red-teaming Ethical AI practices
Benefits & Perks
Fully remote position
Relocation package provided
Visa sponsorship available
Nice to Have
Knowledge of performance, Stress and Load testing tools like K6, JMeter, Blazemeter will be nice to have.

Job Description


We are at the forefront of AI adoption in our cloud-native SaaS platform, building intelligent, agentic features that transform how users interact with our product. As an AI SDET, you'll pioneer and scale AI-driven testing practices from the ground up—fast-tracking reliable, safe, and high-performing AI capabilities across the organization. You will contribute in areas to reduce deployment risks, minimize hallucinations and drift, ensure ethical AI, and drive faster releases (targeting 20-40% velocity gains through automated validations). This is a high-impact, foundational role in Platform Engineering's Quality function, where your work will directly influence product trust, compliance, and innovation for our end users.


📍 Location: This is a fully remote position located in Colombia.


You will be reporting to:

Jai Joshi


Contact:

Maira Russo - Senior Talent Acquisition Partner

What You’ll Be Doing

  • Quality & AI-First Mindset
  • Evolve a modern, AI-first quality strategy for our fast-scaling SaaS architecture, including foundational infrastructure and emerging agentic/intelligent systems.
  • Integrate AI enhancements into CI/CD pipelines (e.g., predictive flakiness detection, automated test generation, self-healing scripts) to improve isolation, data setup, & execution reliability using existing/suggesting tools.
  • Establish scalable testing practices that support hyper-growth and petabyte-scale AI data pipelines.


  • AI-Focused Test Strategy, Automation & Evaluation
  • Design deterministic and statistical testing approaches for non-deterministic LLM-based and agentic systems, addressing hallucinations, prompt injection, bias, drift, and safety risks.
  • Build automated evaluation pipelines and harnesses for correctness, faithfulness, retrieval quality, generation accuracy, tool-calling, planning sequences, and multi-agent flows.
  • Execute/Develop test frameworks for the full AI lifecycle: prompts, datasets, embeddings, model versions, RAG pipelines (end-to-end validation), and guardrails.
  • Implement red-teaming, bias/fairness checks, and compliance mechanisms; leverage in trend frameworks for metrics and observability.
  • Integrate AI-specific quality signals into CI/CD for automated gating and continuous monitoring.


  • Cross-Functional & End-to-End Testing
  • Partner closely with product, data science, AI engineering, and dev teams to test AI features, conduct multi-agent simulations, and ensure high-quality roadmap delivery.
  • Facilitate knowledge sharing and upskilling on AI testing best practices across the Quality Function.


  • Metrics, Observability & Continuous Improvement
  • Drive core metrics (DORA, test coverage/effectiveness) plus AI-specific indicators (e.g., hallucination rate, context precision, drift detection).
  • Build real-time dashboards and support A/B testing of models with post-deployment monitoring.


  • Culture, Mentorship & Innovation
  • Champion a quality-first, ethical AI mindset organization-wide.
  • Mentor SDET’s, lead workshops on AI risks/validation, and influence design/deploy/incident processes.
  • As a foundational hire, define roadmaps and best practices for sustainable AI quality assurance.


Challenges You'll Tackle

  • Ensuring reliability in agentic systems amid data drift and non-deterministic behavior.
  • Scaling tests for global SaaS while maintaining low hallucination rates and strong safety guardrails.
  • Building evaluation from scratch in a rapidly evolving landscape (e.g., multi-modal, agentic flows).


Success in the First 6 Months

  • Launch foundational AI test frameworks and pipelines, achieving 80-90% coverage for key AI components.
  • Reduce AI-related defect escapes by 30-40% and integrate automated safety/compliance checks into all releases.
  • Establish metrics dashboards and evaluation loops that enable data-driven iteration on intelligent features.


What You Will Bring

  • 7+ years in Quality Engineering/SDET roles within cloud-native SaaS environments, including 2+ years hands-on with AI/ML/LLM systems.
  • Expertise in automated testing infrastructure, CI/CD (Jenkins/GitHub Actions), and test pyramid strategies (unit → E2E).
  • Strong full-stack testing experience (frontend/backend/API) and collaboration with dev teams.
  • Proven experience testing LLMs, AI agents, RAG pipelines, and related risks (hallucinations, prompt injection, bias, drift).
  • Proficiency in JS/TS, working knowledge of Python or Java; experiance with AI evaluation frameworks (e.g., Ragas, DeepEval, LangChain/LangSmith/LangFuse) and other tools you may have proficiency in.
  • Knowledge of performance, Stress and Load testing tools like K6, JMeter, Blazemeter will be nice to have.
  • Knowledge of observability (NewRelic), statistical testing methods, red-teaming, and ethical AI practices.
  • Excellent communication, and coaching skills; ability to thrive in ambiguity and drive innovation.
  • Bachelor's/Master's in Computer Science, AI, or related; certifications (e.g., ISTQB AI Testing) a plus.
  • Strong English language communication and collaboration skills


We value adaptability in this fast-moving field—equivalent experience and a strong portfolio (e.g., open-source contributions, case studies) are highly regarded.


Similar Jobs

Explore other opportunities that match your interests

Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Associate

emma of torre.ai

Colombia

Senior Data Engineer

Programming
1w ago

Premium Job

Sign up is free! Login or Sign up to view full details.

•••••• •••••• ••••••
Job Type ••••••
Experience Level ••••••

Caseware

Colombia

Support Engineer

Programming
1w ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

ariesline msp

Colombia

Subscribe our newsletter

New Things Will Always Update Regularly