Data Engineer for a fast-growing Legal AI startup building intelligent systems to process millions of court records. Owns the caselaw and docket data layer end-to-end. Requires 4-8 years of experience in building production data pipelines.
Key Highlights
Key Responsibilities
Technical Skills Required
Benefits & Perks
Job Description
Role: Data Engineer | Legal AI
Location: Remote (US) - working EST timezone
Compensation: $180K–$220K + Equity
I'm partnering with a fast-growing Legal AI startup on a genuinely exciting hire — they're building intelligent systems that process millions of court records to power AI-driven legal strategy, and they need a talented Data Engineer to own the foundation it all runs on.
The Role
This is a high-ownership, high-impact position. You'll own the caselaw and docket data layer end-to-end — building and maintaining production pipelines that ingest, normalize, and enrich data from PACER, NYSCEF, state court systems, and published opinions. Every downstream product surface depends on what you build.
What You'll Be Doing
Interested in remote work opportunities in Data Science? Discover Data Science Remote Jobs featuring exclusive positions from top companies that offer flexible work arrangements.
- Own production pipelines ingesting from PACER, NYSCEF, and state courts at scale across millions of court records
- Wrangle messy semi-structured inputs (PDFs, scanned filings, XML/HTML) into clean, queryable structures
- Design LLM-assisted extraction workflows using Claude, Gemini, or OpenAI to turn unstructured legal text into reliable structured signal
- Collaborate closely with full-stack engineers feeding judicial behavioral intelligence and AI strategy layers
- Run statistical analyses, optimize SQL queries, and leverage RAG and semantic search tooling (Pinecone, Voyage AI)
- Operate and optimize the AWS data stack (S3, RDS) within SOC 2 architecture guardrails
What They're Looking For
- 4–8 years of experience building production data pipelines
- Prior fluency with legal data (PACER, NYSCEF, court records) is a significant advantage — it will save months of ramp
- Strong command of Python, SQL, PostgreSQL, MongoDB, dbt, and NLP/RAG tooling
- Comfortable working in an AWS environment (S3, RDS)
- Remote US-based with strong preference for NYC / Eastern Time (4+ hours daily overlap required)
Browse our curated collection of remote jobs across all categories and industries, featuring positions from top companies worldwide.
Compensation
- $180,000 – $220,000 base salary
- 0.15% – 0.25% equity
- Remote (US) — NYC preferred
- H1B transfers and TNs only — no new visa sponsorship
Similar Jobs
Explore other opportunities that match your interests
Senior Data Scientist
gardyn
Jobgether
Data Engineer