Senior Software Engineer (Python) - Large-Scale Data Ingestion and Processing

Jobgether • United Arab Emirates
Remote
Apply
AI Summary

Jobgether is seeking a Senior Software Engineer (Python) to design and maintain large-scale data ingestion and processing systems. The ideal candidate will have 5+ years of experience in Python development, particularly in web scraping and data pipeline systems at scale. Strong experience working with REST APIs, search and data technologies, and distributed processing frameworks is required.

Key Highlights
Design and maintain large-scale data ingestion and processing systems
Develop robust data pipelines to extract, process, and normalize data
Implement preprocessing and transformation logic to support ML/NLP models
Key Responsibilities
Design and maintain large-scale data ingestion and processing systems
Develop robust data pipelines to extract, process, and normalize data
Implement preprocessing and transformation logic to support ML/NLP models
Collaborate with ML and data science teams to integrate classification models into production pipelines
Automate workflows using tools such as Apache Airflow and deploy scalable systems using Kubernetes and AWS
Technical Skills Required
Python REST APIs ElasticSearch/OpenSearch Apache Airflow Spark (EMR) Kubernetes AWS Tesseract PyMuPDF spaCy Hugging Face TensorFlow PyTorch
Benefits & Perks
Competitive salary
Fully remote flexibility
Comprehensive health coverage
Paid time off
Parental leave
Medical leave
Retirement savings plan
Flexible spending accounts
Health savings accounts
Wellness support
Home office setup support
Continuous learning and professional development support
Nice to Have
Exposure to ML/NLP concepts, LLMs, or frameworks such as spaCy, Hugging Face, TensorFlow, or PyTorch

Job Description


This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Software Engineer (Python) in United Arab Emirates.

This role is a high-impact opportunity for an experienced backend/data-focused engineer who enjoys building large-scale, distributed systems that power data extraction, processing, and search at scale. You will work on complex crawling and ingestion pipelines that collect and structure data from diverse sources such as web pages, APIs, PDFs, and documents. The environment is fast-moving, collaborative, and engineering-driven, with a strong emphasis on ownership, scalability, and problem-solving. You will contribute to the design of robust data architectures that support downstream analytics, classification, and machine learning use cases. The role combines Python engineering, distributed systems, and search technologies in a highly production-oriented setting. You will work closely with ML/NLP teams and infrastructure engineers to deliver reliable, high-performance systems in cloud environments. This is an ideal position for someone who enjoys tackling challenging data problems at scale and building systems that evolve continuously.

Accountabilities

In this role, you will be responsible for designing and maintaining large-scale data ingestion and processing systems that support search and classification capabilities.

  • Design and build distributed web crawling and data extraction systems capable of operating at scale in complex environments.
  • Develop robust data pipelines to extract, process, and normalize data from web pages, APIs, PDFs, and other document formats.
  • Build and maintain systems for unifying heterogeneous data into structured, consistent schemas for downstream use.
  • Implement preprocessing and transformation logic to support ML/NLP models, classification systems, and search indexing.
  • Develop APIs and services that expose structured data through ElasticSearch/OpenSearch.
  • Collaborate with ML and data science teams to integrate classification models into production pipelines.
  • Automate workflows using tools such as Apache Airflow and deploy scalable systems using Kubernetes and AWS.
  • Optimize and scale data processing pipelines using distributed computing frameworks such as Spark (EMR).

Requirements

This role requires strong backend engineering expertise, with deep experience in Python and large-scale data systems.

  • 5+ years of professional experience in Python development, particularly in web scraping and data pipeline systems at scale.
  • Strong experience working with REST APIs and processing structured and unstructured data formats (including PDFs and OCR tools like Tesseract or PyMuPDF).
  • Solid understanding of search and data technologies such as ElasticSearch/OpenSearch and relational or NoSQL databases.
  • Hands-on experience with distributed processing frameworks such as Apache Airflow and Spark (EMR or equivalent).
  • Strong problem-solving skills, especially in handling anti-scraping mechanisms, scaling challenges, and data complexity.
  • Experience working in cloud environments such as AWS or GCP.
  • Good understanding of system design principles for scalable and resilient backend systems.
  • Familiarity with Kubernetes and containerized deployments is a plus.
  • Exposure to ML/NLP concepts, LLMs, or frameworks such as spaCy, Hugging Face, TensorFlow, or PyTorch is an advantage.

Benefits

  • Competitive salary aligned with senior-level engineering expertise.
  • Fully remote flexibility with distributed team collaboration.
  • Comprehensive health coverage including medical, dental, vision, and prescription plans.
  • Paid time off, parental leave, and medical leave for employees and family care.
  • Retirement savings plan with employer matching contributions.
  • Flexible spending accounts (FSA) and health savings accounts (HSA).
  • Wellness support, including mental health resources and monthly wellness allowances.
  • Home office setup support after one year of employment.
  • Continuous learning and professional development support.
  • Inclusive, collaborative work culture with strong engineering ownership and impact.

How Jobgether Works

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Our system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company. The final decision and next steps (interviews, assessments) are managed by their internal team.

We appreciate your interest and wish you the best!

Why Apply Through Jobgether?

Data Privacy Notice: By submitting your application, you acknowledge that Jobgether will process your personal data to evaluate your candidacy and share relevant information with the hiring employer. This processing is based on legitimate interest and pre-contractual measures under applicable data protection laws (including GDPR). You may exercise your rights (access, rectification, erasure, objection) at any time.

We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.


Similar Jobs

Explore other opportunities that match your interests

Security & Infrastructure Engineer

Programming
•
3d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Pragmatike

United Arab Emirates

Design Director

Programming
•
4d ago
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

Jobgether

United Arab Emirates
Visa Sponsorship Relocation Remote
Job Type Full-time
Experience Level Mid-Senior level

wedwise

United Arab Emirates

Subscribe our newsletter

New Things Will Always Update Regularly