Seemless Integration, Exceptional Results

Python Developer

Andela

About the job

About Andela

Andela exists to connect brilliance and opportunity. Since 2014, we have been dedicated to breaking down global barriers and accelerating the future of work for both technologists and organizations around the world.

For technologists, Andela offers competitive long-term career opportunities with leading organizations, access to a global community of professionals, and education opportunities with leading technology providers.

For companies, Andela provides access to a global network of fully integrated team members that unlock their business innovation and growth potential.

At Andela, we are deeply passionate about creating long-lasting and transformative growth opportunities for all and doing it in an E.P.I.C. andela.com/careers way.

We are excited to continue building our remote-first team with incredible people like you!

About the role

The role focuses on building scalable data ingestion pipelines and extracting structured content from complex, often unstructured documents, especially PDF reports, scanned documents, and technical drawings. You will play a key part in enabling the GenAI application to access and reason over new data sources.

This is a backend-focused role, with responsibilities centered on content extraction and processing. While exposure to GenAI technologies is beneficial, the primary requirement is deep hands-on experience with PDF/document processing.

Responsibilities

  • Design and implement robust data extraction pipelines to process diverse document types, especially PDFs with both text and scanned content.
  • Customize extraction logic per data source, including metadata extraction (e.g., machine IDs, customer information).
  • Work with document processing tools like Tesseract, Unstructured IO, or similar.
  • Integrate with AWS-based infrastructure, including Lambda and ECS for deployment.
  • Collaborate with a cross-functional team to onboard and validate new data sources.
  • Ensure the high accuracy and quality of extracted data to support downstream GenAI use.

Qualifications

  • 5–10 years of professional experience with Python, especially in backend or data engineering roles.
  • Strong hands-on experience with document content extraction, particularly from PDFs with complex formats (e.g., scanned images, drawings).
  • Familiarity with OCR tools (e.g., Tesseract) and content extraction libraries (e.g., Unstructured IO, pdfminer).
  • Proficient in building modular, production-grade Python code with data models and validation (e.g., Pydantic).
  • Working knowledge of AWS services, especially Lambda, ECS, and containerization with Docker.
  • Ability to quickly understand new data structures and design custom ingestion strategies.

Preferred Qualifications

  • Prior experience working on GenAI or LLM-powered applications, especially in document understanding or search contexts.
  • Experience with AWS Textract or Azure Document Intelligence for cloud-based content extraction.
  • Familiarity with chunking strategies and data preparation for vector databases (e.g., for retrieval-augmented generation).
  • Experience in fast-paced, deadline-driven projects and ability to deliver with minimal supervision.
  • Comfortable working in globally distributed teams, with flexibility to align with European time zones.

Overlap Hours: 5-8 hours with CET (UTC+2)

At Andela, we outcompete through diversity. We know that our strengths lie in the multiplicity of talents, perspectives, backgrounds, and orientations of residents in our community and we take pride in that. Andela is committed to a work environment in which all individuals are treated with respect and dignity. Each individual has the right to work in a professional atmosphere that promotes equal employment opportunities and prohibits discriminatory practices. Andela provides equal employment opportunities and workplace to all employees and applicants without regard to factors including but not limited to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, pregnancy (including breastfeeding), genetic information, HIV/AIDS or any other medical status, family or parental status, marital status, amnesty or status as a covered veteran in accordance with applicable federal, state and local laws. This commitment applies to all terms and conditions of employment, including but not limited to hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training. Our policies expressly prohibit any form of harassment and/or discrimination as stated above.

Andela is home for all, come as you are.

Share this job

Related Jobs

Douro Labs

Site Reliability Engineer

Solid foundations in computer science, including data structures

Proton.ai

Senior Backend Engineer

Proton is looking for an experienced Senior Backend Engineer

clearer.io

Software Engineer II

You will lead the design and development of complex features

Actionable.co

Senior Front-End Developer

We’re looking for a UX-obsessed front-end engineer

Keystone Recruitment

Software Engineer

This part-time remote contract leverages 2+ years of engineering

Vynfy

Junior Frontend Developer

This is a full-time remote role for a Junior Frontend Developer

Virtasant

Senior Full-Stack Engineer PHP – WordPress Platform

5+ years of professional full-stack engineering experience

RapDev

Sr. ServiceNow Developer

Looking for engineers that are ready to learn the latest technologies

Steakhouse Financial

Lead Kamino Market Infrastructure Software Engineer

You’ll be a force behind the software infrastructure

fjorge

Senior .NET Developer

We are looking for a Senior .NET Developer