Seemless Integration, Exceptional Results

Python Developer

Andela

About the job

About Andela

Andela exists to connect brilliance and opportunity. Since 2014, we have been dedicated to breaking down global barriers and accelerating the future of work for both technologists and organizations around the world.

For technologists, Andela offers competitive long-term career opportunities with leading organizations, access to a global community of professionals, and education opportunities with leading technology providers.

For companies, Andela provides access to a global network of fully integrated team members that unlock their business innovation and growth potential.

At Andela, we are deeply passionate about creating long-lasting and transformative growth opportunities for all and doing it in an E.P.I.C. andela.com/careers way.

We are excited to continue building our remote-first team with incredible people like you!

About the role

The role focuses on building scalable data ingestion pipelines and extracting structured content from complex, often unstructured documents, especially PDF reports, scanned documents, and technical drawings. You will play a key part in enabling the GenAI application to access and reason over new data sources.

This is a backend-focused role, with responsibilities centered on content extraction and processing. While exposure to GenAI technologies is beneficial, the primary requirement is deep hands-on experience with PDF/document processing.

Responsibilities

  • Design and implement robust data extraction pipelines to process diverse document types, especially PDFs with both text and scanned content.
  • Customize extraction logic per data source, including metadata extraction (e.g., machine IDs, customer information).
  • Work with document processing tools like Tesseract, Unstructured IO, or similar.
  • Integrate with AWS-based infrastructure, including Lambda and ECS for deployment.
  • Collaborate with a cross-functional team to onboard and validate new data sources.
  • Ensure the high accuracy and quality of extracted data to support downstream GenAI use.

Qualifications

  • 5–10 years of professional experience with Python, especially in backend or data engineering roles.
  • Strong hands-on experience with document content extraction, particularly from PDFs with complex formats (e.g., scanned images, drawings).
  • Familiarity with OCR tools (e.g., Tesseract) and content extraction libraries (e.g., Unstructured IO, pdfminer).
  • Proficient in building modular, production-grade Python code with data models and validation (e.g., Pydantic).
  • Working knowledge of AWS services, especially Lambda, ECS, and containerization with Docker.
  • Ability to quickly understand new data structures and design custom ingestion strategies.

Preferred Qualifications

  • Prior experience working on GenAI or LLM-powered applications, especially in document understanding or search contexts.
  • Experience with AWS Textract or Azure Document Intelligence for cloud-based content extraction.
  • Familiarity with chunking strategies and data preparation for vector databases (e.g., for retrieval-augmented generation).
  • Experience in fast-paced, deadline-driven projects and ability to deliver with minimal supervision.
  • Comfortable working in globally distributed teams, with flexibility to align with European time zones.

Overlap Hours: 5-8 hours with CET (UTC+2)

At Andela, we outcompete through diversity. We know that our strengths lie in the multiplicity of talents, perspectives, backgrounds, and orientations of residents in our community and we take pride in that. Andela is committed to a work environment in which all individuals are treated with respect and dignity. Each individual has the right to work in a professional atmosphere that promotes equal employment opportunities and prohibits discriminatory practices. Andela provides equal employment opportunities and workplace to all employees and applicants without regard to factors including but not limited to race, color, religion, gender, sexual orientation, gender identity, national origin, age, disability, pregnancy (including breastfeeding), genetic information, HIV/AIDS or any other medical status, family or parental status, marital status, amnesty or status as a covered veteran in accordance with applicable federal, state and local laws. This commitment applies to all terms and conditions of employment, including but not limited to hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training. Our policies expressly prohibit any form of harassment and/or discrimination as stated above.

Andela is home for all, come as you are.

Share this job

Related Jobs

X-FLOW

Senior Fullstack Engineer

Currently looking for a Senior Fullstack Engineer

Gramian Consulting

Senior Python Engineer

We are looking for a high-caliber backend Engineer

Rotate Digital

Software Engineer

Focus on continuing the development of our proprietary marketing platform

Network International

Senior API Developer

You will be a part of a team responsible for building financial API platform

Mybitstore

Back End Developer

We are seeking a skilled Back End Developer

Cabana

Software Engineer

Designing the platform that makes their AI usable

Afya

Backend Engineer

We are looking for an experienced Backend Engineer

Testlio

Business Intelligence Engineer

We are hiring a Business Intelligence (BI) Engineer

Fueled

Senior Full Stack Engineer

You will architect, build, and scale high‑performance web applications

Clearscale

Senior Full Stack Engineer

Design, develop, and deploy scalable apps on AWS platforms