H1 logo

Software Engineer II

H1

Posted about 8 hours ago

At H1, we believe access to the best healthcare information is a basic human right. Our mission is to provide a platform that can optimally inform every doctor interaction globally. This promotes health equity and builds needed trust in healthcare systems. To accomplish this our teams harness the power of data and AI-technology to unlock groundbreaking medical insights and convert those insights into action that result in optimal patient outcomes and accelerates an equitable and inclusive drug development lifecycle.  Visit h1.co to learn more about us.
 
Data Engineering at H1 is responsible for the development and delivery of our most important asset, our data. With thousands of data sources from around the world, the team ensures that data is accurate, normalized, and delivered at a velocity that keeps up with real-world changes. As we expand our markets and the scope of data we provide to our customers, our team must scale to meet that demand.
 
 
WHAT YOU'LL DO AT H1
 
We are hiring a Backend Software Engineer II (Data Harvesting) to help build and scale the systems that power how we collect and process data from the web. This role is ideal for an engineer who has hands-on experience building data pipelines and working with web data, and is looking to grow into deeper ownership of distributed systems and data platforms. You will work closely with senior engineers and cross-functional partners to design, build, and improve systems that capture, process, and deliver high-quality data at scale.

You will: 
- Contribute to building systems and frameworks that capture web data at scale, including working with structured and unstructured data sources
- Design and develop data extraction components using tools such as APIs, scraping frameworks, and parsing logic
- Build and maintain ETL/ELT pipelines using technologies like Apache Spark and cloud platforms (preferably AWS)
- Write clean, efficient Python code to support data ingestion, transformation, and processing workflows
- Help improve the reliability and performance of data pipelines through monitoring, debugging, and optimization
- Work with senior engineers to enhance systems that handle: data quality and normalization,  large-scale data ingestion and pipeline scalability
- Troubleshoot issues related to: data inconsistencies, pipeline failures and source data changes (e.g., website structure updates)
- Collaborate with product, data, and engineering teams to ensure data is usable and aligned with business needs
- Contribute to documentation and participate in code reviews to support engineering best practices

ABOUT YOU
 
- You are an early-to-mid level engineer who has built data pipelines or backend systems and is eager to deepen your expertise in large-scale data systems and web data processing.
-You enjoy solving complex data problems and working with real-world, messy datasets
- You are comfortable writing production-level code and debugging systems
- You are collaborative and open to feedback, with a strong desire to learn from senior engineers
- You have a strong foundation in data structures, system design fundamentals, and backend development
- You are interested in working on systems that interact with external data sources (e.g., APIs, web data)
 
REQUIREMENTS 
 
- 3–5 years of professional experience in backend or data engineering
- Strong proficiency in Python
- Experience working with large-scale data ingestion systems
- Experience building and maintaining data pipelines or backend services
- Familiarity with web data extraction concepts, such as: APIs, web scraping (Selenium, Playwright, or similar) and handling structured and unstructured data
- Strong SQL skills (PostgreSQL or similar databases)
- Experience with Apache Spark or similar data processing frameworks
- Experience working in a AWS cloud environment
- Familiarity with Docker or containerization

Nice to Have: 

- Exposure to web scraping at scale, including challenges like rate limiting or dynamic content
- Familiarity with Airflow, Argo or orchestration tools
- Basic understanding of HTTP/HTTPS and web protocols
- Exposure to LLMs or NLP-based data extraction workflows

Working Hours

- This role is fully integrated with our global team and requires daily collaboration with US-based engineers.
- Working hours: 1:00 PM – 9:00 PM IST (Monday–Friday)
- This ensures strong overlap with our US teams and real-time collaboration

 
 
 
Not meeting all the requirements but still feel like you’d be a great fit? Tell us how you can contribute to our team in a cover letter! 
H1 OFFERS
- Full suite of health insurance options, in addition to generous paid time off
- Pre-planned company-wide wellness holidays
- Retirement options
- Health & charitable donation stipends
- Impactful Business Resource Groups
- Flexible work hours & the opportunity to work from anywhere
- The opportunity to work with leading biotech and life sciences companies in an innovative industry with a mission to improve healthcare around the globe
 

Job details

Workplace

Hybrid

Location

India Remote

Similar
H1 logo

H1

Software Development

About

H1 is on a mission to connect the world with the right doctors. H1’s AI-powered platform leverages a continuously learning Doctor Graph to identify and engage the right doctor for the critical needs in healthcare. It is built on one of the world’s largest and most comprehensive global datasets of healthcare professionals. By combining deep healthcare data with agentic AI workflows, H1 powers clinical research, medical exchange, patients finding doctors, health plan network analytics, and a system of record for provider data, helping create a healthier future for all. Learn more at h1.co.

Company Details

Employees
545
Industry
Software Development
Headquarters
New York, NY
Founded
2017
Company location
New York, NY
Specialties
healthcare, provider data, healthcare professional information, diversity data, medical affairs, pharmaceutical, data science, medical science liaison, msl education, key opinion leaders, kol mapping, biotechnology, life sciences, healthcare data, HCPs, SaaS, healthcare content, clinical trials, clinical operations, and global drug development

Online Presence

Jobr Assistant extension

Get the extension →