This job was posted more than 40 days ago and might be expired.
Foresite Labs (Stealth Co) logo

Staff Engineer, CI/CD & Cloud Infrastructure

Posted about 2 months ago

OfficeSan DiegoSE175k - 185k USD

Staff Engineer, CI/CD & Cloud Infrastructure

Location: San Diego, CA

Job Type: Full-Time

Salary Range: $ 175,000 - $185,000

Position Overview

We are looking for a Staff CI/CD & Cloud Infrastructure Engineer to own and evolve our build pipelines, deployment workflows, and cloud infrastructure. You will be responsible for ensuring that software — spanning Python, C/C++, and CUDA on Linux — is built, tested, versioned, and deployed reliably across both AWS cloud environments and a fleet of complex embedded instruments operated in our central lab facility.

This is a senior hands-on role for an engineer who thrives at the intersection of DevOps automation, cloud infrastructure management, and release engineering. You will design and maintain CI/CD pipelines, manage complex AWS infrastructure as code, and ensure full traceability from source commits through builds, tests, artifacts, and deployments. You will work cross-functionally with firmware, application, and HPC engineers to keep the entire delivery pipeline fast, reliable, and observable.

Key Responsibilities

CI/CD & Build Engineering

  • Design, build, and maintain CI/CD pipelines using GitHub Actions or similar platforms

  • Manage build systems for Python, C/C++, and CUDA codebases on Linux

  • Integrate build tools (CMake, Make, pip, setuptools) into automated pipelines

  • Implement robust versioning, tagging, and artifact management strategies

  • Ensure full traceability of builds, test results, and artifacts from commit to deployment

  • Manage Docker-based build environments including base images, caching, and reproducibility

  • Maintain and optimize build performance, parallelism, and reliability

Cloud Infrastructure (AWS)

  • Architect and manage complex AWS infrastructure including:

    • IAM roles, policies, and access management

    • Storage services (S3, EBS, EFS) with tiered lifecycle policies

    • Databases (RDS, DynamoDB, or similar) with backup and

      failover strategies

    • Data workflow and pipeline engines (Step Functions, Airflow, or

      similar)

    • Compute services (EC2, ECS, EKS, Lambda) scaled to workload

      requirements

  • Implement infrastructure as code using Terraform

  • Manage Kubernetes clusters and Helm charts for containerized

  • workloads

  • Design for scalability, high availability, and disaster recovery

  • Manage cost optimization, resource tagging, and infrastructure

  • governance

  • Support multi-account and multi-region strategies as needed

  • Familiarity with Azure and GCP for secondary or hybrid

  • requirements

On-Premises HPC & Hybrid Infrastructure

  • Provision, configure, and manage on-premises Linux HPC nodes used for secondary and tertiary data processing

  • Define infrastructure-as-code (Terraform, Ansible, or similar) for reproducible HPC node provisioning and configuration

  • Manage high-speed networking infrastructure between instruments, HPC nodes, and storage (configuration, monitoring, troubleshooting)

  • Implement and manage shared storage systems (NFS, parallel filesystems, or similar) accessible to both local HPC and cloud compute

  • Design and operate hybrid burst-to-cloud infrastructure — provision and manage AWS compute resources that extend local HPC capacity on demand

  • Collaborate with the data pipeline team to ensure infrastructure meets throughput, latency, and reliability requirements

  • Manage OS patching, driver updates, and GPU runtime environments across HPC nodes

  • Monitor HPC cluster health, utilization, and capacity to inform scaling decisions

    Experiment Data Management & Pipelines

  • Design and operate data ingestion pipelines for high-volume experiment data from lab instruments

  • Implement tiered storage strategies (hot/warm/cold) to balance accessibility, performance, and cost

  • Deploy and manage search infrastructure (Elasticsearch/ OpenSearch) to make experiment data universally discoverable and queryable

  • Build data cataloging and metadata tagging systems so datasets are well-organized and self-describing

  • Integrate visualization tools (Grafana, Kibana, or similar) to enable engineers and scientists to explore and analyze experiment data

  • Design data lifecycle policies including retention, archival, and compliance requirements

  • Ensure data pipelines are reliable, idempotent, and observable with clear error handling and retry logic

  • Work with engineering and science teams to define data schemas, access patterns, and query requirements

Deployment & Release Engineering

  • Own deployment workflows for software delivered to embedded instruments in our central lab

  • Manage release processes for a small number of complex, high- value lab-operated instruments

  • Design deployment strategies that account for rollback, validation, and minimal downtime

  • Coordinate versioned releases across multiple software components and dependencies

  • Support development, staging, and production environment parity

Logging, Observability & Traceability

  • Implement centralized log collection and aggregation across cloud and on-site systems

  • Deploy and manage observability tooling (Prometheus, Grafana, Loki, CloudWatch, or similar)

  • Ensure structured, searchable logging with clear correlation across services

  • Build dashboards and alerting for infrastructure health, pipeline status, and deployment state

  • Establish traceability standards linking builds, tests, artifacts, and deployments

  • Support diagnostics and post-mortem analysis for production incidents

AI-Augmented DevOps

  • Integrate agentic AI tools into CI/CD workflows to automate code review, test generation, and pipeline troubleshooting

  • Evaluate and deploy AI-powered assistants for infrastructure management, incident response, and operational tasks

  • Design guardrails and human-in-the-loop controls for AI-driven automation in production environments

  • Stay current with the rapidly evolving landscape of AI-augmented development and DevOps tooling

  • Champion adoption of agentic AI across engineering workflows to accelerate delivery and improve reliability

Qualifications

Education:

BS/MS in Computer Science or Engineering

Required:

  • Experience & Technical Skills

  • 7+ years of experience in DevOps, CI/CD, or cloud infrastructure roles

  • Strong, hands-on Linux expertise (administration, debugging, performance tuning)

  • Deep experience designing and operating CI/CD pipelines (GitHub Actions preferred)

  • Proven experience managing complex AWS infrastructure at scale

  • Strong knowledge of Docker including multi-stage builds, registries, and orchestration

  • Experience with infrastructure as code using Terraform

  • Experience with Kubernetes and Helm for container orchestration

  • Solid understanding of versioning strategies, artifact management, and release engineering

  • Experience integrating agentic AI into DevOps workflows and CI/CD pipelines

    Programming & Build Systems

  • Proficiency in Python and shell scripting for automation and tooling

  • Ability to read, debug, and build C/C++ and CUDA applications on Linux

  • Experience integrating build systems (CMake, Make) into CI pipelines

  • Familiarity with package management and dependency resolution across languages

    Cloud & Infrastructure

  • Deep AWS experience across IAM, networking (VPC, security groups), storage, compute, and database services

  • Experience managing on-premises Linux HPC infrastructure alongside cloud resources

  • Experience designing for high availability, failover, and disaster recovery

  • Experience with data pipeline and workflow orchestration tools (Step Functions, Airflow, or similar)

  • Experience with search and indexing platforms (Elasticsearch, OpenSearch, or similar)

  • Understanding of tiered storage strategies and data lifecycle management

  • Knowledge of cost management, tagging strategies, and infrastructure governance

    Observability & Traceability

  • Experience with logging and monitoring stacks (Prometheus,

  • Grafana, Loki, ELK, or CloudWatch)

  • Understanding of build and artifact traceability practices

  • Experience with structured logging and distributed tracing concepts

    Preferred:

  • Experience deploying software to embedded or lab-operated instruments

  • Experience with high-speed networking (InfiniBand, RDMA, or 10/25/100GbE) in HPC environments

  • Experience with CUDA build toolchains and GPU-accelerated workloads

  • Familiarity with Azure or GCP in addition to AWS

  • Experience in regulated or reliability-sensitive environments

  • Experience with GitOps workflows and progressive delivery

    strategies

  • Familiarity with secrets management (Vault, AWS Secrets Manager)


    We are an equal opportunity employer. We thrive on diversity and collaboration.

Job details
Workplace
Office
Location
San Diego
Experience
SE
Salary
175k - 185k USD
per year
Foresite Labs (Stealth Co) logo
Foresite Labs (Stealth Co)
View company page

Foresite Labs creates companies at the intersection of AI/machine learning and science. We believe AI, generative AI, and data science—when applied with scientific rigor—can accelerate discovery and drive innovations that benefit humanity. We provide the foundation for bold ideas to take shape and accelerate, shaping a better future for all. We offer competitive salaries, excellent benefits, and a flexible work environment where employees learn from top thinkers across multiple disciplines. With headquarters in San Francisco and Boston, we’re building a culture where scientific rigor meets entrepreneurial ambition. Foresite Labs Values Truth over progression: We follow the science, pursuing ideas that are grounded in data and abandoning them when not supported by the evidence. Take good risks: Our culture values informed risk-taking: good decisions are celebrated even when they result in bad outcomes. Everyone feels safe to contribute ideas and to learn from failure. Single accountable person: The project team lead is accountable for all decisions and for maintaining transparency and information flow within the team; we trust the project teams. The Review Committee unlocks capital and sets directions. Simplicity and Focus: “Companies die from indigestion, not starvation” (Bill Hewlett) We will focus on a few ideas aggressively and minimize all other distractions. Everyone will have a few key goals that have measurable outcomes. Respect and Community: Our employees are our greatest asset; everyone invests in creating an environment of collaboration and respect. We support their careers and career development whether they stay, go to a Labs company, or end up somewhere else.

Employees
34
Industry
Biotechnology
Headquarters
San Francisco, California
Founded
2019
Company location
601 California St, Suite 600, San Francisco, California 94108, US

Key team members

Alex Aravanis MD PhD

Alex Aravanis MD PhD

Damien Soghoian

Damien Soghoian

Christopher Baldwin

Christopher Baldwin

Kylie Reynolds

Kylie Reynolds

Apply smarter with Jobr

Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.

Direct from company career pages
AI-personalised cover letters
Human review before every submit
Application tracking & follow-ups