Senior Platform Engineer - (NPW)
Milestone Technologies, Inc..com
Hybrid
Remote, IN
Full Time
Job Overview
We’re hiring a Platform Developer to build core services and integrations for an AI-first, cloud-native platform. You’ll implement orchestration services (plan/route/trace), develop typed connector SDKs to external systems and models, and instrument robust observability. The ideal candidate is a hands-on engineer who writes clean, reliable code, understands distributed-systems trade-offs, and can optimize for latency, resiliency, and cost.
Key Responsibilities
- Orchestration Services: Build and harden services for planning, routing, retries/timeouts, idempotency, and call-graph tracing; ensure graceful degradation and fault isolation.
- Connector/Adapter Engineering: Develop typed adapters (REST/gRPC, MCP-style or similar) for AI engines, vector/graph/SQL stores, and in-silico tools; handle auth, pagination, rate limits, and backoff.
- Integration Delivery: Stand up at least one AI engine integration and one in-silico model runner; capture run provenance and metrics; provide mocks for offline testing.
- Observability: Instrument tracing/metrics/logging (OpenTelemetry or equivalent), correlation IDs, and structured logs; expose health/readiness endpoints and SLO dashboards.
- Performance & Caching: Optimize P50/P95 latency, concurrency, and throughput; implement response/result caches, connection pooling, and backpressure strategies.
- Quality & CI/CD: Write unit/integration/contract tests; maintain API schemas; participate in PR reviews; automate builds and environment promotion via CI/CD and IaC.
- Security & Compliance: Implement least-privilege access, secrets management, and audit logging; follow secure coding standards and dependency scanning.
- Documentation & Support: Produce developer docs and runbooks; support incident triage, root-cause analysis, and post-mortems.
Required Skills
- Backend Engineering: 4-8+ years building production services (microservices/event-driven) in Python, TypeScript/Node.js.
- API & Contracts: Strong REST/gRPC design, OpenAPI/JSON Schema, idempotency keys, error taxonomies, pagination, and versioning.
- Distributed Systems: Retries with jitter, exponential backoff, circuit breakers, bulkheads, DLQs/queues (Kafka/SQS/Pub/Sub/RabbitMQ), and concurrency control.
- Data & Integrations: Experience integrating with vector DBs (e.g., pgvector, Pinecone, Weaviate, Milvus), graph DBs (RDF/SPARQL or property graph/Gremlin), SQL/warehouse, and search (OpenSearch/Elasticsearch).
- Observability: OpenTelemetry (traces/metrics/logs), log aggregation, alerting, and performance profiling.
- Cloud & DevOps: Containers/Kubernetes, Terraform/CloudFormation, CI/CD (GitHub Actions/Azure DevOps/Jenkins), secrets management (Vault/KMS), and environment promotion practices.
- Security by Design: OAuth2/OIDC, SSO integration, RBAC/ABAC, secure coding, and audit logging fundamentals.
- Testing Discipline: Unit/integration/e2e/contract testing; mocks/stubs; test data management.
Good To Have Skills
- AI/LLM Integrations: Function/tool-calling patterns, model routing, embedding services, and safe tool-use guardrails.
- Performance Tuning: Async I/O, connection reuse, profiling (pprof/py-spy), and cache strategy design.
- Resilience & Networking: Service mesh (Istio/Linkerd), API gateways, rate-limit governance, and zero-trust networking basics.
- Data Quality & Lineage: Basic familiarity with data validation, provenance, and lineage capture for pipelines.
- FinOps Awareness: Token/run-time cost tracking, cost-per-request dashboards, and budget enforcement hooks.
- Docs & DX: Developer portal contributions, code examples/SDKs, and CLI tooling to improve developer experience.
This role is ideal for a pragmatic engineer who enjoys building reliable platform primitives and integrations that other teams can depend on-secure, observable, and fast.
Senior Platform Engineer - (NPW)
Hybrid
Remote, IN
Full Time
October 8, 2025