Principal Data Engineer
Posted about 15 hours ago
Principal Data Engineer
Full-time | Hybrid | Sydney
Reports to: VP of Engineering
About Us:
Checkbox is a Series A technology company building AI-native SaaS for in-house legal teams. Our platform helps legal teams manage how work is raised, understood, routed, actioned and resolved across the business.
We are now transforming our SaaS ecosystem into agentic-first products, using AI agents and intelligent workflow automation to change how legal work gets done. To support this, we are investing deeply in the data, context and retrieval foundations that power reliable AI product experiences.
This is a critical next chapter for Checkbox. Our data platform needs to become more than infrastructure. It needs to become a competitive moat.
The Role
We are looking for a Principal Data Engineer / Data Architect to own Checkbox’s data strategy and the architecture that delivers it.
This is a senior technical leadership role reporting directly to the VP of Engineering. You will lead our tier-one Data team, starting with one direct report, while remaining deeply hands-on in the design and delivery of the systems you define.
The Data team exists to serve the product engineering streams building AI agents, so close day-to-day partnership with Product, Engineering, AI and Platform teams is central to the role.
You will own the data and AI reference architecture that underpins our agentic product direction. This includes base data sources, intelligence services, context and retrieval layers, API and MCP surfaces, and the secure data access patterns required to give AI agents compliant, tenant-isolated context.
This is not a role for someone who only wants to design from a distance. We need someone who has built data and context foundations for AI products in production, can make pragmatic architecture decisions, and can lead by building.
What You’ll Own
Data and AI Reference Architecture
Own the data and AI reference architecture across Checkbox’s product ecosystem
Define how base data sources, intelligence services, APIs, MCP surfaces, context engines and retrieval layers fit together
Design patterns that allow product engineering streams to build AI features on shared foundations rather than reinventing data access each time
Establish clear architectural principles for storage, retrieval, serving, tenancy, observability and compliance
Work hand in glove with the Principal Engineer on architecture that spans application, platform, eventing, data and AI systems
Context, Retrieval and AI Data Foundations
Architect and build the context and retrieval layers that power AI agents and GenAI product experiences
Design secure, tenant-isolated data access patterns for AI systems
Define how customer, matter, workflow, document, event and operational data should be modelled, retrieved and served
Evaluate and select the right data patterns for each problem, including transactional stores, operational data stores, warehouses, vector stores, semantic layers, knowledge graphs, event streams and APIs
Make buy-vs-build decisions layer by layer rather than defaulting everything to in-house development
Ensure AI agents receive the right context at the right time, with appropriate controls around access, relevance, latency, cost and compliance
Data Platform, Integrations and Eventing
Own architecture across integrations, eventing, operational data, transactional data and AI-ready data
Define data contracts, event models, ingestion patterns and transformation approaches that scale across multiple products
Partner with product engineering teams to ensure new product capabilities generate usable, reliable and well-governed data
Build shared data capabilities that support analytics, AI agents, workflow automation and customer-facing product experiences
Improve data quality, lineage, observability and reliability across the data lifecycle
Security, Tenancy and Compliance
Treat security, tenancy and data isolation as first-order architectural concerns
Ensure data access patterns are compliant, auditable and appropriate for enterprise customers
Design systems that support tenant isolation, data segregation, least privilege access and secure retrieval
Partner with Platform, Security and Engineering teams on encryption, decryption, access control, auditability and production readiness
Ensure data and context systems can support customer trust, compliance requirements and future audit needs
Team Leadership and Technical Direction
Lead the Data team as a tier-one engineering function reporting to the VP of Engineering
Manage and grow one direct report initially, with scope to shape the team as the function expands
Set technical direction while staying close to implementation
Mentor engineers on data architecture, AI data foundations, retrieval patterns and pragmatic systems design
Build operating rhythms, standards and documentation that help the Data team scale
Act as the senior technical voice for data architecture across engineering leadership discussions
What Success Looks Like
Product engineering streams can build AI features on a dependable, shared data and context layer
The data and AI reference architecture is real, documented, actively used and continuously improved
Storage, retrieval and serving choices genuinely fit the problem rather than forcing every use case through one pattern
Data access is compliant, secure and tenant-isolated by default
AI agents can access relevant context reliably, with clear controls around quality, latency, cost and permissions
Data contracts, event models and retrieval patterns reduce duplication across product teams
The Data team becomes a strategic enabler for AI product development rather than a bottleneck
Checkbox’s data and context capability becomes visibly stronger as a competitive moat
About You
Significant experience as a senior data engineer, principal data engineer, data architect, staff engineer or similar technical leadership role
Proven experience building data and context foundations that power AI products in production
Strong experience designing data architectures across transactional, operational, analytical and AI-ready systems
Deep understanding of modern data platform patterns, including data contracts, event-driven architecture, ingestion, transformation, observability, lineage and governance
Strong understanding of AI data patterns such as retrieval systems, embeddings, semantic modelling, vector search, knowledge graphs, context engineering or agentic workflows
Experience working with multi-tenant SaaS systems where data segregation, tenancy and access control matter
Ability to make pragmatic architecture decisions and select the right tool or pattern for each problem
Strong judgement around buy-vs-build decisions across data infrastructure, retrieval, orchestration and AI platform layers
Comfortable leading technical direction while still building, reviewing and shipping
Experience mentoring engineers or leading small technical teams
Strong communication skills and ability to work closely with product engineering streams, platform teams and senior engineering leadership
Comfortable operating in a fast-moving environment where systems are being built, scaled and refined at the same time
Bonus Points
Experience working on AI agents, agentic workflows, GenAI platforms or AI-native SaaS products
Experience designing MCP or API surfaces for data and context access
Experience with legal tech, workflow automation, enterprise SaaS or document-heavy products
Experience with AWS-based data and platform infrastructure
Experience with event-driven systems, queues, Pub/Sub patterns or streaming architectures
Experience with data security, compliance, auditability and enterprise customer requirements
Experience growing a small data function from early foundations into a scalable team
What We Offer
Competitive salary
Hybrid working with team days in our Sydney CBD office
Direct reporting line to the VP of Engineering
High ownership over a tier-one engineering function
Opportunity to shape the data and AI architecture behind agentic-first products
Personal learning and development budget
Flexible leave policy
Expense policy and salary sacrifice options
CBD start-up hub with snacks, drinks, premium coffee and team socials
Company-wide social events and annual off-sites
Transparent, flat culture where questions and feedback are welcomed
Jobr aggregates jobs directly from company career portals — no middlemen. Our team applies on your behalf with AI-tailored resumes, reviewed by a human before submission.