Software Engineer (Networking & Telemetry Systems)
Microsoft.com
98k - 209k USD/year
Office
Mountain View, California, United States
Full Time
Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft’s cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross-team collaboration with business insight and strategy?
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.
Join the Strategic Planning and Architecture (SPARC) team within Microsoft’s Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft’s expanding Cloud Infrastructure and for powering Microsoft’s “Intelligent Cloud” mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide, and AHSI is the team behind our expanding cloud infrastructure. We deliver the core infrastructure and foundational technologies for Microsoft's cloud businesses including Microsoft Azure, Bing, MSN, Office 365, OneDrive, Skype, Teams and Xbox Live.
We are seeking a Software Engineer (Networking & Telemetry Systems) to lead the design and development of scalable networking systems, transports, and telemetry frameworks that support Azure’s AI and data center infrastructure. This role requires deep expertise in systems and network architecture, performance diagnostics, and telemetry engineering, with a focus on building robust observability and debugging capabilities.
Responsibilities
Coding Learns to review code and helps to review code of others to ensure it meets team standards. Participates in code review processes for self-development, gathers feedback, and learns about coding standards and the team's features. Applies coding patterns and best practices. Learns how and begins to use automated source code analysis tools that are incorporated into the build/development process with minimal supervision. Develops and applies knowledge of debugging tools, tests, logs, telemetry, and other methods to begin supporting efforts to proactively flag issues before they occur for product features in production. Learns to conduct incident retrospectives to identify root causes of problems, and begins to implement repair actions with direct supervision. Grows understanding of and begins to apply least-access principles and uses logging, telemetry, and other appropriate mechanisms with direct supervision to investigate issues while retaining privacy and security. With guidance, learns how and creates and implements code for a product, service, or feature reusing code as applicable. Writes and learns to create code that is extensible and maintainable. Learns about and applies diagnosability, reliability, and maintainability, and understands when the code is ready to be shared and delivered. Applies coding patterns and best practices to write code (e.g., leveraging state-of-the-art generative artificial intelligence [GenAI], approaches to source code organization, naming conventions). With guidance from more experienced colleagues, identifies and escalates blockers or unknowns during the development process, and communicates how they will impact timelines. Design Understands proposals and develops an understanding of how to apply them under the technical leadership of others. With managerial guidance, tests and explores various design options for a product/solution feature, outlining strengths and weaknesses of each option. Produces code to test hypotheses for technical solutions and assists with technical validation efforts. Helps with and participates in the development of design documents that support simple user stories with oversight. Develops an awareness of the current technology landscape. Escalates findings from investigations to team members for design decisions. Learns about the implications of performance, scalability, resiliency, cost of goods sold (COGS), and other requirements and expectations in systems architecture. Begins to uphold Microsoft standards of security, privacy, and other compliance requirements in systems architecture. Develops an understanding of the importance of building solutions that expand upon the work of others. Contributes to the refinement and integration of feedback in product features by escalating findings from analyses to inform decisions regarding the engineering of products. Supports the identification of dependencies, and their incorporation into the development of design documents for a product feature with oversight. Learns and helps to actively identify other teams and technologies to leverage, how they interact, and where their own system or team can support others. Learns about downstream interactions between systems. Collaborates with others to understand and execute a defined test strategy that ensures solution quality, prevents regression from being introduced into existing code. Assists with executing test plans that incorporate security testing to validate security invariants (including negative cases) as assigned. Builds testable code for a feature under guidance from more experienced peers. Understands the most common types of tests and test strategies that can be done for the code for their feature, and begins to develop an understanding of testing architectures used both across Microsoft and across the industry. Leverages artificial intelligence (AI) tools for test automation with direct managerial oversight. Engineering Excellence Learns about and helps to ensure the correct processes are followed to achieve a high degree of security, privacy, safety, and accessibility. Contributes to efforts to check for visible evidence (e.g., audit trail) to demonstrate compliance for product features. Develops understanding of the implications of onboarding new technologies following expectations of compliance at Microsoft. Develops an understanding of global and local regulations for technologies and system applications.Develops an understanding of and applies security best practices and establishes code invariants to model "security as code," ensuring each layer is independently secure, and minimizing risk with direct supervision. Begins to adopt security standards for clear security code review practices for a product feature that align with design and engineering principles to raise the security hardening for both protections and detections. Supports efforts to incorporate deployment gates on security controls, and scanners for a product feature to prevent regressions and/or vulnerabilities that would have customer impact. Includes required security monitoring to ensure detection of violations with direct guidance. With direct supervision, contributes to working with relevant security partners to define security promises and security invariants while factoring in attacker/investigator personas for security monitoring and telemetry needs, ensure threat models and premortems validate upstream and downstream assumptions and security invariants, establish security breach drills and security incident response processes (e.g., impact analysis, containment), and ensure that artificial intelligence (AI) safety features are implemented for the AI production systems tied to a product feature.Works with partner teams to ensure a product feature works well with the components of the partner team with direct supervision, supporting team efforts to ensure proper end-to-end testing, live-site coverage, scalability, performance, and DRI escalation pathways are established before going live. Learns to develop and contribute to automation within production and deployment of a product feature. Runs code in simulated, or other non-production environments to confirm functionality and error-free runtime for products with oversight.Develops knowledge of and learns to apply best practices to build code based on well-established methods and secure design principles. Learns about customer scaling requirements and application of best practices for meeting scaling needs and performance expectations and security promises. Reviews current developments and proactively seeks new knowledge that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale. Develops collateral materials for learning and literary sessions used to raise awareness on relevant engineering design principles (e.g., security, testability, performance, scalability, accessibility, product knowledge), with some oversight.Learns about, shares new ideas, and leverages software developer tools to create, debug, and maintain code for features. Identifies whether open source or internal code is available for addressing coding needs for a set of product features, and reuses it in a responsible manner where applicable with some guidance. Implement Reviews work items to increase knowledge of product features in partnership with appropriate stakeholders (e.g., technical program managers) with guidance from more experienced peers. Supports team efforts on breaking down work items into tasks and providing estimation. Escalates issues that might cause a delay. Assists with ensuring required security protections and detection processes are accounted for in planning. Supports team efforts for ensuring project plans adhere to security, privacy, and compliance requirements with direct supervision. Ensures assigned code for a product feature is properly flighted for quicker mitigation of production incidents with managerial oversight. Contributes to calculating capacity for planning, accounting for appropriate failover and backup/restore mechanisms for disaster recovery for a product feature with guidance from more experienced peers. Learns about and begins to make considerations for efficient operation of a product feature after it is live with direct managerial oversight. Supports efforts to establish a rollback plan for a feature as instructed.Learns about and supports deployment to customers by following the correct measures to push features out to customers. Follows safe change deployment practices (e.g., ensuring that flights are set correctly) for their team to minimize adverse impact to users and other services with managerial guidance. Learns about and applies best practices for the deployment of features safely with managerial oversight and/or guidance from more experienced peers. Contributes to monitoring dependency status and ensuring that only the latest, secure versions are deployed. Identifies when rollback plans should be enacted for a product feature with direct supervision. Contributes to building deployment infrastructure to allow developers' private builds for a product feature to be tested in a production-like environment. Reliability and Supportability Acts as a designated responsible individual (DRI) in monitoring a system/product feature/service for degradation, downtime, or interruptions for simple problems, and recommends actions to restore system/product/service by following the playbook. Escalates more complex problems to other DRIs as to status. Responds within service level agreement (SLA) timeframe. Escalates issues to appropriate owners. Contributes to operations of live site service with direct supervision, following security best practices when responding quickly to mitigate issues while using the minimum required permissions to do so that arise on a rotational, on-call basis. Identifies solutions and mitigations to simple issues impacting performance or functionality of live site services and escalates appropriately. With managerial oversight, leverages and provides recommendations for improvements to troubleshooting guides (TSGs), wikis, test-coverage, and telemetry to make on-call better. Supports team efforts to enable secure operations, security monitoring, and integration with live site investigation activities with direct supervision. Contributes to identifying opportunities (e.g., lunch talks, automation, practices, tools) that can be leveraged to improve the live site experience with guidance from more experienced colleagues. Learns to contribute to efforts to integrate logging and instrumentation for gathering telemetry data on system behavior such as performance, reliability, availability, usage, and safety mechanisms, and for allowing monitoring and investigating security-related concerns and scenarios for both live and A/B experiments for products, services, and offerings. Begins to support integrating logging and telemetry into the product to allow for monitoring and investigating security-related concerns and scenarios with direct oversight. Learns to contribute to efforts to classify, and analyze data with some oversight on a range of metrics (e.g., health of the system, where bugs might be occurring). Learns how to and begins to leverage telemetry feedback and effectiveness to support the improvement of subsequent monitoring designs with direct guidance from more experienced colleagues. Considers the privacy implications of telemetry code changes, and adding new data points with guidance from more experienced colleagues. Understand User Requirements Contributes in partnership with appropriate internal stakeholders (e.g., product manager, privacy/security subject matter expert, technical lead) to understand customer/user requirements for a feature. Contributes to incorporating customer insights into solution fixes with oversight from more experienced peers. Begins to incorporate unwritten requirements, such as appropriate continuous feedback loops that measure actionable, quantitative (e.g., customer value, usage patterns, solution performance) and qualitative (e.g., accessibility, globalization) indicators of value. Develops an understanding of the security and privacy needs of the customer who will be using the feature.Qualifications
Required/Minimum Qualifications:
- Bachelor's Degree in Computer Science, or related technical discipline with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.
- One year experince in C, C++, C#, Java, JavaScript, or Python coding.
- OR equivalent experience.
Other Requirements
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
- Master's Degree in Computer Science or related technical field with proven experience coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay
Microsoft will accept applications for the role until September 25, 2025.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Software Engineer (Networking & Telemetry Systems)
Office
Mountain View, California, United States
Full Time
98k - 209k USD/year
September 19, 2025