Operations Engineer, HPC Network

Related keywords: network engineer remote jobhybrid remote jobmental health remote job

This page contains product affiliate links.

Introduction to CoreWeave

CoreWeave is recognized as the AI Hyperscalerâ„¢, providing a cloud platform that leverages cutting-edge technologies to support the advancing field of artificial intelligence. With the aim of delivering superior computational solutions, CoreWeave has been a frontrunner in meeting the needs of enterprises and AI labs, particularly through high-performance computing (HPC) systems. The company boasts a robust footprint of data centers across the United States and Europe since its inception in 2017. In 2024, CoreWeave earned a place on the TIME100 list, showcasing its impact in the tech industry.

Job Overview

CoreWeave is currently on the lookout for a dedicated Operations Engineer to enhance its HPC Networking Team. This role focuses on the deployment, monitoring, troubleshooting, and maintenance of large-scale InfiniBand fabrics, crucial for facilitating industry-leading AI workloads. The role promises to be both challenging and rewarding, particularly for candidates who thrive in dynamic environments and enjoy solving complex problems.

Responsibilities

The Operations Engineer will take on several key responsibilities, including:

  • Supporting the setup and operational oversight of extensive InfiniBand networks.
  • Regularly monitoring the performance and health of the fabrics, involving critical components like switches and host adapters.
  • Investigating operational issues such as network connectivity problems and performance bottlenecks.
  • Collaborating closely with onsite staff and customer teams for the installation and deployment of InfiniBand systems.
  • Performing routine maintenance and upgrades on essential network components.

These responsibilities reflect the critical nature of the Operations Engineer role, emphasizing the need for a strong operations mindset and the ability to handle complex issues with effective collaboration.

Required Skills

Minimum Qualifications

  • At least 1 year of experience with InfiniBand or similar networking technologies.
  • Solid understanding of fundamental networking concepts, including architectures and troubleshooting methodologies.
  • Familiarity with Linux system administration.
  • Proficiency in at least one scripting language.

Preferred Qualifications

  • Experience with Nvidia UFM or similar fabric management tools.
  • Knowledge of SLURM job scheduler in HPC environments.
  • Exposure to monitoring and visualization tools like Grafana or Prometheus.
  • Experience with automation frameworks such as Ansible.
  • Understanding data center operations, including server racks and cabling.
  • Scripting skills in Python or Bash.

Salary and Benefits

CoreWeave offers a competitive salary range for this Operations Engineer position, specifically between $90,000 and $110,000. The compensation varies based on several factors, including the candidate's experience and performance.
In addition to the salary, CoreWeave provides a comprehensive benefits package designed to support the well-being of its employees. Notable benefits include:



  • Medical, dental, and vision insurance fully covered by CoreWeave.
  • Company-paid life insurance and voluntary supplemental options.
  • Short and long-term disability insurance.
  • Health Savings Accounts and Flexible Spending Accounts.
  • Tuition reimbursement and mental wellness resources.
  • Paid parental leave and childcare support.
  • 401(k) plans with employer matching.

CoreWeave also fosters a creative workplace culture that prioritizes innovation and provides a hybrid working environment. This flexibility allows employees to adjust their in-office and remote work schedules based on personal preferences.

Company Culture

CoreWeave promotes a hybrid work culture, encouraging a mix of onsite collaboration and remote flexibility. The company values the creativity and connections fostered within its spaces, emphasizing their importance to business success. CoreWeave also commits to ensuring comprehensive onboarding for remote employees to integrate them effectively into their teams.
CoreWeave’s culture prioritizes inclusivity and equal opportunity, ensuring all qualified candidates receive fair consideration regardless of background or identity. The company actively seeks to accommodate applicants with disabilities, committing to providing necessary support throughout the hiring process.

Closing Thoughts

For job seekers in the tech field, especially those specializing in HPC and networking, this position presents a unique opportunity to work at the forefront of advancements in computational technologies. The role enables candidates to tackle significant challenges, contribute to essential projects, and be part of a rapidly evolving team. CoreWeave, with its solid reputation and commitment to fostering a supportive environment, is undoubtedly an ideal choice for those looking to enhance their careers in operations engineering.



This job offer was originally published on himalayas.app

CoreWeave

United States

Operations

Full-time

February 10, 2025

12 views

0 clicks on Apply Now

Share


Similar job offers


This job offer summary has been generated using automated technology. While we strive for accuracy, it may not always fully capture the nuances and details of the original job posting. We recommend reviewing the complete job listing before making any decisions or applications.