Celesta Portfolio
Careers

Celesta and our portfolio of startups are always hiring exceptional talent!
Browse open jobs below to find your next career move.

Senior Site Reliability Engineer, AI Infrastructure

SambaNova Systems

SambaNova Systems

Software Engineering, Other Engineering, Data Science
Bengaluru, Karnataka, India
Posted on Tuesday, April 25, 2023

The third era of AI has arrived, powered by Generative AI. Generative AI is achieving step-function increases in scale, versatility, and accuracy compared to legacy AI technologies, presenting an opportunity for organizations to fundamentally transform their business and operations.

SambaNova Suite™ is enabling organizations and enterprises to achieve the transformative promise of these new AI technologies with a fully integrated hardware-software system that delivers innovation across the full AI stack, including the most accurate generative AI models, optimized for enterprise and government. This creates the AI backbone for the next 10 years and beyond.

Working at SambaNova

This role presents a unique opportunity to shape the future of AI and the value it can unlock across every aspect of an organization’s business and operations. The Service Operations team at SambaNova Systems is responsible for building and operating the platform and infrastructure that enables us to deliver our groundbreaking capabilities to enterprise customers.

Job Description

SambaNova is hiring a Senior Site Reliability Engineer, AI Infrastructure who will lead key system engineering and automation functions, enhancing our capabilities to provide a reliable and scalable service for customers, in a hybrid deployment pattern.

This individual will be responsible for:

  • Assume broad responsibilities for the successful delivery of our SambaNova services in a hybrid model including but not limited to, deployment, configuration, integrations, and ongoing operations
  • Systems and application administration for multiple customer-facing production environments (hosted and on-premise), with a continued focus on improving efficiencies, availability, and supportability.
  • Take ownership of ongoing updates, upgrades, and patches on customer environments
  • Lead efforts to triage, debug and fix issues related to network, storage, scheduling, applications, and systems, for proactive and reactive incident resolution and root cause analysis.
  • Augment ongoing efforts to design and develop automation for deployments, updates, and upgrades of the entire SambaNova software stack
  • Build the systems and tools for centralized command and control of distributed environments
  • Partner and collaborate with product and engineering teams to improve the security posture and operational readiness of our systems with the flexibility to integrate into unique customer environments.
  • Participate in on-call rotation responsibilities

Basic Qualifications

  • Bachelor and/or Master in CS /EE or related field
  • 5+ years of hands-on experience as an SRE with a focus on systems and infrastructure for cloud/SaaS production requirements
  • Extensive experience building, configuring, securing and administering Linux systems large-scale production environments
  • Strong scripting /programming skills (Python preferable) with experience with automated deployment systems, e.g. Ansible, Terraform, etc.
  • Systematic problem-solving approach to troubleshooting, and the desire to solve the root cause of common problems in 24x7 environments

Additional Required Qualifications

  • Extensive experience building, configuring, securing, and administering Linux systems in large-scale production environments
  • Strong scripting /programming skills (Python preferable) with experience with automated deployment systems, e.g. Ansible, Terraform, etc.
  • A systematic problem-solving approach to troubleshooting, and the desire to solve the root cause of common problems in 24x7 environments
  • Deep understanding of DNS, DHCP, LDAP, NFS, Kerberos, PAM, PXE, SNMP, SSH, HTTP/S, NTP, troubleshooting network performance issues
  • Knowledge of software development processes and methods, CI/CD pipelines, and experience with common version control software
  • Knowledge of virtualization, multiple hypervisor technologies, Kubernetes cluster administration, and management
  • Must have past experience deploying and managing systems and infrastructure in data centers with the ability to debug and resolve recurring hardware issues
  • Experience deploying applications and managing infrastructure in public clouds (AWS, Azure, GCP)
  • Experience with monitoring and logging systems and the ability to identify new technologies as appropriate
  • Configuration and maintenance of web servers, load balancers, databases, storage systems, and messaging systems
  • A passion to design for high availability and scale, with the discipline and desire for extensive automation
  • Strong communication skills with the ability and willingness to work with diverse teams, and customers, across multiple time zones

Preferred Qualifications

  • Experience working in a high-growth startup
  • A team player who demonstrates humility
  • Action-oriented with a focus on speed & results
  • Ability to thrive in a no-boundaries culture & make an impact on innovation

Submission Guidelines

Please note that in order to be considered an applicant for any position at SambaNova Systems you must submit an application form for each position for which you believe you are qualified.

If you are a new, recent (within the last two years), or upcoming college graduate and are interested in opportunities with SambaNova Systems, please apply through our University job listings.

EEO Policy

SambaNova Systems is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard basis of age (40 and over), color, disability, gender identity, genetic information, marital status, military or veteran status, national origin/ancestry, race, religion, creed, sex (including pregnancy, childbirth, breastfeeding), sexual orientation, and any other applicable status protected by federal, state, or local laws.

Customers turn to SambaNova to quickly deploy state-of-the-art AI capabilities to meet the demands of the AI-enabled world. Our purpose-built enterprise-scale AI platform is the technology backbone for the next generation of AI computing. We enable customers to unlock the valuable business insights trapped in their data. Our flagship offering, SambaNova Suite™, provides the most accurate generative AI models, optimized for enterprise and government organizations, deployed on-premises or in the cloud, and adapted with an organization’s data for greater accuracy