Barracuda

Cloud Site Reliability Staff Developer

Ottawa, ON
October 6, 2025
Application ends: October 31, 2026
Apply Now
Deadline date:
October 31, 2026

Job Description

What Will You Be Working On

 

 

  • Application Infrastructure Design: Engage with internal customers to understand application design and cloud infrastructure needs, focusing on scalability, security, and reliability
  • Infrastructure Automation: Create and design templates, tools, and accelerators for deployment infrastructure to support development teams
  • Architectural Leadership: Lead architectural decisions and approve major system design changes, implementing contemporary architectural patterns
  • Platform Development: Design and develop self-service platforms for Product Engineering teams
  • Service Level Management: Define, implement, and track SLIs, SLOs, and SLAs across services
  • Incident Management: Lead incident response processes and conduct post-incident learning reviews
  • Disaster Recovery: Develop and maintain disaster recovery and business continuity plans
  • Technical Design: Plan and implement non-functional requirements including security, performance, deployment frequency, and monitoring
  • Solution Architecture: Oversee architecture snapshots, solution design, prototyping, and code reviews
  • Technology Stack Implementation: Drive modern solutions using AWS, Kubernetes, GitHub Actions, Jenkins, Terraform, Pulumi, and other current technologies
  • Data Infrastructure: Build support infrastructure for global data pipeline and storage using Databricks, Spark, and ELK stack
  • Deployment Automation: Lead initiatives to convert manual deployments to automated processes
  • Observability Systems: Build and enhance monitoring and reliability systems
  • On-Call Duties: Participate in on-call rotation to ensure 24/7 system reliability

 

 

What You Bring To The Role

 

 

  • 10+ years hands-on infrastructure design experience, including 5+ years cloud development and 3+ years in SRE/DevOps roles
  • Cloud Infrastructure: Deep expertise in AWS cloud infrastructure development, security, and operations with proven success in large-scale production environments
  • Infrastructure as Code: Extensive experience with Terraform, CloudFormation, Pulumi, and Crossplane for cloud infrastructure automation
  • CI/CD & Automation: Strong background with GitHub, GitHub Actions, Jenkins, Packer, Ansible, and Puppet
  • Deployment Patterns: Expertise in blue/green, canary, rolling deployments, and draining strategies
  • Container Orchestration: Comprehensive experience with Docker, Kubernetes, and EKS in AWS environments
  • Programming: Strong coding abilities in Python, Go, Ruby etc.
  • Operating Systems: Advanced Linux knowledge including system internals
  • Observability: Extensive experience with New Relic, Elastic APM, CloudWatch, Prometheus, and Grafana…
  • Data Engineering: Experience with Databricks, Apache Spark, Kafka, and DataStage Problem Solving: Strong systematic debugging and troubleshooting capabilities
  • Certifications: AWS certifications (Solutions Architect, DevOps) and Kubernetes certifications (CKA, CKAD, CKS) a plus
Customer reviews
Write A review

(0 Ratings)

  • 1
    0%
  • 2
    0%
  • 3
    0%
  • 4
    0%
  • 5
    0%

Be the first to review “Cloud Site Reliability Staff Developer”