Site Reliability Engineering - Berlin

Nur für registrierte Mitglieder Berlin, Deutschland

vor 17 Stunden

Default job background
Ganztags 65.000 € - 95.000 € (EUR) pro Jahr *
* Diese Gehaltsspanne ist eine Schätzung von beBee
About Us · 1GLOBAL is a technology-driven global mobile communications provider, delivering global connectivity solutions to enterprises and consumers. Powered by a best-in-class telecom platform – including its own owned and operated global mobile core network, fully fledged in- ...
Jobbeschreibung

About Us
1GLOBAL is a technology-driven global mobile communications provider, delivering global connectivity solutions to enterprises and consumers. Powered by a best-in-class telecom platform – including its own owned and operated global mobile core network, fully fledged in-house developed eSIM technology, and an extensive portfolio of telecom licenses – 1GLOBAL operates as a fully regulated telecommunications provider across 40 countries worldwide.

We serve many of the world's leading banks, enterprises, and digital-first businesses, including neo-banks, global fast moving consumer goods companies, travel leaders, and payment service providers. Today, 1GLOBAL connects more than 70 million people and devices globally, enabling our customers to launch, scale, and innovate with confidence in the mobile ecosystem.

1GLOBAL is a profitable, fast-growing business. With full-year revenues in 2025 exceeding US$200 million and profits of over US$25 million, we generate strong cash flows to fund our growth allowing us to continuously invest in infrastructure, platform innovation, and global expansion. Recent years have marked a defining phase in our journey, with major enterprise and mass-consumer client wins accelerating our evolution into a global mobile connectivity powerhouse, purpose-built to enable consumer brands to enter and succeed with their own  aspirations to offer telecommunications services to their clients.

Founded in 2022 by experienced technology entrepreneurs, Hakan Koç and Pyrros Koussios, 1GLOBAL has rapidly emerged as a European technology leader shaping the future of global telecommunications. We operate as a fully regulated Mobile Virtual Network Operator (MVNO) in 12 countries and as a regulated telecom operator in an additional 28 markets. Headquartered in the Netherlands, with world-class R&D hubs in Lisbon, Berlin, and São Paulo, our team of close to 500 experts across 15 countries is united by a single ambition: to redefine global mobile connectivity through technology, scale, and execution excellence.

About the Team

We are looking for a talented Site Reliability Engineering (SRE) Team Lead to join our Technology Department.

As the SRE Team Lead, you will be responsible for ensuring the stability, scalability, and reliability of our global infrastructure and services across both cloud and on-prem environments. You will lead a team of SREs focused on service availability, resilience, and operational excellence, driving a data-driven reliability culture based on SLIs, SLOs, and error budgets.  Your mission will be to proactively identify weaknesses across systems and improve reliability through redundancy testing, automation, and observability. You will build tools and processes to automatically detect, prevent, and recover from incidents — ensuring our services remain reliable and performant for customers around the world.

This role collaborates closely with DevOps, Infrastructure, IP Network, and Security teams to maintain carrier-grade reliability standards across all layers of our platform.

About the Role

  • Lead and mentor a team of Site Reliability Engineers, setting clear priorities, goals, and reliability metrics
  • Define, measure, and maintain SLIs and SLOs for core infrastructure and customer-facing services
  • Plan and execute redundancy and resilience testing across service, infrastructure, and networking layers — validating failover, HA configurations, and disaster recovery readiness
  • Design and implement automated recovery mechanisms, self-healing workflows, and intelligent alerting systems
  • Drive incident response, root-cause analysis, and blameless post-mortems, and ensure implementation and tracking of corrective and preventive actions derived from them to achieve continuous improvement
  • Develop and enhance observability (metrics, logs, traces) using Prometheus, Grafana, Loki, and OpenTelemetry
  • Collaborate with Infrastructure and DevOps teams to ensure deployment safety, rollback policies, and configuration consistency
  • Proactively identify weaknesses through fault-injection, load, and chaos testing
  • Continuously reduce operational toil through automation and reliability tooling
  • Establish on-call practices, improving alert quality, runbooks, escalation procedures and incident management processes
  • Conduct capacity planning, performance benchmarking, and resilience audits across systems
  • Ensure compliance with security, reliability, and availability standards
  • Create and maintain internal documentation, playbooks, and operational guidelines for peers and users
  • Built and managed cloud cost-optimization frameworks, including reserved capacity planning, autoscaling design, storage tiering, workload right-sizing, and continuous anomaly detection

Requirements

About You

Must-haves

  • A minimum of 7 years of experience in Site Reliability, Systems, or Infrastructure Engineering (including 2+ years in a SRE role and 2+ years in a leadership role)
  • Strong expertise in Linux systems engineering, distributed systems, and networking
  • Proven experience building and running high-availability, mission-critical production systems
  • Hands-on experience with redundancy and failover testing, disaster recovery, and high-availability architecture validation
  • Deep understanding of monitoring, observability, and incident management principles
  • Experience with Prometheus, Grafana, Loki, Thanos, and OpenTelemetry or similar tools
  • Proficiency in Python, Go, and Bash for automation and reliability tooling
  • Strong knowledge of Kubernetes, container orchestration, and service mesh architectures
  • Experience with AWS (EKS, EC2, VPC) and on-premises infrastructure integration
  • Proficiency in Infrastructure as Code tools such as Terraform
  • Understanding of networking fundamentals (routing, load balancing, BGP, DNS, VXLAN, etc.)
  • Excellent analytical and problem-solving skills, capable of leading under pressure
  • Strong communication and collaboration skills across distributed and cross-functional teams


Nice-to-haves

  • Experience in telecom, carrier-grade, or large-scale distributed systems environments
  • Hands-on experience with chaos engineering and automated failure-scenario validation (e.g., simulating link or node failures)
  • Strong understanding of high-availability networking concepts
  • Background in capacity planning, traffic engineering, and multi-region failover
  • Experience building reliability dashboards and integrating SRE metrics into business KPIs or compliance reports
  • Familiarity with security and resilience standards (ISO 27001, NIST SP

Benefits

Why 1GLOBAL?

  • Growth Opportunities: Advance your career in one of the fastest growing telecommunications companies, expanding over 100% year-on-year under the leadership of successful tech entrepreneurs.
  • Major Transaction Exposure: Be in the driver's seat for transactions that will have an impact on the future telco industry.
  • Work with a Talented Team: From the Board and the Founders to the Senior Management Team, you will collaborate daily with the most capable and renowned external advisors, and constantly being exposed to talented and driven individuals.
  • Dynamic Work Environment: Thrive in a collaborative, fast-paced workplace where innovation is encouraged, and every contribution counts.
  • Professional Development: Work alongside industry experts to enhance your skills and knowledge in a cutting-edge field.
  • International Experience: Gain opportunities to work in different 1GLOBAL offices around the world as you grow within the company.
  • Open Communication Culture: Join a team where your ideas are heard, and open dialogue is encouraged, fostering a supportive and transparent work environment.
  • Get Things Done Attitude: Be part of a results-driven team that values efficiency, creativity, and the drive to make a tangible impact in the industry.


1GLOBAL is an equal opportunity employer, we value your character as much as your talent. Diversity drives our innovation, and we offer a collaborative, dynamic, and international work environment. We are excited for you to join our mission to revolutionise connectivity globally.



Ähnliche Jobs

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We help the world run better At SAP we keep it simple you bring your best to us and we'll bring out the best in you We're builders touching over 20 industries and 80% of global commerce and we need your unique talents to help shape what's next The work is challenging but it matte ...

    Berlin, Berlin

    vor 3 Wochen

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    Job summary · The ideal blend of stability and flexibility. A genuinely human employer that cares for people and the planet. · ...

    Berlin Employee

    vor 1 Woche

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    Site reliability engineer role at Wire to complement customer operations team. · We are looking for a site reliability engineer to support customers deploying our product and its dependencies.Join us at Wire and build secure communication infrastructure and systems. · ...

    Berlin, Berlin

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We are seeking a Site Reliability Engineer to join the Observability group inside our Platform Engineering domain. · In This Role, You Will · You'll build the tools for monitoring and measuring infrastructure, microservices, and sometimes totally unique workloads. · You'll put th ...

    Berlin, Berlin

    vor 2 Wochen

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We are looking for a talented Site Reliability Engineering (SRE) Team Lead to join our Technology Department. · We are open to hiring this role in Berlin, Germany. · ...

    Berlin, Berlin

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    In a part of our Social Dating & Video segment we rely on a real-time, high-quality video platform – and your mission as a Site Reliability Engineer (all genders) is to keep it fast, stable, · and scalable. You'll work closely with mainly US based (East Coast) backend, · infrastr ...

    Berlin Metropolitan Area

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We are an ambitious team of engineers looking for mission-driven people to join our European teams – and apply their skills to solve complex problems.Helsing is a defence AI company protecting democracies. · Designing implementing managing Kubernetes infrastructure · ...

    Berlin, Berlin

    vor 2 Wochen

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We are committed to building a diverse workforce and to creating excellent opportunities for talented engineers and technologists. Our tech teams and business units use agile ways of working to create #GlobalHausbank solutions from our home market. · ...

    Berlin

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    At Zattoo we're building the TV platform of the future.We own the full chain - from ingest encoding transcoding packaging to delivery - our engineers have the unique opportunity to shape how TV is experienced in future. · ...

    Berlin

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineering

    Nur für registrierte Mitglieder

    We are looking for a talented Site Reliability Engineering (SRE) Team Lead to join our Technology Department.As the SRE Team Lead, you will be responsible for ensuring the stability, scalability, and reliability of our global infrastructure and services across both cloud and on-p ...

    Berlin

    vor 1 Woche

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We're building the TV platform of the future. With our ever-growing demand for unicast TV delivery, we're scaling out our custom-built infrastructure to deliver live and on-demand video at multi-Tbps scale. · ...

    Berlin Ganztags

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    The ideal blend of stability and flexibility.A genuinely human employer that cares for people and the planet. · This is the perfect platform to take your career where you want. · Build and support on-premise infrastructure to ensure smooth operation of critical systems. · Analyze ...

    Berlin

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We are looking for a Site Reliability Engineer to join our highly motivated team of experts dedicated to combating cyber security threats.We foster a culture that highly values creative ideas, agile working methods, initiative and participation. · ...

    Berlin

    vor 1 Woche

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We are looking for a Site Reliability Engineer who wants to redefine how infrastructure is managed. · The Role · You will join a team where "Infrastructure as Code" is the law, · not just a suggestion.You won't just stare at dashboards waiting for a red light;you will build syste ...

    Berlin 64.000 € - 80.000 € (EUR)

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    This isn't your regular job. Almedia is a place where those who want to push harder can accelerate their careers faster than anywhere else. · We are building the future of marketing by rewarding our community of over 60 million users for engaging with our advertisers' products. · ...

    Berlin

    vor 5 Tagen

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We are looking for a Site Reliability Engineering team plays a significant role in delivering on the promise of a great cloud gaming experience to our customers. · Lead team technical discussions, especially around ongoing improvements in Reliability and ScalabilityMentor junior ...

    Berlin Ganztags

    vor 2 Wochen

  • In der Firma arbeiten

    Reliability & Test Engineer

    Nur für registrierte Mitglieder

    Make sure the system never lies, and rarely fails, no matter the complexity. As Reliability & Test Engineer, you will own the design and enforcement of reliability, testing, and safety practices across Dunia's facilities. · ...

    Berlin

    vor 2 Wochen

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    We are seeking a Site Reliability Engineer to join the Observability group inside our Platform Engineering domain. · Build tools for monitoring and measuring infrastructure,microservices,and unique workloads. · Focus on enhancing the developer experience in implementations. · ...

    Berlin

    vor 1 Monat

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    PlayStation isn't just the Best Place to Play — it's also the Best Place to Work. · ...

    Berlin

    vor 2 Wochen

  • In der Firma arbeiten

    Site Reliability Engineer

    Nur für registrierte Mitglieder

    The ideal blend of stability and flexibility. A genuinely human employer that cares for people and the planet. · We're building the TV platform of the future. With our ever-growing demand for unicast TV delivery, we're scaling out our custom-built infrastructure to deliver live a ...

    Berlin

    vor 2 Wochen