Posted November 24, 2020

Senior Site Reliability Engineer

Episerver

USA Remote Full Time

This job is being posted to Silicon Florist because it is potentially open to remote candidates. Feel free to contact me if you’d like to learn about...

Expand

This job is being posted to Silicon Florist because it is potentially open to remote candidates. Feel free to contact me if you’d like to learn about Episerver before applying. I am a PDX-based remote Episerver employee. [email protected]

Senior Site Reliability Engineer

We start the new chapter of Episerver as we proudly join forces with Optimizely to create a new wave of digital leaders through transforming digital experience creation and optimization. Episerver is consistently ranked as a market leader in digital experience creation, supporting the digital journeys of 9,000+ global brands, while Optimizely is the world’s leader in experience optimization. Combined, these two powerhouses create the most advanced digital experience platform in the industry. The combination of creation and optimization will enable companies across all segments and industries to take advantage of what content, commerce, personalization and experimentation can bring to their business and to their customers.

The scale of our product has created tremendous potential for growth with Episerver + Optimizely – growth of teams, growth of influence, and growth of personal careers. If you are looking to work on the next generation of digital technologies in a fast-paced, hyper-growth environment, apply! We’re just getting started...

Episerver Engineering Operations is a rapidly growing part within the organization. We are in the process of building our teams, tools and systems as part of our mission to build the leading digital experience platform.

We enable Episerver to go fast by providing real time feedback on production systems. We work side by side with the product family and platform developers to maintain and improve services and performance. We live the company values (Dependable, Collaborative and Simple) with a strong customer focus and possess a healthy sense of urgency. We are a heavily data driven team, utilising a variety of data collection, enrichment, analytics and visualisations to learn about our complex systems.

We also live the 'Play, as a team' value by having a strong focus on sharing learning experiences from the front line with the development teams. So, the options for people in the team are vast. If you like mastering a domain and going deep, we need you. If you can juggle three tasks and coordinate multiple people in the heat of an incident, we need you. If you love the benefits of process and methodical improvement, you will love it here. If you want to keep your head down, headphones on and bash out code to support the team, we have a spot for you too.

As an SRE in one of our teams, you will work to enhance availability, performance and stability of Episerver services as well as automating away repetitive work.

You'll also respond to pings, pages and alerts to investigate issues in our products that you can really sink your teeth into. You'll be working on non-production and production environments, monitoring, data collection and configuration management, as well as disaster recovery planning, capacity engineering, reliability improvement initiatives and platform automation.

As a Senior Site Reliability Engineer you will:

Serve as level 3 support resource for responsible systems
Troubleshoot and resolve end-user issues independently and efficiently
Build knowledge base around common production support issues
Troubleshoot and fix the system when it breaks
Reduce the impact of errors and automate repetitive tasks
Maintain services once they are live by measuring and monitoring availability, latency and overall system health
Author and maintain documentation for related processes, procedures and system events
Identify areas of improvement within our systems and perform enhancements
Share the responsibility of being on-call
Engage in the entire lifecycle of services—from inception through operation and continuous integration
Lead incident triage, analysis, and resolution
Drive Root cause analysis and corrective action completion to help eliminate disruption of services and consequently to improve the day-to-day operations of the organization

About you:

Expert level troubleshooting skills across different levels of the stack
Scripting and software development across one or more programming languages (Powershell / Bash / Python)
Good understanding of cloud architecture both in Windows- and Linux based systems
Hands on experience with cloud infrastructure such as Azure or AWS minimum of 2 years
Deep expertise in monitoring distributed systems application architectures
Exposure to and maintenance of CICD and orchestration tools at scale (Azure Automation, Octopus Deploy, Salt, Puppet, Chef etc.)
Diagnosing and troubleshooting user facing service outages
Exposure to system and application level telemetry for large distributed cloud architectures
Diagnosing and resolving problems in high-throughput web applications and network services

We would be very excited if you have experience with:

ElasticSearch
Understanding of ITIL terminology for incident and problem management
GIT
Kubernetes
Azure DevOps
Azure Active Directory

About us

Agile Product and Software Development
Consistently demonstrate our security values by being ISO 27001 certified and GDPR compliant
International work environment, collaborating with colleagues in the USA, UK, Germany, Vietnam and more countries
Permanent contract
Very bright and generous office space in the heart of Stockholm
The ability to shape your work through your curiosity and expertise
Regular team/company events and lunch gatherings to celebrate work anniversaries, acknowledgement awards, and to cheer the wins
Monthly hack-days when you work on your own project whatever that is...

This listing expired on Jan 08. Applications are no longer accepted.

Below are some other jobs we think you might be interested in.

Platform Engineer (from Junior to VP)
- HILOS
- Portland, Oregon, USA
Jun 30