Posted Mar 17

You love solving hard problems having to do with things like scale, performance and availability. You are passionate about designing and building elegant, innovative solutions, and are excited about working in the cloud computing space. You are someone who thrives on contributing in huge ways with a top-notch team to create products that have a positive impact on all who use it. Metal Toad is looking for talented and keen technologist to join our Managed Services team. You'll be taking on an eclectic mixture of tasks, software and systems, and will be helping to design, build, and maintain the core infrastructure. The ideal candidate will be self-motivated, learn new skills while improving on existing ones, all the while maintaining a positive and "can do" attitude. This candidate will be passionate about open source software, agile practices and DevOps values.

We work primarily on AWS, but are vendor-neutral – Metal Toad manages installations on a variety of public and private clouds.

Responsibilities

Analyzes customer requirements for software components, system availability, security, and performance.

Designs and documents complete cloud hosting systems, including capacity planning, software and instance type selection, allocation, and network design.

Estimates costs of recommended system design.

Builds systems by executing installation, configuration, and testing of cloud resources.

Uses automation and configuration management to ensure repeatability and traceability of changes.

Protects integrity and security of systems by developing access controls, monitoring tools, and provides written evaluation and recommendations for ongoing improvement.

Troubleshoots system hardware, software, networks, and operating systems.

Contributes to definition of best practices, operational policies, and procedures.

Maintains system performance through system monitoring and analysis, performance tuning, and planning for future growth.

Designs and runs load and stress tests, documents outcomes, debugs infrastructure issues, and escalates documented application problems to development team.

Establishes, documents, and tests disaster recovery procedures, documents outcomes, and makes recommendations for ongoing improvement.

Maintains internal systems and customer deployment documentation.

Partners with project managers, technical consultants, software architects, and developers to ensure infrastructure deliverables are validated against the requirements, and all technical hand-offs are documented.

Part of a 24x7 on-call rotation (approx. 1 week per month).

Responds to support tickets and incidents in a timely manner, corresponding to SLA commitments.

Updates job knowledge by participating in educational opportunities, reading professional publications, maintaining personal networks, and participating in professional organizations.

Technologies we use

Linux (CentOS/RedHat, Ubuntu)

Amazon Web Services

EC2 Auto Scaling, RDS, ElastiCache, ELB, S3, EFS, Route 53, CloudFront, CloudFormation, Code Deploy, Lambda

Secondary cloud providers: Azure, Rackspace, KVM on-premise Varnish, CloudFlare, Akamai Puppet and Chef Python, Ruby, Bash Git Vagrant Capistrano Jenkins JMeter Monitoring tools (Stackdriver, Pingdom, New Relic, AppNeta, PagerDuty, Kibana, logentries) Jira We host projects utilizing LAMP, Solr, Tomcat, Python, Node.js, and are open-minded about others.

Qualifications

Experience with the majority of in-use technologies above

Configuration management experience (Puppet or Chef)

Expert Unix administration knowledge

Scripting language experience (Python, Ruby, Bash)

Deep knowledge of TCP/IP and HTTP protocols, and browser dev tools

Experience with web accelerators, load balancers, and reverse proxies

Problem solver and willing to work in an agile/fast-paced environment

Good communication skills and customer oriented

Willing to react to system problems in off hours and weekends if necessary

Work Environment



The noise level in the work environment is usually moderate.