Senior Service Reliability Engineer (SRE)

Sunday, 27 July 2014

Senior Service Reliability Engineer (SRE) | Twitter, Inc. | New York City, NY

Infrastructure Operations | New York City, NY

Senior Service Reliability Engineer (SRE)
Twitter is looking for a very well-rounded, experienced Reliability Engineer to join a team of senior SREs dedicated to improving the reliability of our end-to-end platform. We work on some of the world’s largest distributed systems -- our core infrastructure receives hundreds of millions of tweets per day and serves tens of billions of API requests. Our other systems serve over 2+ billion search queries per day, render hundreds of millions of ad impressions, and process hundreds of terabytes of log and interaction data daily. This person dive deep into gnarly operational issues, from the programming, systems, automation, and process perspectives. He/she will understand the challenges around rapidly creating, scaling, and managing distributed applications and services, and will be able to work with talented engineers across multiple disciplines to address those challenges.

Responsibilities

Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes
Troubleshoot issues across the entire stack: hardware, software, application and network
Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization
Mentor SREs across the organization on best practices for everything from monitoring to troubleshooting complex code issues
Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services
Participate in code reviews for projects primarily written in Java and Scala, built on open source libraries such as Finagle, and running on both physical and virtualized platforms
Represent the SRE organization in design reviews and operational readiness exercises for new and existing services

Requirements

Solid understanding of systems and application design, including the operational trade-offs of various designs
Strong practical expertise building and supporting event-driven frontend and/or backend systems on JVM (Java and/or Scala)
Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices
Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
Must work well with and be able to influence myriad personalities at all levels
Practical, solid knowledge of shell scripting and at least one scripting language (Python preferred)
Minimum 7 years of managing services in an internet scale *nix environment

Ability to prioritize tasks and work independently
Must be adaptable and able to focus on the simplest, most efficient & reliable solutions
Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills

<pDesired

Ability to lead technical teams through design and implementation across an organization
Experience with existing open source projects such as Scribe, ZooKeeper, and Apache Mesos
B.S. in computer science or similar field

https://about.twitter.com/careers/positions?jvi=oMjUYfwH,Job

IT Jobs | CS Jobs | Direct Links To IT, CS Jobs

Pages

Sunday, 27 July 2014