Sunday 27 July 2014

Senior Service Reliability Engineer (SRE) | Twitter, Inc. | New York City, NY


Senior Service Reliability Engineer (SRE) | Twitter, Inc. | New York City, NY

Senior Service Reliability Engineer (SRE)

Infrastructure Operations | New York City, NY

Senior Service Reliability Engineer (SRE)
Twitter is looking for a very well-rounded, experienced Reliability Engineer to join a team of senior SREs dedicated to improving the reliability of our end-to-end platform.  We work on some of the world’s largest distributed systems -- our core infrastructure receives hundreds of millions of tweets per day and serves tens of billions of API requests. Our other systems serve over 2+ billion search queries per day, render hundreds of millions of ad impressions, and process hundreds of terabytes of log and interaction data daily. This person dive deep into gnarly operational issues, from the programming, systems, automation, and process perspectives. He/she will understand the challenges around rapidly creating, scaling, and managing distributed applications and services, and will be able to work with talented engineers across multiple disciplines to address those challenges.

Responsibilities

  • Perform deep dives into both systemic and latent reliability issues; partner with software and systems engineers across the organization to produce and roll out fixes
  • Troubleshoot issues across the entire stack: hardware, software, application and network
  • Drive standardization efforts across multiple disciplines and services in conjunction with embedded SREs throughout the organization
  • Mentor SREs across the organization on best practices for everything from monitoring to troubleshooting complex code issues
  • Identify and drive opportunities to improve automation for the company; scope and create automation for deployment, management and visibility of our services
  • Participate in code reviews for projects primarily written in Java and Scala, built on open source libraries such as Finagle, and running on both physical and virtualized platforms
  • Represent the SRE organization in design reviews and operational readiness exercises for new and existing services

Requirements

  • Solid understanding of systems and application design, including the operational trade-offs of various designs
  • Strong practical expertise building and supporting event-driven frontend and/or backend systems on JVM (Java and/or Scala)
  • Practical knowledge of various aspects of service design, including messaging protocols & behavior, caching strategies and software design practices   
  • Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
  • Must work well with and be able to influence myriad personalities at all levels
  • Practical, solid knowledge of shell scripting and at least one scripting language (Python preferred)
  • Minimum 7 years of managing services in an internet scale *nix environment
  • Ability to prioritize tasks and work independently
  • Must be adaptable and able to focus on the simplest, most efficient & reliable solutions
  • Track record of successful practical problem solving, excellent written and interpersonal communication, and documentation skills
<pDesired
  • Ability to lead technical teams through design and implementation across an organization
  • Experience with existing open source projects such as Scribe, ZooKeeper, and Apache Mesos
  • B.S. in computer science or similar field
 
 
 
 
 

No comments:

Post a Comment