How monitoring works at scale? Monitoring tools, components and mentality in Facebook

With enough scale - everything is a problem, with Facebook scale everything requires extremely accurate design in order to meet the enormous production needs.

In this talk I'll try and cover how Facebook handles one of it's biggest challenges - monitoring a huge scale environment.

  • What tools & systems are at play
  • How we leverage our monitoring infra to solve production issues
  • How developers and devops share oncall rotation
  • How we treat monitoring as data
  • What we choose to be paged and what not
  • Compare Facebook infrastructure to other open source tools out there

About the speaker - Ran Leibman

My name is Ran and I'm a Production Engineer in the Facebook Tel-Aviv office. I'm working on multiple projects in world scale, including: Internet.org, Onavo, Messenger, and more...

In the past I've worked as:

  • DevOps Engineer at Watchdox
  • Linux system administrator at Altair-Semiconductor
  • System Admin at IDF

Event Leaders

GigaSpaces Technologies

Platinum Sponsors

AOL On Monitis

Gold Sponsors

Cloudify by GigaSpaces JFrog Chef Stratoscale BigPanda Wix Engineering Outbrain SimilarWeb

Silver Sponsors

Librato Metrics Rounds PagerDuty Kenshoo Anodot