Elizabeth A. Nichols
John Willis is a Director of Ecosystem Development for Docker, which he joined after the company he co-founded (SocketPlane, which focused on SDN for containers) was acquired by Docker in March 2015. Previous to founding SocketPlane in Fall 2014, John was VP of Customer Enablement at Stateless Networks, and prior to that John was Chief DevOps Evangelist at Dell, which he joined following the Enstratius acquisition in May 2013. He has also held past executive roles at Opscode/Chef and Gulf Breeze Software.
This presentation surveys a collection of techniques for detecting anomalies in a DevOps environment. Each of the techniques has strengths and weaknesses that are illustrated via real-world (anonymized) customer data. Techniques discussed include deterministic and statistical models as well as uni-variate and multi-variate analytics. Examples are given that show concrete evidence where each can succeed and each can fail. This presentation is about concepts and how to think about alternative anomaly detection techniques. This presentation is not an academic discourse in math, statistics or probability theory.
Elizabeth A. Nichols (Betsy) is Chief Data Scientist at Netuitive, Inc. In this role she is responsible for leading the company's vision and technologies for analytics, modeling, and algorithms.
Betsy has applied mathematics and computer technologies to create systems for war gaming, space craft mission optimization, industrial process control, supply chain logistics, electronic trading, advertising networks, IT security and risk models, and network and systems management. She has co-founded three companies, all of which delivered analytics to commercial and government enterprises. Betsy graduated with an A.B. from Vassar College and a Ph.D. in Mathematics from Duke University. Check her out on LinkedIn (https://www.linkedin.com/in/elizabethanichols) for more information.
This is the tale of how we iterate on infrastructure, tooling, and product continuously at $startup. It arose from a need for an infrastructure that would allow us to continuously deliver applications; that would be appropriate for distributed systems and a services-oriented architecture; and that facilitated rapidly prototyping, building, and scaling new features. We went from a couple of Ubuntu instances with services running in screen to continuous deployments using Docker, CoreOS, Fleet, and custom tooling (soon to be open sourced). It stars our intrepid heroine, $name, in zir first role as employee #1 at a startup and the entire corpus of Kelsey Hightower's open source contributions as zir trusty sidekick.
It's a tale of experimentation, failure, breaking things, rebuilding things, and finally just running things (with scissors). It runs the gamut of emotions in your typical buddy flick as our stars learn how to run software together. You'll laugh as they try to run Kubernetes in production from day one. You'll cry when they manually rebuild an etcd2 cluster because they have to learn how. You'll be on the edge of your seat when they rebuild an entire AWS infrastructure in 15 minutes. You'll smile when the complete development stack spins up in Docker and no developer ever has to hit production databases again.
The moral of the story: you don't go from zero to Kelsey Hightower overnight. Understand your needs, and use that understanding to describe the ideal tool for the job. Evaluate multiple tools and find the tool or tools with the closest resemblance that allow you to do the most by doing the least. Then experiment until you get it right.
Greg Poirier is the Factotum, Chief Architect, and herder of engineering at Opsee.com. He has done research in network security and artificial intelligence and has extensive career history in systems engineering. When he's not in front of a computer, you can find him in front of a piano or singing.
Our configuration management tools are a key part of our automated infrastructure and keeping it running. So when caring for the heart of automation operations, it's important to make sure they aren't exposed to dangerous elements. How do we do that? And how does that work, especially on prepping with new installs/migrations? My aim is to answer those questions, exploring the setup of a configuration management system with a mind's eye into security and it's principles, including stories about CloudPassage's setup with Enterprise Chef Server and the hurdles faced.
Jamesha Fisher has worked in the Tech industry for over 10 years, with a special interest to security. Graduating with a degree in Information Assurance and Security Engineering, she's lent this experience to her career in Operations and Systems Engineering, including at companies like Google and her present company CloudPassage. In her spare time she's a maker of things musical, delicious, and objects that use binary numbers.
When you have a slow system, getting meta-information about the contents can be quite frustrating. Whether the system is an API, database or source code management system, caching the information can help immensely with performance and the ability to do common tasks such as diff, logging and retrieving/finding data.
She has used git to create an easy-to-access cache, and tools to wrap the git functions to enable users to find what they want and easily see differences between different versions. While keeping information in two places can seem inefficient, it's frequently the case that metadata about "committed" items doesn't change, and when your SCM, API or database is slow your users can find it very frustrating to find the information they're looking for. Additionally, these systems aren't usually tuned for quick retrieval of and comparison of metadata.
She has also created a website for accessing this data, making it easy for users to see the information in a convenient form - and having the metadata easy to access means I can also run metrics and show how specific pieces of the metadata have changed over time.
Come discover how you can help your users find the information they need without frustration.
Kirsten Hunter is an unapologetic hacker and passionate advocate for the development community. Her technical interests range from graph databases to cloud services, and her experience supporting and evangelizing REST APIs has given her a unique perspective on developer success. In her copious free time she's a gamer, fantasy reader, and all around rabble-rouser. She is currently an API Evangelist at Akamai - best job ever! Code samples, recipes, and philosophical musings can be found on her blog at http://www.princesspolymath.com
Even the best designed systems can and will have outages. No matter how well you've hardened your infrastructure and put in place failover or self-healing automation, something you didn't see coming will wreak havoc in your special snowflake of a system. In many cases a human is likely to be a contributing factor. In fact, Gartner has predicted that in 2015, 80% of outages will be caused by people and process issues.
Are you considering the Human element when revisiting incidents and outages with your infrastructure? If so, are you approaching it with a blameless mindset focused on removing the many forms of bias and searching for absolute truth. Do you believe that there is always a root cause to outages or is it more accurate to seek out additional aspects that may have contributed to the incident, especially with regard to the people and processes?
Regardless of your approach, the point of a postmortem is to accurately describe the "story" about what took place in as much detail as possible. The good, the bad, those involved, conversations had, actions taken, related timestamps, who was on-call, etc. You want to know absolutely everything that took place that was related in some degree so that you can review the data and learn from it.
How do we ensure that we are asking the right questions and seeking out relevant and important information that will help us understand what took place and ultimately how to become a better team, company, and product as a result?
The blameless culture (specifically blameless postmortems) is a topic of interest to many in the middle of a DevOps transformation within their organization. Jason will outline important best practices for conducting effective postmortems and demonstrate methods to measure benefits from adopting postmortems especially those of a "blameless" nature.
Jason Hand is a DevOps Evangelist at VictorOps, co-organizer of DevOpsDays - Rockies, author of ChatOps for Dummies, and co-host of Community Pulse a podcast on building community within tech.
Jason has spent the last year presenting and giving workshops on a number of DevOps topics such as Blameless Post-mortems, ChatOps, Alerting, and the value of context within incident management.
Despite the best of intentions, we sometimes find ourselves working on a team of size one. Groups shrink for many reasons: attrition, mergers and acquisitions, transfers, and financial distress. It's never comfortable being a Single Point of Failure, but how can you survive this state of non-redundancy? Are there any benefits to being a team of "me, myself, and I", or is it all a pit of despair? What kind of red flags should you be on the lookout for? And, most importantly, what compelling leverage can you try to use to encourage team growth back to a reasonable size?
In this talk, I will share a series of unfortunate events that led to my current status: the Human SPoF; I will also discuss some of the tactics I've used to survive. Automation, tools, and code-as-infrastructure are a force multiplier applied correctly, allowing one engineer to do the work of many. However, these wonders come with a price tag. I will also share strategies to grow a team, and ways to maintain sanity while keeping the lights blinking and the disks spinning in a 24x7 real-time environment with over 2000 servers.
When you create or manage a complex system you tend to learn all the "gotchas", accumulate arcane knowledge and become the system wizard. The desire to become an expert is natural. However, monopolizing knowledge can be unhealthy for you and dangerous for your employer. In this talk, Sasha will help you learn how to recognize the early signs and stop yourself and your peers from becoming a single point of failure.
Sasha Rosenbaum is a senior consultant at 10th Magnitude, a Chicago based cloud consulting company. Sasha specializes in Azure and is particularly passionate about the DevOps movement. She devotes time to the community and helps companies build a culture of collaboration with their software delivery teams. Sasha is also co-organizer of the DevOps Days Chicago conference and the Chicago Azure meetup.
Ignites to be announced shortly!