Abstract:

Many software-as-a-service providers are faced with the problem of exploding data volumes with a relatively limited budget. A relatively new class of so-called NoSQL databases may help address this problem, but bring with them significant challenges.

The social media revolution has caused a content explosion. The networking aspect of social applications means that users are interested in understanding and communicating not only with their own friends, contacts or customers, but with the entire social network. The amount of data generated by that network's tweets, wall posts and other communications has grown exponentially, out-stripping our ability to handle it in conventional ways.

NoSQL is the buzzword that is being used to describe a number of different, non-relational databases that attempt to deal with some of these issues. As you might expect from a term (NoSQL) that defines what they are not, there are many diverse products lumped under the NoSQL moniker.

Cassandra is a highly scalable second-generation distributed database. It is in use at Digg, Facebook, Twitter, Reddit, and other companies that have large, active data sets. It is fault tolerant, scales linearly, and runs well on commodity hardware. Because Cassandra is very different from traditional relational databases like MySQL, DB2 and Oracle, it creates unique development and operational challenges. In addition, because it is new and rapidly evolving, there isn't the same reservoir of experience and best practices to draw on. These challenges means that close DevOps collaboration and flexibility is vital for success.

From the development standpoint, Cassandra features a data model quite different from a traditional RDBMS. There are no joins, referential integrity or fixed schema. Application developers have more control and more responsibility for the details of how their data is stored and accessed--issues that are typically the concern of DBAs in the relational world. They must also cope with relatively rapid change in the tools themselves.

On the Ops side, Cassandra brings different challenges: Instead of a massive servers and associated storage, a Cassandra cluster may typically consists of dozens of servers, often spanning multiple data centers.

Ops challenges

  • Dozens of servers
  • Lots of knobs
  • Need for flexibility

How we use Puppet to manage NoSQL

  • Why do you need puppet for it?
  • Talk about it as a newer technology and how puppet lets it get you there easier
  • Difficult to manage 72 nodes, but we can do it in 3 hours
  • Ability to add new monitoring; advantages of applying 1 thing to 72 node
  • Config example

Speaker: Dave Connors - Constant Constant

blog comments powered by Disqus