Riptano Training for Apache Cassandra
Riptano's training program takes you from 0 to 60 buiding scalable applications on Apache Cassandra. This session is appropriate for developers and DBAs looking to understand design principles involved in modeling against Cassandra, as well as best practices for deploying and maintaining a Cassandra cluster.
This training will include hands-on exercises. Attendees must bring their own laptop, and should pre-install VMWare Player (Fusion on OS X) to run the virtual machine provided by Riptano. Training will run from 9 to 5:30 with a one-hour break for lunch (provided by Riptano) and two shorter breaks in the morning and afternoon.
Installation and configuration
Your VM will come with a single-node Cassandra instance installed. We'll extend that to three nodes locally, and explain the configuration options available, including multi-datacenter replication. We'll show how to do simple benchmarks with the py_stress tool.
Cassandra data modeling is not like relational schema design. We will cover why denormalization is your friend and how to think in ColumnFamilies, as well as the Thrift API. As concrete examples, we will explain the data model behind the Twissandra application and CloudKick's time series data.
Basics of Cassandra Internals
To understand Cassandra performance, you need to know a little about how it was designed, just like with relational databases you need to understand query plans. We'll explain memtables and sstables, Cassandra's SEDA design, and how to use the JMX metrics it exports to infer its internal state.
How Cassandra replication works with no single points of failure, and what this means for adding, load-balancing, and replacing machines safely and efficiently. We'll explain gossip and failure detection, and also columnfamily modification, snapshots, and data import + export.
Tuning and troubleshooting
There are many factors that affect Cassandra performance. We'll cover OS- and machine-level factors such as the OS buffer cache and disk utilization, JVM factors such as garbage collector settings, and Cassandra tunables such as cache sizes. We'll also cover how to use the metrics covered previously to recognize warning signs that you need to add capacity to your cluster. Yes, there will be war stories.
About Apache Cassandra
Cassandra is the "hands down winner for transaction processing performance at scale." Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, and many other companies with large, active data sets. Cassandra's fully-distributed design with no single points of failure allows exceptional reliability.
About your instructor
Jonathan Ellis is project chair of Apache Cassandra and co-founder of Riptano.
When & Where
Riptano is the leading expert for Apache Cassandra, providing software, support, and training for all things Cassandra. Riptano is obsessed with providing great customer service.
Our mission is to help you with all of your Cassandra needs so you can focus on your core business. Contact us with your questions.