AWS ElastiCache Redis

July 26, 2018

AWS ElastiCache Redis

AWS ElastiCache Redis (ElastiCache Deep Dive: 2017)

Why ElastiCache

# Elasticache provides scalable in-memory cache (ElastiCache designed for short time storage of information, that information to be accessed very quickly)
# Supports Memcached and Redis engines.
# Reduces load on databases.
# Manages application session state.

Amazon ElastiCache overview

# In-memory key-value store supporting
Redis
Memcached
# High-performance
# Fully managed; zero admin
# Highly available and reliable
# Hardened by Amazon

Redis overview

# In-memory data structure server
# Powerful (~200 commands + Lua scripting)
# Utility data structures (String, lists, hashes, sets, sorted sets, bitmaps, and HyperLogLogs)
# Simple
# Atomic operations (supports transactions)
# Highly available (replication)
# Persistence
# Open source
# Ridiculously fast! (<1 ms latency for most commands)
other advantages
# Redis run Lua scripts.
# Geospatial queries.
# Pub/Sub

ElastiCache features

1. Elasticache offers a variety of different ways you can deploy and monitor your Redis cluster.
    # AWS CloudFormation (Infrastructure as code, how Redis cluster look like, Node types, slots such details with respect to cluster and versioning template)
    # AWS CLI and SDK (Full control on all the operation on ElastiCache)
    # AWS Management Console (Can build Redis cluster in a couple of clicks)
    # AWS CloudTrail (Monitoring perspective, get logs for every interaction to the service. Ex: when it happens, who did it etc)
    # AWS Config (To build compliance, How you want your cluster to look like)
    # Amazon CloudWatch (Pairs very nicely with Redis. Make us proactive by issuing an alarm, consuming alarm via an SNS notification)

2. Enhanced Redis Engine
    # Optimized Swap Memory (Mitigate the risk of increased swap usage during syncs and snapshots)
    # Dynamic writes throttling (Improved output buffer management when the node's memory is close to being exhausted)
    # Smoother failovers (Cluster recover faster as replicas avoid flushing their data to do a full re-sync with the primary)

Redis topologies

1. Vertically scaled (Cluster Mode Disabled)

# One primary, entire (16383 hash slots) live in single keyspace. Means all the data can fit the large size of an event node. And replicas have the entire copy of all the data.
# Provided one 'Primary' endpoint, it also has the 'Replica' endpoint. If there any failover, one of the Replica take the primary role (DNS swap)

2. Horizontally scaled (Cluster Mode Enabled)

    # Up to 15 shots (the shot is made up of a Primary and 0-5 Replicas), each one of the shots essentially owns a portion of their Keyspace.
    # Default distribution is to divide the hash slots to a number of slots we have.
    # Given a configuration end-point, Redis communicate via that endpoint.

# Redis has 16384 hash slots
    @ Slot for a key is CRC16(key) mod 16384
# Slots are distributed across the cluster into shards
# Developers must use a Redis cluster aware client
    @ Clients are redirected to the correct shard
    @ Smart client store a map

Shard S1 = slots 0 - 3276
Shard S2 = slots 3277 - 6553
Shard S3 = slots 6554 - 9829
Shard S4 = slots 9830 - 13106
Shard S5 = slots 13107 - 16383

Redis cluster - mode enabled vs disabled

Redis cluster - Architecture

# 3 slot cluster, each slot there will be a different colour
# Grey border notice the primary.
# Replicas has the same hash slot range as primary.

Redis Migration (Zero-downtime Online Re-sharding)

Command : aws elasticache modify-replication-group-configuration --replication-group-id rep-group-id --apply-immediately --node-group-count 5

Scale-Out

Without changing the behaviour of an application, that is while using Redis as we use (No downtime) uniformly slot will split across multiple sharts. Slot by slot migration in a very reliable and robust way.

Few limitations that can't use few commands
# Lua capabilities
and
# if failure happens its harder to recover from that sort of scenario.
# Small performance impact, but no downtime.

Scale In

ElastiCache security

Amazon Elasticache Encryption and Compliance

Encryption
# In-Transit: encrypt all communications between clients and Redis server as well as between nodes
# At-Redis: encrypt backups on disk and in Amazon S3
# Fully managed: setup via API or console, automatic issuance and renewal

Compliance
# HIPPA eligibility for ElastiCache for Redis
# Included in AWS Business Associate Addendum
# Redis 3.2.6

Common usage patterns

1. Session Management
2. Database caching
3. APIs (HTTP responses)
4. IOT
5. Streaming data analytics (Filtering/aggregation)
6. Pub/Sub
7. Social media (Sentiment analysis)
8. Standalone database (Metadata store)
9. Leaderboards

Caching

Caching NoSQL
# Smaller NoSQL DB cluster needed = lower costs
# Faster data retrieval = better performance

Caching NoSQL databases with Amazon ElastiCache
# Smaller NoSQL DB clusters needed = lower costs
# Faster data retrieval = better performance

Streaming data enrichment/processing
Because of Redis is very fast and very rich with respect to data-structures like hash-map, set, sorted-set and list, as it capturing fast moving data, it can be used for streaming.

Big data architectures using Redis

IoT powered by ElastiCache

Mobile apps powered by ElasticCache

Ad tech powered by ElasticCache

Chat apps powered by ElastiCache

Gaming-real-time leaderboards
Using a sorted set, Redis gives you a very very easy way to rank information, retrieve information from various users
# very popular for gaming apps that need uniqueness and ordering
# Easy with Redis sorted sets

Rate limiting
Ex: Throttling requests to an API uses Redis counters.

Amazon ElastiCache: Best practices

1. Cluster sizing best practices
# Storage (clusters should have adequate memory)
    @ Recommended: memory needed + 25% reserved memory (for Redis) + some room for growth (optimal 10%)
    @ Optimize using eviction policies and TTLs (Better to have an idea of the frequency of change of underlying data)
    @ Scale up or out before reaching max-memory using CloudWatch alarms
    @ Use memory optimized nodes for cost-effectiveness (R4 support)
# Performance (Performance should not be compromised)
    @ Benchmark operations using Redis Benchmark tool
        for more READIOPS - add replicas
        for more WRITEIOPS - add shards (scale out)
        for more network IO - use network optimized instances and scale out
    @ Use pipelining for bulk reads/writes
    @ Consider Big(O) time complexity for data structure commands
# Cluster isolation (apps sharing key space - choose a strategy that works for your workload)
    @ Identify what kind of isolation is needed based on the workload and environment.
    @ Isolation: No isolation $ | Isolation by Purpose $$ | Full isolation $$$

2. Redis benchmark tool
# Open source utility to benchmark performance

3. Redis max-memory policies
# Select max-memory policy based on your workload needs.
    @ noeviction: return errors when the memory limit has been reached and the client is trying to execute commands that might result in more memory to be used.
    @ allkeys-lru: evict keys trying to remove the less recently used (LRU) keys first
    @ volatile-lru: evict keys trying to remove the less recently used (LRU) keys first, but only among keys that have an expire set.
    @ allkeys-random: evict random keys to make space for the new data added.
    @ volatile-random: evict random keys to make space for the new data added, but only evict keys with an expire set.
    @ volatile-ttl: evict only keys with an expire set, and try to evict keys with a shorter time to live (TTL) first.

4. Key ElastiCache CloudWatch metrics
    @ CPU-Utilization
        Memcached (up to 90% ok)
        Redis (divide by cores. Ex: 90% / 4 = 22.5%)
    @ SwapUsage low
    @ CacheMisses / CacheHits Ratio low/stable
    @ Evictions (near zero)
        Exception : Russian-doll caching
    @ CurrConnections (stable)
    @ Setup alarms with CloudWatch metrics

5. ElastiCache modifiable parameters
    @ Maxclients: 65000 (unchangeable)
        Use connection pooling
        timeout (close a connection after it has been idle for a given interval)
        tcp-keepalive (detects dead peers given an interval)
    @ Databases: 16 (default) for non-clustered mode
        Logical partition
    @ Reserved-memory: 25% (default)
        Recommended
            50% of maxmemory to use before 2.8.22
            25% after 2.8.22 (ElastiCache)
    @ Maxmemory-policy:
        The eviction policy for keys when maximum memory usage is reached
        Possible values: volatile-lru, allkeys-lru, volatile-random, allkeys-random, volatile-ttl, noeviction.

Caching tips

# Understand the frequency of change of underlying data
# Set appropriate TTLs on keys that match that frequency
# Choose appropriate eviction policies that are aligned with application requirements.
# Isolate your cluster by purpose (for example, cache cluster, queue, standalone database and so on)
# Maintain cache freshness with write-throughs
# Performance test and size your cluster appropriately
# Monitor Cache HIT/MISS ratio and alarm on poor performance
# Use failover API to test application resiliency.

Search This Blog

Thashi's Blog