Analyzing Streaming Data in Real Time with Amazon Kinesis

Analyzing Streaming Data in Real Time with Amazon Kinesis (Quick note - 2017)


Batch Processing Stream Processing
Hourly server logs Real-time metrics
Weekly or monthly bills Real-time spending alerts/caps
Daily website clickstream Real-time clickstream analytics
Daily fraud reports Real-time detection

Simple Pattern for Stream Data


Data Producer (Mobile Client) Streaming Service (Amazon Kinesis) Data Consumer
(Amazon Kinesis app)
Continuously creates data Durably stores data Continuously processes data
Continuously writes data to a stream Provides temporary buffer that preps data Clean, prepares & aggregates
Can be almost anything Supports very high-throughput Transforms data to information

Amazon Kinesis (made up of 3 services)

1. Amazon Kinesis Data Streams
Build custom applications that process and analyze streaming data

2. Amazon Kinesis Data Analytics
Easily process and analyze streaming data with standard SQL

3. Amazon Kinesis Data Firehose
Easily load streaming data into AWS

Amazon Kinesis Data Streams

1. Capture and send data to Kinesis Streams
2. Build custom real-time applications using 'Kinesis Analytics', stream processing frameworks like 'Apache Spark' or your code running on 'Amazon EC2' or 'AWS Lambda'
3. Load processed data to any data store, send real-time alerts, feeds live dashboards and more

# Easy administration and low cost
# Build real-time applications with frameworks of choice
# Secure, durable storage

Amazon Kinesis Data Analytics

1. Capture streaming data with 'Kinesis Streams' or 'Kinesis Firehose'
2. Run standard 'SQL' queries against data streams
3. Kinesis Analytics can send processed data to analytics tool so you can create alerts and respond in real-time

# Powerful real-time application
# Easy to use, fully managed
# Automatic elasticity

Amazon Kinesis Data Firehose

1. Capture and send data to Kinesis Firehose
2. 'Kinesis Firehose' prepares and loads the data continuously to the destinations you chose from among 'S3', 'Redshift', 'Amazon Elasticsearch Service', and 'Kinesis Analytics'
3. Analyse streaming data using your favourite BI tools

# Zero administration and seamless elasticity
# Direct-to-data store integration
# Serverless, continuous data transformations

Amazon Kinesis Data Analytics Applications


1. Connect to streaming source
2. Easy write SQL code to process streaming data
3. Continuously deliver SQL results

Common use cases


Three Common Scenarios
1. Streaming Ingest-Transform-Load: Deliver data to analytics tools faster and cheaper
2. Continuous Metric Generation: Compute analytics as the data is generated
3. Actionable Insights: React to analytics based off of insights

Web Analytics and Leaderboards

Monitoring IoT Devices


Analyzing CloudTrail Event Logs


1. Ingest and deliver raw log data
# CloudTrail provides continuous account activity logging
# Events are sent in real time (to near real-time) to Kinesis Data Firehose or Streams
# Each event includes a timestamp, IAM user, AWS service name, API call, response and more.

Stream Data to Amazon Kinesis


Just a sample. Many more ways stream data to Amazon Kinesis

2. Compute operational metrics in real time
Compute metrics using SQL in real time like:
# Total calls by IP, service, API call, IAM user
# Amazon EC2 API failures (or any other service)
# Anomalous behaviour of Amazon EC2 API (or any other service)
# Top 10 API calls across all services

How do we aggregate streaming data? (answer: Using windows)
# Aggregations (count, sum, min...) take granular real-time data and run it into insights
# Data is continuously processed so you need to tell the application when you want results.

Window Types
# Sliding, tumbling, session and custom windows
# Tumbling windows are fixed size and grouped keys do not overlap


Event, ingest and processing time
# Event time is the timestamp is assigned when the event occurs, also called client-side time.
# Processing time is when your application reads and analyzes the data (ROWTIME).

3. Persist data from real-time dashboards
# Use Kinesis Data Firehose to archive processed to in S3
# Use AWS Lambda to deliver data to DynamoDB (or another database)
# Open source or other tools to visualize the data

Late results
# An event is late if it arrives after the computation for which it logically belongs to has been completed
# Your Kinesis Analytics application will produce an amendment

Updating a database
# Perform inserts but on the duplicate key update

What does all this cost?

# All service used in the solution are pay as you go
# All services used are serverless and have lower devops expense
# This solution will cost the "average" customer less than $100 per month

Try it out yourself

Go to aws.amazon.com/kinesis/

Some good examples:
# Get started in minutes with a clickthrough template for AWS CloudTrail Event Log Analytics - <Link> (Friendly URL)
# Tinyurl.com/rt-dashboard
# Great blog posts with example use cases

Comments

  1. Thanks for the information. Get the best Real Time Analytics from certified IT company in USA. Visit http://www.onefederalsolution.com/historical-analytics-real-time-analytics-predictive-prescriptive-analytics/

    ReplyDelete

Post a Comment

Popular Posts