Analyzing Streaming Data in Real Time with Amazon Kinesis
Analyzing Streaming Data in Real Time with Amazon Kinesis (Quick note - 2017)
Batch Processing | Stream Processing |
---|---|
Hourly server logs | Real-time metrics |
Weekly or monthly bills | Real-time spending alerts/caps |
Daily website clickstream | Real-time clickstream analytics |
Daily fraud reports | Real-time detection |
Simple Pattern for Stream Data
Data Producer (Mobile Client) | Streaming Service (Amazon Kinesis) | Data Consumer (Amazon Kinesis app) |
---|---|---|
Continuously creates data | Durably stores data | Continuously processes data |
Continuously writes data to a stream | Provides temporary buffer that preps data | Clean, prepares & aggregates |
Can be almost anything | Supports very high-throughput | Transforms data to information |
Amazon Kinesis (made up of 3 services)
1. Amazon Kinesis Data Streams
Build custom applications that process and analyze streaming data
2. Amazon Kinesis Data Analytics
Easily process and analyze streaming data with standard SQL
3. Amazon Kinesis Data Firehose
Easily load streaming data into AWS
Amazon Kinesis Data Streams
1. Capture and send data to Kinesis Streams
2. Build custom real-time applications using 'Kinesis Analytics', stream processing frameworks like 'Apache Spark' or your code running on 'Amazon EC2' or 'AWS Lambda'
3. Load processed data to any data store, send real-time alerts, feeds live dashboards and more
# Easy administration and low cost
# Build real-time applications with frameworks of choice
# Secure, durable storage
Amazon Kinesis Data Analytics
1. Capture streaming data with 'Kinesis Streams' or 'Kinesis Firehose'
2. Run standard 'SQL' queries against data streams
3. Kinesis Analytics can send processed data to analytics tool so you can create alerts and respond in real-time
# Powerful real-time application
# Easy to use, fully managed
# Automatic elasticity
Amazon Kinesis Data Firehose
1. Capture and send data to Kinesis Firehose
2. 'Kinesis Firehose' prepares and loads the data continuously to the destinations you chose from among 'S3', 'Redshift', 'Amazon Elasticsearch Service', and 'Kinesis Analytics'
3. Analyse streaming data using your favourite BI tools
# Zero administration and seamless elasticity
# Direct-to-data store integration
# Serverless, continuous data transformations
Amazon Kinesis Data Analytics Applications
1. Connect to streaming source
2. Easy write SQL code to process streaming data
3. Continuously deliver SQL results
Common use cases
Three Common Scenarios
1. Streaming Ingest-Transform-Load: Deliver data to analytics tools faster and cheaper
2. Continuous Metric Generation: Compute analytics as the data is generated
3. Actionable Insights: React to analytics based off of insights
Web Analytics and Leaderboards
Monitoring IoT Devices
Analyzing CloudTrail Event Logs
1. Ingest and deliver raw log data
# CloudTrail provides continuous account activity logging
# Events are sent in real time (to near real-time) to Kinesis Data Firehose or Streams
# Each event includes a timestamp, IAM user, AWS service name, API call, response and more.
Stream Data to Amazon Kinesis
Just a sample. Many more ways stream data to Amazon Kinesis
2. Compute operational metrics in real time
Compute metrics using SQL in real time like:
# Total calls by IP, service, API call, IAM user
# Amazon EC2 API failures (or any other service)
# Anomalous behaviour of Amazon EC2 API (or any other service)
# Top 10 API calls across all services
How do we aggregate streaming data? (answer: Using windows)
# Aggregations (count, sum, min...) take granular real-time data and run it into insights
# Data is continuously processed so you need to tell the application when you want results.
Window Types
# Sliding, tumbling, session and custom windows
# Tumbling windows are fixed size and grouped keys do not overlap
Event, ingest and processing time
# Event time is the timestamp is assigned when the event occurs, also called client-side time.
# Processing time is when your application reads and analyzes the data (ROWTIME).
3. Persist data from real-time dashboards
# Use Kinesis Data Firehose to archive processed to in S3
# Use AWS Lambda to deliver data to DynamoDB (or another database)
# Open source or other tools to visualize the data
Late results
# An event is late if it arrives after the computation for which it logically belongs to has been completed
# Your Kinesis Analytics application will produce an amendment
Updating a database
# Perform inserts but on the duplicate key update
What does all this cost?
# All service used in the solution are pay as you go# All services used are serverless and have lower devops expense
# This solution will cost the "average" customer less than $100 per month
Try it out yourself
Go to aws.amazon.com/kinesis/Some good examples:
# Get started in minutes with a clickthrough template for AWS CloudTrail Event Log Analytics - <Link> (Friendly URL)
# Tinyurl.com/rt-dashboard
# Great blog posts with example use cases
Thanks for the information. Get the best Real Time Analytics from certified IT company in USA. Visit http://www.onefederalsolution.com/historical-analytics-real-time-analytics-predictive-prescriptive-analytics/
ReplyDelete