spot_img
HomeEducationAuto-Scaling DynamoDB Streams Purposes on Kubernetes - DZone Acquire US

Auto-Scaling DynamoDB Streams Purposes on Kubernetes – DZone Acquire US

This weblog put up demonstrates tips on how to auto-scale your DynamoDB Streams shopper purposes on Kubernetes. You’ll work with a Java utility that makes use of the DynamoDB Streams Kinesis adapter library to devour change information occasions from a DynamoDB desk. It is going to be deployed to an Amazon EKS cluster and will likely be scaled robotically utilizing KEDA.

The applying contains an implementation of the com.amazonaws.companies.kinesis.clientlibrary.interfaces.v2.IRecordProcessor that processes information from the DynamoDB stream and replicates it to a different (goal) DynamoDB desk – that is simply used for instance. We are going to use the AWS CLI to supply information to the DynamoDB stream and observe the scaling of the appliance.

The code is out there on this GitHub repository.

What’s Lined?

  • Introduction
    • Horizontal scalability with Kinesis Consumer Library
  • What’s KEDA?
  • Conditions
  • Setup and configure KEDA on EKS
  • Configure IAM Roles
  • Deploy DynamoDB Streams shopper utility to EKS
  • DynamoDB Streams shopper app autoscaling in motion with KEDA
  • Delete assets
  • Conclusion

Introduction

Amazon DynamoDB is a fully managed database service that provides fast and predictable performance with seamless scalability. With DynamoDB Streams, you can leverage Change Data Capture (CDC) to get notified about changes to DynamoDB table data in real time. This makes it possible to easily build applications that react to changes in the underlying database without the need for complex polling or querying.

DynamoDB offers two streaming models for change data capture:

  • Kinesis Data Streams for DynamoDB
  • DynamoDB Streams

With Kinesis Data Streams, you can capture item-level modifications in any DynamoDB table and replicate them to a Kinesis data stream. With DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours.

We will make use of the native DynamoDB Streams capability. Even with DynamoDB Streams, there are multiple options to choose from when it comes to consuming the change data events:

Our application will leverage DynamoDB Streams along with the Kinesis Client Library (KCL) adapter library 1.x to consume change data events from a DynamoDB table.

Horizontal Scalability With Kinesis Client Library

The Kinesis Client Library ensures that for every shard there is a record processor running and processing that shard. KCL helps take care of many of the complex tasks associated with distributed computing and scalability. It connects to the data stream, enumerates the shards within the data stream, and uses leases to coordinate shard associations with its consumer applications.

A record processor is instantiated for every shard it manages. KCL pulls data records from the data stream, pushes the records to the corresponding record processor, and checkpoints processed records. More importantly, it balances shard-worker associations (leases) when the worker instance count changes or when the data stream is re-sharded (shards are split or merged). This means that you are able to scale your DynamoDB Streams application by simply adding more instances since KCL will automatically balance the shards across the instances.

But, you still need a way to scale your applications when the load increases. Of course, you could do it manually or build a custom solution to get this done.

This is where KEDA is available in.

What Is KEDA?

KEDA is a Kubernetes-based event-driven autoscaling component that can monitor event sources like DynamoDB Streams and scale the underlying Deployments (and Pods) based on the number of events needing to be processed. It’s built on top of native Kubernetes primitives such as the Horizontal Pod Autoscaler that can be added to any Kubernetes cluster. Here is a high-level overview of its key components (you can refer to the KEDA documentation for a deep dive):

High-level overview of KEDA key componentsFrom KEDA Concepts documentation

  1. The keda-operator-metrics-apiserver part in KEDA acts as a Kubernetes metrics server that exposes metrics for the Horizontal Pod Autoscaler.
  2. A KEDA Scaler integrates with an exterior system (similar to Redis) to fetch these metrics (e.g., size of a listing) to drive auto-scaling of any container in Kubernetes based mostly on the variety of occasions needing to be processed.
  3. The position of the keda-operator part is to activate and deactivateDeployment, i.e. scale to and from zero.

You will note the DynamoDB Streams scaler in motion that scales based mostly on the shard rely of a DynamoDB Stream.

Now let’s transfer on to the sensible a part of this tutorial.

Prerequisites

In addition to an AWS account, you will need to have the AWS CLI, kubectl, and Docker put in.

Setup an EKS Cluster and Create a DynamoDB Desk

There are a number of the way in which you’ll create an Amazon EKS cluster. I want utilizing eksctl CLI due to the comfort it affords. Creating an EKS cluster utilizing eksctl may be as simple as this:

eksctl create cluster --name <cluster title> --region <area e.g. us-east-1>

For particulars, discuss with the Getting Started with Amazon EKS – eksctl.

Create a DynamoDB desk with streams enabled to persist utility information and entry the change information feed. You should utilize the AWS CLI to create a desk with the next command:

aws dynamodb create-table 
    --table-name customers 
    --attribute-definitions AttributeName=electronic mail,AttributeType=S 
    --key-schema AttributeName=electronic mail,KeyType=HASH 
    --billing-mode PAY_PER_REQUEST 
    --stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES

We might want to create one other desk that may function a duplicate of the primary desk.

aws dynamodb create-table 
    --table-name users_replica 
    --attribute-definitions AttributeName=electronic mail,AttributeType=S 
    --key-schema AttributeName=electronic mail,KeyType=HASH 
    --billing-mode PAY_PER_REQUEST

Clone this GitHub repository and alter it to the appropriate listing:

git clone 
cd dynamodb-streams-keda-autoscale

Okay, let’s get began!

Setup and Configure KEDA on EKS

For the purposes of this tutorial, you will use YAML files to deploy KEDA, however you might additionally use Helm charts.

Set up KEDA:

# replace model 2.8.2 if required

kubectl apply -f 

Confirm the set up:

# test Customized Useful resource Definitions
kubectl get crd

# test KEDA Deployments
kubectl get deployment -n keda

# test KEDA operator logs
kubectl logs -f $(kubectl get pod -l=app=keda-operator -o jsonpath=".gadgets[0].metadata.title" -n keda) -n keda

Configure IAM Roles

The KEDA operator as well as the DynamoDB streams consumer application need to invoke AWS APIs. Since both will run as Deployments in EKS, we will use IAM Roles for Service Accounts (IRSA) to offer the required permissions.

In our specific state of affairs:

  • KEDA operator wants to have the ability to get details about the DynamoDB desk and Stream
  • The applying (KCL 1.x library to be particular) must work together with Kinesis and DynamoDB – it wants a bunch of IAM permissions to take action.

Configure IRSA for the KEDA Operator

Set your AWS Account ID and OIDC Identification supplier as atmosphere variables:

ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output textual content)

#replace the cluster title and area as required
export EKS_CLUSTER_NAME=demo-eks-cluster
export AWS_REGION=us-east-1

OIDC_PROVIDER=$(aws eks describe-cluster --name $EKS_CLUSTER_NAME --query "cluster.id.oidc.issuer" --output textual content | sed -e "s/^https:////")

Create a JSON file with Trusted Entities for the position:

learn -r -d '' TRUST_RELATIONSHIP <<EOF
{
  "Model": "2012-10-17",
  "Assertion": [
    
      "Effect": "Allow",
      "Principal": 
        "Federated": "arn:aws:iam::$ACCOUNT_ID:oidc-provider/$OIDC_PROVIDER"
      ,
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": 
        "StringEquals": 
          "$OIDC_PROVIDER:aud": "sts.amazonaws.com",
          "$OIDC_PROVIDER:sub": "system:serviceaccount:keda:keda-operator"
        
      
    
  ]
}
EOF
echo "$TRUST_RELATIONSHIP" > trust_keda.json

Now, create the IAM position and fix the coverage (check out policy_dynamodb_streams_keda.json file for particulars):

export ROLE_NAME=keda-operator-dynamodb-streams-role

aws iam create-role --role-name $ROLE_NAME --assume-role-policy-document file://trust_keda.json --description "IRSA for DynamoDB streams KEDA scaler on EKS"

aws iam create-policy --policy-name keda-dynamodb-streams-policy --policy-document file://policy_dynamodb_streams_keda.json
aws iam attach-role-policy --role-name $ROLE_NAME --policy-arn=arn:aws:iam::$ACCOUNT_ID:coverage/keda-dynamodb-streams-policy

Affiliate the IAM position and Service Account:

kubectl annotate serviceaccount -n keda keda-operator eks.amazonaws.com/role-arn=arn:aws:iam::$ACCOUNT_ID:position/$ROLE_NAME

# confirm the annotation 
kubectl describe serviceaccount/keda-operator -n keda

You will want to restart KEDA operator Deployment for this to take impact:

kubectl rollout restart deployment.apps/keda-operator -n keda

# to confirm, verify that the KEDA operator has the appropriate atmosphere variables
kubectl describe pod -n keda $(kubectl get po -l=app=keda-operator -n keda --output=jsonpath=.gadgets..metadata.title) | grep "^s*AWS_"

# anticipated output

AWS_STS_REGIONAL_ENDPOINTS:   regional
AWS_DEFAULT_REGION:           us-east-1
AWS_REGION:                   us-east-1
AWS_ROLE_ARN:                 arn:aws:iam::<AWS_ACCOUNT_ID>:position/keda-operator-dynamodb-streams-role
AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets and techniques/eks.amazonaws.com/serviceaccount/token

Configure IRSA for the DynamoDB Streams Client Software

Begin by making a Kubernetes Service Account:

kubectl apply -f - <<EOF
apiVersion: v1
form: ServiceAccount
metadata:
  title: dynamodb-streams-consumer-app-sa
EOF

Create a JSON file with Trusted Entities for the position:

learn -r -d '' TRUST_RELATIONSHIP <<EOF
{
  "Model": "2012-10-17",
  "Assertion": [
    
      "Effect": "Allow",
      "Principal": 
        "Federated": "arn:aws:iam::$ACCOUNT_ID:oidc-provider/$OIDC_PROVIDER"
      ,
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": 
        "StringEquals": 
          "$OIDC_PROVIDER:aud": "sts.amazonaws.com",
          "$OIDC_PROVIDER:sub": "system:serviceaccount:default:dynamodb-streams-consumer-app-sa"
        
      
    
  ]
}
EOF
echo "$TRUST_RELATIONSHIP" > belief.json

Now, create the IAM position and fix the coverage. Replace the coverage.json file and enter the area and AWS account particulars.

export ROLE_NAME=dynamodb-streams-consumer-app-role

aws iam create-role --role-name $ROLE_NAME --assume-role-policy-document file://belief.json --description "IRSA for DynamoDB Streams shopper app on EKS"

aws iam create-policy --policy-name dynamodb-streams-consumer-app-policy --policy-document file://coverage.json

aws iam attach-role-policy --role-name $ROLE_NAME --policy-arn=arn:aws:iam::$ACCOUNT_ID:coverage/dynamodb-streams-consumer-app-policy

Affiliate the IAM position and Service Account:

kubectl annotate serviceaccount -n default dynamodb-streams-consumer-app-sa eks.amazonaws.com/role-arn=arn:aws:iam::$ACCOUNT_ID:position/$ROLE_NAME

# confirm the annotation
kubectl describe serviceaccount/dynamodb-streams-consumer-app-sa

The core infrastructure is now prepared. Let’s put together and deploy the patron utility.

Deploy DynamoDB Streams Consumer Application to EKS

We would first need to build the Docker image and push it to ECR (you can refer to the Dockerfile for details).

Build and Push the Docker Image to ECR

# create runnable JAR file
mvn clean compile assembly:single

# build docker image
docker build -t dynamodb-streams-consumer-app .

AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)

# create a private ECR repo
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com

aws ecr create-repository --repository-name dynamodb-streams-consumer-app --region us-east-1

# tag and push the image
docker tag dynamodb-streams-consumer-app:latest $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/dynamodb-streams-consumer-app:latest
docker push $AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/dynamodb-streams-consumer-app:latest

Deploy the Consumer Application

Update the consumer.yaml to include the Docker image you just pushed to ECR and the ARN for the DynamoDB streams for the source table. The rest of the manifest remains the same.

To retrieve the ARN for the stream, run the following command:

aws dynamodb describe-table --table-name users | jq -r '.Table.LatestStreamArn'

The consumer.yaml Deployment manifest looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dynamodb-streams-kcl-consumer-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dynamodb-streams-kcl-consumer-app
  template:
    metadata:
      labels:
        app: dynamodb-streams-kcl-consumer-app
    spec:
      serviceAccountName: dynamodb-streams-kcl-consumer-app-sa
      containers:
        - name: dynamodb-streams-kcl-consumer-app
          image: AWS_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/dynamodb-streams-kcl-consumer-app:latest
          imagePullPolicy: Always
          env:
            - name: TARGET_TABLE_NAME
              value: users_replica
            - name: APPLICATION_NAME
              value: dynamodb-streams-kcl-app-demo
            - name: SOURCE_TABLE_STREAM_ARN
              value: <enter ARN>
            - name: AWS_REGION
              value: us-east-1
            - name: INSTANCE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name

Create the Deployment:

kubectl apply -f consumer.yaml

# verify Pod transition to Running state
kubectl get pods -w

DynamoDB Streams Consumer App Autoscaling in Action With KEDA

Now that you’ve deployed the consumer application, the KCL adapter library should jump into action. The first thing it will do is create a “control table” in DynamoDB – it should be the same as the name of the application (which in this case is dynamodb-streams-kcl-app-demo).

It might take a few minutes for the initial co-ordination to happen and the table to get created. You can check the logs of the consumer application to see the progress.

kubectl logs -f $(kubectl get po -l=app=dynamodb-streams-kcl-consumer-app --output=jsonpath=.items..metadata.name)

Once the lease allocation is complete, check the table and note the leaseOwner attribute:

aws dynamodb describe-table --table-name dynamodb-streams-kcl-app-demo

dynamodb-streams-kcl-app-demo Items returned

Add Data to the DynamoDB Table

Now that you’ve deployed the consumer application, let’s add data to the source DynamoDB table (users).

You can use the producer.sh script for this.

export export TABLE_NAME=users
./producer.sh

Check consumer logs to see the messages being processed:

kubectl logs -f $(kubectl get po -l=app=dynamodb-streams-kcl-consumer-app --output=jsonpath=.items..metadata.name)

Check the target table (users_replica) to confirm that the DynamoDB streams consumer application has indeed replicated the data.

aws dynamodb scan --table-name users_replica

Notice that the value for the processed_by attribute? It’s the same as the consumer application Pod. This will make it easier for us to verify the end-to-end autoscaling process.

Create the KEDA Scaler

Use the scaler definition:

kubectl apply -f keda-dynamodb-streams-scaler.yaml

Here is the ScaledObject definition. Notice that it’s targeting the dynamodb-streams-kcl-consumer-app Deployment (the one we just created) and the shardCount is set to 2:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name:  aws-dynamodb-streams-scaledobject
spec:
  scaleTargetRef:
    name: dynamodb-streams-kcl-consumer-app
  triggers:
  - type: aws-dynamodb-streams
    metadata:
      awsRegion: us-east-1
      tableName: users
      shardCount: "2"
      identityOwner: "operator"

Note on shardCount Attribute:

We are using the shardCount value of 2. This is very important to note since we are using DynamoDB Streams Kinesis adapter library using KCL 1.x that supports “up to 2 simultaneous consumers per shard.” This means that you cannot have more than two consumer application instances processing the same DynamoDB stream shard.

However, this KEDA scaler configuration will ensure that there will be one Pod for every two shards. So, for example, if there are four shards, the application will be scaled out to two Pods. If there are six shards, there will be three Pods, and so on. Of course, you can choose to have one Pod for every shard by setting the shardCount to 1.

To keep track of the number of shards in the DynamoDB stream, you can run the following command:

aws dynamodbstreams describe-stream --stream-arn $(aws dynamodb describe-table --table-name users | jq -r '.Table.LatestStreamArn') | jq -r '.StreamDescription.Shards | length'

I have used a utility called jq. If you want the shard details:

aws dynamodbstreams describe-stream --stream-arn $(aws dynamodb describe-table --table-name users | jq -r '.Table.LatestStreamArn') | jq -r '.StreamDescription.Shards'

Verify DynamoDB Streams Consumer Application Auto-Scaling

We started off with one Pod of our application. But, thanks to KEDA, we should now see additional Pods coming up automatically to match the processing requirements of the consumer application.

To confirm, check the number of Pods:

kubectl get pods -l=app=dynamodb-streams-kcl-consumer-app-consumer

Most likely, you will see four shards in the DynamoDB stream and two Pods. This can change (increase/decrease) relying on the speed at which information is produced to the DynamoDB desk.

Identical to earlier than, validate the information within the DynamoDB goal desk (users_replica) and notice the processed_by attribute. Since we’ve scaled out to extra Pods, the worth must be completely different for every report since every Pod will course of a subset of the messages from the DynamoDB change stream.

Additionally, ensure that to test dynamodb-streams-kcl-app-demo management desk in DynamoDB. It’s best to see an replace for the leaseOwner reflecting the truth that now there are two Pods consuming from the DynamoDB stream.

Updated items returned showing two pods consuming from DynamoDB stream

After you have verified the end-to-end answer, you may clear up the assets to keep away from incurring any extra fees.

Delete Assets

Delete the EKS cluster and DynamoDB tables.

eksctl delete cluster --name <enter cluster title>
aws dynamodb delete-table --table-name customers
aws dynamodb delete-table --table-name users_replica

Conclusion

Use circumstances it is best to experiment with:

  • Scale additional up – How will you make DynamoDB streams improve it is variety of shards? What occurs to the variety of shopper occasion Pods?
  • Scale down – What occurs when the shard capability of the DynamoDB streams decreases?

On this put up, we demonstrated tips on how to use KEDA and DynamoDB Streams and mix two highly effective strategies (Change Knowledge Seize and auto-scaling) to construct scalable, event-driven techniques that may adapt based mostly on the information processing wants of your utility.

#AutoScaling #DynamoDB #Streams #Purposes #Kubernetes #DZone

RELATED ARTICLES
Continue to the category

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -spot_img

Most Popular

Recent Comments