Sign in

AWS MSK(managed streaming for kafka) is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data.

I worked on building kafka cluster on on-premise servers as well as ec2 instances, and AWS MSK really helped to relieve so many manul…

Amazon Simple Storage Service (Amazon S3) is a cost-efficient and highly scalable persistent or temporary object storage that most of the organizations consider using to store regular or big data.

Before Nov 2014, whenever objects were created/deleted/e.t.c there was no notification system to transmit the events. To detect those events…

If you are not worked on “clearcase” then you are really a fortunate soul. “ClearCase” is one of the complex repository management software and I always used to face issues due to rebase and other operations to commit code and the errors not able to resolve and some times I…

To process continuous streams of data from sources HDFS directories, TCP sockets, Kafka, Flume, Twitter e.t.c spark has two methodologies.

Spark streaming in general works on something called as a micro batch. …

Kafka best practices edition → How to design Kafka message key and why it is important in determining application performance?

What is a Kafka Message: A record or unit of data within Kafka. Each message has a key and a value, and optionally headers.The key is commonly used for data about the message and the value is the body of the message

Message Key → Can be null or contain…

If we need to stream data from “kafka” and perform some transformations and persist the results on AWS S3 or Azure Blob storage or Google cloud storage or kafka or hdfs we have wide range of options to choose from like spark structured streaming, kafka streams, storm, flink, akka streams.

Background related to S3 consistency issue → Spark computations involve jobs divided into stages in turn divided into tasks that use rename functions while committing intermediate data to storage systems like S3 or hdfs.

If the underlying system is POSIX compliant, actions like file rename will be atomic, even though…

Please refer to understand the difference between image and container.

Docker client and daemon → Docker use a client-server architecture. The Docker client talks to the Docker daemon(dockerd), which does the heavy lifting of building(docker build), running(docker run). …

Docker Image → A Docker image is an immutable file that contains source code, libraries, dependencies, tools, and other files needed for an application to run(like “my image” from below instructions)

FROM ubuntu:16.0.0
COPY *.properties /app/properties
COPY *.jar /app/jars/
RUN make /app
CMD python /app/

This image is usually built…

Lambda architecture identifies itself with big data, not to be confused with AWS lambda which is just a function(or piece of code) invoked on an event on a source(like s3, sqs e.t.c) or lambda expressions in java.

Big data is the most celebrated word in the last decade, thanks to…


Principal data engineer → Distributed Threat hunting security platform | aws certified solutions architect | gssp-java

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store