AdityaMulti Version Concurrency Control MVCC based design (hot question in distributed systems interview)What is concurrency → Ability of a program to do multiple things at once.·2 min read·May 17, 2023----
AdityaStep by Step guide to expose spark jmx metrics and funnel them to datadog.Please read my previous article…·2 min read·Nov 4, 2022----
AdityaHow JMX metrics from spark applications will help to configure driver/executor memory correctly…What are jmx metrics → Java Management Extensions (JMX) is a specification for monitoring and managing Java applications.·2 min read·Oct 31, 2022----
AdityaSpark job vs stage vs task in simple terms(with cheat sheet)When a spark application invoke an action, such as collect() or take() on your DataFrame or Dataset, the action will create a job. Below is…·2 min read·Sep 20, 2022----
AdityaWhat is compaction in big data applications(hudi, hive, spark, kafka, e.t.c)Compaction → Process of converting small files to large file(s) (consolidation of files) and clean up of the smaller files.·3 min read·Jun 19, 2022----
AdityaIn which scenarios need to use mapPartitions or foreachPartitionin in spark (Simple question that…As a data engineer, while developing spark jobs and performing operations, you will encounter a situation where your spark code that is…·2 min read·May 6, 2022----
AdityaHow attackers use log4j vulnerability(CVE-2021–44228)to access applications and how to quickly…A vulnerability( CVE-2021–44228) in Apache Log4j, a widely used logging package for Java has been found( first reported to Apache on…·2 min read·Dec 17, 2021----
AdityaConcurrency vs Parallelism in simple terms (Important question in system design interviews)One of the most important concepts in programming languages(like go, java, .e.t.c) or distributed computing is the difference between…·2 min read·Dec 16, 2021----
AdityaStructured vs Semi-structured vs Unstructured dataWhat is data → Data is a representation of some aspect of the real world. We can classify data as structured or unstructured or…·2 min read·Nov 8, 2021----
AdityaMy biggest issue with AWS MSK (resulting in over charging)AWS MSK(managed streaming for kafka) is a fully managed service that enables you to build and run applications that use Apache Kafka to…·1 min read·Aug 26, 2021----