What is compaction in big data applications(hudi, hive, spark, kafka, e.t.c) ? [Important concept for Big data engineer interview]
Compaction → Process of converting small files to large file(s) (consolidation of files) and clean up of the smaller files.
Generally, compaction jobs run in the background and most of the big data processing applications support manual and automatic compactions.
What is the issue with small files → Having so many small files in data lake is a nightmare because downstream distributed…