Good news from AWS for projects writing/reading to S3 using spark streaming with EMR(S3 now strong consistent!! No need to use EMRFS)

Aditya
3 min readFeb 17, 2021

Background related to S3 consistency issue → Spark computations involve jobs divided into stages in turn divided into tasks that use rename functions while committing intermediate data to storage systems like S3 or hdfs.

If the underlying system is POSIX compliant, actions like file rename will be atomic, even though hdfs is not posix compliant fully, its rename operation is atomic.

--

--

Aditya

Principal data engineer → Distributed Threat hunting security platform | aws certified solutions architect | gssp-java | Chicago-IL