Spark streaming vs Spark structured streaming

Aditya
2 min readAug 19, 2021

To process continuous streams of data from sources HDFS directories, TCP sockets, Kafka, Flume, Twitter e.t.c spark has two methodologies.

Spark streaming in general works on something called as a micro batch. The stream pipeline is registered with some operations and the Spark polls the source after every batch duration (defined in the application) .

Spark streaming uses DStream api and spark structured streaming uses dataset/dataframe — short answer :-) .

--

--

Aditya

Principal data engineer → Distributed Threat hunting security platform | aws certified solutions architect | gssp-java | Chicago-IL