Member-only story

In a spark application, how does driver memory affect CPU utilization?

Aditya
1 min readJul 10, 2020

--

Spark driver node plays a key role in the health of a given spark job. We can submit spark jobs in client mode or cluster mode. In client mode, the node where we submit spark job works as driver node, and in cluster mode, the node where spark driver job runs will be determined by “cluster manager”(like yarn, spark standalone e.t.c) at run time.

The spark driver is the program that declares the transformations and actions on RDDs of data and submits such requests to the master.

By default, spark driver memory configured to 1GB, and most of the scenarios where spark application performs some distributed output action (like rdd.saveAsTextFile), it will be sufficient, but we may need more than that, in case driver job contain logic related loading large objects for cache lookups or usage of operations like “collect” or “take”.

If the driver memory(spark.driver.memory) assigned less than needed, then CPU pressure on the given driver node increases. This CPU utilization if crossed 90% will throw weird errors like OOM’s, End of File exception, and connection related errors to executers, that are hard to debug.

So even though your driver node launching 10 apps, if one of them using driver memory less than needed, due to increased CPU utilization all apps on running on that job node will start displaying above mentioned issues like, that are hard to debug

--

--

Aditya
Aditya

Written by Aditya

Principal data engineer → Distributed Threat hunting security platform | aws certified solutions architect | gssp-java | Chicago-IL

No responses yet

Write a response