Member-only story

How executor cores impact spark application when chosen incorrectly?

2 min readJun 18, 2020

Executor → An executor is a single JVM process that is launched for an application on a worker node. A single node in the cluster can run multiple executors and executors for an application can span multiple worker nodes.

Core → Core is a basic computation unit of CPU and a CPU may have one or more cores to perform tasks at a given time

Apache spark application needs executors and cores as input to while submitting spark application .

Choosing executors is straight forward, but may include trail and error approach to fine tune it based on application timelines, partitions e.t.c.

But choosing number of cores per executor is tricky. If we choose only one core than we may loose parallelism per executor. If we choose more than one core parallelism of application also increase at executor level.

We need to be careful regarding the multi-threaded behavior of the application when the number of cores per executor is more than 1.

Why ?

Suppose spark application code that runs on executors contains logic related to using non-thread-safe classes like LRU or LinkedHashMap, the application will not behave as expected and run into problems like race conditions, deadlocks e.t.c which in turn hang the spark application.

If spark application (batch or streaming) seems stuck without performing any job then we need to take thread dump which will throw light on to the classes that are blocking and most of the times non-thread-safe classes will be the ones that land in thread dumps.

How executor cores impact spark application when chosen incorrectly?

Create an account to read the full story.

Written by Aditya

No responses yet