Member-only story

Spark job vs stage vs task in simple terms(with cheat sheet)

2 min readSep 20, 2022

When a spark application invoke an action, such as collect() or take() on your DataFrame or Dataset, the action will create a job. Below is the Cheat sheet to remember.

One job results one or more stages
One stage results one or more tasks
one task operates on one partition

So what does executors run ? → Executor(for interview or concept perspective) is one jvm on one physical node(each physical node can contain or or more executors). One executor can run one or more tasks

General day-to-day example to understand the above scenario

Let’s say our spark job objective is to go to the bank and withdraw some cash and pay bills.

Stage 1 → Going in a car from home to the bank is one stage.

Starting car is one task
Starting gps is another task
Driving on road is another task

Stage 2 → Going into bank is another stage

Get down from car one task
Walk down to the bank another task
Go to the teller window is another task

Stage 3 → Withdraw cash from teller

Giving debit card to teller on task
Withdraw amount request another task
Collect money another task

So in above example, stage 3 depends on stage 2 and stage 2 depends on stage 1 (generally for shuffle or wide transformation we see this scenario also called ShuffleMapStage)
Say after withdrawing money if we use that to pay off some bills then can consider it as the final stage(called as ResultStage)

Spark job vs stage vs task in simple terms(with cheat sheet)

Written by Aditya

No responses yet