Step by Step guide to expose spark jmx metrics and funnel them to datadog.

Aditya
2 min readNov 4, 2022

Please read my previous article https://sprinkle-twinkles.medium.com/how-jmx-metrics-from-spark-applications-will-help-to-configure-driver-executor-memory-correctly-560d5863d0af article to get context around the need and use of exposing spark jmx metrics .

Below are steps we need follow to expose spark jmx metrics and export them to datadog.

Step one: Configure spark job to expose jmx metrics.

This can be done by adding below args to spark.driver.extraJavaOptions for a given spark job or add them in spark-defaults.conf (which is not recommended as it applies to each and every job on cluster)

'spark.driver.extraJavaOptions': '-Dcom.sun.management.jmxremote                                  -Dcom.sun.management.jmxremote.port=8090                          
- Dcom.sun.management.jmxremote.rmi.port=8090 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=127.0.0.1

Step two: In spark-defaults.conf add “spark.metrics.namespace ${spark.app.name}”.

Why → When jmx metrics emitted they will contain application id like application_xxx as metrics prefix. So if we need to track metrics irrespective of application restarts , it needs to have application name instead of id.

--

--

Aditya

Principal data engineer → Distributed Threat hunting security platform | aws certified solutions architect | gssp-java | Chicago-IL