Member-only story
Step by Step guide to expose spark jmx metrics and funnel them to datadog.
Please read my previous article https://sprinkle-twinkles.medium.com/how-jmx-metrics-from-spark-applications-will-help-to-configure-driver-executor-memory-correctly-560d5863d0af article to get context around the need and use of exposing spark jmx metrics .
Below are steps we need follow to expose spark jmx metrics and export them to datadog.
Step one: Configure spark job to expose jmx metrics.
This can be done by adding below args to spark.driver.extraJavaOptions
for a given spark job or add them in spark-defaults.conf (which is not recommended as it applies to each and every job on cluster)
'spark.driver.extraJavaOptions': '-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=8090
- Dcom.sun.management.jmxremote.rmi.port=8090 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=127.0.0.1
Step two: In spark-defaults.conf add “spark.metrics.namespace ${spark.app.name}”.
Why → When jmx metrics emitted they will contain application id like application_xxx as metrics prefix. So if we need to track metrics irrespective of application restarts , it needs to have application name instead of id.