Skip to content

Latest commit

 

History

History
206 lines (144 loc) · 6.58 KB

spark-SparkListener.adoc

File metadata and controls

206 lines (144 loc) · 6.58 KB

Spark Listeners

SparkListener is a developer API for custom Spark listeners. It is an abstract class that is a SparkListenerInterface with empty no-op implementations of all the callback methods.

With all the callbacks being no-ops, you can focus on events of your liking and process a subset of events.

Tip
Developing a custom SparkListener is an excellent introduction to low-level details of Spark’s Execution Model. Check out the exercise Developing Custom SparkListener to monitor DAGScheduler in Scala.

SparkListenerEvents

Caution
FIXME Give a less code-centric description of the times for the events.

SparkListenerApplicationStart

SparkListenerApplicationStart(
  appName: String,
  appId: Option[String],
  time: Long,
  sparkUser: String,
  appAttemptId: Option[String],
  driverLogs: Option[Map[String, String]] = None)

SparkListenerApplicationStart is posted when SparkContext does postApplicationStart.

SparkListenerJobStart

SparkListenerJobStart(
  jobId: Int,
  time: Long,
  stageInfos: Seq[StageInfo],
  properties: Properties = null)

SparkListenerJobStart is posted when DAGScheduler does handleJobSubmitted and handleMapStageSubmitted.

SparkListenerStageSubmitted

SparkListenerStageSubmitted(stageInfo: StageInfo, properties: Properties = null)

SparkListenerStageSubmitted is posted when DAGScheduler does submitMissingTasks.

SparkListenerTaskStart

SparkListenerTaskStart(stageId: Int, stageAttemptId: Int, taskInfo: TaskInfo)

SparkListenerTaskStart is posted when DAGScheduler does handleBeginEvent.

SparkListenerTaskGettingResult

SparkListenerTaskGettingResult(taskInfo: TaskInfo)

SparkListenerTaskGettingResult is posted when DAGScheduler does handleGetTaskResult.

SparkListenerTaskEnd

SparkListenerTaskEnd(
  stageId: Int,
  stageAttemptId: Int,
  taskType: String,
  reason: TaskEndReason,
  taskInfo: TaskInfo,
  // may be null if the task has failed
  @Nullable taskMetrics: TaskMetrics)

SparkListenerTaskEnd is posted when DAGScheduler does handleTaskCompletion.

SparkListenerStageCompleted

SparkListenerStageCompleted(stageInfo: StageInfo)

SparkListenerStageCompleted is posted when DAGScheduler does markStageAsFinished.

SparkListenerJobEnd

SparkListenerJobEnd(
  jobId: Int,
  time: Long,
  jobResult: JobResult)

SparkListenerJobEnd is posted when DAGScheduler does cleanUpAfterSchedulerStop, handleTaskCompletion, failJobAndIndependentStages, and markMapStageJobAsFinished.

SparkListenerApplicationEnd

SparkListenerApplicationEnd(time: Long)

SparkListenerApplicationEnd is posted when SparkContext does postApplicationEnd.

SparkListenerEnvironmentUpdate

SparkListenerEnvironmentUpdate(environmentDetails: Map[String, Seq[(String, String)]])

SparkListenerEnvironmentUpdate is posted when SparkContext does postEnvironmentUpdate.

SparkListenerBlockManagerAdded

SparkListenerBlockManagerAdded(
  time: Long,
  blockManagerId: BlockManagerId,
  maxMem: Long)

SparkListenerBlockManagerAdded is posted when BlockManagerMasterEndpoint registers a BlockManager.

SparkListenerBlockManagerRemoved

SparkListenerBlockManagerRemoved(
  time: Long,
  blockManagerId: BlockManagerId)

SparkListenerBlockManagerRemoved is posted when BlockManagerMasterEndpoint removes a BlockManager.

SparkListenerBlockUpdated

SparkListenerBlockUpdated(blockUpdatedInfo: BlockUpdatedInfo)

SparkListenerBlockUpdated is posted when BlockManagerMasterEndpoint receives UpdateBlockInfo message.

SparkListenerUnpersistRDD

SparkListenerUnpersistRDD(rddId: Int)

SparkListenerUnpersistRDD is posted when SparkContext does unpersistRDD.

SparkListenerExecutorAdded

SparkListenerExecutorAdded(
  time: Long,
  executorId: String,
  executorInfo: ExecutorInfo)

SparkListenerExecutorAdded is posted when DriverEndpoint RPC endpoint (of CoarseGrainedSchedulerBackend) handles RegisterExecutor message, MesosFineGrainedSchedulerBackend does resourceOffers, and LocalSchedulerBackendEndpoint starts.

SparkListenerExecutorRemoved

SparkListenerExecutorRemoved(
  time: Long,
  executorId: String,
  reason: String)

SparkListenerExecutorRemoved is posted when DriverEndpoint RPC endpoint (of CoarseGrainedSchedulerBackend) does removeExecutor and MesosFineGrainedSchedulerBackend does removeExecutor.

Known Implementations

The following is the complete list of all known Spark listeners:

Caution
FIXME Make it complete.

SparkListenerInterface

SparkListenerInterface is an internal interface for listeners of events from the Spark scheduler.