Spark Catalyst Internals
Spark catalyst is one of the secret sauce of Spark’s Operations on the structured data. Let’s take a deep look into its internals.
TreeNode is the fundamental data type abstraction for the catalyst internals. This abstraction brings methods (such as foreach, map, flatmap, collect etc.) that helps to manipulate scala functional for manipulating the internal tree node structure. The contract of the tree node abstraction to the operators/ class that extending it is to define the list of children.