Apache Spark Notes
- if you’re programming in Scala, this particular page in the docs regarding the Dataset API will be your friend
- DataFrames are statically typed, DataSets are collections of objects
- “Dataset APIs are all expressed as lambda (anonymous) functions and JVM typed objects”
- datasets are composed of
Dataset[T]
, strongly typed JVM objects; dataframes are composed ofDataset[Row]
, untyped JVM objects - :star: dataframes vs datasets vs RDDs
- datasets are composed of