The Parser for Apache Spark parses unmodified Apache Spark History Server Event logs.
Parsed logs contain metadata pertaining to your Apache Spark application execution. Particularly, the runtime for a task, the amount of data read & written, the amount of memory used, etc. These logs do not contain
sensitive information such as the data that your Apache Spark application is processing. Below is an example of the output of the log parser

Clone this repo to the desired directory.
If you have not already done so, complete the instructions to download the Apache Spark event log.
-
To process a log file, execute the parse.py script in the sync_parser folder, and provide a log file destination with the -d flag.
python3 sync_parser/parse.py -d [log file location]The parsed file
parsed-[log file name]will appear in the results directory. -
Send Sync Computing the parsed log
Email Sync Computing (or upload to the Sync Auto-tuner) the parsed event log.