yes the script is suitable for a small dataset. I have updated with another statistics job with the smaller dataset, please check: https://bigcount.xyz/script-and-spark-for-small-dataset.html regards David Precious wrote: > Given that the OP is running into memory issues processing an 80+ > million line file, I don't think suggesting a CPAN module designed to > read the entire contents of a file into memory is going to be very > helpfulThread Previous | Thread Next