Removing duplicate lines from a file

For removing duplicate lines from a file in a mac or linux we can use the following

cat file | sort | uniq -u > output

Note that you usually need to use the shuf command to be able to consume the file in a machine learning or statistics pipeline

Leave a Reply

Your email address will not be published. Required fields are marked *