This exercise was solved as part of the Deezer recruitment process, for the internship position of “Data Engineer – Fraud detection” opened in April 2016.
The objective was to come up with a way to better identify fraudulent streams that the platform sometimes has to deal with.
I produced a Python script that would process a "Daily stream" log file containing all Deezer listen logs for a day, and then detect & filter frauds to obtain the cleaned "Daily Artists Top 10" list later used for attribution and remuneration.
As the exercice is still used for recruitment purposes, I was asked to remove my solution from the web for obvious reasons. I really enjoyed working on such a project, and because I trully feel it's a good example of my working methodology, if you're really curious please contact me directly :)