Coffee, coding, cinema and making data informed decisions (Spam Control) Part 2/2

MyMovieRack
4 min readNov 18, 2017

“There is nothing noble in being superior to your fellow men. True nobility lies in being superior to your former self”- Hemingway

We don’t just want to get better than our competition when it comes to offering recommendations, but also have credibility with the content and ratings we offer. It was high time we got serious, and maybe to some extent, stringent.

Feature 17.4.3: Enforcing Spam and noise control

As mentioned above, the next biggest concern was ensuring credibility which is an essence of any discovery platform. Time and again, with our rising popularity, we have noticed our platform being spammed by PR agencies rating movies and TV show higher than what they deserve.
That not only led to noise in our rating system but also created an uncomfortable place for authentic users of the platform. For eg. a Marathi film, was spammed high ratings by a certain PR agency’s army of users. No, Sir , that may be possible on IMDB but not on MMR!!

Machine Learning trained model to catch such type of organised PR spam

One glaring example was one of the MSG’s film which was spammed high ratings by some users and reached at 4.2/5 on our platform and 9.2 on IMDb.

Thus it was time to put some checks by considering factors such as:

  • The Age of rating
  • The value of rating
  • The prestige of rater
  • The authenticity of the rater profile
  • Likelihood of fraudulent rating

Though the top 3 were easy to recognise and implement, items 4 and 5 required us to build and train a model which could help us predict the same to best confidence. I will write a separate piece on the maths, but for now we can safely assume our ML trainings and the resources we referred to were quite useful here. After testing on multiple datasets we were able to correct the spam ratings while genuine ratings remain unaltered. Below are the current ratings of the movie on both platforms.

Well MSG’s actions are neither above law nor above our Algo’s rules. Earlier Ratings 4.2/5
IMDb rating for the Hind Ka Napak Ko Jawab

Feature 17.4.4: Support of Regional Language content and assets

Mymovierack has been accessed by users across 170 countries. Even if we look at our country of origin, there are 29 states with 50+ major languages. India, of recent, has been going through huge quantum of digital transformation . A recent report confirmed the number of Facebook users in India stands at 241 million and rising, highest in the world.

Distribution of traffic sources on MMR on the basis of Primary Regional Language

That being said, even our Google analytics Behaviour numbers reported appreciable numbers on Non-English and Non-Hindi users frequenting our platform. This lead us to do an cost benefit analysis on including regional language content listing support on our platform and the results was overwhelmingly in favor of including the same. We have already started supporting major Indian regional language like Tamil, Telugu, Kannada, Marathi, Malayalam, Gujrati & Punjabi. In coming future, as we expand globally, we would be supporting major languages outside India such as Chinese, Korean, Japanese, Russian and French followed by Iranian and German/Austrian.

One Step at a time…

Our energies are now focused in building the world’s first organic and humane recommendation system. We have named the project daydreaMMR as it won’t be just an code block which spits out a bunch of films and TV show, but a close friend who knows you, understands you and cares of you and your taste buds for movies and TV shows.

While writing this note I was struck with an epiphany that I should spent sometime in the sun and make some efforts to see the world outside. For now, I will take some break, but not before I promise that MMR will continue in it’s endeavour to personalize entertainment. While there are many players in the arena(From incumbents with deep pockets to 100’s of new kids like us), what seperates us is what we have defined as What’s in it for us?
Adios for now!

You can read Part 1/2 of this series here

In coming days we will be releasing snippets of some common utilities which will be helpful to most of the startups. Stay tuned!

--

--