Detecting workers spamming your HITs

ergo · Aug 28, 2009

Panos Ipeirotis of Behind the Enemy Lines, who studies Mechanical Turk from a variety of different angles, has put up slides from a presentation he'll be giving at the NY Mechanical Turk Meetup on September 1st.

http://www.slideshare.net/ipeirotis/new-york-mechanical-turk-meetup

The subject concerns collecting statistics about the error rates of workers completing one's tasks, and using those to infer whether any workers are cheating (or perhaps depending on the nature of the task, completing work too poorly to be of use).

Looks interesting. Cheating definitely does happen and is a serious problem for requesters to contend with, especially when it comes to large data sets. He's also released free code to go along with the presentation, which can be found here:

http://code.google.com/p/get-another-label/

Personally, I'd rather see requesters use custom qualifications, filtering out cheaters or poor workers in the beginning, but I suppose this could go hand in hand with that as a double measure to control quality.

lightdark · Aug 28, 2009

Amen to the custom qualifications.

Amazon needs to make the command line tools easier for some people or fix their web access to be able to have multiple qualifications.

I believe three qualifications are needed:
Approval rate better than 95%
Approved/Submitted Hits better than 50 (or higher)*
US qualification - for reasonable paying HITs from US requesters

*I've seen this qualification used correctly, I still need to find the code.

ipeirotis · Aug 28, 2009

Unfortunately, spammers are smarter than that.

Adding a qualification test, for example, for adult content detection, is pretty much useless. The spammer passes the test and then proceeds and gives random answers while rating real sites. Or marks all sites as porn. Or vice versa.

I put all the HIT % approval tests, added qualification tests, rejected spammers. Nothing. Still spammers come to the HITs. No matter the price, I get >50% spam in the porn detection HITs. I keep blocking spammer workers, they come back. Some of them are even working in pairs and give pairs of incorrect answers that can easily fool a "best of three" system.

So, the code that I distribute makes it easier to detect real spammers, and distinguish them from legitimate workers. I hope it is a first step towards a more systematic treatment of this problem. Teaching spammers that they will get no benefit from submitting spam, is the first step towards a spam-free MTurk. I hope that this will also bring more requesters in, and will also increase the willingness to pay. If the work is almost guaranteed to be spam-free, the willingness to pay increases.

At the same time, it is also important to distinguish spammers from legitimate workers that are working hard but have not understood properly the instructions: I see many workers that are systematically wrong, but in a very predictable manner. These guys are not spammers and deserve to be paid for their effort, even if their answers are always incorrect. My code tries to detect such cases and reward the guys that put some honest effort into the system.

lightdark · Aug 28, 2009

I believe a lot of these spammers as you call them, come from other countries besides the US. And to them the value of doing HITs is higher.

So what I'm trying to say, is if you get someone who has done 1000 approved HITs, with a 99% approval rate, they should not be a spammer. Sure, those qualifications would lessen the workers who could do the HIT, but the work done should be much better overall.

You may want to work with the whole population of workers. I would rather wait for better results.

Most workers would rather protect their investment if they have good stats. That's another reason, one worker may not choose to do 100s of tasks before they know anything about the requester, for fear of a dishonest requester. That's where fast approval helps out.

Workers need all the bad requesters blocked. No good worker wants to waste time going though garbage HITs. This lessens the active good workers, and increases your spammer ratio, because spammers don't care.

I do see the value in what your saying. A lot of requesters want all their work done quickly and correct. So using the whole population of workers may be optimal.

ergo · Aug 28, 2009

ipeirotis said: ↑

Some of them are even working in pairs and give pairs of incorrect answers that can easily fool a "best of three" system.
Click to expand...

Even Amazon has told some requesters privately that the "best of three" approach is "unsustainable" on its own. Thanks to cheaters on the Are these items different? HITs, honest workers were receiving scores of undeserved rejections. Amazon made no apology or recompensation in spite of some workers having their approval rates trashed, but they do seem to be operating differently now, taking longer to approve and using some other means.

At the same time, it is also important to distinguish spammers from legitimate workers that are working hard but have not understood properly the instructions: I see many workers that are systematically wrong, but in a very predictable manner. These guys are not spammers and deserve to be paid for their effort, even if their answers are always incorrect. My code tries to detect such cases and reward the guys that put some honest effort into the system.
Click to expand...

I appreciate you clarifying this. I was concerned about honest but mistaken workers being blocked, because two blocks earn a worker an automatic suspension. One can be reinstated, but still that could be hard on someone who needs the work and can't afford to miss a few days.

ipeirotis · Aug 29, 2009

Thanks lightdark for the suggestions. I will indeed try a 99% approval rate for the next batch of HITs that I will post. (I had the limit to 90% for the last batch.)

Will let you know how it goes.

lightdark · Aug 29, 2009

You could try filtering the newbies too. Anyone without at least 200 Approved HITs could be a dishonest worker, that just got lucky.

Techlist just used the 'Total approved HITs' qualification today, he should get a good idea how many honest active Mturk workers there actually are. See this thread.

This newest qualification is: Worker_NumberHITsApproved
QualificationTypeId: 00000000000000000040

Detecting workers spamming your HITs

ergo User

lightdark User

ipeirotis Member

lightdark User

ergo User

ipeirotis Member

lightdark User

Share This Page