One thing I've thought about a lot is captchas that could collect samples of natural language, in a range of languages, accents etc under a free license (CC BY-SA or maybe even GPL). These could gradually build up a new speech corpus to add to those being used to train free code voice recognition software (#Voxforge etc)
A voice sample captcha could pick a sentence at random from a long Wikipedia article (as an example of text under a free license), in the system language and character set of the user, and ask the user to read it aloud while holding a button down so the system can record it. It would would need to check somehow that the sample is newly recorded and matches the text, so it's not a bot recycling samples from somewhere. Maybe that could be another captcha, so humans can check it?
@strypey we might need to just that
@strypey Sue for labor law violations
@strypey unfortunately this only works for people who can speak (American) English perfectly. Will not be accessible for the members of the Deaf community. Even though I can speak fairly well, Siri doesn't understand me 60-70% of the time 😝
Ableism is never cool, not even as a hypothetical solution
@Gotterdammerung the goal of the design is to collect samples of how any real human speaks, in any and every language, in any and every accent, including deaf accents. The goal being to support the development of free code speech recognition engines that can understand you all the time 😊 It's worth noting that recording a voice sample would be much easier for blind users than most existing forms of captcha, as long as the UI is designed with accessibility in mind.
@strypey the Deaf generally do not speak as a rule. They will laugh or grunt at the most, if they make any vocal sounds at all. 😊
Those that do speak are a tiny minority. Maybe 5% maximum.
@Gotterdammerung OK, I see. Deaf users wouldn't be able to check recorded samples against the original text either. Thanks for clarifying. To be clear, the voice sample thing was just an example of a more general principle, of making the tasks required by captchas contribute something to a commons. I imagine this hypothetical "captcha commons" system including a growing range of tasks, at least some of which would work for the deaf. Something that helps OpenStreetMap, for example.
@Gotterdammerung that's a cool idea. A sign-recognition engine would be amazing for accessibility. In order to train a free code AI to recognize sign reliably, you'd need a huge corpus of freely licensed video samples of people signing. So when deaf users (and anyone else who can sign) encounter the hypothetical captcha, they could record a video sample instead of a voice sample, or audit one recorded by another user against the text it's meant to represent.
@strypey @CharredStencil the visual data of videophone chats between the community and the phone interpreters at companies like Sorenson will build a database of video samples of a wide selection of signers.
That's when a deaf user encounters the hypothetical captcha, they can instead sign in the camera to prove that they're human users.
> will build a database of video samples of a wide selection of signers.
But will that database be freely licensed? That's important for open source communities to develop #FreeCode, sign recognition tools, as they usually can't afford to pay the license fees to use a proprietary one. This has held back the development of free code speech recognition tools for years, which is why #LibreVox was set up.
@strypey My reCaptcha policy remains:
@dredmorbius the thing is, despite my critique of Goggle and their reCaptcha, captcha do serve a legitimate need. Like most sites that allow open sign-ups, CoActivate.org (hosts of my Disintermedia blog and wiki) is constantly bombed by spam bots setting up projects and seeding them with linkspam. Unless we close sign-ups, or require a human to troll through them manually, we need some mechanism for distinguishing between humans and bots. At the moment our system isn't very effective :(
@strypey The question, ultimately, is trust.
ReCaptcha is one mechanism. It is not the _only_ mechanism.
It fails to stop many of the bad guys. It stops many of the good guys. And it feeds bad behaviour and privatises workfactor.
I think we can do better.
@dredmorbius preaching to the choir brother ;) See the rest of the thread that branched off from my OP.