I wish nobody used . Not only does it forces users to allow Goggle to run scripts in their browser, making users do unpaid mechanical turking for Goggle is just rude. If it contributed to a data commons, like training free code AI, I would't mind so much.

One thing I've thought about a lot is captchas that could collect samples of natural language, in a range of languages, accents etc under a free license (CC BY-SA or maybe even GPL). These could gradually build up a new speech corpus to add to those being used to train free code voice recognition software ( etc)

Show thread

A voice sample captcha could pick a sentence at random from a long Wikipedia article (as an example of text under a free license), in the system language and character set of the user, and ask the user to read it aloud while holding a button down so the system can record it. It would would need to check somehow that the sample is newly recorded and matches the text, so it's not a bot recycling samples from somewhere. Maybe that could be another captcha, so humans can check it?

Show thread

@strypey unfortunately this only works for people who can speak (American) English perfectly. Will not be accessible for the members of the Deaf community. Even though I can speak fairly well, Siri doesn't understand me 60-70% of the time 😝
Ableism is never cool, not even as a hypothetical solution

@Gotterdammerung thanks for taking the time to point out how your needs are different. It's definitely important to consider in any design.

@Gotterdammerung the goal of the design is to collect samples of how any real human speaks, in any and every language, in any and every accent, including deaf accents. The goal being to support the development of free code speech recognition engines that can understand you all the time 😊 It's worth noting that recording a voice sample would be much easier for blind users than most existing forms of captcha, as long as the UI is designed with accessibility in mind.

@strypey the Deaf generally do not speak as a rule. They will laugh or grunt at the most, if they make any vocal sounds at all. 😊

Those that do speak are a tiny minority. Maybe 5% maximum.

@Gotterdammerung OK, I see. Deaf users wouldn't be able to check recorded samples against the original text either. Thanks for clarifying. To be clear, the voice sample thing was just an example of a more general principle, of making the tasks required by captchas contribute something to a commons. I imagine this hypothetical "captcha commons" system including a growing range of tasks, at least some of which would work for the deaf. Something that helps OpenStreetMap, for example.

@strypey @Gotterdammerung The voice CAPTCHA could fall back on a visual CAPTCHA, say, clicking on traffic lights or buses

@CharredStencil @strypey Nope. Instead of going backwards, why not actually treat the deaf user as a human being and include some software algorithm that can actually recognize sign language?

@Gotterdammerung that's a cool idea. A sign-recognition engine would be amazing for accessibility. In order to train a free code AI to recognize sign reliably, you'd need a huge corpus of freely licensed video samples of people signing. So when deaf users (and anyone else who can sign) encounter the hypothetical captcha, they could record a video sample instead of a voice sample, or audit one recorded by another user against the text it's meant to represent.

@strypey @CharredStencil the visual data of videophone chats between the community and the phone interpreters at companies like Sorenson will build a database of video samples of a wide selection of signers.
That's when a deaf user encounters the hypothetical captcha, they can instead sign in the camera to prove that they're human users.

> will build a database of video samples of a wide selection of signers.

But will that database be freely licensed? That's important for open source communities to develop , sign recognition tools, as they usually can't afford to pay the license fees to use a proprietary one. This has held back the development of free code speech recognition tools for years, which is why was set up.

@Gotterdammerung Also, even if the databases of Sorenson etc are freely licensed, it never hurts to have a larger corpus of samples for training pattern recognition engines.

@Gotterdammerung @strypey Adding one feature based on an unsolved CS problem is a step backwards unless you add two features based on unsolved CS problems?

@dredmorbius the thing is, despite my critique of Goggle and their reCaptcha, captcha do serve a legitimate need. Like most sites that allow open sign-ups, (hosts of my Disintermedia blog and wiki) is constantly bombed by spam bots setting up projects and seeding them with linkspam. Unless we close sign-ups, or require a human to troll through them manually, we need some mechanism for distinguishing between humans and bots. At the moment our system isn't very effective :(

@strypey The question, ultimately, is trust.

ReCaptcha is one mechanism. It is not the _only_ mechanism.

It fails to stop many of the bad guys. It stops many of the good guys. And it feeds bad behaviour and privatises workfactor.

I think we can do better.

@dredmorbius preaching to the choir brother ;) See the rest of the thread that branched off from my OP.

Sign in to participate in the conversation
Mastodon - NZOSS

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!