“Artificial intelligence image tools have a tendency to spin up disturbing clichés: Asian women are hypersexual. Africans are primitive. Europeans are worldly. Leaders are men. Prisoners are Black.
These stereotypes don’t reflect the real world; they stem from the data that trains the technology. Grabbed from the internet, these troves can be toxic — rife with pornography, misogyny, violence and bigotry.”
“AI chatbots have exploded in popularity over the past four months, stunning the public with their awesome abilities, from writing sophisticated term papers to holding unnervingly lucid conversations. Chatbots cannot think like humans: They do not actually understand what they say. They can mimic human speech because the artificial intelligence that powers them has ingested a gargantuan amount of text, mostly scraped from the internet.
This text is the AI’s main source of information about the world as it is being built, and influences how it responds to users. If it aces the law school admissions test, for example, it’s probably because its training data included thousands of LSAT practice sites. Tech companies have grown secretive about what they feed the AI. So The Washington Post set out to analyze one of these data sets to fully reveal the types of proprietary, personal, and often offensive websites that go into an AI’s training data.”
“Pour essayer d’affiner son esprit critique, Delphi a donc passé de longs moments à scruter le web, et notamment les questions posées sur les pages Reddit r/AmITheAsshole (que l’on traduirait par « suis-je un trou du cul ? ») et r/Confessions, ou les redditeurs se livrent leurs secrets les moins avouables. Ces situations ont ensuite été soumises aux jugements de sous-traitants, employés grâce à l’Amazon Mechanical Turk, l’outil de microtâche à bas prix mis à disposition par Amazon. De ce processus est sorti une sorte de « guide moral » appelé Commonsense Norm Bank. Cette base de données « compile 1,7 million d’exemples de jugements éthiques de personnes, sur un large éventail de situations quotidiennes. »”
“Amazon Mechanical Turk (AMT) offers a relatively low-cost alternative to traditional expensive survey samples, which likely explains its popularity among survey researchers. An important question about using such samples is whether they are representative of the larger Internet user population. Though prior research has addressed this question about demographic characteristics, little work has examined how AMT workers compare with others regarding their online activities—namely, social media experiences and online active engagement. This article analyzes survey data administered concurrently on an AMT and a national sample of U.S. adults to show that AMT workers are significantly more likely to use numerous social media, from Twitter to Pinterest and Reddit, as well as have significantly more experiences contributing their own online content, from posting videos to participating in various online forums and signing online petitions. The article discusses the implications of these findings for research that uses AMT as a sampling frame when examining questions related to social media use and active online engagement.”
“A member of the Stanford Behavioral Laboratory posted on a Prolific forum, “We have noticed a huge leap in the number of participants on the platform in the US Pool, from 40k to 80k. Which is great, however, now a lot of our studies have a gender skew where maybe 85% of participants are women. Plus the age has been averaging around 21.” Wayne State psychologist Hannah Schechter seems to have been the first person to crack the case. “This may be far-fetched,” she tweeted, linking to Frank’s video, “but given the timing, virality of the video, and the user’s follower demographics….” Long-standing Prolific survey-takers complained on Reddit that Frank had made it difficult to find paid surveys to take on the overrun platform.”
“The rapidly increasing usage of machine learning raises complicated questions: How can we tell if models are fair? Why do models make the predictions that they do? What are the privacy implications of feeding enormous amounts of data into models? This ongoing series of interactive, formula-free essays will walk you through these important concepts.”
Source : AI Explorables | PAIR
“These early results are encouraging, and we look forward to sharing more soon, but sensibleness and specificity aren’t the only qualities we’re looking for in models like LaMDA. We’re also exploring dimensions like “interestingness,” by assessing whether responses are insightful, unexpected or witty. Being Google, we also care a lot about factuality (that is, whether LaMDA sticks to facts, something language models often struggle with), and are investigating ways to ensure LaMDA’s responses aren’t just compelling but correct. But the most important question we ask ourselves when it comes to our technologies is whether they adhere to our AI Principles. Language might be one of humanity’s greatest tools, but like all tools it can be misused. Models trained on language can propagate that misuse — for instance, by internalizing biases, mirroring hateful speech, or replicating misleading information. And even when the language it’s trained on is carefully vetted, the model itself can still be put to ill use. ”
“To make sure we’re building for everyone, our model accounts for factors like age, sex, race and skin types — from pale skin that does not tan to brown skin that rarely burns. We developed and fine-tuned our model with de-identified data encompassing around 65,000 images and case data of diagnosed skin conditions, millions of curated skin concern images and thousands of examples of healthy skin — all across different demographics. Recently, the AI model that powers our tool successfully passed clinical validation, and the tool has been CE marked as a Class I medical device in the EU.”
“Quels que soient les axes de développement retenus, une chose est claire aux yeux de Florence G. Sell, professeur en droit privé à l’Université de Lorraine : « la mise à disposition des décisions de justice couplée aux progrès des outils du Big Data va permettre une vision beaucoup plus globale et approfondie du fonctionnement de la justice ». Pour l’experte, l’institution judiciaire a tout intérêt à se saisir de ces outils pour améliorer sa qualité et son efficacité. Et si elle ne le fait pas,« d’autres acteurs, tels les avocats ou les startups le feront : ce seront alors eux qui seront à la pointe d’une évolution de toute façon irrémédiable. »”
“My advice is simply to take note of your emotional reaction to each headline, sound bite or statistical claim. Is it joy, rage, triumph? Fine. But having noticed it, keep thinking. You may find clarity emerges once your emotions have been acknowledged. So what do puzzles, poker, and misinformation have in common? Some puzzles — and some poker hands — require enormous intellectual resources to navigate, and the same is true of certain subtle statistical fallacies. But much of the time we fool ourselves in simple ways and for simple reasons. Slow down, calm down, and the battle for truth is already half won.”