UC Berkeley scientists developing artificial intelligence tool to combat ‘hate speech’ on social media
Berkeley scientists developing artificial intelligence
tool to combat ‘hate speech’ on social media
DANIEL PAYNE - ASSISTANT EDITOR • DECEMBER 17, 2018
‘Ten students of diverse backgrounds’ helped developed
algorithm
Scientists at the University of California, Berkeley, are
developing a tool that uses artificial intelligence to identify “hate speech”
on social media, a program that researchers hope will out-perform human beings
in identifying bigoted comments on Twitter, Reddit and other online platforms.
Scientists at Berkeley’s D-Lab “are working in
cooperation with the [Anti-Defamation League] on a ‘scalable detection’
system—the Online Hate Index (OHI)—to identify hate speech,” the Cal Alumni
Association reports.
In addition to artificial intelligence, the program will
use several different techniques to detect offensive speech online, including
“machine learning, natural language processing, and good old human brains.”
Researchers aim to have “major social media platforms” one day utilizing the
technology to detect “hate speech” and eliminate it, and the users who spread
it, from their networks.
Current technology mainly involves the use of “keyword
searches,” one researcher states, which are “fairly imprecise and blunt.”
Current algorithms can be fooled by simply spelling words differently.
The OHI intends to address these deficiencies. Already,
their work has attracted the attention and financial support of the platforms
that are most bedeviled—and that draw the most criticism—for hate-laced
content: Twitter, Google, Facebook, and Reddit…
D-Lab initially enlisted ten students of diverse
backgrounds from around the country to “code” the posts, flagging those that
overtly, or subtly, conveyed hate messages. Data obtained from the original
group of students were fed into machine learning models, ultimately yielding
algorithms that could identify text that met hate speech definitions with 85
percent accuracy, missing or mislabeling offensive words and phrases only 15
percent of the time.
Though the initial ten coders were left to make their own
evaluations, they were given survey questions (e.g. “…Is the comment directed
at or about any individual or groups based on race or ethnicity?) to help them
differentiate hate speech from merely offensive language. In general, “hate
comments” were associated with specific groups while “non-hate” language was
linked to specific individuals without reference to religion, race, gender,
etc. Under these criteria, a screed against the Jewish community would be
identified as hate speech while a rant—no matter how foul—against an
African-American celebrity might get a pass, as long as his or her race wasn’t
cited.
One researcher warned against the possibility of
inadvertent censorship: “Unless real restraint is exercised, free speech could
be compromised by overzealous and self-appointed censors.” The lab is thus
“working to minimize bias with proper training and online protocols that
prevent operators from discussing codes or comments with each other.”
Comments
Post a Comment