To create a fairer AI dataset, Facebook urges people to share their age and gender

To create a fairer AI dataset, Facebook urges people to share their age and gender

Tech News: To create a fairer AI dataset, Facebook urges people to share their age and gender.

Facebook is sharing a new and diverse dataset with the wider AI community. In an announcement detected by VentureBeat, the company says researchers want to use the collection, called Casual Conversations, to test their machine learning patterns for bias. The dataset includes 3,011 people in 45,186 videos and is named after from the fact that these people give unscripted answers to the company’s questions.

Importantly, casual conversations involve paid actors to whom Facebook explicitly asked to disclose their age and gender. The company also hired trained professionals to label subjects’ ambient light and skin tones according to the Fitzpatrick scale, a system developed by dermatologists to classify human skin tones. Facebook states that the dataset is the first of its kind.

You don’t have to look far to find examples of bias in AI. A recent study found that face recognition and analysis programs like Face ++ rate the faces of black men more angry than their white counterparts, even when both men are smiling. The same flaws have permeated consumer AI software. In 2015, Google changed Photos to no longer use a label after software engineer Jacky Alciné discovered that the app misidentified his black friends as “gorillas.”

Many of these problems can be traced back to the datasets organizations use to train their software, and this is where an initiative like this can help. A recent MIT study of popular machine learning datasets found that about 3.4 percent of the data in those collections were inaccurate or mislabeled.

Although Facebook describes Casual Conversations as a “good brave first step forward,” admits the dataset isn’t perfect. For starters, it only contains people from the United States. The company also did not ask participants about their ethnicity and, when it came to gender, the only options were “male”, “female” and “other”. However, the company plans to make the dataset more inclusive in the next year.