2 Inspiration

To become a data curator, you do not need to be a data scientist, a statistician, or a data engineer. We are looking for professionals, researchers, or citizen scientists who are interested in data and its visualisation and its potential to form the basis of informed business or policy decisions and to provide scientific or legal evidence. Our ideal curators share a passion for data-driven evidence or visualisations and have a strong, subjective idea about the data that would inform them in their work.

2.1 We need data

2.1.1 No Data is Available: This Scientist Stung Himself With Dozens Of Insects Because No One Else Would

The Schmidt Pain Index, as its informally known, runs from 1-4. The common honey bee serves as its anchor point, a solid 2. At the top end of the scale lie the bullet ant and the tarantula hawk (which is neither a tarantula nor a hawk; it’s a wasp). Watch the video with Dr. Schmidt, and listen to the whole interview here. ⏯ This Scientist Stung Himself With Dozens Of Insects Because No One Else Would.

2.1.2 Nobody Counted Them Before: Big Data Is Saving This Little Bird

“We need to improve conservation by improving wildlife monitoring. Counting plants and animals is really tricky business.” ⏯ Big Data Is Saving This Little Bird

2.2 From Datasets and Files to Living Web Resoures {web-30}

2.2.1 Web 3.0

2.3 Remain Critical: Ethical Data, Trustworthy AI

Sometimes we put our hands on data that looks like a unique starting point to create a new indicator. But our indicator will be flawed if the original dataset is flawed. And it can be flawed in many ways, most likely that some important aspect of the information was omitted, or the data is autoselected, for example, under-sampling women, people of colour, or observations from small or less developed countries.

2.3.1 Machine Learning from Bad Data: Weapons of Math Destruction, Algorithms of Oppression

Cathy O’Neil: ⏯ Weapons of math destruction, which O’Neil are mathematical models or algorithms that claim to quantify important traits: teacher quality, recidivism risk, creditworthiness but have harmful outcomes and often reinforce inequality, keeping the poor poor and the rich rich. They have three things in common: opacity, scale, and damage. https://blogs.scientificamerican.com/roots-of-unity/review-weapons-of-math-destruction/](https://blogs.scientificamerican.com/roots-of-unity/review-weapons-of-math-destruction/)

In ⏯ Algorithms of Oppression, Safiya Umoja Noble challenges the idea that search engines like Google offer an equal playing field for all forms of ideas, identities, and activities. Data discrimination is a real social problem; Noble argues that the combination of private interests in promoting certain sites, along with the monopoly status of a relatively small number of Internet search engines, leads to a biased set of search algorithms that privilege whiteness and discriminate against people of colour, especially women of colour.

2.3.2 Big Data Creates Inequalities: Data Feminism

Catherine D’Ignazio and Lauren F. Klein: ⏯ Data Feminism. This is a much-celebrated book and with a good reason. It views AI and data problems from a feminist point of view, but the examples and the toolbox can be easily imagined for small-country biases, racial, ethnic, or small enterprise problems. A very good introduction to the injustice of big data and the fight for a fairer use of data, and how bad data collection practices through garbage in-garbage out lead to misleading information or even misinformation.

2.3.3 Bad Data collection Used for Modeling: Why The Bronx Burned

Why The Bronx Burned. Between 1970 and 1980, seven census tracts in the Bronx lost more than 97 percent of their buildings to fire and abandonment. In his book ⏯ The Fires, Joe Flood blames the misguided “best and brightest” effort by New York City to increase government efficiency. With the help of the Rand Corp., the city tried to measure fire response times, identify redundancies in service, and close or re-allocate fire stations accordingly. What resulted, though, was a perfect storm of bad data: The methodology was flawed, the analysis was rife with biases, and the results were interpreted in a way that stacked the deck against poorer neighbourhoods. The slower response times allowed smaller fires to rage uncontrolled in the city’s most vulnerable communities. Listen to the podcast here.

2.3.4 Bad Incentives Are Blocking Better Science

Bad Incentives Are Blocking Better Science “There’s a difference between an answer and a result. But all the incentives are pointing toward telling you that as soon as you get a result, you stop.” After the deluge of retractions, the stories of fraudsters, the false positives, and the high-profile failures to replicate landmark studies, some people have begun to ask: “⏯ Is science broken?”. Listen to the pdocast [⏯Science is Hard] tttps://podcasts.apple.com/us/podcast/10-science-is-hard/id1011406983?i=1000391467935)

2.4 Reality Check

2.4.1 Looking Behind Data: Moving to America’s Worst Place to Live

Christopher Ingraham wrote ⏯ a quick blog post for The Washington Post about an obscure USDA data set called the natural amenities index, which attempts to quantify the natural beauty of different parts of the country. He described the rankings, noted the counties at the top and bottom, hit publish and did not think much of it. Almost immediately, he started to hear from the residents of northern Minnesota, who were not very happy that Chris had written, “The absolute worst place to live in America is (drumroll, please) … Red Lake County, Minn.” He could not have been more wrong … a year later he moved to Red Lake County with his family.