Big Data for All

We want to ensure that individual researchers, artists, and professionals, as well as NGOs and small and large organizations can benefit equally from big data in the age of artificial intelligence.

Big data creates inequality and injustice because it is only the big corporations, big government agencies, and the biggest, best endowed universities that can finance long-lasting, comprehensive data collection programs. Big data, and large, well-processed, tidy, and accurately imputed datasets allow them to unleash the power of machine learning and AI. These large entities are able to create algorithms that decide the commercial success of your product and your artwork, giving them a competitive edge against smaller competitors while helping them evade regulations.

We are looking for partners to develop our technological solution in a financially sustainable way, bringing increasingly relevant curated open data to light. Our Product/Market Fit was validated in the world’s 2nd ranked university-backed incubator program, the Yes!Delft AI Validation Lab. We are currently developing this project with the help of the JUMP European Music Market Accelerator program.

How We Add Value To Your Data?

Many countries in the world allow access to a vast array of information, such as documents under freedom of information requests and national statistics figures and datasets. In the European Union, most taxpayer-financed data can be accessed and reused in domains such as government administration, transportation, or meteorology, for example. More and more scientific output is expected to be reviewable and reproducible, implying open access.

We create high value key business and policy evaluation indicators.
Scientific proofs require the combination of correctly matching, formatting, and verifying controlled pieces of data. Our data comes from verified and legal sources, with information about use rights and a complete history. We do not deal in blood diamonds.
Adding metadata exponentially increases the value of data.
Did your region add a new town to its boundaries? How do you adjust old data to conform to constantly changing geographic boundaries? What are some practical ways of combining satellite sensory data with my organization’s records? And do I have the right to do so? Metadata logs the history of data, providing instructions on how to reuse it, also setting the terms of use. We automate this labor-intensive process applying the FAIR data concept.
Data is only potential information, raw and unprocessed
. How do I correctly convert between dollars and euros? How do I verify consistency in units of measurement? Some of our indicators go through more than 10,000 processing steps. If your team does this in a spreadsheet or statistical software, there is no way it will be faultless – or that senior staff can verify it.

Data sits everywhere: in every government data warehouse (you can reuse it!), scientific journals, libraries, your sales records, and in sensors, to name a few. Not having access to data due to budgetary or legal constraints is an absolute barrier, and being unable to correctly assemble it into reliable information can keep its value low.
  • In the 1. Open Data chapter we investigate the reasons for which not even organizations such as the European Commission use open data in their own data dissemination practices, even though their data is at least legally available. The idea of open data is that it can reduce your material data costs, because it gives you access to data that was created at your tax expense by governmental agencies or universities. The main problem with open data is that while it is legally accessible, and often cost-free, in most cases it is not discoverable, and not even accessible directly. While since 2003 the EU has approved policies around making taxpayer-funded data reusable, it did not make much technical steps to make this a reality. Reusability of governmental data and scientific data is a right, but not a practical possiblity for most users.

  • In the 2. FAIR Data and the Added Value of Rich Metadata we introduce how we apply the concept of FAIR (findable, accessibe, interoperable, and reusable digital assets) in our APIs. Metadata does not relate to material data acquistion costs, but in fact, it is even more important: it is responsible for industry non-billable hours or uncredited working hours in academia. Poor data documentation, lack of reproducible processing and testing logs, inconsistent use of currencies, keywords, and storing messy data make reusability impossible. Organizations pay many times for the same, repeated work, because these boring tasks, which often comprise of tens of thousands of microtasks, are neglected. Our solution creates automatic documentation and metadata for your own historical internal data or for acquisitions from data vendors. We apply the more general Dublin Core and the more specific, mandatory and recommended values of DataCite for datasets – these are new requirements in EU-funded research from 2021. But they are just the minimal steps, and there is a lot more to do to create a diamond ring from an uncut gem.

  • In the 4. Application: Automated Data Observatories chapter we provide further technical information about our application. We use open-source software and open data. The applications are hosted on the cloud resources of Reprex, an early-stage technology startup currently building a viable, open-source, open-data business model to create reproducible research products. Our development team works on an open collaboration basis. Our indicator R packages and our services are developed together with rOpenGov.

  • See 5. Service Design and Business Case Development to understand our ideas around finding a suitable business model for data sharing, as well as collaborative research activities that share the exponential value added from data integration across various business, policy, and academic partners.

  • In the 6. Data Curators chapter we provide information for prospective curators. See also our get inspired and your first contribution subchapters.

Big Data For All

Machine learning and AI give a competitive edge to large companies and governments that can exploit it. But training algorithms requires large quantities of uniformly-formatted, high quality data, and the deployment of algorithms comes with many potential side effects.

Trustworthy AI
We help deploying reliable AI that is under human supervision, and algorithms that will not turn against your organization, or engage in discriminative, unlawful, or counterproductive behavior. We automate data and metadata management, documentation, and verification, because computers are much better than humans in these laborious and often repetitive tasks; humans must increasingly focus on oversight of too much data.

Open collaboration for data treasures
We use the agile open collaboration project methodology of open source software development to make sure that large universities, consultancies, citizen scientists, individual artists, and small NGOs can share research budgets, data assets, and innovation in big data to remain competitive against big tech and large organizations.
.Most organizations cannot afford to build an in-house data science and data engineerong team, nor do they possess in-house market research or IT capabilities. Instead of burdening your team with manual data downloads and ad hoc data manipulation, we offer you subscription for curated open and proprietary processed data. We keep all your data assets tidy, documented, and easy to use.
Automated Data Observatories
Data sits everywhere: in every government data warehouse (you can reuse it!), scientific journals, libraries, your sales records, and in sensors, to name a few. Not having access to data due to budgetary or legal constraints is an absolute barrier, and being unable to correctly assemble it into reliable information can keep its value low. Our observatories are built around open collaborations between scientific, business, public, and NGO policy partners.

Automated Data Observatories

Our work is inspired by the open collaboration concept, a well-known principle in open source software development and reproducible science. Our goal is to make this agile project management methodology more inclusive, involving data curators and various institutional partners as part of a general approach. Based on our early-stage startup, Reprex, and the open-source developer community rOpenGov, we are working together with other developers, data scientists, and domain specific data experts in climate change and mitigation, antitrust and innovation policies, and various aspects of the music and film industry.

The Green Deal Data Observatory is a modern reimagination of existing ‘data observatories’; currently, there are over 70 permanent international data collection and dissemination points known as ‘data observatories.’ One of our objectives is to understand why the dozens of the EU’s observatories do not use open data and reproducible research. We want to show that open governmental data, open science, and reproducible research can lead to higher quality and a faster data ecosystem that fosters growth for policy, business, and academic data users. Find it on the web and on social media: the Green Deal Data Observatory on Linkedin and the Green Deal Data Observatory on Twitter, and join our contributor team.
The Digital Music Observatory (DMO) is a fully automated, open source, open data observatory that creates public datasets to provide a comprehensive view of the European music industry. It provides high-quality and timely indicators in all four pillars of the planned official European Music Observatory as a modern, open source and largely open data-based, automated, API-supported alternative solution for this planned observatory. The insight and methodologies we are refining in the DMO are applicable and transferable to about 60 other data observatories funded by the EU which do not currently employ governmental or scientific open data. Find it on the web and on social media: the Digital Music Data Observatory on Linkedin and the Digital Music Data Observatory on Twitter and join our contributor team.
The Competition Data Observatory
is the first offspring of the Economy Data Observatory incubator. See further details in 10 Competition Data Observatory Chapter. We would like to create early-warning, risk, economic effect, and impact indicators that can be used in scientific, business and policy contexts for professionals who are working on re-setting the European economy after a devastating pandemic and in the age of AI. We would like to map data between economic activities (NACE), antitrust markets, and sub-national, regional, and metropolitan area data. See the prototype on the web.
The Economy Data Observatory
works now as an incubator for economy-focused data observatories. Find it on the web and on social media: Economy Data Observatory on Linkedin; Economy Data Observatory on Twitter. Join our contributor team!