As the amount of research data available increases, searching for research data is one way of opening up new avenues of research. However, it is not without its pitfalls! This post looks at 2 problems facing the data searcher, and ways of overcoming them.
Research data is increasingly becoming a fundamental part of the PhD process. Funding statements and projects require data management plans that encompass what data will be produced, and how that data will be collected, managed and stored. Funding bodies including the UK Research and Innovation Councils (the UKRI includes the EPSRC, BBSRC and AHRC, to name a few) require research data from a funded project to be made open access.
The rise of open access research data comes with many benefits for researchers. Searching for data can open new avenues of research – sometimes in entirely different disciplines! Data can be used to examine, replicate and challenge research methodologies. It can be used to more effectively understand and support new conclusions or challenge existing ones. Sharing research data enables future researchers to discover new lines of enquiry without the duplication of effort involved in collecting the data again.
But even as the amount of research data increases, searching for this data is not always an easy task! There are two points to keep in mind when searching for research data.
- The Location of Data is not always obvious
Depending on where the data was collected, what data was collected, and how much was collected, data might be stored in a generalist repository, a discipline specific one, or an institutional one. Many articles provide links to where the corresponding dataset is deposited, but not always. Additionally, some datasets are not linked to specific articles or publications or are associated with many research projects that do not link back to these datasets.
But thankfully there is a way of finding out where this data might be stored. Re3data.org is an invaluable tool for finding data repositories – regardless of whether you’re searching by subject area or specific research topic. Personally, as a social scientist with an interest in AI and machine learning, this helped me to find repositories such as the UCI Machine Learning Repository and OpenML that I otherwise would have overlooked!
Alternatively, I have found it always worth checking the academic affiliation of various publication authors when looking for datasets that are associated with, but not explicitly linked to, a publication. Sometimes the data has been held not by the Principal Investigator’s University or at the University of one of the top listed academics on the paper, but rather at the data collector’s University.
2. The landscape of repositories if highly heterogenous
Some repositories are data specific, while others store datasets in particular areas or flag them with searchable terms. Some repositories don’t categorise data separately at all, or simply flag them as miscellaneous. Depending on the conventions governing how and where data is stored, different terms might be salient when conducting a search.
Searching for datasets in repositories can require a delicate touch. Sometimes the dataset is listed under a different name to the paper and on rare occasions even academic’s names can be listed differently due to different database conventions, changes in circumstance, and typos (seriously!). As with searching for specific items and combing through other repositories, a single letter can make a massive difference to a search term. If a dataset doesn’t come up the first time, when searching for the exact title or author name, it is definitely worth giving it a go with less information to see if that changes things.
Another thing to bear in mind is how data might be tagged in relation to the University. Although ‘University of Warwick’ seems to be the standard – and most popular – way of referencing the affiliation in generalist repositories, searching for ‘Warwick University’ can throw up different – and equally relevant – results!
With the above in mind, searching for data can be a frustrating experience. But if you can navigate the ever-expanding ocean of open access data repositories it is definitely a rewarding one. A bit of patience with search terms – the keyword ‘data’ or ‘dataset’ is a must – can open up entirely new lines of inquiry. Because of this, it is another great tool to have for researchers.
What about you? How do you use different databases and repositories? Tweet us at @ResearchEx, email us at firstname.lastname@example.org, or leave a comment below.