PhD on Track » Open Science » Research data

Research data

picture - research data

Recently, access to research data has become an important focus point among researchers. In 2018, The Norwegian Ministry of Education and Research created a national strategy on access to and sharing of research data. The main principles in the national strategy are that publicly funded research data should be shared and reused more widely, and that the data must be as open as possible, and as closed as necessary. Research data should be as transparent and reproducible as possible. Proper data management is a condition for reproducible data, and sharing data is important for the transparency.

Data should be as open as possible, and as closed as necessary (European Commission, 2016)

In this section, and in the Data Management section, you will learn about

  • open research data
  • how to define “research data”
  • how to make data management plans (DMP)
  • how to archive and share data in open repositories
  • how to decide if your data is too sensitive to share.

You can find some quick information about searching for data, data management plans and data archiving by clicking the arrows in the figure below.

Defining “data”

There is no consensus on the definition on the term “data” or “research data” – it varies according to discipline and project funder. To start with, we can use a definition taken from the Research Council of Norway (NFR, 2017):

[research data is]… the registration/recording/reporting of numerical scores, textual records, images and sounds that are generated by or arise during research projects. These may, for example, be data that are generated through new analysis by combining existing secondary data, or entirely new data that are generated through new data collection. Research data are always a direct result of research activity, regardless of whether the data are based on secondary data or whether they are collected from scratch.

Even if data can be perceieved as self-sustained units existing in the world, the data we are dealing with in this section can be divided into the following:

  • Your data versus other people’s data
  • Personal / sensitive data versus anonymous / impersonal data
  • Raw data versus processed data

The focus in this section is on the data you collect or generate during your research, not the data you have downloaded from a database. The most important issue is deciding if your data can in any way be regarded as personal or sensitive, and that you treat them accordingly.

Data management stages

You are responsible for your own results, how you store your data and, of course, how you share them later on. When starting a new project, find out if you need any kind of approval for dealing with sensitive data. Individual countries may have several officials or agencies responsible for approving the use and collection of data in new projects. Any project planning to deal with personal data of any kind will generally need approval from the national data protection authority (Datatilsynet in Norway), the data protection official at your institution or any board of health research ethics (REC in Norway)

If you use fully anonymized data you do not need to apply for approval by an external body. You still need to check if your institution has a research ethics board that must approve your project before you start.

By “anonymized data”, we mean data where it is impossible to identify any single individual through e.g.

  • names
  • ID Numbers
  • unique characteristics
  • background variables
  • name/connection key

Note that even though a single piece of data may not be enough to identify individuals, a combination of multiple variables could very well be enough to make the connection; typical cases could be sparsely populated areas or rare diseases.

Open research data

Open research data is still data, defined as above. The term “open” refers to the fact that there are no restrictions on their access. Foster defines open data as “…online, free of cost, accessible data that can be used, reused and distributed provided that the data source is attributed and shared alike”(Fosteropenscience, 2014). According to the Norwegian national strategy on access to and sharing of research data (2018), publicly funded research data should be shared and reused more widely. The strategy is based on the principle that research data must be as open as possible, as closed as necessary.

The openness of data is one of the key elements in the FAIR guidelines on research data. Open research data may be e.g. government data or collected data sets published in a data repository or in a data journal. As we move towards more open data, you need to know if your data can be opened up for the research community, and you need to know how to do this correctly.

Sharing data

If you want to share your research data, it is a good idea to make a data management plan at an early stage. If you document and structure your data from the beginning, you save yourself a lot of work preparing the data for sharing later. A proper data management plan ensures that you plan ahead regarding future use of the data, and also make the potential re-use of your data as easy as possible. A possible starting point for a data management and sharing plan could be the Digital Curation Centre’s working level guide: “How to develop a data management and sharing plan“.

Remember that you need permission from all data collaborators before you share your research data, and you should sign a written agreement stating the conditions for ownership, reuse and sharing. It is common practice to follow the same guidelines for co-authorship as for publications when publishing data.

There are many reasons for sharing research data. Funders like the Research Council of Norway and the European Commission have policies on data sharing, and for some funding programmes, you are required to make your data openly available unless you are restricted by e.g. sensitivity issues. Some journals, like Nature, also require that published articles are accompanied by the underlying research data. If your journal of choice does not require the publishing of related datasets, you should still keep in mind that the data could be valuable for other researchers and should therefore be made available if possible. Also remember that your institution may have a policy on data sharing.

Although you may not be required to share your data, there are still good reasons for sharing data, both for you yourself and for other researchers (Piwowar & Vision, 2013; RECODE, 2013):

  • Studies that make data available in a public repository receive significantly more citations than similar studies for which the data are not made available
  • Sharing your data increases your visibility as a researcher in general
  • Sharing data allows for new collaborations, and seeing how other people interpret and use your data may be beneficial for your own future research
  • Sharing your data makes your research more transparent, and thus more trustworthy
  • When research communities share their data, plagiarism and fraud are more easily detected, and science in general becomes more trustworthy
  • The research community can access a wider range of data to use for re-analysis, comparison, integration and testing, building a more solid research foundation
  • Quicker access to data allows science to move on faster
  • Less risk of data duplication provides better economics

How to share your data

There are several possibilities for sharing data:

  • Institutional archives
  • Interdisciplinary archives
  • Iubject-specific archives
  • Data journals

There is some evidence to indicate that you are better off by choosing a subject-specific, certified archive if this exists in your subject area.

Sending your data directly to another researcher or institution is of course also data sharing. Note that this kind of sharing is not considered as “open data”, even though the same rules on data safety and personal data apply.

Data repositories in Norway

A few data repositories are nationally available in Norway. Follow the links in the non-exhaustive list below for a description of some of the Norwegian repositories, their focus, services and security level.

UiT The Arctic University of Norway provides an open data repository, UiT Open Research Data. The service is geared towards employees and students of the university, but is available for any researcher through the Dataverse network. Since UiT Open Research data is an open repository, it is unsuitable for any personal or sensitive data. UiT Open Research Data complies with the DataCite schema, and stored data are therefore reproducible and transparent.

Uninett runs a data storage and high performance computing service called Sigma 2. Sigma2 services are available for Norwegian researchers and projects funded by the Norwegian government. The Sigma2-services include high-security data storage and a tool for data management plans. Uninett’s storage facility is called NorStore, and offers storage, sharing and management of active datasets. Note that storing data in NorStore is not a permanent solution, and it should not be used as a data repository. Uninett’s repository service is called Nird, which is also compliant with the DataCite schema. Nird will store your data for a period up to 10 years. Nird is free and you log in via FEIDE.

The Norwegian Centre for Research Data (NSD) is developing a framework for storing, searching and managing research data called NORDi. The current services include a tool for creating data management plans. NORDi is still in progress but is planned as a complete platform for finding, sharing and using research data. NORDi also focuses on training courses in subjects related to research data.

The Norwegian Ministry of Education and Research is launching a data repository interface (BIRD), through its service group Unit. Norwegian institutions can create repositories within this interface for their own needs. Currently, BIRD contains only one archive belonging to BI Norwegian Business School. BIRD is mainly intended as a data storage facility and should not be treated as a proper repository for sharing data openly. Sharing is possible through requests.


If you are interested in international multi-disciplinary data repositories, have a look in the section on finding research data below.

There are a few things to consider to make sure you choose the right archive. The Digital Curation Centre has a checklist and a more detailed guide for evaluating data repositories, aimed at research support staff in UK higher education institutions, but the information is general in character and will be useful for you as well.

  1. Is a reputable repository available?
  2. Will it take the data you want to deposit?
  3. Will it be safe in legal terms?
  4. Will the repository sustain the data value?
  5. Will it support analysis and track data usage?

These points involve long time preservation, possibilities for adding sufficient metadata and assigning a persistent identifier like a DOI. It is a a good idea to make sure that the archive you choose complies with the FAIR-guidelines on research data.

When not to share

Under some circumstances data cannot be shared:

  • Safety issues
  • Ethical/sensitivity/confidentiality issues, e.g. patient information
  • Commercial issues, e.g. in external collaboration with industry
  • Legal issues, e.g. the consent form does not allow sharing or ownership by a third party

These issues can sometimes be resolved if addressed at an early stage. You cannot share personal information about others, but if properly de-identified and with suitable consent forms, you may share a processed version of such data. Read more about how to treat personal data and data ethics.

Data transfer

Sending your data to another person, institution or project for further analysis or normal re-use is another kind of data sharing. When you send your data directly to an identifiable unit, you are responsible for making sure that the receiver of the data will treat them according to legislation and regulations. The best way of doing this is to sign a data processor agreement with the receiver of the data. Ask your institution’s data protection official if there is a standard template for a data processor agreement, or create one using the information on the Norwegian Centre for Research Data’s topic page on data treatment.

As the data owner, you decide what is proper handling of your data, and you are also responsible for obtaining any approval needed for processing personal data. However, any receiver of research data must make sure that all approvals are in place before they start processing the data.

The information below is important if you are transferring personal data. Read more on personal data on the Ethics-page.

Data transfer in Norway

If you transfer your data to another institution, you must make sure that the receiving institution has a proper information handling system. Note that the Personal Data Act also applies to data transfer, meaning that any data that needs approval for research or collection also needs approval for transfer. If the receiving organization is subject to current industry or conduct norms, or is a certified information handler, you could probably go ahead with the transfer. Try contacting the receiver’s data protection officer if you are unsure; the Norwegian Centre for Research Data (NSD) acts as data protection agency for many Norwegian research institutions.

Your institution is the data controller in this setting. It is your task alone to determine the purpose of any data transfer as well as the way the data are to be transferred.

The data processing agreement

First of all, the EU General Data Protection Regulation (GDPR), also in force in Norway, is quite strict and detailed. Writing a GDPR-compliant data processing agreement is probably the one area where you need of help from your local data protection officer. If you need to issue a data processing agreement to a third party, even within Norway, you can use the following template as guidance.

The template is a freely available example published by DLA Piper in the UK.

The Norwegian Data Protection Authority (Datatilsynet) has published a guide to the data processing agreement explaining what must be included in a general agreement if you should choose to write one on your own.

Note that the Personal Data Act was replaced by the EU GDPR in May 2018.

International data transfer

Some kinds of data handling are regarded as transferring or sending the data abroad. If you are part of a larger project where a collaborating university needs your data transferred for further processing, you are in fact transferring your data out of the country. The regulations in the Personal Data Act apply regardless of how long the data is stored abroad; international data transfer is regulated by the same requirements for approval as your own collection and handling of the same data. If your project needs approval before collecting the data, you also need approval if you need to send the data abroad.

  • Research data may be transferred to countries within the EU and EEA
  • Personal data may be transferred to countries approved by the EU as recipients of personal data
  • Personal data may be transferred to the USA when the recipient is certified according to the Privacy Shield Agreement
  • The regulations on processing in the Personal Data Act must be met:
    – there must be a legal basis for the transfer
    – the transfer must be in accordance with the purpose stated in the data handler agreement
  • Transferring research data abroad does not require a separate notification to the data protection official, but the transfer must be described in any future application concerning the main purpose of the data processing
  • A data processing agreement is needed if the data is transferred to a data processor. See more information on this in the box above.

Searching for research data

Searching research data may help you to get to know your field. Even though your field is not data-intensive, you may be surprised what you can find, as many types of research content may be considered to be research data.
If there are research data in your field, there are several ways you can benefit from a good data search strategy.

  • Existing data can be a part of the foundation your research will build upon
  • You can examine the need for new data
  • If you are planning to collect or generate data, you need to make sure at an early stage that the exact same data have not already been collected
  • You can discover whether there are norms and standards you should follow for structure, labelling and documentation

There are numerous data repositories, and they cannot all be listed here. However, there are sites where you can search data sets or repositories.

Data search engines

DataCite provides services to help the researchers locate, identify and cite research data.
At https://search.datacite.org/, you can search data sets that are assigned a DOI. Each data set has the citation shown in different reference styles and with export to BibTeX and RIS.

Bielefeld Academic Search Engine is a search engine especially for academic web resources. The BASE advanced search allows you to limit the search to data sets.

Data archives

Zenodo and Figshare are multidisciplinary data archives. Their user interface is similar to that of literature databases, and you can use common operators like AND and OR. Figshare also provides a guide on how to use it.

Data archive registry

At http://www.re3data.org/ you cannot search data sets directly, but you can search or browse data repositories by subject, content type or country. Note that not all the repositories are open access, but you can use the filter to find those that are. The repository details also state terms of use and contact information, so if you find a repository that is not open access, you may still get access upon request.

Here is a quick video showing how re3data works (youtube)

Sources for public, free data

The Norwegian Centre for Research Data (NSD) has available data sets for personal data, regional data, institutional data, and more.

The Norwegian Institute of Public Health provides data for research and analysis. Data on health records, health surveys and biobanks are available. You will also find information on how to access the data.

Statistics Norway (SSB) can supply you with research data at a personal level. The procedure for obtaining data is clearly outlined.

Citing research data

When using data collected or generated by others, you need to cite the data set, similar to citing all articles, books and other sources you use in a publication. This facilitates description and information retrieval, access and persistence, verification and reproducibility, and integration with other data (Altman & Crosas, 2013). Force 11 has developed a set of data citation principles on how and why data need to be cited properly (Martone, 2014). Many repositories have citation export or clear guidelines on how the data sets should be cited. If not, the citation should include

  • the author(s)
  • the year
  • the title of the dataset
  • the data repository or archive
  • the version
  • the persistent identifier (or url)

Most data repositories will explain which elements should be part of a proper reference to their data, and there is no need to create an exhaustive and hard-to-read reference if you have a persistent identifier leading the reader to all the relevant metadata information.

Example of how to cite a data set:

In the text:

The hypothesis is supported by observations of the Atlantic puffin (Barret, 2016).

In the reference list:

Barrett, R. T. (2016). Atlantic puffin Fratercula arctica field data, Hornoya [Data set]. UiT Open Research Data Dataverse, V2. https://doi.org/10.18710/4LABGF.

References

Altman, M, & Crosas, M. (2013). The evolution of data citation: from principles to implementation. IASSIST Quarterly, 37. Retrieved from: http://www.iassistdata.org/sites/default/files/iqvol371_4_altman.pdf

Foster.(2014). Open Data Definition. Foster’s Open Data Taxonomy. Retrieved from: https://www.fosteropenscience.eu/taxonomy/term/110

Martone, M. (2014). Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. San Diego CA: FORCE11. https://doi.org/10.25490/a97f-egyk.

Ministry of Education and Research, Norway: National Strategy of access to and sharing of research data. Retrieved from https://www.regjeringen.no/en/dokumenter/national-strategy-on-access-to-and-sharing-of-research-data/id2582412/sec1

The Research Council of Norway. (2017). Open access to research data : Policy for the Research Council of Norway. Retrieved from https://www.nfr.no/PolicyOpenDataWEBrev2017.pdf

Piwowar, H. A., & Vision, T. J. (2013). Data reuse and the open data citation advantage. PeerJ 1:e175, https://doi.org/10.7717/peerj.175.

RECODE (2013). Policy recommendations for open access to research data in europe – Stakeholder values and ecosystems. Retrieved from http://recodeproject.eu/wp-content/uploads/2013/10/RECODE_D1-Stakeholder-values-and-ecosystems_Sept2013.pdf

A s k -u s