Data Security

From GRDI2020

Jump to: navigation, search

This is a GRDI challenge; return to Main Page with all the challenges and recommendations


Contents

Introduction

There is a range of reasons for securing data and systems. Data in a specific community context and application scenario may be subject to Intellectual Property Rights (IPR) or license issues, confidentiality and privacy concerns, or other types of restrictions. The systems need to manage the data according to these restrictions while maintaining other requirements for data infrastructures (e.g. scalability, interoperability, provenance), and also the systems themselves need to be secured against potential external attacks.

Obviously, "data security" covers a lot of different aspects of data infrastructures that may be both,technical and organisational (procedures, policies, physical access, etc.). Indeed, data security must be implemented into all components of a data infrastructure, since a single loose link would potentially break the secure chain. Moreover, even though the definition what data security implies may vary between the requirements of a specific community, a generic data infrastructure needs to account for all security aspects from the communities combined.

For the purpose of this report, we divide security issues into the following three groups:

  1. Proactive security: Measures taken in order to guarantee the security of the system, such as protection against loss of data, standard operating procedures of data centres including physical access control, policies governing access etc.
  2. Retroactive security: Measures taken to remedy a security loss, such as incident handling, disaster recovery, etc.
  3. Access control: Measures taken in order to provide access that users and administrators need

In this section we focus on the issue of access control.

10-year vision

10 years from now, there will be coordinated authentication, authorisation and billing despite decentralised user registries, rights management and accounting mechanisms across national boundaries in Europe.

  • (from a user perspective) - inspection for security and data protection is in place (e.g. independent, cross-service audit and/or user-driven inspection)
  • (from an infrastructure perspective) - available frameworks for distributed single-sign on, rights management and billing are scalable and robust
  • (from a policy and funder perspective) -
    • a response team (on ministerial level) is inaugurated to support audit and interventions against security threats, with defined authority (aka. "limited") as part of the European legal framework
    • policies are in place that allow the interoperation between trust domains

State of the art

Today, many research data providers offer somewhat simple access policies, mostly focusing on giving write access to the data provider itself and offering only read access to (often unauthenticated) users. Where more complex access solutions were needed, individual, often community specific solutions tended to be implemented. Today, there is large variety of security solutions between different research data providers in place. The reasons for this outcome are:

  1. The requirements differ significantly between different communities.
  2. No common security framework for data access has or is available today
  3. Many research data providers have so far been established on national levels with pan-European institutions only recently becoming a reality (Institutions such as CERN and EMBL have been rather the exception than the rule in the past. However, with the ESFRI projects this situation is set to change in the mid-term future)

The use-cases of high-energy physics and EBI may exemplify this situation [use-case HEP, EBI].

Challenges and Recommendations

Data security is operating in a legal and policy framework. In other words, technical steps can only be effective with the appropriate legal and policy context in place on a European level. The following recommendations hence expect both to go hand in hand:

Use cases

Use-case HEP:

In the context of the analysis of the data produced by the Large Hadron Collider (LHC) at CERN, the high-energy physics community has successfully established a large grid infrastructure (WLCG/EGEE), which is currently being transformed into the European Grid Infrastructure (EGI). The security mechanisms for storing and accessing these vast amounts of data are based on a non-standard extension of X.509 certificates (so-called attribute certificates). The X.509 certificates are issued by national grid certificate authorities that are coordinated by the international grid trust federation (IGTF). In this framework all users are authenticated through their certificate and no privacy or anonymity is supported.

Conclusion: This community has successfully established a working security framework on a technical as well as organizational level. However, its target audience is rather small, the overall maintenance effort is pretty high and it cannot easily be extended to a much larger user community.


Use-case EBI:

The European Bioinformatics Institute (EBI) serves a large user community that access data collections through the web. The user base is very diverse and the vast majority of the users are not authenticated. Many users deem authentication unacceptable. On the other hand, there are a small but growing number of personal data sets, which have to be protected. A custodian, who decides on the access mechanisms, manages the access of each such data set.

Conclusion: The EBI serves a large user community through unauthenticated access methods. However, the management of personal data becomes more important, its usage is increasing and the current handling of this kind of data is not scalable.

Personal tools