Home Uncategorized Global Research Data Infrastructures: The GRDI2020 Vision

Global Research Data Infrastructures: The GRDI2020 Vision




  1. The New Science Paradigm
    Some areas of science are currently facing from a hundred – to a thousand-fold increase in volumes of data
    compared to the volumes generated only a decade ago. This data is coming from satellites, telescopes,
    high-throughput instruments, sensor networks, accelerators, supercomputers, simulations, and so on [1]. The
    availability and use of huge datasets presents both new opportunities and at the same time new challenges
    for scientific research.
    Often referred to as a data deluge massive datasets are revolutionizing the way research is carried out and
    resulting in the emergence of a new fourth paradigm of science based on data-intensive computing
    [2]. New data-dominated science will lead to a new data-centric way of conceptualizing, organizing and
    carrying out research activities which could lead to a rethinking of new approaches to solve problems that
    were previously considered extremely hard or, in some cases, even impossible to solve and also lead to
    serendipitous discoveries.
    The new availability of huge amounts of data, along with advanced tools of exploratory data analysis, data
    mining/machine learning and data visualization, offers a whole new way of understanding the world. One
    view put forward is that in the new data-rich environment correlation supersedes causation, and science can
    advance even without coherent models, unified theories, or really any mechanistic explanation at all [3].
    In order to be able to exploit these huge volumes of data, new techniques and technologies are needed. A
    new type of e-infrastructure, the Research Data Infrastructure, must be developed for harnessing the
    accumulating data and knowledge produced by the communities of research, optimizing the data movement
    across scientific disciplines, enabling large increases in multi- and inter- disciplinary science while reducing
    duplication of effort and resources, and integrating research data with published literature.
    To make this happen several breakthroughs must be achieved in the fields of research data modelling, management and tools.
    Global Research Data Infrastructures: The GRDI2020 Vision
  2. Research Data Infrastructures
    Research Data Infrastructures can be defined as managed networked environments for digital
    research data consisting of services and tools that support: (i) the whole research cycle, (ii) the movement of scientific data across scientific disciplines, (iii) the creation of open linked data spaces by
    connecting data sets from diverse disciplines, (iv) the management of scientific workflows, (v) the interoperation between scientific data and literature, and (vi) an Integrated Science Policy Framework.
    Research data infrastructures are not systems in the traditional sense of the term; they are networks
    that enable locally controlled and maintained digital data and library systems to interoperate more
    or less seamlessly. Genuine research data infrastructures should be ubiquitous, reliable, and widely
    shared resources operating on national and transnational scales.
    A research data infrastructure should include organizational practices, technical infrastructure and social forms that collectively provide for the smooth operation of collaborative scientific
    work across multiple geographic locations. All three should be objects of design and engineering; a
    data infrastructure will fail if any one of these three elements is ignored [4].
    Another school of thought considers an (data) infrastructure as a fundamentally relational concept.
    It becomes an infrastructure in relation to organized (research) practices [5]. The relational property
    of an (data) infrastructure talks about that which is between – between communities and data/publications collections mediated by services and tools. According to this school of thought the exact
    sense of the term (data) infrastructure and its “betweenness” are both theoretical and empirical
    In Star and Ruhleder’s “Steps toward an ecology of infrastructure: Design and access for large information spaces” [6] an (data) infrastructure emerges with the following dimensions:
  • Embeddedness: the infrastructure is “sunk” into, inside of, other structures, social arrangements and technologies
  • Transparency: the infrastructure is transparent to use, in the sense that it does not have to be
    reinvented each time or assembled for each task, but invisibly supports those tasks.
  • Reach of scope: the infrastructure reaches beyond a single event or one-site practice.
  • Learned as part of membership: The taken-for-grantedness of artefacts and organizational
    arrangements is a sine qua non of membership in a community of practice. Strangers and outsiders encounter the infrastructure as a target object to be learned about. New participants acquire a
    naturalized familiarity with its objects as they become members.
    Global Research Data Infrastructures: The GRDI2020 Vision
  • Links with conventions of practice: the infrastructure both shapes and is shaped by the conventions
    of a community of practice.
  • Embodiment of standards: Modified by scope and often by conflicting conventions, the infrastructure
    takes on transparency by plugging into other infrastructures and tools in a standardized fashion.
  • Builds on an installed base: the infrastructure does not grow de novo; it wrestles with the “inertia of
    the installed base” and inherits strengths and limitations from that base.
  • Becomes visible upon breakdown: The normally invisible quality of the working infrastructure becomes visible when breaks occur: the server is down, the bridge washes out, there is a power blackout.
    Even when there are back-up mechanisms or procedures, their existence further highlights the now-visible
    Research data infrastructures should be science- and engineering-driven and when coupled with high
    performance computational systems increase the overall capacity and scope of scientific research.
    Optimization for specific applications may be necessary to support the entire research cycle but work in this
    area is mature in many problem domains.
    Science is a global undertaking and research data are both national and global assets. There is a need for
    a seamless infrastructure to facilitate collaborative arrangements necessary for the intellectual and practical
    challenges the world faces.
    Therefore, there is a need for global research data infrastructures to be able to interconnect the components of a distributed worldwide science ecosystem by overcoming language, policy, methodology, and social barriers. Advances in technology should
    enable the development of global research data infrastructures
    that reduce geographic, temporal, social, and national barriers
    in order to discover, access, and use data.
    Their ultimate goal should be to enable
    researchers to make the best use of the
    world’s growing wealth of data.
    The next generation of global research
    data infrastructures is facing two main
  • To effectively and efficiently support
    data-intensive science
  • To effectively and efficiently support multidisciplinary/interdisciplinary science
    Global Research Data Infrastructures: The GRDI2020 Vision
    Data-Intensive Science
    By data-intensive science we mean any scientific research activity whose progress is heavily dependent
    on careful thought about how to use data. Such research activities are characterized by:
  • increasing volumes and sources of data,
  • complexity of data and data queries,
  • complexity of data processing,
  • high dynamicity of data,
  • high demand for data,
  • complexity of the interaction between researchers and data, and
  • importance of data for a large range of end-user tasks.
    Fundamentally, data-intensive disciplines face two major challenges [7]:
  • Managing and processing exponentially growing data volumes, often arriving in time-sensitive streams
    from arrays of sensors and instruments, or as outputs from simulations; and
  • Significantly reducing data analysis cycles so that researchers can make timely decisions.
    Multidisciplinary – Interdisciplinary Science
    By multidisciplinary approach to a research problem we mean an approach that draws appropriately
    from multiple disciplines in order to redefine the problem outside of normal boundaries and reach solutions
    based on a new understanding of complex situations.
    There are several barriers to the multidisciplinary approach of a behavioural and technological nature.
    Among the major technological barriers we identify those that must be overcome when moving data, information, and knowledge between disciplines. There is the risk of interpreting representations in different
    ways caused by the loss of the interpretative context. This can lead to a phenomenon called “ontological
    drift” – the intended meaning becomes distorted as the information object moves across semantic boundaries
    (semantic distortion) [8].
    A relatively similar concept is the interdisciplinary approach to a research problem. It involves the connection and integration of expertise belonging to different disciplines for the purpose of solving a common
    research problem.
    Again, the barriers faced by an interdisciplinary approach are of two types: behavioural and technological.
    Amongst the major technological barriers we identify the need for integrating data, information, and knowledge created by different disciplines. In fact, one of the major barriers to be overcome concerns the integration of activities that are taking place on different ontological foundations.
    Global Research Data Infrastructures: The GRDI2020 Vision
    The requirements described above, imposed by data-intensive multidisciplinary-interdisciplinary science are
    the motivations behind building the theoretical foundations of the next generation data infrastructures. To
    make this happen a considerable number of difficult data, application, system, organizational, and policy
    challenges must be successfully tackled.
    The breakthrough technologies needed to address many of the critical problems in data-intensive multidisciplinary-interdisciplinary computing will come from collaborative efforts involving many domain application
    disciplines as well as computer science, engineering and mathematics.


Please enter your comment!
Please enter your name here

Neueste Beiträge

5 Dinge, auf die man bei einem professionellen Übersetzungsdienst achten sollte

Übersetzungsdienste werden schnell zu einer wichtigen Anforderung moderner Unternehmen. Benötigt Ihr Unternehmen qualifizierte Sprach- und Übersetzungsdienste? Es ist wichtig, dass Sie verstehen,...

Unlocking the full value of Scientific Data

The report “Riding the wave – How Europe can gain from the rising tide of scientific data” provides the “vision 2030” for...

Co-ordination Workshop on Open Access to Scientific Information

About the Workshop The preparation for the next European Framework Programme advances very quickly and the European Commission had...

“Global Research Data Infrastructures: The Big data Challenges”

OVERVIEWThe GRDI2020 Project organised the Workshop entitled "Global Research Data Infrastructures: The Big data Challenges" on 18-19 October 2011 in Brussels. The...

Global Research Data Infrastructures: The GRDI2020 Vision

The New Science ParadigmSome areas of science are currently facing from a hundred – to a thousand-fold increase in volumes of...