Make the costs of data infrastructure transparent
From GRDI2020
This is a GRDI recommendation; return to Main Page with all the challenges or to recommendations
Context and challenges
Without an accurate picture of the cost structures of the data infrastructures, it is hard to plan for the future support activities. The transparency is also crucial for estimating incremental costs (e.g. need for completely new power delivery structure when moving from hundreds of terabytes to several petabytes).
Another challenge of the data infrastructure costs is uncovering costs that are currently hidden. Examples of these (typically) hidden costs are:
- The socioeconomic cost of Ph.D. student's graduation being delayed due to maintenance responsibilities of commodity services
- Cost of lost data
- Opportunity cost of researchers doing "data management on laptops"
By making costs transparent to users, users will use data infrastructure more consciously.
Recommendation
Make the costs vs. risks of data infrastructure transparent. This includes both, (1) comparing costs and risks of "conventional" data management (e.g. on private laptops or on floppy disks in the lab cupboard) with professional data infrastructure, as well as (2) making users aware of the operational costs of data infrastructure.
Ad 1): The direct and indirect costs of the current and planned data infrastructures should be modelled carefully to understand where and how GRDI can provide most economical benefits. At the same time the policies based on the analysis of the cost structures should ensure that there remains enough space for experimentation that can lead to new innovations: the sustained successes based on 3M's 15% rule (and its emulation by Google with their "20% time") should be kept in mind as reminders that quest for short-term efficiency should not be the overriding concern.
Ad 2): Operational costs should be modelled and calculated to be presented to the user - only where costs are transparent, users will start to select value and use data infrastructure conscious of the costs involved, which is so important for ensuring the long-term sustainability of a service. Storage is not unlimited, nor is the time of data centre staff in managing exabytes of data.
Stakeholders and Impact
The transparency of the cost structures should also help specialisation process of Libraries and data centers and help the entities to focus on mix of activities.