Talk Abstracts - RSE2018

Select the talk title to display its abstract.

Talks

Invited keynote: Nammu and Oracc: digital humanities software in the sustainable development of Iraqi history and heritage
Eleanor Robson
(University College London)

The past few years of conflict in Syria and Iraq have drawn worldwide attention to the terrorist destruction of archaeological sites and historic buildings in the region. Large international projects are now working to protect, conserve and document these places that are so fundamental to understanding Middle Eastern and global history. But that is a solution to only one part of the problem, however important it is in itself. The GCRF-funded Nahrein Network tackles the human side of this tragedy, helping local heritage experts, academic historians and history-lovers contribute meaningfully to the long-term social and economic development of post-conflict Iraq and its neighbours. Open-access, reusable online resources are a natural solution to increasing the accessibility of academic research, teaching materials, and reliable information for interested publics, in a region where everyone has a phone but libraries are scarce and massively under-resourced. As part of this effort UCL’s Research Software Development Group is adapting open-access tools for Arabophone ancient historians to work with. Nammu is an easy-to-use text-editor for cuneiform inscriptions, the millions of witnesses to the first three millennia of Middle Eastern and world history. Oracc is a collaborative online publishing platform for these editions, and for educational websites about ancient Middle Eastern history. In this talk I will explain how Nammu and Oracc are key components of a programme designed to reduce Middle Eastern academics’ international isolation, train the next generation of experts in ancient Middle Eastern history, cultural heritage, and related fields, and to help local audiences make meaning of local heritage, in local languages and local contexts. I’ll also talk through the how the project is working in practice, from the involvement of the RSEs in London in the initial research design through to the implementation with researchers, teachers and students in Baghdad and beyond.

Eleanor Robson is Professor of Ancient Middle Eastern History and Head of the History Department at University College London. She has three main research interests: the social and political contexts of knowledge production in the cuneiform culture of ancient Iraq, five to two thousand years ago; the construction of knowledge about ancient Iraq in Europe, the Americas, and the Middle East over the past two centuries; and use of open, standards-based online resources for democratising access to knowledge about the ancient Middle East. These three fields come together in the Nahrein Network, funded by the Uk Arts and Humanities Research Council and the Global Challenges Research fund, 2017-20. Working with multiple academic and non-academic partners, it aims to significantly develop the capacity of Middle Eastern universities, museums, archives and cultural heritage sites to foster cultural and economic growth in the region.

Challenges and pathways to sustainability in scientific software ecosystems
James Howison
(University of Texas at Austin)

A key challenge of science policy is to achieve sustained benefit from scientific grant making. As software has become more important to the practice of science, policy makers and scientists have become more concerned about a perceived lack of sustained benefit from software produced by grant funded projects. In this presentation I will provide a framework for thinking about sustainability in scientific software projects, based on empirical studies of development and use of software in science. The framework starts by asking: what is it that causes sustainability problems anyway? Over time, software declines in scientific usefulness, driven by five factors: a moving scientific frontier, technological change, friction in building software, friction in using software and, least appreciated, change in the software ecosystem sounding a component. These factors drive a need for work; in response we can try to suppress the drivers, try to reduce the amount of work needed, or attract sufficient resources able to undertake the work needed to sustain scientific usefulness. I will analyze three systems by which projects obtain resources: commercial markets, grant-making, and community based peer-production. I will conclude with recent results from a study into pathways to sustainable peer production.

James Howison is an Associate Professor in the Information School of the University of Texas at Austin. His research examines technology and human collaboration, especially the organization of work. He has studied the organization of open source software development and is currently studying the organization of scientific software development. His work has been funded by the National Science Foundation, including an NSF CAREER award to study transitions from grant funding, and the Sloan Foundation to improve the citation of scientific software. Originally from Australia, he was a post-doc at Carnegie Mellon Computer Science and received his Ph.D. in 2009 at the School of Information studies at Syracuse University.

Building computer vision systems that really work
Andrew Fitzgibbon

Andrew Fitzgibbon has been shipping advanced computer vision systems for twenty years. In 1999, prize-winning research from Oxford University was spun out to become the Emmy-award-winning camera tracker “boujou”, which has been used to insert computer graphics into live-action footage in pretty much every movie made since its release, from the “Harry Potter” series to “Bridget Jones’s Diary”. In 2007, he was part of the team that delivered human body tracking in Kinect for Xbox 360, and in 2015 he moved from Microsoft Research to the Windows division to work on Microsoft’s HoloLens, an AR headset brimming with cutting-edge computer vision technology. In all of these projects, the academic state of the art has had to be leapfrogged in accuracy and efficiency, sometimes by several orders of magnitude. Sometimes that’s just raw engineering, sometimes it means completely new ways of looking at the research. If he had to nominate one key to success, it’s fanatical test-driven development (TDD), but what really gets it done is a focus on, well, everything: from cache misses to end-to-end experience, and on always being willing to change one’s mind.

Biography: Fitzgibbon is a partner scientist at Microsoft in Cambridge, UK. He has published numerous highly-cited papers, and received many awards for his work, including ten “best paper” prizes at various venues, the Silver medal of the Royal Academy of Engineering, and the BCS Roger Needham award. He is a fellow of the Royal Academy of Engineering, the British Computer Society, and the International Association for Pattern Recognition. He studied at University College, Cork, and then did a Masters at Heriot-Watt University, before taking up an RSE job at the University of Edinburgh, which eventually morphed into a PhD. He moved to Oxford in 1996 and drove large software projects such as the VXL project, and then spent several years as a Royal Society University Research Fellow before joining Microsoft in 2005. He loves programming, particularly in C++, and his recent work has included new numerical algorithms for Eigen, and compilation of F# to a non-garbage-collected runtime.

http://aka.ms/awf
https://github.com/awf

How Machine Learning and AI are Transforming Research
Andreas Fidjeland

How can machine learning (ML) and AI help reveal previously untapped insights to help you accelerate your research? At Google DeepMind, we’re on a scientific mission to push the boundaries of AI, developing programs that can learn to solve any complex problem without needing to be taught how.

From climate change to the need for radically improved healthcare, too many problems suffer from painfully slow progress, their complexity overwhelming our ability to find solutions. By leveraging ML and AI, those solutions will come into reach.

In this effort to solve truly global and dynamic problems, Research Software Engineers are incredibly important to help reach scientific discoveries. Whether you’re new to ML or already an expert, join this session to learn how ML and AI can open a range of new research possibilities. We’ll also highlight some of the groundbreaking work within Google DeepMind and our vision for how AI will transform future scientific breakthroughs.

Andreas Fidjeland is Head of Research Engineering at Google DeepMind

Maximising the value of research software – a research funder perspective
David Carr

As a global research foundation dedicated to improving health for all, Wellcome is passionately committed to ensuring that the outputs of the research we support – including publications, datasets, software and materials – can be accessed and used in ways that will ensure the greatest benefit to health and society. Over the last two decades, we have been a leading champion and advocate for open access and research data sharing.

In July 2017, we extended our long-standing policy on research data management and sharing to explicitly include research software and materials . Rather than ask researchers for a data management plan, we now ask them to provide a broader outputs management plan which describes how they will manage and share all significant outputs (data, software or materials) resulting from their research.

In this talk, I will discuss the activities Wellcome is developing to support the implementation of this policy and promote open research more broadly – with a particular focus on research software. I will describe where we are focusing as a funder and some of the remaining opportunities and challenges we are seeking to address, and then invite discussion and feedback on how research funders can best support the RSE community in maximising the value of research software.

David Carr is part of the Open Research team at Wellcome, which is progressing work to maximise the availability and use of research outputs (including data, publications, code and materials), in ways that will enhance the research enterprise and accelerate health benefits. Prior to this, David worked as a Policy Adviser where he led on work to develop and communicate Wellcome’s policy in several areas, including data sharing, open access publishing, biosecurity and genetics. In 2001, he worked on secondment at the World Health Organisation in Geneva, where he assisted in the preparation of the Advisory Committee on Health Research (ACHR) report on Genomics and World Health. Prior to joining Wellcome in 1999, David worked as a project researcher at a scientific consultancy firm in Cambridge. He has undergraduate and masters degrees in genetics from the University of Cambridge.

Software in the brave new world of UKRI
Susan Morrell, Eddie Clarke

EPSRC has refreshed its software infrastructure strategy. But how does this relate to what’s going on at the UKRI level, and in particular, the UKRI Research and Innovation Infrastructure Roadmap? The talk will give some perspectives on the strategic environment we are now in.

SAL: Modernising data access for the JET tokamak
Alex Meakins, Alys Brett
(UK Atomic Energy Authority)

The world’s largest fusion experiment, the Joint European Torus (JET), has used data systems to store and expose its data to researchers continuously for over 30 years.

There are particular software challenges in providing data access for such a long-running international experiment with many competing goals in play: accommodating evolving data production, consistency for software used in experimental operations, ease of use for a transient population of visiting researchers, robust remote access and tackling the increase in complexity and maintenance load over time. We must also plan ahead to ensure JET data remains useful for future generations.

The UKAEA Software Engineering Group has been working to modernise access to JET data and enable use of object storage solutions. The new access system, Simple Access Layer (SAL), was launched recently and will gradually pull all JET data into a single tree structure with a simple access API. It uses modern Python and web technologies and could be adapted for structured data in other domains.

In this talk the architect of SAL will present the design decisions and trade-offs involved in engineering a system that is simple to use, cheap to maintain and agnostic to storage solution.

The first year in the life of a Research Software Group
Andrew Edmondson
(Advanced Research Computing, University of Birmingham)

By the time of RSE18 our Research Software Group will be close to its first birthday. First we defined our goals: to enable our research community to get the best from their research software; to provide specialist advice and support to researchers and RSEs; to enhance the University’s reputation for high quality research; and to help researchers get the most from our Advanced Research Computing systems and services. Then we set about trying to achieve these goals. In this talk I will briefly illustrate several of the key lessons we’ve learned, for example the importance of version control and convincing people to use it, or how we are trying to support RSEs across the University. I’ll explain why we try not to do coding for people (isn’t that what an RSE is supposed to do?) and some of the most frequent advice people ask from us. I might even tell you why optimising your code may be the wrong thing to do.

The Research Software Engineer Culture Shock
Andrew Williams
(University of Bristol)

One of the roles of the Research Software Engineer is to introduce good software development practices to teams who may not have any formal training in the subject, and whose programming experience may be limited to attempting to understand code written by colleagues.
From the point of view of established research staff, why do they need to change practices that have hitherto served them well when new techniques may be perceived to be an additional and unnecessary burden?
I will be talking about my experience as a new RSE within an environment without a collaborative development culture, and my attempt to balance the sometimes-conflicting priorities of research and software engineering. I’ll discuss the problems that I encountered and how I have attempted to change the attitude towards research software.

The DiRAC distributed RSE group: software from the smallest to the largest scales
Andy Turner, Peter Boyle, Guido Cossu, Mark Filipiak, Alexei Borissov, Sam Cox, Jeffrey Salmond
(University of Edinburgh, University of Durham, University of Leicester, University of Cambridge)

In late 2016 the DiRAC national HPC facility set up a team of RSE’s distributed across the UK working to improve software used on DiRAC in the fields of astrophysics, cosmology and particle physics. In this talk we present highlights of the work of the RSE team to help researchers studying the smallest and largest entities in our universe and our experiences of working as an RSE team distributed across different institutions.

Highlights will include: getting best performance while maintaining portability of QCD applications across different HPC architectures; improvements to the cosmological application, SWIFT, to implement new physics; porting an AI reduction algorithm to HPC; and moving applications from shared memory systems to new DiRAC 2.5 services. Although these projects cover a variety of different programming models and programming languages, all benefitted from the cross-institutional sharing of expertise within the DiRAC RSE team. This RSE work also involved making the applications involved flexible and able to exploit the performance offered by novel and future computer architectures; thus allowing researchers to use the best hardware for their research.

Adventuring in the cloud with materials chemistry simulations
Ardita Shkurti~, David Bray~, Richard Anderson^
(~STFC SCD, ^STFC Hartree Centre)

In the last decade, the cloud has emerged as an excellent alternative to in-house High Performance Computing (HPC) architectures. In fact, there are several challenges related to hardware and software maintenance of these in-house architectures highly impacting on time-to-solution and feasibility of large-scale jobs.
And, the introduction of containers has made the transition to the cloud easier than ever.
At the STFC Scientific Computing Department and the Hartree Centre we are developing novel full-stack solutions related to materials chemistry simulation workflows to be executed on-the-cloud.
As part of these solutions, we incorporate within container images (i) a particular operating system of our choice (ii) our simulation engine source code; (iii) its software dependencies; (iv) instructions on its compilation; (v) instructions on its execution and (vi) postprocessing workflows. Then, we ship and launch our container images on-the-cloud and use the relevant availble cloud storage solutions to back-up the outputs of our workflows. We will discuss our efforts on the development of such solutions as part of a proof-of-concept project for a spin-off company.

SIP: Prototyping the Science Data processing for the worlds largest radio telescope.
Benjamin Mort
(The University of Oxford’s e-Research Centre)

The Square Kilomotre Array (SKA) project is an international effort to build
the world’s largest radio telescope. The scale of the SKA represents a
huge leap forward in both engineering and research and development, and the
detailed design and preparation is now well underway.

The SKA will be a software telescope, and the Science Data Processor (SDP),
is a critical component, responsible for processing the deluge of data,
streaming out of the hundreds of thousands of antennas and central signal
processor into science data products.

In this talk I will describe the SDP Integration prototype, a project to
build a lightweight prototype of the major components of the SDP system
being developed on a bare-metal OpenStack system. This project aims to reflect,
inform and verify the SDP architecture, which in many ways is closer to
that of Google or Netflix than a traditional HPC system.

RSE Trainings at DLR 2.0 – What we’ve learned from 1.0
Carina Haupt, Michael Meinel
(German Aerospace Center (DLR))

Starting 2009, we give trainings for research scientists on how to use the software engineering tools Subversion and Mantis. In this talk we present the
improvements which we achieved in the past and how they contributed our new GitLab based concept for our new workshop series.

Over the years, our training has become a well-received two-day workshop. The
main goal of the training is to enable the participants to get started with the DLR software engineering tools in context of structured software development. Particularily, the combination of profound theory with hands-on exercises made it a valuable experience for the participants. Participants thereby are scientists from a wide variety of domains. In addition, a series of spin-off trainings was also established covering related topics like open source licenses or agile methodologies.
However, our old tools are on the verge of being retried and a new system – GitLab – will take over. This gives us the possibility to restructure our training using our experiences, collected feedback, to create an updated version of our training. We will present this new concept, as well as why it became what it is now.

be prepared to be unpopular: tips on shutting services down
Catherine Jones
(Science and technology Facilities Council)

Shutting software based services down is unglamorous and less exciting than starting new services and there is less advice available. However it is important for reputations and collaborations going forward that it is planned and managed well. Thinking about how to shut services down can also help to inform how services are set up to avoid potential pitfalls in the future.
I will share my experience of transferring and closing multiple services both big and small. The talk will cover the importance of the following key topics.
– Being clear about the reasons for the change and what impact that may have.
– Management support
– Timelines
– Knowing your service: who uses it and why and rights and responsibilities for the content.
– Who do you need to communicate with and what will the message be.
– Posterity and final statistics
– Decommissioning and finally closing out

Everyone will have experienced service shutdown from a user perspective, this talk will cover what it is like from a service provider’s perspective.

Easy, fast, and robust data analysis with modern C++
Corentin Schreiber
(University of Oxford )

Interactive languages such as Python are very popular choices when it comes to data processing. The fact that these languages are interactive, simple, but feature-rich makes them ideal tools for quick data visualization and analysis. In contrast, compiled languages such as C++ tend to be avoided because they offer more limited interactivity and are more complex to master, which reduces productivity. Yet they deliver the best possible performances, their stricter rules makes them more maintainable in the long run, and being non-interactive they also maximize reproducibility. In this talk I will show how, without compromising on performance and correctness, modern C++ and libraries enable a productivity and simplicity approaching that of interactive languages. The latter can then be used exclusively for what they do best: data visualization. I propose that adopting such framework for daily coding in research can improve robustness, productivity, and replicability, and should be encouraged in teaching.

A (Long) Tale of Academic Software Development
Dominik Jochym
(STFC)

CASTEP is an academic-led materials modelling code with core support from CoSeC staff at STFC. The project spans four decades, so we ask the question “How has an academic Fortran code survived and thrived so long?” In short, one could argue that it has not! The code has been extensively rewritten, refactored and refactored again such that the original CASTEP code is not in use today. We present an ongoing tale of risk, change, success and the role of RSEs, who may have been around a bit longer than you might, at first, think.

Quantitative Analysis of the quality of Python and R research software on Github
Dominique Hansen
( Berlin School of Library and Information Science/Humboldt University of Berlin (Student))

Writing code is now an integral part of many research areas but a lot of researchers lack the formal training to produce quality code, that enables reproduction or reuse. This issue has been looked at from the qualitative site, illuminating the experience of researchers writing code and their skill level. But not much has been done to quantitatively measure aspects of the actual software. This poster/talk will illustrate the results of a masters thesis measuring aspects of the internal quality of research software projects and comparing the measurements to conventional open source software. The analyzed software projects where identified as research software through a harvest of zenodo.org and the focus was put onto projects dominated by python and R code, that are available through GitHub.

Lessons learnt from building the RSE community in Cardiff University
Ian Harvey, Unai Lopez-Novoa
(Cardiff University)

In recent years Cardiff University has been hiring many RSEs across its different schools. It has also been recognised that there are numerous researchers across the university that are RSEs in all but name. At the Data Innovation Research Institute (DIRI) we set ourselves the challenge of discovering and bringing them together as the Cardiff RSE community.

As a first step towards this goal, DIRI arranged the first Cardiff University RSE meetup in April 2018, which attracted ~40 attendees, many of them RSEs and RAs. A handful of them gave presentations about their jobs, careers and concerns, and there were also several round tables to discuss these and other issues.

In addition, DIRI conducts a yearly seedcorn funding call, which has proven to be an excellent way to meet researchers from a wide variety of disciplines, and to increase awareness of the role and value of RSEs among them.

In this talk, we will describe the lessons learnt and outcomes from these and other activities towards building the Cardiff RSE community.

Robot Routing with a Quantum Processor
James Clark
(STFC Hartree Centre)

Quantum computing is the next step in computing and research, however universal quantum computers are still a number of years away from practical use. A quantum annealer is a type of quantum computer that performs a metaheuristic called Quantum Annealing (QA). QA can be used to find the the global minimum of an objective function, with applications in (for example) traffic routing, fault analysis and machine learning. In this talk, I will go over the basics of quantum annealing, how to program a quantum annealer and the results of the work that the Hartree Centre and Ocado have done on routing robots with a D-Wave 2000Q quantum processor. Knowledge of quantum mechanics is NOT required!

From Lab to University: Towards an Institutional RSE Career Pathway
James Smithies
(King’s College London)

This paper describes our efforts to create an RSE career pathway by scaling a model implemented in a digital humanities and social science lab (established in late 2015) towards the wider university. By offering permanent contracts and focusing on diversity, the team has grown from 6 men and 1 woman to 8 men and 5 women. A flat HR structure supports a Software Development Lifecycle (SDLC) based on an Agile methodology, routing work through a project manager, analysts, UI/UX designers, software engineers, and a systems manager. Entry-level, Senior, and Principal roles are defined in a lab career development document, aligned to Agile DSDM® and Skills Framework for the Information Age (SFIA). The lab HR policy encourages staff to produce career portfolios, including a wide variety of outputs from code to design artefacts, for assessment by a promotions panel. Conversations are under way with IT and HR to scale the lab approach to RSE careers towards the wider university, grafting it onto the emerging IT People Strategy as a defined career path that can support recruitment and retention. The model is not without issues, but provides a useful reference point for the wider community.

Using Apache Kafka to build a Modular Text Extraction Platform
Julia Damerow
(Arizona State University)

Apache Kafka is a distributed streaming platform that can be used like a traditional publish/subscribe messaging system as communication backend for distributed systems. However, its architecture lends itself nicely for building highly scalable systems and provides features traditional messaging systems lack. In this talk, I will describe Apache Kafka and how we use it in the Giles Ecosystem, a distributed, modular text extraction system. The Giles Ecosystem (or “Giles”) has been built for digital humanities scholars that require their PDFs to be transformed into plain text to apply computational methods such as topic modeling or named entity recognition. Often humanities scholars lack the necessary skill set to employ command line tools or to write scripts to aide in the extraction process. Giles provides them with an easy to use tool to extract embedded text from PDFs and run OCR routines on images.

Dealing with research software: recommendations for best practices
Kaja Scheliga
(Helmholtz Association)

The increasing digitisation of education and research leads to a raising number of software-based solutions used or developed in research institutions. In many research areas, source code is at the core of the research process and software-based solutions are indispensable tools in knowledge creation.
Even though research software increasingly gains importance as well as attention, there is still a lack of standards and guidelines, and also a lack of best practices for dealing with research software. Moreover, support mechanisms concerning the development, publication and maintenance of research software are needed.
In this talk / poster we present general recommendations for dealing with research software. We discuss incentives and metrics, software development and documentation, accessibility, publication and transfer strategies, infrastructures, quality assurance, licensing and other legal topics, education and training, as well as policies and guidelines.
We argue that research software should be treated and acknowledged as a discrete product of the research process. Moreover, we consider research software, alongside text and data, as an essential element of open science.

How RSEs address the advanced computing facilities accessibility gap in Higher Education
Laurence Hurst
(Advanced Research Computing, University of Birmingham)

I will be talking about how we are making advanced computing facilities more accessible to researchers in higher education and, in particular, the impact of having a centrally funded team of RSEs at this University has made. I will also seek to highlight what further areas for improvement we have identified and intend to prompt further discussion/feedback within and from the audience post-talk.

Back in 2016 I gave a talk entitled “Accessible HPC” at CIUK, in Manchester, to highlight the difficulties for new users trying to access advanced computing services, focused somewhat on HPC (as the focus of the conference in question). That talk, which deliberately posed rhetorical questions and no answers about the issue, was to encourage a serious discussion within the Research Computing community about these issues.

Since then, there has been a lot of changes at this institution – amongst them the appointment of a number of centrally funded RSEs, a role that did not formally exist at the University until late 2017. This talk will reflect on how much the RSE role has improved the situation here, look at how much work we still have to do.

Measuring the financial return on investment of the ARCHER eCSE Programme
Lorna Smith, Chris Johnson, Xu Guo, Neelofer Banglawala, Alan Simpson
(EPCC, The University of Edinburgh)

The eCSE programme has allocated funding to the RSE community through a series of regular funding calls over the past four to five years. The programme has funded RSE’s from across the UK to develop and enhance software for the UK’s National HPC Service, ARCHER and for the wider computational science community. The programme is funded by EPSRC and NERC and understanding and quantifying the benefits of the eCSE programme is key to justifying the expenditure on the programme and to securing future funding. We have identified a set of benefits associated with the programme and developed procedures to measure and showcase these benefits. One component of this has been to develop a process to measure and quantify the financial return on investment, based on performance improvements to the code and on post project utilisation figures. This form of financial benefit is an important one, forming part of a portfolio of benefits, benefits that include scientific improvements and achievement.

Building graphical user interfaces for HPC
Mark Dawson
(Swansea University)

Typical usage of HPC involves logging into a remote shell and invoking commands to manage jobs. Whilst convenient and powerful, this can be daunting to newcomers. We present a new, open-source tool which enables developers to quickly and easily configure powerful graphical user interfaces suitable for non-technical users. This is achieved flexibly by adapting simple Python classes. The workflow of a HPC job often requires graphical user interactions on desktop machines, for example interactive job setup or visualisation of results. This tool allows these interactions to be streamlined and easily managed by non-technical users in a few mouse clicks. Jobs can also be submitted transparently to multiple HPC systems with different queuing systems.

The basics of UI design
Mark Turner
(Newcastle University)

As RSEs we’re always writing code, testing code and writing documentation about that code. However, every so often we have to build something that has a user interface. All of a sudden you’re not just a programmer and a database engineer but a UI/UX designer too. For those who find the world of UI/UX design a little challenging, this talk will introduce simple core concepts that form the basis of a well designed, useful, maybe even aesthetically pleasing user interfaces. These concepts are transferable across programming languages, the web and even onto printed materials and slide decks. Hopefully by picking up a few tips of what to do, and what not to do, everyone’s user interfaces will be a little easier to use, and maybe even a little better to look at.

The Knowledge Makers
Matteo Cancellieri
(Open University)

The Knowledge Makers are a group of passionate people that work at the Open University that tries to crash the silos that usually divides the different University departments. We started in December 2017 collecting interest from people and we are running a bi-monthly meetup trying to connect the maker culture with Academia.
We are a group of people passionate about all the shades of making, from Raspberry Pi to Textiles, from Origami to Lego and researchers with any kind of background. It’s a place where people with a passion and academics with lifelong expertise can sit together and discuss and discover. In the near future, we are also planning to run some workshops based on what our makers want to learn and what they can teach. The first one will be a 3D printing workshop at the end of May.
The aim of the talk is about telling the story of the Knowledge Makers, how did we manage to gather researchers that usually work in silos in the same room, what we failed to do and what we achieved. The talk will also be a way to present the concept of the Knowledge Maker and try to push some other group of RSE to run something similar at their institution and then share the experience.

High-performance artificial intelligence: scaling machine learning on supercomputers
Matthew Archer
(University of Cambridge )

There are currently a variety of popular tools and frameworks used for machine learning/deep learning. However, efficiently scaling these frameworks on a modern HPC system is still a challenging problem.

Currently, complex problems in the field of machine/deep learning utilise frameworks such as Tensorflow, Caffe, and Pytorch. These frameworks offer the data-scientist a highly-optimised environment that can leverage cutting-edge hardware: CPUs, GPUs, and FPGAs, enabling researchers to investigate problems that would have been impractical until recently.

UK researchers can make use of high-performance computing (HPC) clusters at a variety of UK universities. These clusters contain thousands of nodes connected to multi-petabyte file systems, that can deliver quadrillions of operations every second. A researcher taking advantage of these resources could train more sophisticated models, on more data, faster than would be otherwise possible.

This talk explores the challenges of training a large model on a modern distributed-memory cluster. We shall present the science applications that motivates and continues to inspire interest in this, and showcase some of the results that have been made possible by this work.

“So, we can toss the cluster in a skip, right?”: Experiences of computational biology in the clouds
Matthew Hartley
(The John Innes Centre)

Research computing often requires large scale processing and storage infrastructure. We experience this particularly in our work in computational biology, where we handle huge genomic and image based datasets. This infrastructure is often provided at an institutional level as a HPC (High Performance Compute) cluster, with associated storage.

These resources work well most of the time, but don’t meet the needs of all researchers. Not everybody has access to them, they can be difficult to use, it’s difficult to scale them rapidly to meet demand, and adapting software and pipelines to run on traditional HPC can be painful.

Cloud resources offer the potential to solve some of these problems. They’re widely available, rapidly scalable and easy to customise. There are, however, many potential challenges to using cloud computing. Resources can be expensive, the range of options is bewildering, and data movement and security are potential problems.

We’ve experimented with a range of cloud providers and resource to supplement or replace our compute and storage needs. In this talk, I’ll explain what worked and what didn’t, and where we think cloud computing will fit into our future, both short and long term.

Harnessing AI for Research
Matthew Johnson
(Microsoft Research, Cambridge)

Artificial Intelligence is increasingly being used to both augment existing fields of research and open up new avenues of discovery. From quality control for imaging flow cytometry to computational musicology, modern AI is an exciting new tool for research and thus knowing how to engineer AI systems in a research context is a vital new skill for RSEs to acquire. In this talk, I will outline four different areas of AI: supervised learning, unsupervised learning, interactive learning, and Bayesian learning. For each of these approaches, I will discuss how they typically map to different research problems and explore best practices for RSEs via specific use cases. At the end of the talk, you will have received a high-level overview of AI technologies and their use in research, have seen some cool examples of how AI has been used in a wide range of research areas, and have a good sense of where to go to learn more.

Surviving as an Apprentice Developer
Matthew Richards
(STFC)

In this industry, software engineers have a vast array of knowledge with in-depth systems and often specialist knowledge (especially in the area of research). I’ve been a degree software engineering apprentice at STFC, fresh from A-Levels for just under a year and discovered how daunting the industry can be for those with little experience. It can be difficult when colleagues assume you know what they’re talking about all day but in reality, you’ve understood roughly ten words they’ve said in as many minutes. This talk will be about my experiences as an apprentice and how I’ve learned to survive around my intelligent co-workers with only academic programming knowledge. It’s also a great chance for you to understand the mind of an apprentice and how you can apply that in the future to not scare off colleagues that don’t have very much knowledge.

Interactive Research Data Visualisation by Drag-and-Drop
Matthew Walker
(University of Southampton)

Many researchers now choose to make their datasets openly available and may also release corresponding software tools to implement their techniques. However, enabling others to easily visualise and understand the data is often challenging.

We demonstrate a web-based results visualiser where a user can drag-and-drop raw data files (either experimental data, or the output from a software tool) into an online webpage, which proceeds to analyse the data and automatically generate interactive graphs and tables.

With this approach, anyone can visualise the data from any web-enabled device with a modern web browser; it supports every operating system, and does not require software to be installed.

We discuss the tools and techniques used to create such interactive visualisations, how they aid in the understanding of data, and the further potential. While our demonstration is custom-built for a specific purpose, we propose a framework to enable anyone to create a visualiser for their data.

How reusable is software mentioned in Open Access papers? An empirical study using code-cite
Neil Chue Hong, Robin Long, Martin O’Reilly, Naomi Penfold, Isla Staden, Alexander Struck, Shoaib Sufi, Matthew Upson, Andrew Walker, Kirstie Whitaker
(Software Sustainability Institute, University of Edinburgh)

Software is increasingly referenced in publications [1] because it has been used to produce the results being described, and because journals and funders are requiring code to be shared to improve reproducibility, encourage reuse and reduce duplication. This software may have been written to enable the work described in the publication, may be being cited to credit the original authors, or the main function of the publication could be to describe the software.

However it is hard both to identify software which is referenced in publications, and to assess its reusability. To address this we mined [2] the full text of papers available from EuropePMC to identify links to software repositories (here, “Github.com”). We investigated link persistence and queried the software repository to extract attributes including license information, documentation and update frequency, from which we inferred the likely reusability and sustainability of the software.

Our results show that there are clear differences in the reusability of software referenced in the research literature.

[1] Bullard and Howison (2015) https://doi.org/10.1002/asi.23538
[2] Watson et al (2018). http://doi.org/10.5281/zenodo.1209311

Fast code with just enough effort
Pashmina Cameron
(Microsoft Research)

A few years ago, writing fast parallel code meant one of two things – using CPU threads for running multiple tasks in parallel, or getting your hands dirty to write some intrinsics or assembly to address specific bottlenecks.
New classes of hardware such as GPUs, wide SIMD, FPGAs and custom AI processors/DSPs are becoming available. Programming each of these platforms requires specialised skills. We will briefly discuss choices for space constrained, compute constrained or power constrained applications.
Software tool kits fall into two broad classes. The first provide a level of abstraction to reduce the complexity of multi-platform programming but this comes at a cost. Others are focused on a single platform but are able to offer better performance. This vast array of choices makes choosing the right tool for a problem tricky.
We will look at example CPU/SIMD and GPU implementations of routines typically found in computer vision and AI applications. We will compare the best performance achievable from each method, while maintaining focus on the trade-offs between the effort of programming and the resulting performance.

Life, death and resurrection: lessons from CASTEP
Phil Hasnip
(University of York)

What happens when a leading research computer program becomes unmaintainable? A cutting-edge research tool, yet living on borrowed time… do you keep the project limping on, or let it die? What is the legacy of “legacy code”? At the end of the 1990s, CASTEP was just such a code; a heady mix of leading-edge physics and materials science powered by an unsustainable spaghetti of FORTRAN, C and compiler extensions.

In this talk I will present tried and trusted philosophies and practical principles for software design and implementation. I will cover each stage of the process, from the initial design to the released software, and illustrate the principles with tales from a range of projects including CASTEP, which was completely and successfully redesigned and re-engineered in just 2 years, and continues to flourish 17 years later.

There are not enough RSEs in the world. Is community support the answer?
Simon Hettrick
(The Software Sustainability Institute)

There are 210,000 researchers in the UK and a significant number (at least a third) rely on software to conduct their research. How do we support this huge community with only a few thousand RSEs in the UK? One option is to invest effort into creating local communities that complement the work of RSEs by dealing with the straightforward questions asked by researchers.

Researchers require help with their software, but it is not always the in-depth software engineering that we have come to expect from RSEs. RSE Group leaders are inundated by requests for software engineering, but also by requests for simple help: which tool to use, what practices to follow, and general pointers on improving code. There are also requests for help with general, philosophical problems that do not fit well with the paid-for service model of many RSE Groups.

One solution, trialled by my group at the University of Southampton and others, is to create a Research Software Community comprising RSEs and “researcher-developers”. By bringing these people together, they support each other. In this talk we will discuss our successes and describe how a network of these communities could be created across academia to support researchers and RSEs.

“YOU HAVE 0 CREDIT – PLEASE INSERT C̶O̶I̶N̶ FILE”: The Citation File Format
Stephan Druskat
(Humboldt-Universität zu Berlin, Department of German Studies and Linguistics)

Getting credit for your research software is like playing a pinball machine: Just insert a coin (file) to get the ball rolling. Then, several parts of the machine can increase your credit (via citation) and help you make the highscore: The plunger (source code repository) provides the required metadata; The flippers (metadata tooling) transfer the metadata into other formats; the bumpers (repositories/indexers) distribute and display the metadata; the bonus holes (publishers) use the metadata to publish references. The Citation File Format is that initial coin: A human- and machine-readable format for software citation metadata. No need to slam tilt: Add a CITATION.cff file to your source code, and users will have all the necessary metadata to cite your software. Required fields make sure the file is self-explanatory, and includes the metadata required for credit and re-use as per the Software Citation Principles. A file in the Citation File Format can also be read to display citation information in the repository GUI, in the referenced software itself, or in the data the latter produces. And downstream it can be converted into the CodeMeta exchange format so that all actors in the software citation workflow can read the metadata.

Surviving the vacuum: A strategy for sustaining software in the absence of RSE teams
Stephan Druskat, Thomas Krause
(Humboldt-Universität zu Berlin, Department of German Studies and Linguistics)

The continued foundation of new institutional RSE teams is great, but what if you still don’t have one? Who will ensure your new research software will be sustainable/re-usable after your project money is gone? You! Make your software extensible and use a generic data model. Let infrastructure handle the rest: Most components are available for free, and sustainable enough. Make use of a source code repository platform that can handle code, documentation, and communication between users, contributors and maintainers; pick one that is harvested by Software Heritage. Use a dependency repository to safeguard reproducible builds; make sure it’s the big, default one, and that it’s run from long-lasting funds. Use Zenodo for releases: It is designed to be there for the long run, and gives you a DOI. Student asistants are cheap: Document all the things, so that a capable one will be able to take over maintainership when your funding is gone. Document all the things: For users and developers; all design decisions; all infrastructure and architecture; all workflows; all community documents and processes. Our dedicated research project will do just this, learn about requirements and best practices, and share for RSEs and researcher training.

The challenges of creating an interactive big data visualisation platform for meteorology
Stephen Haddad
(Met Office)

The Met Office continues to strive to increase the resolution of simulations and observations of the atmosphere and ocean. Consequently, the challenge of presenting the data to both operational and research users also continues to grow. Lack of suitable tools leaves big datasets under-utilised and scientists spending a lot of time on technical issues or waiting for results rather than doing analysis.
This talk will discuss these challenges for a specific application for SE Asia and consider the tool’s use in other contexts. The key goal of the tool is to provide interactive access to visualisations high-resolution data from multiple sources through a single web portal that is accessible both in the Met Office and in SE Asia.
I will discuss how we are using cloud infrastructure running widely available python tools to create a tool that provides a common visualisation platform for research and operational users, facilitating better collaboration. We have been working with the Informatics Lab and using the Pangeo platform to visualise big datasets and develop code that researchers can extend. We aim to enable users to utilise the wealth of available data to provide better forecast products and do better research

Software Engineering Guidelines – From Theory to Practice
Tobias Schlauch, Carina Haupt
(German Aerospace Center (DLR))

Research software is mainly developed by scientists who are domain experts. Most of them have no specific education in software development. To support research scientists at the German Aerospace Center, we created a set of software engineering guidelines for different fields of software development.
At RSE17, we already presented the concept of the guidelines. This time, we want to share the practical experiences we collected over the last year.

In this talk, we want to practically introduce the guidelines and the
classification scheme. Particularly, we demonstrate their usage in context of
two real-world research software applications. The first example demonstrates
the usage of the guidelines when starting a new software development. In this
case the focus is on finding the initial steps and getting an overview of future
aspects. The second example is about an existing, legacy software application.
In this case the focus is on analysing what is already there and finding out the
next suitable steps. In this context, the accompanying checklists function as an
ongoing planning document and the classification scheme helps to find the
suitable starting point.

Utilising the Robot Operating System to create a reproducible platform for surgical device development
Tom Dowrick
(UCL)

A range of novel surgical tools are under development at the Wellcome/EPSRC Centre for Interventional and Surgical Sciences. To facilitate the integration of these new tools, such as concentric tube robots, with existing imaging (Slicer, NIFTK, OpenIGTLink) and control software, a common development platform and workflow is required, to reduce overheads involved with testing and deploying new devices.

A key challenge in this area of work is ensuring that software and hardware developments are not carried out in isolation. By providing a standardised set of software libraries, working environments and documentation, new hardware developments can be designed to be compatible with existing infrastructure from the outset.

This work heavily utilises the open source Robot Operating System (ROS), which consists of a set of C++/Python libraries for control of, and communication between, robotic devices, based around a simple message passing system.

An overview of ROS and the novel surgical tools being considered will be provided, followed by details of the software being implemented, which includes the creation of template projects, integration/processing of multiple data feeds, and bridges to existing software platforms.

Code is Science: a manifesto for open source code in science
Yo Yehudi
(University of Cambridge)

Much of modern science relies upon computing. Despite this fact, fewer than 12% of code papers surveyed in Science recently were found to have code and data readily accessible [1]. This is a problem for many reasons. Science has, historically, always been peer reviewed before being considered valid. If the code isn’t available online, what is the likelihood it has been peer reviewed? When the code backing the science is wrong, the science is wrong too. Furthermore, if a paper talks about computational results, there is no way to reproduce the paper without having the code available. It seems unlikely that a paper describing a mathematical proof could omit the proof itself, so why is this tolerated for computer code?

The Code is Science manifesto [2] is a call to funders, journals, researchers, and research institutions to recognise code as a fundamental part of science, making it open source, peer reviewed, and code credited as a scientific output equivalent to papers or data. The manifesto is currently in a public consultation period and will launch formally in early June.

[1] http://www.pnas.org/content/115/11/2584
[2] https://codeisscience.github.io/manifesto/manifesto

UiOHive: a local Hub-node organization for building competence in IoT
by alphabetical order: John F. Burkhart, Ana Costa Conrado, Simom Filhol, Anne Fouilloux
(University of Oslo, Department of Geosciences)

UiOHive is an initiative from the Department of Geosciences (University of Oslo, Norway) to establish a local knowledge hub around the development of new observation devices and connect individuals with different backgrounds and titles (technicians, lab engineer, research software engineers, students, PhDs, postdocs, researchers, etc.) but all interested in utilizing IoT technologies, applying AI/ML to solve data challenges, and sharing knowledge across relevant interdisciplinary domains.
The need arose within the “Land-Atmosphere Interactions in Cold Environments” project where the development of observation sensors triggered new challenges related to IoT technologies. To overcome these obstacles, a revolutionary approach is required with strong software and hardware co-design strategy at its center.
To establish a community and form a foundation of knowledge, UiOHive will provide the infrastructure to 1) educate (workshops, hackathons) students and scientists about the potential of IoT technology for building new observation tools, 2) share existing facilities (electronic labs, in-situ Research Station), 3) create collaborative platform to share solutions, problems, and simply exchange across departments at the university.

Building Nordic-RSE: why and how?
by alphabetical order: Radovan Bast, Anne Fouilloux, Bjørn Lindi, Radek Lonka, Sri Harsha Vathsavayi, Thor Wikfeldt
(University of Oslo, Department of Geosciences)

Inspired by the success of the UK Research Software Engineer (RSE) Association and emerging initiatives in the Netherlands and Germany, a handful of RSEs from Oslo, Trondheim (Norway), Tromso (Norway), Stockholm (Sweden), and Helsinki (Finland) have decided to informally launch a unique Nordic-RSE Association.
The Nordic RSE Association was born in April 2018 and we wish to present to you the initial results of our survey as well as our plans for growing a Nordic-RSE community. We will explain why we think it is important to combine our efforts within the Nordic countries (Denmark, Finland, Iceland, Norway, Sweden) and not go for individual RSE networks in each Nordic country.
We will present our vision following what has been done in the UK, Germany and the Netherlands but showing also what is specific to Nordic countries and how these specificities can be of value at an international level.