ve_data_science

Using GLOBUS

We are using the GLOBUS system to curate the data used in the Virtual Ecosystem data science repository.

At any point, we should be able to re-run analyses by cloning the code from the GitHub repository and then populating the data directory using GLOBUS.

GLOBUS Overview

GLOBUS is a web-based system that provides access to data files.

A single data repository is called a collection. A collection is basically just a configured connection to a particular set of files. Individual users can then be given access to collections. Users can also be made part of a group and that group can be given access to collections.

For the VE Data Science team, we are using GLOBUS to connect to a collection of files hosted on the Imperial College London Research Data Store.

Once you have logged into the GLOBUS web application, you will end up on a page with a set of different tabs on the left hand side.

!!! info “Access permissions”

Globus frequently requests extra authentication steps. This is usually when you are
accessing a new part of the GLOBUS functionality. It will typically take you to a
page with a prompt like "Session reauthentication required (Globus Transfer)". The
page will then also show your login ID email - this is actually a clickable link to
start the authentication for the action and should just complete and take you to the
page you were trying to access.

The Collections tab

The Collections tab is used to provide an overview of the data collections that you have access to.

The File Manager

The File Manager tab is used to view the files and folders within a collection and to interact with the data repository. You can access the tab from a particular collection (as above), from the tab button on the left or directly using the URL https://app.globus.org/file-manager

Once you have opened a collection in the pane then you should be able to see the files and folders in the collection and can open folders to explore the data.

!!! alert “Collection paths”

When you open the VE Data Science collection, you will see that it shows a path at the
top: `ve_data_science/data`. This is because the collection shares _all_ of the files
in our Research Data Store. This includes a clone of the `ve_data_science` repo but
also some other data resources. We are managing access to the files using a GLOBUS
group ('VE Data Science team') that only has access to the files under the
`ve_data_science/data` path, so you can't see the other data on that drive.

File Manager actions

The bar in the centre of the file manager provides action buttons to work with files and folders.

These tools may be all you need for day to day work - if you have a few files to upload this may well be what you want to do. However, if you want to upload a more complex set of files or download a large number of files, this is going to be a problem.

This is where the Transfer or Sync to… option comes in - it allows files and folders to be copied between two collections. To do so, you need to configure your own computer as a collection.

Globus Connect Personal

The Globus Connect Personal application https://www.globus.org/globus-connect-personal is a local application that you install to your computer that sets up a GLOBUS collection on your computer.

If you now go to the web application and look at the collections administered by you, you should see the a new Private Mapped Connection:

https://app.globus.org/collections?scope=administered-by-me

In the File Manager tab of the web application, you can now select your personal collection and use the File Manager action buttons to manage your files and transfer folders between the two collections.

!!! Warning “Local file access permissions”

By default, Globus Connect Personal (GCP) has access to your home directory. Only you
have access to the collection, but you can also configure GCP to only be able to
access a subset of files. Under the `GCP > Preferences` settings, you can select the
Access tab and specify which files GCP can access _and_ whether GCP is allowed to
write to those folders.

Within the GLOBUS web application, you can also check the visibility of your local
collection through the Collections tab: click your local collection and then explore
the visibility options to check if other users can see the existence of your
collection.

The GLOBUS Transfer system

Transfer is used to copy files from a source collection to a destination collection. Here, you could be uploading a folder from your personal collection (source) to the RDS repo (destination) or downloading data from the RDS (source) to your local collection (destination) for analysis. Or possibly doing both to synchronise the two folders!

To transfer files or folders between collections: