Skip to content

Getting Started

Testing and development so far has only been done on native Linux. MacOS is supported by InvenioRDM but has not been tested with this project. Development in WSL may be possible but working natively in Windows is not supported.

Requirements

The requirements for working with InvenioRDM are laid out in detail in the InvenioRDM System Requirements Docs. You're ready to go once you have cloned the code repository and can run invenio-cli check-requirements --development in the project directory and all requirements are met. Below are some tips and specifics for this project:

  • Start by installing invenio-cli and run the requirements check above to see what's missing.
  • Both pipenv and invenio-cli are best installed with pipx. These need to be discoverable on your path.
  • We are currently pinning to Python 3.9 for compatibility to the deployment base image so you'll need this available. invenio-cli will be satisfied with anything 3.9 or newer but you need 3.9.
  • Cairo and DejaVu are listed in the InvenioRDM Docs but are not checked for by invenio-cli. The direct impacts of not having these is unclear but you'd probably get by.
  • ImageMagik is checked for by invenio-cli but similarly you'd probably get by without it.

Tooling Overview

A combination of tools are used to manage the project. Their different roles are summarised below but most operations use invenio-cli which wraps the other tools as required and is covered in more detail below.

  • pipenv is used to manage Python dependencies and the virtual environment used for development.
  • node and npm are used to manage JavaScript dependencies and the build process for the frontend.
  • Docker and Docker Compose are used to manage the services required to run the application, namely the database, OpenSearch, Redis and RabbitMQ.
  • invenio - is a command line that can be used to interact with some Invenio components. It is installed within the virtual environment managed by pipenv so must be invoked via pipenv run invenio.

invenio-cli

As mentioned above invenio-cli is the primary tool for managing the project and most operations are performed by invoking it. It's main subcommands are sumarised below:

  • invenio-cli install - Installs the project and its dependencies. Creates the virtual environment if necessary, syncs the dependencies with Pipfile.lock, builds the frontend and copies/symlinks the assets to the correct location in the virtual environment.
  • invenio-cli services - Manages the Docker services required to run the application. Can be used to setup, start, stop and teardown the services.
  • invenio-cli run - Starts the Flask development server and a set of Celery workers.
  • invenio-cli packages - Wraps pipenv to manage Python dependencies. Can be used to install, uninstall and update packages.
  • invenio-cli pyshell - Starts a shell in the virtual environment with an initialised Flask app.
  • invenio-cli assets - Manages static files and frontend assets. Can be used to build the frontend, watch for changes and clean up.

Local Installation

Initial setup of the project can be done with the following commands:

invenio-cli install
invenio-cli services setup --no-demo-data

This will:

  • Create a virtual environment and install the Python dependencies. site/ic_data_repo is installed in editable mode so changes to the source code are immediately available.
  • Install the JavaScript dependencies and build the frontend assets.
  • Copy/symlink the staticfiles and Javascript assets to the correct location in the virtual environment.
  • Start the Docker services required to run the application and ensure they are healthy. This includes the database, OpenSearch, Redis and RabbitMQ.
  • Create the database schema, initialise the Opensearch indices and various other one-off setup tasks.
  • Populate the database with some default data e.g. default user roles and permissions. The --no-demo-data flag is used to prevent the creation of demo data records. Remove it if you want the instance to be populated with example deposit data.
  • Creates a number of Celery tasks to populate the database with controlled vocabulary data. Note that there are no Celery workers running yet to process these tasks so they are just waiting in a queue.

Note that the above leaves the services running. You can stop them with invenio-cli services stop. Either way you can then start the Flask server with:

invenio-cli run

This runs the Flask development server and creates a number of Celery workers in the background. If the services are not already running then they will be started. The first time this is run after setup there will be a backlog of Celery tasks that starts executing. This can be a bit resource intensive and make things a bit sluggish.

Once the Flask server has started visit https://127.0.0.1:5000 in your browser. The development setup uses a self-signed TLS certificate so may need to bypass a security warning. Once finished, stop the running Flask server and use invenio-cli services stop to bring down the running seOrder complete

We’ve emailed you these order details and will text you about your order.rvices.

If you want to restart the setup process from scratch you can use invenio-cli services destroy remove all the services and data.

Logging In

In order to log in to the application you will need to create a user account:

invenio users create DUMMY_EMAIL --password DUMMY_PASSWORD --active

You can also optionally make this user an admin with:

invenio access allow administration-access user DUMMY_EMAIL

Development

QA

It is strongly recommended to use pre-commit to check your individual commits meet the QA standards of the project. These are enforced via GitHub Actions and it's easiest to make sure you're compliant as you go along. Details of the QA tools can be found in .pre-commit-config.yaml.

Continuous Integration

A simple Continuous Integration setup is provided via GitHub Actions. This checks the target commit against the project QA tooling and for commits to the main branch builds and pushes Docker images for the web application and frontend.

Tests

A test suite is provided in the tests directory. Assuming services have already been setup, tests can be run with:

invenio services start
pipenv run pytest

All development work should be supported by an appropriate set of tests. Best practices around testing are expected to evolve as the project develops.

The pytest-invenio plugin is provided to support test development. This extends pytest-flask to provide fixtures and support for testing Invenio.

Backend Development

Using invenio-cli run will start the Flask development server and a set of Celery workers. Debugging is enabled and it the server will automatically reload when changes are made to the source.

Frontend Development

The frontend is built with Webpack and the assets are managed by invenio-cli. Any changes made to the css or javascript assets will require a rebuild of the assets. As a one-off operation this can be done with invenio-cli assets build. To watch for changes and rebuild automatically use invenio-cli assets watch.

Note that the above is not required for any changes to the html templates which are processed by the backend.

Troubleshooting

InvenioRDM is a sophisticated application with many moving parts. If you encounter issues the below information may help with troubleshooting:

  • invenio-cli stores some state about the project (e.g. whether setup has been performed for the services) in the file .invenio.private. The file is gitignored but avoid deleting it. If you're worried it has gotten out of sync then run invenio-cli destroy to completely remove all services, data and resources.
  • If you encounter errors about missing indexes (for Opensearch) or database tables (for postgres) then setup may not have completed successfully. You can try invenio-cli services destroy to do a complete teardown then setup the services again.
  • You can check the status of the services with invenio-cli services status. This will show which services are running and whether they are healthy. If a service is having issues you can use Docker Compose to check the logs e.g. docker compose logs opensearch.
  • The celery workers started by invenio-cli run can be a bit verbose and polute the logs in the console. You can redirect the celery logs to a file with invenio-cli run --celery-log-file /path/to/logfile.
  • invenio-cli pyshell can be used to start a shell in the virtual environment with an initialised Flask app. This can be useful for debugging issues with the application code or inspecting config.

Configuration

This project extends the configuration approach used by Invenio RDM.

Inspired by Django the following changes have been made:

  • Configuration is stored in the module ic_data_repo.config.
  • The module to use as settings can be specified at runtime via the environment variable INVENIO_SETTINGS_MODULE. This defaults to ic_data_repo.config.
  • The standard InvenioRDM config file (invenio.cfg) now contains only the necessary import machinery to facilitate the above.

Note that overriding settings by environment variable still works.

The default configuration is suitable for development. A production oriented settings file is also provided in ic_data_repo.config.production.

Test Data

Note

This functionality is not currently working.

Instructions for accessing and working with realistic test data records are provided in the test_data directory.