Invenio-Files-REST module.

This guide will show you how to get started with Invenio-Files-REST. It assumes that you already have knowledge of Flask applications and Invenio modules.

It will then explain key topics and concepts of this module.

Getting started

You will learn how to create a new Location, a Bucket and an ObjectVersion using the programmatic APIs of Invenio-Files-REST.

First, you will have to setup your virtualenv environment and install this module along with all it’s dependencies.

After that, start a Python shell and execute the following commands:

>>> from flask import Flask
>>> app = Flask('myapp')

This is the initial configuration needed to have things running:

>>> app.config['BROKER_URL'] = 'redis://'
>>> app.config['CELERY_RESULT_BACKEND'] = 'redis://'
>>> app.config['DATADIR'] = 'data'
>>> app.config['REST_ENABLE_CORS'] = True
>>> app.config['SECRET_KEY'] = 'CHANGEME'
>>> app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite://'
>>> allow_all = lambda *args, **kwargs: \
... type('Allow', (), {'can': lambda self: True})()
>>> app.config['FILES_REST_PERMISSION_FACTORY'] = allow_all

Relevant configuration variables will be explained later on. Now let’s initialize all required Invenio extensions:

>>> import shutil
>>> from os import makedirs
>>> from os.path import dirname, exists, join
>>> from pprint import pprint
>>> import json
>>> from flask_babelex import Babel
>>> from flask_menu import Menu
>>> from invenio_db import InvenioDB, db
>>> from invenio_rest import InvenioREST
>>> from invenio_admin import InvenioAdmin
>>> from invenio_accounts import InvenioAccounts
>>> from invenio_access import InvenioAccess
>>> from invenio_accounts.views import blueprint as accounts_blueprint
>>> from invenio_celery import InvenioCelery
>>> from invenio_files_rest import InvenioFilesREST
>>> from invenio_files_rest.views import blueprint
>>> ext_babel = Babel(app)
>>> ext_menu = Menu(app)
>>> ext_db = InvenioDB(app)
>>> ext_rest = InvenioREST(app)
>>> ext_admin = InvenioAdmin(app)
>>> ext_accounts = InvenioAccounts(app)
>>> ext_access = InvenioAccess(app)

You can now initialize Invenio-Files-REST. When using Invenio-Files-REST as dependency of an Invenio applicaton, the REST views are automatically registered via entry points. For this example, you will have to register them manually and push a Flask application context:

>>> ext_rest = InvenioFilesREST(app)
>>> app.register_blueprint(accounts_blueprint)
>>> app.register_blueprint(blueprint)
>>> app.app_context().push()

Let’s create the database and tables, using an in-memory SQLite database:

>>> db.create_all()

When you setup Invenio-Files-REST for the first time, you will have to define a default Location. It can be local or remote and it will be accessed via its URI.

Create a location

For this example, you will use a temporary directory:

>>> from invenio_files_rest.models import Location
>>> d = app.config['DATADIR']  # folder `data`
>>> if exists(d): shutil.rmtree(d)
>>> makedirs(d)
>>> loc = Location(name='local', uri=d, default=True)
>>> db.session.add(loc)
>>> db.session.commit()

Create a bucket

In order to create, modify or delete files, you have to create a files container first, the Bucket.

>>> from invenio_files_rest.models import Bucket
>>> b1 = Bucket.create(loc)
>>> db.session.commit()

Create objects

Files are represented by ObjectVersions. After creating a bucket, you can now add files to it, for example:

>>> from six import BytesIO
>>> from invenio_files_rest.models import ObjectVersion
>>> a_file = BytesIO(b"my file contents")
>>> f = ObjectVersion.create(b1, "thesis.pdf", stream=a_file)
>>> db.session.commit()

Retrieve objects

You can now retrieve objects. Retrieve the bucket object:

>>> b = Bucket.get(

Retrieve all ObjectVersions contained in a bucket:

>>> file_names = [ov.key for ov in ObjectVersion.get_by_bucket(]

Retrieve a specific ObjectVersion by filename:

>>> f = ObjectVersion.get(, "thesis.pdf")

Data model

This is a more in-depth explanation of the concepts introduced in the Overview section.


A bucket is a container of objects. It is uniquely identified by an ID. Buckets have a default Location and Storage class. Individual objects in the bucket can however have different Locations and Storage classes.

The size field stores the current size of the bucket. When a new object is added or completely removed, its size is updated.

Buckets can have constraints on the maximum amount of objects that they can contain. It is controlled by the function invenio_files_rest.limiters.file_size_limiters(): by default, a new object can be added to the bucket if the maximum size of the file is lower than invenio_files_rest.config.FILES_REST_DEFAULT_MAX_FILE_SIZE and if the total quota (the sum of sizes of all files) is lower than invenio_files_rest.config.FILES_REST_DEFAULT_QUOTA_SIZE.

Buckets can be marked as locked. When a bucket is locked, objects can be retrieved but no object can be added and deleted.

Similarly to objects, bucket can be logically marked as deleted without affecting the actual content. When it is deleted, it simply means that no objects can be retrieved or added via APIs.

Finally, buckets provide ways to create or synchronize copies: the snapshot operation creates a new copy of a bucket with all the latest versions of the object it contains, without duplicating files on disk. The sync operation mirrors objects contained in the source bucket to the destination bucket.


ObjectVersions are objects that represent a specific version of a file at a given point in time. ObjectVersions are uniquely identified by its ID. They are always contained in an existing Bucket by having the reference bucket_id to it.

An ObjectVersion describes the file (FileInstance) that references with the attribute file_id. It also stores some metadata of the file: the file name, stored in the key attribute and the version, stored in version_id attribute. The triplet (bucket_id, key, version_id) is unique.

For a given key in a Bucket, normally the latest version in history is marked as the head.

The key has a maximum length defined via invenio_files_rest.config.FILES_REST_OBJECT_KEY_MAX_LEN.

ObjectVersion can be marked as deleted by removing its reference to the file it represents: from the user perspective, deleting a file normally means adding a new ObjectVersion, which will be the new head, without file_id.


A file instance represents a file on disk. A file instance may be linked from many objects, while an object can have one and only one file instance.

The file on disk can be retrieved by the file instance uri, which is an absolute path/URI generated when adding the file: the base path is retrieved from the Location used for this file, and the relative path is assigned by the file’s Storage. It is responsibility of the Storage, which is aware of the file system that is managing, to generate a unique final path for the file. You can modify how the path is generated with the default storage by changing invenio_files_rest.config.FILES_REST_STORAGE_PATH_SPLIT_LENGTH or invenio_files_rest.config.FILES_REST_STORAGE_PATH_DIMENSIONS.

A file instance may not be ready to be accessed, for example in case of multipart uploads: the attribute readable marks it. It can also be marked as not writable if it cannot be deleted or replaced, for safety reasons.

checksum, last_check_at and last_check are attributes used to store information about integrity checks.

You can find the documentation of each API in the API Docs.


REST APIs allow you to perform most of the operations needed when manipulating files.

By design, Locations cannot be created using REST APIs. This is because they depend on your physical file storage infrastructure. You will have to create them in advance when setting up your Invenio instance.

To be able to run each of the next steps, you can instantiate and start an Invenio instance as described here.

Create a bucket

A bucket can be created by a POST request to /files. The response will contain the unique ID of the bucket.

$ curl -X POST http://localhost:5000/api/files
    "max_file_size": null,
    "updated": "2019-05-16T13:07:21.595398+00:00",
    "locked": false,
    "links": {
        "self": "http://localhost:5000/api/files/
        "uploads": "http://localhost:5000/api/files/
        "versions": "http://localhost:5000/api/files/
    "created": "2019-05-16T13:07:21.595391+00:00",
    "quota_size": null,
    "id": "cb8d0fa7-2349-484b-89cb-16573d57f09e",
    "size": 0

Uploading Files

You can upload, download and modify single files via REST APIs. A file is uniquely identified within a bucket by its name and version. Each file can have multiple versions.

Let’s upload a file called my_file.txt inside the bucket that was just created.

$ BUCKET=cb8d0fa7-2349-484b-89cb-16573d57f09e

$ echo "my file content" > my_file.txt

$ curl -i -X PUT --data-binary @my_file.txt \
    "mimetype": "text/plain",
    "updated": "2019-05-16T13:10:22.621533+00:00",
    "links": {
        "self": "http://localhost:5000/api/files/

        "version": "http://localhost:5000/api/files/
        "uploads": "http://localhost:5000/api/files/
    "is_head": true,
    "tags": {},
    "checksum": "md5:d7d02c7125bdcdd857eb70cb5f19aecc",
    "created": "2019-05-16T13:10:22.617714+00:00",
    "version_id": "7f62676d-0b8e-4d77-9687-8465dc506ca8",
    "delete_marker": false,
    "key": "my_file.txt",
    "size": 14

If you have a new version of the file, you can upload it to the same bucket using the same filename. In this case, a new ObjectVersion will be created.

$ echo "my file content version 2" > my_filev2.txt

$ curl -i -X PUT --data-binary @my_filev2.txt \
    "mimetype": "text/plain",
    "updated": "2019-05-16T13:11:22.621533+00:00",
    "links": {
        "self": "http://localhost:5000/api/files/

        "version": "http://localhost:5000/api/files/
        "uploads": "http://localhost:5000/api/files/
    "is_head": true,
    "tags": {},
    "checksum": "md5:fe76512703258a894e56bac89d2e8dec",
    "created": "2019-05-16T13:11:22.617714+00:00",
    "version_id": "24bf075f-09f4-42f8-9fbe-3f00b8aac3e8",
    "delete_marker": false,
    "key": "my_file.txt",
    "size": 13

When integrating the REST APIs to upload files via a web application, you might use JavaScript to improve user experience. Invenio-Files-REST provides out of the box integration with JavaScript uploaders. See the JS Uploaders section for more information.

Invenio-Files-REST also provides different ways to upload large files. See the Multipart Upload and Large Files sections for more information.

Serving files

To serve and allow download of files, you can perform a GET request specifying the bucket and the filename used to upload the file.

$ curl -i -X GET "http://localhost:5000/api/files/$BUCKET/my_file.txt"

You can also list files or download specific versions of files. See the REST APIs reference documentation below for more information.

Be aware that there are security implications to take into account when serving files. See the Security for more information.

Invenio-Files-Rest provides also the functionality to serve your files directly from your external storage. This is achieved by attaching the X-Accel-Redirect header to the response, which will then be redirected by your Web Proxy (e.g. NGINX, Apache) to your external storage, finally streaming the file directly to the user. To use this feature you will need to configure your Web Proxy accordingly and then enable the invenio_files_rest.config.FILES_REST_XSENDFILE_ENABLED.

API Reference

Default Location

Create a bucket:

POST /files/


Check if bucket exists, returning either a 200 or 404:

HEAD /files/<bucket_id>

Retrieve the latest version of all objects in bucket:

GET /files/<bucket_id>

Retrieve all versions of files in a bucket:

GET /files/<bucket_id>?versions

Return list of multipart uploads:

GET /files/<bucket_id>?uploads


Initiate multipart upload (see Multipart Upload):

POST /files/<bucket_id>/<file_name>?

Finalize multipart upload:

POST /files/<bucket_id>/<file_name>?uploadId=<upload_id>

Upload a file to a bucket:

PUT /files/<bucket_id>/<file_name>

Upload part of in-progress multipart upload to a bucket:

PUT /files/<bucket_id>/<file_name>?uploadId=<upload_id>&part=<part_number>

Retrieve the latest version of a given file. By default, the file is returned with the header 'Content-Disposition': 'inline'. Be aware that the browser will try to preview it.

GET /files/<bucket_id>/<file_name>

Download the latest version of a given file. It will return the same response as the request above but with the response header 'Content-Disposition': 'attachment' to instruct the browser trigger a download.

GET /files/<bucket_id>/<file_name>?download

Retrieve a specific version of a given file:

GET /files/<bucket_id>/<file_name>?versionId=<version_id>

Retrieve the list of parts of a multipart upload:

GET /files/<bucket_id>/<file_name>?uploadId=<id_number>

Mark an object as deleted (see Deleting files):

DELETE /files/<bucket_id>/<file_name>

Permanently erase an object and the physical file on disk:

DELETE /files/<bucket_id>/<file_name>?versionId=<version_id>

Abort multipart upload:

DELETE /files/<bucket_id>/<file_name>?uploadId=<upload_id>

Deleting files

A delete operation can be of two types:

  1. mark an object as deleted, allowing the possibility of restoring a deleted file (also called delete marker or soft deletion).

  2. permanently remove any trace of an object and referenced file on disk (also called hard deletion).

Soft deletion

Technically, it creates a new ObjectVersion, that becomes the new head, with no reference to a FileInstance. It is possible to revert it by getting the previous version.

This operation will not access to the file on disk and it will leave it untouched.

You can soft delete using REST APIs:

DELETE /files/<bucket_id>/<file_name>

Hard deletion

Given a specific object version, it will delete the ObjectVersion, the referenced FileInstance and the file on disk. If the deleted version was the head, it will then set the previous object as the new head.

The deletion of files on disk will not happen immediately. This is because it is done via an asynchronous task to ensure that the FileInstance is safely removed from the database in case the low level operation of file removal on disk fails for any unexpected reason.

You can hard delete a file using REST APIs:

DELETE /files/<bucket_id>/<file_name>?versionId=<version_id>

REST APIs do not allow to perform delete operations that can affect multiple objects at the same time. For advanced use cases, you will to use the Invenio-Files-REST APIs programmatically.


For safety reasons, the deletion will fail if the file that you want to delete is referenced by multiple ObjectVersions, for example in case of Buckets snapshots.


Invenio-Files-REST relies on Invenio-Access to implement files authorization. The following documentation assumes that you already have knowledge of how authorization works on Invenio.

Invenio-Files-REST defines a set of actions for operations on Bucket and ObjectVersions that can be used to implement authorization as you need:

  • files-rest-location-update

  • files-rest-bucket-read

  • files-rest-bucket-read-versions

  • files-rest-bucket-update

  • files-rest-bucket-listmultiparts

  • files-rest-object-read

  • files-rest-object-read-version

  • files-rest-object-delete

  • files-rest-object-delete-version

  • files-rest-multipart-read

  • files-rest-multipart-delete

Response codes

If the authorization for an action fails, Invenio-Files-REST normally returns a 403 response code for authenticated users, 401 otherwise. For security reasons, when trying to retrieve an unauthorized file, it will return a 404 instead to hide the existence or non-existence of the file.

Authorization definition

The default permission factory invenio_files_rest.permissions.permission_factory will authorize users that has Needs that fulfill the actions listed above. This means that by default no user will be authorized (with the exception of any superuser).

Depending on how you are planning to integrate Invenio-Files-REST in your Invenio application, you might want to decide how to give permissions for operations on files.

If you plan to give authorization to specific users or roles, you can use the default permission factory and assign user or roles to the actions listed above as described in the Invenio-Access documentation.

If instead you want to define permissions based on other object, for example on records to which the files are attached to, then you will have to define your own permission factory and used via the configuration variable invenio_files_rest.config.FILES_REST_PERMISSION_FACTORY.

See invenio_files_rest.permissions for more documentation.


When serving files, you will have to take into account any security implications. Here you can find some recommendations to mitigate possible vulnerabilities, such as Cross-Site Scripting (XSS):

  1. If possible, serve user uploaded files from a separate domain (not a subdomain).

  2. By default, Invenio-Files-REST sets some response headers to prevent the browser from rendering and executing HTML files. See invenio_files_rest.helpers.send_stream() for more information.

  3. Prefer file download instead of allowing the browser to preview any file, by adding the ?download URL query argument


Invenio-Files-REST supports signals that can be used to react to events.

Events are sent whenever a file is downloaded, uploaded or deleted.

As an example, let’s listen to the file download event:

from invenio_files_rest.signals import file_downloaded

def after_file_downloaded(event, sender_app, obj=None, **kwargs):
    print("File downloaded {0}".format(obj))

listener = file_downloaded.connect(after_file_downloaded)
# Request to download a file for the event to trigger

See invenio_files_rest.signals for more documentation.


Invenio-Files-REST computes and stores checksums when files are uploaded and it allows you to set up periodic tasks to regularly re-validate files integrity.

By default, it uses MD5 to compute checksums. You can override this by subclassing

You can use the tasks invenio_files_rest.tasks.verify_checksum() and invenio_files_rest.tasks.schedule_checksum_verification() to set up periodic tasks to perform checksum verifications on single files or batches and provide reports.

Let’s create a periodic task to compute checksums:

    'file-checks': {
       'task': 'invenio_files_rest.tasks.schedule_checksum_verification',
       'schedule': timedelta(hours=1),

By default, invenio_files_rest.tasks.schedule_checksum_verification() will generate batches of files to check using some predefined constraints, in order to throttle the execution rate of the checks. It will then spawn a celery task invenio_files_rest.tasks.verify_checksum() for each of the file in the set.

You can customize most of these parameters by passing the method arguments to the schedule definition.

Keep in mind that you need to have celerybeat running.

Storage Backends

Invenio-Files-REST provides a default implementation of storage factory used when performing operation on files in the defined locations. The PyFSFileStorage class uses PyFilesystem to access the file system.

Build your own Storage Backend

In order to use a different storage backend, you can implement the interface.

Mandatory methods to implement:

  • initialize

  • open

  • save

  • update

  • delete

Optional methods to implement:

  • send_file

  • checksum

  • copy

  • _init_hash

  • _compute_checksum

  • _write_stream

Then, you will have to re-implement a storage factory in a similar way as the default and set configuration variable invenio_files_rest.config.FILES_REST_STORAGE_FACTORY.

JS Uploaders

Some JS uploaders do not allow you to customize the HTTP request that is sent to the REST APIs when uploading a file. If the default implementation provided by Invenio-Files-REST is not compatible, you will have to implement your own custom factory to adapt the JS uploader request to Invenio-Files-REST.

When using the AngularJS uploader ng-file-upload, Invenio-Files-REST already provides a compatible factory, invenio_files_rest.views.ngfileupload_uploadfactory().

If you have to create a new custom factory, you have to:

  1. Create your own factory similar to invenio_files_rest.views.ngfileupload_uploadfactory().

2. Instruct Invenio-Files-REST to use it by setting the configuration variables invenio_files_rest.config.FILES_REST_MULTIPART_PART_FACTORIES and invenio_files_rest.config.FILES_REST_UPLOAD_FACTORIES

Multipart Upload

You might want to optimize upload in case of large files. Invenio-Files-REST allows you to upload parts of the same file in parallel via multiparts uploads.

A multipart upload requires that each part of the file has the same size, except for the last one that can be smaller. Each part can be uploaded at the same time and at the end of the process all parts are merged into one single file.

In case of failure when uploading one of the parts, the operation is completely aborted and all parts are deleted.

With Invenio-Files-REST, the multipart upload consists of 3 actions:

  • An initial request to initiate the upload and obtain an id to be used for each part upload.

  • A series of requests to upload of each part specifying the part number to correctly merge the file at the end.

  • A final request to to merge all parts together.

Let’s see an example. Let’s create an 11 MB file which will then be split into 2 chunks using the linux split command:

$ dd if=/dev/urandom of=my_file.txt bs=1048576 count=11

$ split -b6291456 my_file.txt segment_

Create a new bucket:

$ curl -X POST http://localhost:5000/api/files



Now, let’s initiate the multipart upload. Notice the URL query argument that specify total size and each part size:

$ B=c896d17b-0e7d-44b3-beba-7e43b0b1a7a4

$ curl -i -X POST \

Notice the upload id in the response:




Now, let’s upload each part in parallel. Notice the uploadId and partNumber URL query arguments:

$ U=a85b1cbd-4080-4c81-a95c-b4df5d1b615f

$ curl -i -X PUT --data-binary @segment_aa \

$ curl -i -X PUT --data-binary @segment_ab \

Complete the multipart upload:

$ curl -i -X POST \

You can also abort a multipart upload (and delete all uploaded parts):

$ curl -i -X DELETE \

Multiparts uploads limits can be controlled via configuration variables:

Large Files

By default, Flask and your web server have a limit on the maximum size of the upload files. Normally, when the max size is exceeded, the server will return a response code 413 (Request Entity Too Large).

You can adjust these configurations according to your needs.

For Flask, specify MAX_CONTENT_LENGTH configuration variable. Be aware that if the request does not specify a CONTENT_LENGTH, no data will be read. To change the max size, you can for example:

$ app.config['MAX_CONTENT_LENGTH'] = 25 * 1024 * 1024

Here is an example for Nginx web server. If you are using another web server, please check the related documentation.

http {
    client_max_body_size 25M;

Data Migration

When you already have an instance running with a certain amount of uploaded data, you might have the need to migrate the data to a different, larger or more efficient physical location. It can involve your entire set of files or just a part of it.

Note that files migration can be performed with no downtime and in a completely transparent way for the user.

The steps to perform a complete migration are the followings:

  1. Create the new Location in the database with the URI of your new location and set it to default = True. In this way, new Buckets will use the new default location.

  2. Change all existing buckets locations in the database to the new one. By doing this, any new file uploaded to the existing bucket will be stored in the new location.

  3. For each FileInstance, run the asynchronous task invenio_files_rest.tasks.migrate_file() passing the new location.

The asynchronous task invenio_files_rest.tasks.migrate_file() will create a new FileInstance and copy the file content to the new location. It will then change each ObjectVersion that have a reference to the old FileInstance to reference the new FileInstance and eventually run an integrity check.