API Docs

Files download/upload REST API similar to S3 for Invenio.

class invenio_files_rest.ext.InvenioFilesREST(app=None)[source]

Invenio-Files-REST extension.

Extension initialization.

init_app(app)[source]

Flask application initialization.

init_config(app)[source]

Initialize configuration.

Models

Models for Invenio-Files-REST.

The entities of this module consists of:

  • Buckets - Identified by UUIDs, and contains objects.
  • Buckets tags - Identified uniquely with a bucket by a key. Used to store extra metadata for a bucket.
  • Objects - Identified uniquely within a bucket by string keys. Each object can have multiple object versions (note: Objects do not have their own database table).
  • Object versions - Identified by UUIDs and belongs to one specific object in one bucket. Each object version has zero or one file instance. If the object version has no file instance, it is considered a delete marker.
  • File instance - Identified by UUIDs. Represents a physical file on disk. The location of the file is specified via a URI. A file instance can have many object versions.
  • Locations - A bucket belongs to a specific location. Locations can be used to represent e.g. different storage systems.
  • Multipart Objects - Identified by UUIDs and belongs to a specific bucket and key.
  • Part object - Identified by their multipart object and a part number.

The actual file access is handled by a storage interface. Also, objects do not have their own model, but are represented via the ObjectVersion model.

class invenio_files_rest.models.Bucket(**kwargs)[source]

Model for storing buckets.

A bucket is a container of objects. Buckets have a default location and storage class. Individual objects in the bucket can however have different locations and storage classes.

A bucket can be marked as deleted. A bucket can also be marked as locked to prevent operations on the bucket.

Each bucket can also define a quota. The size of a bucket is the size of all objects in the bucket (including all versions).

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

classmethod all()[source]

Return query of all buckets (excluding deleted).

classmethod create(location=None, storage_class=None, **kwargs)[source]

Create a bucket.

Parameters:
  • location – Location of a bucket (instance or name). Default: Default location.
  • storage_class – Storage class of a bucket. Default: Default storage class.
  • **kwargs – Keyword arguments are forwarded to the class
  • **kwargs – Keyword arguments are forwarded to the class constructor.
Returns:

Created bucket.

default_location

Default location.

default_storage_class

Default storage class.

classmethod delete(bucket_id)[source]

Delete a bucket.

Does not actually delete the Bucket, just marks it as deleted.

deleted

Delete state of bucket.

classmethod get(bucket_id)[source]

Get a bucket object (excluding deleted).

Parameters:bucket_id – Bucket identifier.
Returns:Bucket instance.
get_tags()[source]

Get tags for bucket as dictionary.

id

Bucket identifier.

location

Location associated with this bucket.

locked

Lock state of bucket.

Modifications are not allowed on a locked bucket.

max_file_size

Maximum size of a single file in the bucket.

Usage of this property depends on which file size limiters are installed.

quota_left

Get how much space is left in the bucket.

quota_size

Quota size of bucket.

Usage of this property depends on which file size limiters are installed.

remove()[source]

Permanently remove a bucket and all objects (including versions).

Warning

This by-passes the normal versioning and should only be used when you want to permanently delete a bucket and its objects. Otherwise use Bucket.delete().

Note the method does not remove the associated file instances which must be garbage collected.

Returns:self.
size

Size of bucket.

This is a computed property which can rebuilt any time from the objects inside the bucket.

size_limit

Get size limit for this bucket.

The limit is based on the minimum output of the file size limiters.

snapshot(lock=False)[source]

Create a snapshot of latest objects in bucket.

Parameters:lock – Create the new bucket in a locked state.
Returns:Newly created bucket containing copied ObjectVersion.
sync(bucket, delete_extras=False)[source]

Sync self bucket ObjectVersions to the destination bucket.

The bucket is fully mirrored with the destination bucket following the logic:

  • same ObjectVersions are not touched
  • new ObjectVersions are added to destination
  • deleted ObjectVersions are deleted in destination
  • extra ObjectVersions in dest are deleted if delete_extras param is True
Parameters:
  • bucket – The destination bucket.
  • delete_extras – Delete extra ObjectVersions in destination if True.
Returns:

The bucket with an exact copy of ObjectVersions in self.

validate_storage_class(key, default_storage_class)[source]

Validate storage class.

class invenio_files_rest.models.BucketTag(**kwargs)[source]

Model for storing tags associated to buckets.

This is useful to store extra information for a bucket.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

bucket

Relationship to buckets.

classmethod create(bucket, key, value)[source]

Create a new tag for bucket.

classmethod create_or_update(bucket, key, value)[source]

Create or update a new tag for bucket.

classmethod delete(bucket, key)[source]

Delete a tag.

classmethod get(bucket, key)[source]

Get tag object.

classmethod get_value(bucket, key)[source]

Get tag value.

key

Tag key.

value

Tag value.

class invenio_files_rest.models.FileInstance(**kwargs)[source]

Model for storing files.

A file instance represents a file on disk. A file instance may be linked from many objects, while an object can have one and only one file instance.

A file instance also records the storage class, size and checksum of the file on disk.

Additionally, a file instance can be read only in case the storage layer is not capable of writing to the file (e.g. can typically be used to link to files on externally controlled storage).

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

checksum

String representing the checksum of the object.

clear_last_check()[source]

Clear the checksum of the file.

copy_contents(fileinstance, progress_callback=None, chunk_size=None, **kwargs)[source]

Copy this file instance into another file instance.

classmethod create()[source]

Create a file instance.

Note, object is only added to the database session.

delete()[source]

Delete a file instance.

The file instance can be deleted if it has no references from other objects. The caller is responsible to test if the file instance is writable and that the disk file can actually be removed.

Note

Normally you should use the Celery task to delete a file instance, as this method will not remove the file on disk.

classmethod get(file_id)[source]

Get a file instance.

classmethod get_by_uri(uri)[source]

Get a file instance by URI.

id

Identifier of file.

init_contents(size=0, **kwargs)[source]

Initialize file.

last_check

Result of last fixity check.

last_check_at

Timestamp of last fixity check.

readable

Defines if the file is read only.

send_file(filename, restricted=True, mimetype=None, trusted=False, chunk_size=None, as_attachment=False, **kwargs)[source]

Send file to client.

set_contents(stream, chunk_size=None, size=None, size_limit=None, progress_callback=None, **kwargs)[source]

Save contents of stream to this file.

Parameters:
  • obj – ObjectVersion instance from where this file is accessed from.
  • stream – File-like stream.
set_uri(uri, size, checksum, readable=True, writable=False, storage_class=None)[source]

Set a location of a file.

size

Size of file.

storage(**kwargs)[source]

Get storage interface for object.

Uses the applications storage factory to create a storage interface that can be used for this particular file instance.

Returns:Storage interface.
storage_class

Storage class of file.

update_checksum(progress_callback=None, chunk_size=None, checksum_kwargs=None, **kwargs)[source]

Update checksum based on file.

update_contents(stream, seek=0, size=None, chunk_size=None, progress_callback=None, **kwargs)[source]

Save contents of stream to this file.

Parameters:
  • obj – ObjectVersion instance from where this file is accessed from.
  • stream – File-like stream.
uri

Location of file.

validate_uri(key, uri)[source]

Validate uri.

verify_checksum(progress_callback=None, chunk_size=None, throws=True, checksum_kwargs=None, **kwargs)[source]

Verify checksum of file instance.

Parameters:
  • throws (bool) – If True, exceptions raised during checksum calculation will be re-raised after logging. If set to False, and an exception occurs, the last_check field is set to None (last_check_at of course is updated), since no check actually was performed.
  • checksum_kwargs (dict) – Passed as **kwargs` to storage().checksum.
writable

Defines if file is writable.

This property is used to create a file instance prior to having the actual file at the given URI. This is useful when e.g. copying a file instance.

class invenio_files_rest.models.Location(**kwargs)[source]

Model defining base locations.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

classmethod all()[source]

Return query that fetches all locations.

default

True if the location is the default location.

At least one location should be the default location.

classmethod get_by_name(name)[source]

Fetch a specific location object.

classmethod get_default()[source]

Fetch the default location object.

id

Internal identifier for locations.

The internal identifier is used only used as foreign key for buckets in order to decrease storage requirements per row for buckets.

name

External identifier of the location.

uri

URI of the location.

validate_name(key, name)[source]

Validate name.

class invenio_files_rest.models.MultipartObject(**kwargs)[source]

Model for storing files in chunks.

A multipart object belongs to a specific bucket and key and is identified by an upload id. You can have multiple multipart uploads for the same bucket and key. Once all parts of a multipart object is uploaded, the state is changed to completed. Afterwards it is not possible to upload new parts. Once completed, the multipart object is merged, and added as a new version in the current object/bucket.

All parts for a multipart upload must be of the same size, except for the last part.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

bucket

Relationship to buckets.

bucket_id

Bucket identifier.

chunk_size

Size of chunks for file.

complete()[source]

Mark a multipart object as complete.

completed

Defines if object is the completed.

classmethod create(bucket, key, size, chunk_size)[source]

Create a new object in a bucket.

delete()[source]

Delete a multipart object.

expected_part_size(part_number)[source]

Get expected part size for a particular part number.

file

Relationship to buckets.

file_id

File instance for this multipart object.

classmethod get(bucket, key, upload_id, with_completed=False)[source]

Fetch a specific multipart object.

static is_valid_chunksize(chunk_size)[source]

Check if size is valid.

static is_valid_size(size, chunk_size)[source]

Validate max theoretical size.

key

Key identifying the object.

last_part_number

Get last part number.

last_part_size

Get size of last part.

merge_parts(version_id=None, **kwargs)[source]

Merge parts into object version.

classmethod query_by_bucket(bucket)[source]

Query all uncompleted multipart uploads.

classmethod query_expired(dt, bucket=None)[source]

Query all uncompleted multipart uploads.

size

Size of file.

upload_id

Identifier for the specific version of an object.

validate_key(key, key_)[source]

Validate key.

class invenio_files_rest.models.ObjectVersion(**kwargs)[source]

Model for storing versions of objects.

A bucket stores one or more objects identified by a key. Each object is versioned where each version is represented by an ObjectVersion.

An object version can either be 1) a normal version which is linked to a file instance, or 2) a delete marker, which is not linked to a file instance.

An normal object version is linked to a physical file on disk via a file instance. This allows for multiple object versions to point to the same file on disk, to optimize storage efficiency (e.g. useful for snapshotting an entire bucket without duplicating the files).

A delete marker object version represents that the object at hand was deleted.

The latest version of an object is marked using the is_head property. If the latest object version is a delete marker the object will not be shown in the bucket.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

basename

Return filename of the object.

bucket

Relationship to buckets.

bucket_id

Bucket identifier.

copy(bucket=None, key=None)[source]

Copy an object version to a given bucket + object key.

The copy operation is handled completely at the metadata level. The actual data on disk is not copied. Instead, the two object versions will point to the same physical file (via the same FileInstance).

All the tags associated with the current object version are copied over to the new instance.

Warning

If the destination object exists, it will be replaced by the new object version which will become the latest version.

Parameters:
  • bucket – The bucket (instance or id) to copy the object to. Default: current bucket.
  • key – Key name of destination object. Default: current object key.
Returns:

The copied object version.

classmethod create(bucket, key, _file_id=None, stream=None, mimetype=None, version_id=None, **kwargs)[source]

Create a new object in a bucket.

The created object is by default created as a delete marker. You must use set_contents() or set_location() in order to change this.

Parameters:
  • bucket – The bucket (instance or id) to create the object in.
  • key – Key of object.
  • _file_id – For internal use.
  • stream – File-like stream object. Used to set content of object immediately after being created.
  • mimetype – MIME type of the file object if it is known.
  • kwargs – Keyword arguments passed to Object.set_contents().
classmethod delete(bucket, key)[source]

Delete an object.

Technically works by creating a new version which works as a delete marker.

Parameters:
  • bucket – The bucket (instance or id) to delete the object from.
  • key – Key of object.
Returns:

Created delete marker object if key exists else None.

deleted

Determine if object version is a delete marker.

file

Relationship to file instance.

file_id

File instance for this object version.

A null value in this column defines that the object has been deleted.

classmethod get(bucket, key, version_id=None)[source]

Fetch a specific object.

By default the latest object version is returned, if version_id is not set.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – Key of object.
  • version_id – Specific version of an object.
classmethod get_by_bucket(bucket, versions=False, with_deleted=False)[source]

Return query that fetches all the objects in a bucket.

Parameters:
  • bucket – The bucket (instance or id) to query.
  • versions – Select all versions if True, only heads otherwise.
  • with_deleted – Select also deleted objects if True.
Returns:

The query to retrieve filtered objects in the given bucket.

get_tags()[source]

Get tags for object version as dictionary.

classmethod get_versions(bucket, key, desc=True)[source]

Fetch all versions of a specific object.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – Key of object.
  • desc – Sort results desc if True, asc otherwise.
Returns:

The query to execute to fetch all versions.

is_head

Defines if object is the latest version.

key

Key identifying the object.

mimetype

Get MIME type of object.

Relink all object versions (for a given file) to a new file.

Warning

Use this method with great care.

remove()[source]

Permanently remove a specific object version from the database.

Warning

This by-passes the normal versioning and should only be used when you want to permanently delete a specific object version. Otherwise use ObjectVersion.delete().

Note the method does not remove the associated file instance which must be garbage collected.

Returns:self.
restore()[source]

Restore this object version to become the latest version.

Raises an exception if the object is the latest version.

send_file(restricted=True, trusted=False, **kwargs)[source]

Wrap around FileInstance’s send file.

set_contents(stream, chunk_size=None, size=None, size_limit=None, progress_callback=None)[source]

Save contents of stream to file instance.

If a file instance has already been set, this methods raises an FileInstanceAlreadySetError exception.

Parameters:
  • stream – File-like stream.
  • size – Size of stream if known.
  • chunk_size – Desired chunk size to read stream in. It is up to the storage interface if it respects this value.
set_file(fileinstance)[source]

Set a file instance.

set_location(uri, size, checksum, storage_class=None)[source]

Set only URI location of for object.

Useful to link files on externally controlled storage. If a file instance has already been set, this methods raises an FileInstanceAlreadySetError exception.

Parameters:
  • uri – Full URI to object (which can be interpreted by the storage interface).
  • size – Size of file.
  • checksum – Checksum of file.
  • storage_class – Storage class where file is stored ()
validate_key(key, key_)[source]

Validate key.

version_id

Identifier for the specific version of an object.

class invenio_files_rest.models.ObjectVersionTag(**kwargs)[source]

Model for storing tags associated to object versions.

Used for storing extra technical information for an object version.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

copy(object_version=None, key=None)[source]

Copy a tag to a given object version.

Parameters:
  • object_version – The object version instance to copy the tag to. Default: current object version.
  • key – Key of destination tag. Default: current tag key.
Returns:

The copied object version tag.

classmethod create(object_version, key, value)[source]

Create a new tag for a given object version.

classmethod create_or_update(object_version, key, value)[source]

Create or update a new tag for a given object version.

classmethod delete(object_version, key=None)[source]

Delete tags.

Parameters:
  • object_version – The object version instance or id.
  • key – Key of the tag to delete. Default: delete all tags.
classmethod get(object_version, key)[source]

Get the tag object.

classmethod get_value(object_version, key)[source]

Get the tag value.

key

Tag key.

object_version

Relationship to object versions.

value

Tag value.

version_id

Object version id.

class invenio_files_rest.models.Part(**kwargs)[source]

Part object.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

checksum

String representing the checksum of the part.

classmethod count(mp)[source]

Count number of parts for a given multipart object.

classmethod create(mp, part_number, stream=None, **kwargs)[source]

Create a new part object in a multipart object.

classmethod delete(mp, part_number)[source]

Get part number.

end_byte

Get end byte in file for this part.

classmethod get_or_create(mp, part_number)[source]

Get or create a part.

classmethod get_or_none(mp, part_number)[source]

Get part number.

multipart

Relationship to multipart objects.

part_number

Part number.

part_size

Get size of this part.

classmethod query_by_multipart(multipart)[source]

Get all parts for a specific multipart upload.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A invenio_files_rest.models.Part instance.
set_contents(stream, progress_callback=None)[source]

Save contents of stream to part of file instance.

If a the MultipartObject is completed this methods raises an MultipartAlreadyCompleted exception.

Parameters:
  • stream – File-like stream.
  • size – Size of stream if known.
  • chunk_size – Desired chunk size to read stream in. It is up to the storage interface if it respects this value.
start_byte

Get start byte in file of this part.

upload_id

Multipart object identifier.

Storage

File storage interface.

class invenio_files_rest.storage.FileStorage(size=None, modified=None)[source]

Base class for storage interface to a single file.

Initialize storage object.

checksum(chunk_size=None, progress_callback=None, **kwargs)[source]

Compute checksum of file.

copy(src, chunk_size=None, progress_callback=None)[source]

Copy data from another file instance.

Parameters:
  • src – Source stream.
  • chunk_size – Chunk size to read from source stream.
delete()[source]

Delete the file.

initialize(size=0)[source]

Initialize the file on the storage + truncate to the given size.

open(mode=None)[source]

Open the file.

The caller is responsible for closing the file.

save(incoming_stream, size_limit=None, size=None, chunk_size=None, progress_callback=None)[source]

Save incoming stream to file storage.

send_file(filename, mimetype=None, restricted=True, checksum=None, trusted=False, chunk_size=None, as_attachment=False)[source]

Send the file to the client.

update(incoming_stream, seek=0, size=None, chunk_size=None, progress_callback=None)[source]

Update part of file with incoming stream.

invenio_files_rest.storage.pyfs_storage_factory(fileinstance=None, default_location=None, default_storage_class=None, filestorage_class=<class 'invenio_files_rest.storage.pyfs.PyFSFileStorage'>, fileurl=None, size=None, modified=None, clean_dir=True)[source]

Get factory function for creating a PyFS file storage instance.

class invenio_files_rest.storage.PyFSFileStorage(fileurl, size=None, modified=None, clean_dir=True)[source]

File system storage using PyFilesystem for access the file.

This storage class will store files according to the following pattern: <base_uri>/<file instance uuid>/data.

Warning

File operations are not atomic. E.g. if errors happens during e.g. updating part of a file it will leave the file in an inconsistent state. The storage class tries as best as possible to handle errors and leave the system in a consistent state.

Storage initialization.

delete()[source]

Delete a file.

The base directory is also removed, as it is assumed that only one file exists in the directory.

initialize(size=0)[source]

Initialize file on storage and truncate to given size.

open(mode='rb')[source]

Open file.

The caller is responsible for closing the file.

save(incoming_stream, size_limit=None, size=None, chunk_size=None, progress_callback=None)[source]

Save file in the file system.

update(incoming_stream, seek=0, size=None, chunk_size=None, progress_callback=None)[source]

Update a file in the file system.

Signals

Models for Invenio-Files-REST.

invenio_files_rest.signals.file_deleted = <blinker.base.NamedSignal object at 0x7fafe961fc88; 'file-deleted'>

File deleted signal.

Sent when a file is deleted.

invenio_files_rest.signals.file_downloaded = <blinker.base.NamedSignal object at 0x7fafe961fc50; 'file-downloaded'>

File downloaded signal.

Sent when a file is downloaded.

invenio_files_rest.signals.file_uploaded = <blinker.base.NamedSignal object at 0x7fafe961fbe0; 'file-uploaded'>

File uploaded signal.

Sent when a file is uploaded.

File streaming

File serving helpers for Files REST API.

invenio_files_rest.helpers.MIMETYPE_WHITELIST = {'audio/mpeg', 'audio/ogg', 'audio/wav', 'audio/webm', 'image/gif', 'image/jpeg', 'image/png', 'image/tiff', 'text/plain'}

List of whitelisted MIME types.

Warning

Do not add new types to this list unless you know what you are doing. You could potentially open up for XSS attacks.

invenio_files_rest.helpers.chunk_size_or_default(chunk_size)[source]

Use default chunksize if not configured.

invenio_files_rest.helpers.compute_checksum(stream, algo, message_digest, chunk_size=None, progress_callback=None)[source]

Get helper method to compute checksum from a stream.

Parameters:
  • stream – File-like object.
  • algo – Identifier for checksum algorithm.
  • messsage_digest – A message digest instance.
  • chunk_size – Read at most size bytes from the file at a time.
  • progress_callback – Function accepting one argument with number of bytes read. (Default: None)
Returns:

The checksum.

invenio_files_rest.helpers.compute_md5_checksum(stream, **kwargs)[source]

Get helper method to compute MD5 checksum from a stream.

Parameters:stream – The input stream.
Returns:The MD5 checksum.
invenio_files_rest.helpers.create_file_streaming_redirect_response(obj)[source]

Redirect response generating function.

invenio_files_rest.helpers.make_path(base_uri, path, filename, path_dimensions, split_length)[source]

Generate a path as base location for file instance.

Parameters:
  • base_uri – The base URI.
  • path – The relative path.
  • path_dimensions – Number of chunks the path should be split into.
  • split_length – The length of any chunk.
Returns:

A string representing the full path.

invenio_files_rest.helpers.populate_from_path(bucket, source, checksum=True, key_prefix='', chunk_size=None)[source]

Populate a bucket from all files in path.

Parameters:
  • bucket – The bucket (instance or id) to create the object in.
  • source – The file or directory path.
  • checksum – If True then a MD5 checksum will be computed for each file. (Default: True)
  • key_prefix – The key prefix for the bucket.
  • chunk_size – Chunk size to read from file.
Returns:

A iterator for all invenio_files_rest.models.ObjectVersion instances.

invenio_files_rest.helpers.sanitize_mimetype(mimetype, filename=None)[source]

Sanitize a MIME type so the browser does not render the file.

invenio_files_rest.helpers.send_stream(stream, filename, size, mtime, mimetype=None, restricted=True, as_attachment=False, etag=None, content_md5=None, chunk_size=None, conditional=True, trusted=False)[source]

Send the contents of a file to the client.

Warning

It is very easy to be exposed to Cross-Site Scripting (XSS) attacks if you serve user uploaded files. Here are some recommendations:

  1. Serve user uploaded files from a separate domain (not a subdomain). This way a malicious file can only attack other user uploaded files.
  2. Prevent the browser from rendering and executing HTML files (by setting trusted=False).
  3. Force the browser to download the file as an attachment (as_attachment=True).
Parameters:
  • stream – The file stream to send.
  • filename – The file name.
  • size – The file size.
  • mtime – A Unix timestamp that represents last modified time (UTC).
  • mimetype – The file mimetype. If None, the module will try to guess. (Default: None)
  • restricted – If the file is not restricted, the module will set the cache-control. (Default: True)
  • as_attachment – If the file is an attachment. (Default: False)
  • etag – If defined, it will be set as HTTP E-Tag.
  • content_md5 – If defined, a HTTP Content-MD5 header will be set.
  • chunk_size – The chunk size.
  • conditional – Make the response conditional to the request. (Default: True)
  • trusted – Do not enable this option unless you know what you are doing. By default this function will send HTTP headers and MIME types that prevents your browser from rendering e.g. a HTML file which could contain a malicious script tag. (Default: False)
Returns:

A Flask response instance.

Tasks

Celery tasks for Invenio-Files-REST.

invenio_files_rest.tasks.default_checksum_verification_files_query()[source]

Return a query of valid FileInstances for checksum verification.

(task)invenio_files_rest.tasks.merge_multipartobject(upload_id, version_id=None)[source]

Merge multipart object.

Parameters:
Returns:

The invenio_files_rest.models.ObjectVersion version ID.

(task)invenio_files_rest.tasks.migrate_file(src_id, location_name, post_fixity_check=False)[source]

Task to migrate a file instance to a new location.

Note

If something goes wrong during the content copy, the destination file instance is removed.

Parameters:
invenio_files_rest.tasks.progress_updater(size, total)[source]

Progress reporter for checksum verification.

(task)invenio_files_rest.tasks.remove_expired_multipartobjects[source]

Remove expired multipart objects.

(task)invenio_files_rest.tasks.remove_file_data(file_id, silent=True)[source]

Remove file instance and associated data.

Parameters:
Raises:

sqlalchemy.exc.IntegrityError – Raised if the database removal goes wrong and silent is set to False.

(task)invenio_files_rest.tasks.schedule_checksum_verification(frequency=None, batch_interval=None, max_count=None, max_size=None, files_query=None, checksum_kwargs=None)[source]

Schedule a batch of files for checksum verification.

The purpose of this task is to be periodically called through celerybeat, in order achieve a repeated verification cycle of all file checksums, while following a set of constraints in order to throttle the execution rate of the checks.

Parameters:
  • frequency (dict) – Time period over which a full check of all files should be performed. The argument is a dictionary that will be passed as arguments to the datetime.timedelta class. Defaults to a month (30 days).
  • batch_interval (dict) – How often a batch is sent. If not supplied, this information will be extracted, if possible, from the celery.conf[‘CELERYBEAT_SCHEDULE’] entry of this task. The argument is a dictionary that will be passed as arguments to the datetime.timedelta class.
  • max_count (int) – Max count of files of a single batch. When set to 0 it’s automatically calculated to be distributed equally through the number of total batches.
  • max_size (int) – Max size of a single batch in bytes. When set to 0 it’s automatically calculated to be distributed equally through the number of total batches.
  • files_query (str) – Import path for a function returning a FileInstance query for files that should be checked.
  • checksum_kwargs (dict) – Passed to FileInstance.verify_checksum.
(task)invenio_files_rest.tasks.verify_checksum(file_id, pessimistic=False, chunk_size=None, throws=True, checksum_kwargs=None)[source]

Verify checksum of a file instance.

Parameters:file_id – The file ID.

Exceptions

Errors for Invenio-Files-REST.

exception invenio_files_rest.errors.BucketLockedError(errors=None, **kwargs)[source]

Exception raised when a bucket is locked.

Initialize RESTException.

exception invenio_files_rest.errors.DuplicateTagError(errors=None, **kwargs)[source]

Invalid tag key and/or value.

Initialize RESTException.

exception invenio_files_rest.errors.ExhaustedStreamError(errors=None, **kwargs)[source]

The incoming file stream has been already consumed.

Initialize RESTException.

exception invenio_files_rest.errors.FileInstanceAlreadySetError(errors=None, **kwargs)[source]

Exception raised when file instance already set on object.

Initialize RESTException.

exception invenio_files_rest.errors.FileInstanceUnreadableError(errors=None, **kwargs)[source]

Exception raised when trying to get an unreadable file.

Initialize RESTException.

exception invenio_files_rest.errors.FileSizeError(errors=None, **kwargs)[source]

Exception raised when a file larger than allowed.

Initialize RESTException.

exception invenio_files_rest.errors.FilesException(errors=None, **kwargs)[source]

Base exception for all errors.

Initialize RESTException.

exception invenio_files_rest.errors.InvalidKeyError(errors=None, **kwargs)[source]

Invalid key.

Initialize RESTException.

exception invenio_files_rest.errors.InvalidOperationError(errors=None, **kwargs)[source]

Exception raised when an invalid operation is performed.

Initialize RESTException.

exception invenio_files_rest.errors.InvalidTagError(errors=None, **kwargs)[source]

Invalid tag key and/or value.

Initialize RESTException.

exception invenio_files_rest.errors.MissingQueryParameter(arg_name, **kwargs)[source]

Exception raised when missing a query parameter.

Initialize RESTException.

get_description(environ=None)[source]

Get the description.

exception invenio_files_rest.errors.MultipartAlreadyCompleted(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartException(errors=None, **kwargs)[source]

Exception for multipart objects.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartInvalidChunkSize(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartInvalidPartNumber(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartInvalidSize(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartMissingParts(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartNoPart(errors=None, **kwargs)[source]

Exception raised by part factories when no part was detected.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartNotCompleted(errors=None, **kwargs)[source]

Exception raised when multipart object is not already completed.

Initialize RESTException.

exception invenio_files_rest.errors.StorageError(errors=None, **kwargs)[source]

Exception raised when a storage operation fails.

Initialize RESTException.

get_errors()[source]

Get errors.

Returns:A string with the error message.
exception invenio_files_rest.errors.UnexpectedFileSizeError(errors=None, **kwargs)[source]

Exception raised when a file does not match its expected size.

Initialize RESTException.

Limiters

File size limiting functionality for Invenio-Files-REST.

class invenio_files_rest.limiters.FileSizeLimit(limit, reason)[source]

File size limiter.

Instantiate a new file size limit.

Parameters:
  • limit – The imposed imposed limit.
  • reason – The limit description.
invenio_files_rest.limiters.file_size_limiters(bucket)[source]

Get default file size limiters.

Parameters:bucket – The invenio_files_rest.models.Bucket instance.
Returns:A list containing an instance of invenio_files_rest.limiters.FileSizeLimit with quota left value and description and another one with max file size value and description.

Permissions

Permissions for files using Invenio-Access.

invenio_files_rest.permissions.BucketListMultiparts = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-bucket-listmultiparts')

Action needed: list multipart uploads in bucket.

invenio_files_rest.permissions.BucketRead = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-bucket-read')

Action needed: list objects in bucket.

invenio_files_rest.permissions.BucketReadVersions = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-bucket-read-versions')

Action needed: list object versions in bucket.

invenio_files_rest.permissions.BucketUpdate = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-bucket-update')

Action needed: create objects and multipart uploads in bucket.

invenio_files_rest.permissions.LocationUpdate = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-location-update')

Action needed: location update.

invenio_files_rest.permissions.MultipartDelete = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-multipart-delete')

Action needed: abort a multipart upload.

invenio_files_rest.permissions.MultipartRead = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-multipart-read')

Action needed: list parts of a multipart upload in a bucket.

invenio_files_rest.permissions.ObjectDelete = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-object-delete')

Action needed: delete object in bucket.

invenio_files_rest.permissions.ObjectDeleteVersion = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-object-delete-version')

Action needed: permanently delete specific object version in bucket.

invenio_files_rest.permissions.ObjectRead = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-object-read')

Action needed: get object in bucket.

invenio_files_rest.permissions.ObjectReadVersion = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-object-read-version')

Action needed: get object version in bucket.

invenio_files_rest.permissions.bucket_listmultiparts_all = Need(method='action', value='files-rest-bucket-listmultiparts', argument=None)

Action needed: list all buckets multiparts.

invenio_files_rest.permissions.bucket_read_all = Need(method='action', value='files-rest-bucket-read', argument=None)

Action needed: read all buckets.

invenio_files_rest.permissions.bucket_read_versions_all = Need(method='action', value='files-rest-bucket-read-versions', argument=None)

Action needed: read all buckets versions.

invenio_files_rest.permissions.bucket_update_all = Need(method='action', value='files-rest-bucket-update', argument=None)

Action needed: update all buckets

invenio_files_rest.permissions.location_update_all = Need(method='action', value='files-rest-location-update', argument=None)

Action needed: update all locations.

invenio_files_rest.permissions.multipart_delete_all = Need(method='action', value='files-rest-multipart-delete', argument=None)

Action needed: delete all multiparts.

invenio_files_rest.permissions.multipart_read_all = Need(method='action', value='files-rest-multipart-read', argument=None)

Action needed: read all multiparts.

invenio_files_rest.permissions.object_delete_all = Need(method='action', value='files-rest-object-delete', argument=None)

Action needed: delete all objects.

invenio_files_rest.permissions.object_delete_version_all = Need(method='action', value='files-rest-object-delete-version', argument=None)

Action needed: delete all objects versions.

invenio_files_rest.permissions.object_read_all = Need(method='action', value='files-rest-object-read', argument=None)

Action needed: read all objects.

invenio_files_rest.permissions.object_read_version_all = Need(method='action', value='files-rest-object-read-version', argument=None)

Action needed: read all objects versions.

invenio_files_rest.permissions.permission_factory(obj, action)[source]

Get default permission factory.

Parameters:
Raises:

RuntimeError – If the object is unknown.

Returns:

A invenio_access.permissions.Permission instance.

Serializers

REST API serializers.

class invenio_files_rest.serializer.BaseSchema(*, only: Union[Sequence[str], Set[str]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Dict[KT, VT] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: str = None)[source]

Base schema for all serializations.

Get base links.

class invenio_files_rest.serializer.BucketSchema(*, only: Union[Sequence[str], Set[str]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Dict[KT, VT] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: str = None)[source]

Schema for bucket.

Dump links.

class invenio_files_rest.serializer.MultipartObjectSchema(*, only: Union[Sequence[str], Set[str]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Dict[KT, VT] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: str = None)[source]

Schema for ObjectVersions.

Dump links.

class invenio_files_rest.serializer.ObjectVersionSchema(*, only: Union[Sequence[str], Set[str]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Dict[KT, VT] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: str = None)[source]

Schema for ObjectVersions.

Dump links.

dump_tags(o)[source]

Dump tags.

wrap(data, many)[source]

Wrap response in envelope.

class invenio_files_rest.serializer.PartSchema(*, only: Union[Sequence[str], Set[str]] = None, exclude: Union[Sequence[str], Set[str]] = (), many: bool = False, context: Dict[KT, VT] = None, load_only: Union[Sequence[str], Set[str]] = (), dump_only: Union[Sequence[str], Set[str]] = (), partial: Union[bool, Sequence[str], Set[str]] = False, unknown: str = None)[source]

Schema for parts.

wrap(data, many)[source]

Wrap response in envelope.

invenio_files_rest.serializer.json_serializer(data=None, code=200, headers=None, context=None, etag=None, task_result=None, serializer_mapping={<class 'invenio_files_rest.models.Bucket'>: <class 'invenio_files_rest.serializer.BucketSchema'>, <class 'invenio_files_rest.models.ObjectVersion'>: <class 'invenio_files_rest.serializer.ObjectVersionSchema'>, <class 'invenio_files_rest.models.MultipartObject'>: <class 'invenio_files_rest.serializer.MultipartObjectSchema'>, <class 'invenio_files_rest.models.Part'>: <class 'invenio_files_rest.serializer.PartSchema'>}, view_name=None)[source]

Build a json flask response using the given data.

Parameters:
  • data – The data to serialize. (Default: None)
  • code – The HTTP status code. (Default: 200)
  • headers – The HTTP headers to include. (Default: None)
  • context – The schema class context. (Default: None)
  • etag – The ETag header. (Default: None)
  • task_result – Optionally you can pass async task to wait for. (Default: None)
  • serializer_mapping – Optionally provide the serializer with a different mapping.
  • view_name – Optionally push to the marshmallow context the view name prefix,usefull in case of multiple routes pointing to the same blueprint.
Returns:

A Flask response with json data.

Return type:

flask.Response

invenio_files_rest.serializer.schema_from_context(context, serializer_mapping)[source]

Determine which schema to use.

invenio_files_rest.serializer.wait_for_taskresult(task_result, content, interval, max_rounds)[source]

Get helper to wait for async task result to finish.

The task will periodically send whitespace to prevent the connection from being closed.

Parameters:
  • task_result – The async task to wait for.
  • content – The content to return when the task is ready.
  • interval – The duration of a sleep period before check again if the task is ready.
  • max_rounds – The maximum number of intervals the function check before returning an Exception.
Returns:

An iterator on the content or a invenio_files_rest.errors.FilesException exception if the timeout happened or the job failed.

Views

Files download/upload REST API similar to S3 for Invenio.

class invenio_files_rest.views.BucketResource(*args, **kwargs)[source]

Bucket item resource.

Instantiate content negotiated view.

get(bucket=None, versions=<marshmallow.missing>, uploads=<marshmallow.missing>)[source]

Get list of objects in the bucket.

Parameters:bucket – A invenio_files_rest.models.Bucket instance.
Returns:The Flask response.
head(bucket=None, **kwargs)[source]

Check the existence of the bucket.

listobjects(bucket, versions)[source]

List objects in a bucket.

Parameters:bucket – A invenio_files_rest.models.Bucket instance.
Returns:The Flask response.
multipart_listuploads(bucket)[source]

List objects in a bucket.

Parameters:bucket – A invenio_files_rest.models.Bucket instance.
Returns:The Flask response.
class invenio_files_rest.views.LocationResource(*args, **kwargs)[source]

Service resource.

Instantiate content negotiated view.

post()[source]

Create bucket.

class invenio_files_rest.views.ObjectResource(*args, **kwargs)[source]

Object item resource.

Instantiate content negotiated view.

static check_object_permission(obj)[source]

Retrieve object and abort if it doesn’t exists.

create_object(bucket, key)[source]

Create a new object.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – The file key.
Returns:

A Flask response.

delete(bucket=None, key=None, version_id=None, upload_id=None, uploads=None)[source]

Delete an object or abort a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from. (Default: None)
  • key – The file key. (Default: None)
  • version_id – The version ID. (Default: None)
  • upload_id – The upload ID. (Default: None)
Returns:

A Flask response.

delete_object(bucket, obj, version_id)[source]

Delete an existing object.

Parameters:
Returns:

A Flask response.

get(bucket=None, key=None, version_id=None, upload_id=None, uploads=None, download=None)[source]

Get object or list parts of a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from. (Default: None)
  • key – The file key. (Default: None)
  • version_id – The version ID. (Default: None)
  • upload_id – The upload ID. (Default: None)
  • download – The download flag. (Default: None)
Returns:

A Flask response.

classmethod get_object(bucket, key, version_id)[source]

Retrieve object and abort if it doesn’t exist.

If the file is not found, the connection is aborted and the 404 error is returned.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – The file key.
  • version_id – The version ID.
Returns:

A invenio_files_rest.models.ObjectVersion instance.

multipart_complete(multipart)[source]

Complete a multipart upload.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A Flask response.
multipart_delete(multipart)[source]

Abort a multipart upload.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A Flask response.
multipart_init(bucket, key, size=None, part_size=None)[source]

Initialize a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – The file key.
  • size – The total size.
  • part_size – The part size.
Raises:

invenio_files_rest.errors.MissingQueryParameter – If size or part_size are not defined.

Returns:

A Flask response.

multipart_listparts(multipart)[source]

Get parts of a multipart upload.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A Flask response.
multipart_uploadpart(multipart)[source]

Upload a part.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A Flask response.
post(bucket=None, key=None, uploads=<marshmallow.missing>, upload_id=None)[source]

Upload a new object or start/complete a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from. (Default: None)
  • key – The file key. (Default: None)
  • upload_id – The upload ID. (Default: None)
Returns:

A Flask response.

put(bucket=None, key=None, upload_id=None)[source]

Update a new object or upload a part of a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from. (Default: None)
  • key – The file key. (Default: None)
  • upload_id – The upload ID. (Default: None)
Returns:

A Flask response.

static send_object(bucket, obj, expected_chksum=None, logger_data=None, restricted=True, as_attachment=False)[source]

Send an object for a given bucket.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • obj – A invenio_files_rest.models.ObjectVersion instance.
  • logger_data – The python logger.
  • kwargs – Keyword arguments passed to Object.send_file()
Params expected_chksum:
 

Expected checksum.

Returns:

A Flask response.

invenio_files_rest.views.as_uuid(value)[source]

Convert value to UUID.

invenio_files_rest.views.bucket_view(*args, **kwargs)

Bucket item resource.

invenio_files_rest.views.check_permission(permission, hidden=True)[source]

Check if permission is allowed.

If permission fails then the connection is aborted.

Parameters:
  • permission – The permission to check.
  • hidden – Determine if a 404 error (True) or 401/403 error (False) should be returned if the permission is rejected (i.e. hide or reveal the existence of a particular object).
invenio_files_rest.views.default_partfactory(part_number=None, content_length=None, content_type=None, content_md5=None)[source]

Get default part factory.

Parameters:
  • part_number – The part number. (Default: None)
  • content_length – The content length. (Default: None)
  • content_type – The HTTP Content-Type. (Default: None)
  • content_md5 – The content MD5. (Default: None)
Returns:

The content length, the part number, the stream, the content type, MD5 of the content.

invenio_files_rest.views.ensure_input_stream_is_not_exhausted(f)[source]

Make sure that the input stream has not been read already.

invenio_files_rest.views.invalid_subresource_validator(value)[source]

Ensure subresource.

invenio_files_rest.views.location_view(*args, **kwargs)

Service resource.

invenio_files_rest.views.minsize_validator(value)[source]

Validate Content-Length header.

Raises:invenio_files_rest.errors.FileSizeError – If the value is less than invenio_files_rest.config.FILES_REST_MIN_FILE_SIZE size.
invenio_files_rest.views.need_bucket_permission(action, hidden=True)

Get permission for buckets or abort.

Parameters:
  • object_getter – The function used to retrieve the object and pass it to the permission factory.
  • action – The action needed.
  • hidden – Determine which kind of error to return. (Default: True)
invenio_files_rest.views.need_location_permission(action, hidden=True)

Get permission for buckets or abort.

Parameters:
  • object_getter – The function used to retrieve the object and pass it to the permission factory.
  • action – The action needed.
  • hidden – Determine which kind of error to return. (Default: True)
invenio_files_rest.views.need_permissions(object_getter, action, hidden=True)[source]

Get permission for buckets or abort.

Parameters:
  • object_getter – The function used to retrieve the object and pass it to the permission factory.
  • action – The action needed.
  • hidden – Determine which kind of error to return. (Default: True)
invenio_files_rest.views.ngfileupload_partfactory(part_number=None, content_length=None, uploaded_file=None)[source]

Part factory for ng-file-upload.

Parameters:
  • part_number – The part number. (Default: None)
  • content_length – The content length. (Default: None)
  • uploaded_file – The upload request. (Default: None)
Returns:

The content length, part number, stream, HTTP Content-Type header.

invenio_files_rest.views.ngfileupload_uploadfactory(content_length=None, content_type=None, uploaded_file=None)[source]

Get default put factory.

If Content-Type is 'multipart/form-data' then the stream is aborted.

Parameters:
  • content_length – The content length. (Default: None)
  • content_type – The HTTP Content-Type. (Default: None)
  • uploaded_file – The upload request. (Default: None)
  • file_tags_header – The file tags. (Default: None)
Returns:

A tuple containing stream, content length, and empty header.

invenio_files_rest.views.object_view(*args, **kwargs)

Object item resource.

invenio_files_rest.views.parse_header_tags()[source]

Parse tags specified in the HTTP request header.

invenio_files_rest.views.pass_bucket(f)[source]

Decorate to retrieve a bucket.

invenio_files_rest.views.pass_multipart(with_completed=False)[source]

Decorate to retrieve an object.

invenio_files_rest.views.stream_uploadfactory(content_md5=None, content_length=None, content_type=None)[source]

Get default put factory.

If Content-Type is 'multipart/form-data' then the stream is aborted.

Parameters:
  • content_md5 – The content MD5. (Default: None)
  • content_length – The content length. (Default: None)
  • content_type – The HTTP Content-Type. (Default: None)
Returns:

The stream, content length, MD5 of the content.

invenio_files_rest.views.validate_tag(key, value)[source]

Validate a tag.

Keys must be less than 128 chars and values must be less than 256 chars.

Form parser

Werkzeug form data parser customization.

class invenio_files_rest.formparser.FormDataParser(stream_factory=None, charset='utf-8', errors='replace', max_form_memory_size=None, max_content_length=None, cls=None, silent=True)[source]

Custom form data parser.

parse(stream, mimetype, content_length, options=None)[source]

Parse the information from the given request.

Parameters:
  • stream – An input stream.
  • mimetype – The mimetype of the data.
  • content_length – The content length of the incoming data.
  • options – Optional mimetype parameters (used for the multipart boundary for instance).
Returns:

A tuple in the form (stream, form, files).