API Docs

Files download/upload REST API similar to S3 for Invenio.

class invenio_files_rest.ext.InvenioFilesREST(app=None)[source]

Invenio-Files-REST extension.

Extension initialization.

init_app(app)[source]

Flask application initialization.

init_config(app)[source]

Initialize configuration.

Models

Models for Invenio-Files-REST.

The entities of this module consists of:

  • Buckets - Identified by UUIDs, and contains objects.
  • Buckets tags - Identified uniquely with a bucket by a key. Used to store extra metadata for a bucket.
  • Objects - Identified uniquely within a bucket by string keys. Each object can have multiple object versions (note: Objects do not have their own database table).
  • Object versions - Identified by UUIDs and belongs to one specific object in one bucket. Each object version has zero or one file instance. If the object version has no file instance, it is considered a delete marker.
  • File instance - Identified by UUIDs. Represents a physical file on disk. The location of the file is specified via a URI. A file instance can have many object versions.
  • Locations - A bucket belongs to a specific location. Locations can be used to represent e.g. different storage systems and/or geographical locations.
  • Multipart Objects - Identified by UUIDs and belongs to a specific bucket and key.
  • Part object - Identified by their multipart object and a part number.

The actual file access is handled by a storage interface. Also, objects do not have their own model, but are represented via the ObjectVersion model.

class invenio_files_rest.models.Bucket(**kwargs)[source]

Model for storing buckets.

A bucket is a container of objects. Buckets have a default location and storage class. Individual objects in the bucket can however have different locations and storage classes.

A bucket can be marked as deleted. A bucket can also be marked as locked to prevent operations on the bucket.

Each bucket can also define a quota. The size of a bucket is the size of all objects in the bucket (including all versions).

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

classmethod all()[source]

Return query of all buckets (excluding deleted).

classmethod create(location=None, storage_class=None, **kwargs)[source]

Create a bucket.

Parameters:
  • location – Location of bucket (instance or name). Default: Default location.
  • storage_class – Storage class of bucket. Default: Default storage class.
  • **kwargs – Keyword arguments are forwarded to the class constructor.
Returns:

Created bucket.

default_location

Default location.

default_storage_class

Default storage class.

classmethod delete(bucket_id)[source]

Delete a bucket.

Does not actually delete the Bucket, just marks it as deleted.

deleted

Delete state of bucket.

classmethod get(bucket_id)[source]

Get bucket object (excluding deleted).

Parameters:bucket_id – Bucket identifier.
Returns:Bucket instance.
get_tags()[source]

Get tags for bucket as dictionary.

id

Bucket identifier.

location

Location associated with this bucket.

locked

Lock state of bucket.

Modifications are not allowed on a locked bucket.

max_file_size

Maximum size of a single file in the bucket.

Usage of this property depends on which file size limiters are installed.

quota_left

Get how much space is left in the bucket.

quota_size

Quota size of bucket.

Usage of this property depends on which file size limiters are installed.

remove(*args, **kwargs)[source]

Permanently remove a bucket and all objects (including versions).

Warning

This by-passes the normal versioning and should only be used when you want to permanently delete a bucket and its objects. Otherwise use Bucket.delete().

Note the method does not remove the associated file instances which must be garbage collected.

Returns:self.
size

Size of bucket.

This is a computed property which can rebuilt any time from the objects inside the bucket.

size_limit

Get size limit for this bucket.

The limit is based on the minimum output of the file size limiters.

snapshot(*args, **kwargs)[source]

Create a snapshot of latest objects in bucket.

Parameters:lock – Create the new bucket in a locked state.
Returns:Newly created bucket with the snapshot.
validate_storage_class(key, default_storage_class)[source]

Validate storage class.

class invenio_files_rest.models.FileInstance(**kwargs)[source]

Model for storing files.

A file instance represents a file on disk. A file instance may be linked from many objects, while an object can have one and only one file instance.

A file instance also records the storage class, size and checksum of the file on disk.

Additionally, a file instance can be read only in case the storage layer is not capable of writing to the file (e.g. can typically be used to link to files on externally controlled storage).

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

checksum

String representing the checksum of the object.

clear_last_check()[source]

Clear the checksum of the file.

copy_contents(*args, **kwargs)[source]

Copy this file instance into another file instance.

classmethod create()[source]

Create a file instance.

Note, object is only added to the database session.

delete()[source]

Delete a file instance.

The file instance can be deleted if it has no references from other objects. The caller is responsible to test if the file instance is writable and that the disk file can actually be removed.

Note

Normally you should use the Celery task to delete a file instance, as this method will not remove the file on disk.

classmethod get(file_id)[source]

Get a file instance.

classmethod get_by_uri(uri)[source]

Get a file instance by URI.

id

Identifier of file.

init_contents(*args, **kwargs)[source]

Initialize file.

last_check

Result of last fixity check.

last_check_at

Timestamp of last fixity check.

readable

Defines if the file is read only.

send_file(*args, **kwargs)[source]

Send file to client.

set_contents(*args, **kwargs)[source]

Save contents of stream to this file.

Parameters:
  • obj – ObjectVersion instance from where this file is accessed from.
  • stream – File-like stream.
set_uri(uri, size, checksum, readable=True, writable=False, storage_class=None)[source]

Set a location of a file.

size

Size of file.

storage(**kwargs)[source]

Get storage interface for object.

Uses the applications storage factory to create a storage interface that can be used for this particular file instance.

Returns:Storage interface.
storage_class

Storage class of file.

update_checksum(*args, **kwargs)[source]

Update checksum based on file.

update_contents(*args, **kwargs)[source]

Save contents of stream to this file.

Parameters:
  • obj – ObjectVersion instance from where this file is accessed from.
  • stream – File-like stream.
uri

Location of file.

validate_uri(key, uri)[source]

Validate uri.

verify_checksum(progress_callback=None, throws=True, **kwargs)[source]

Verify checksum of file instance.

Parameters:throws (bool) – If True, exceptions raised during checksum calculation will be re-raised after logging. If set to False, and an exception occurs, the last_check field is set to None (last_check_at of course is updated), since no check actually was performed.
writable

Defines if file is writable.

This property is used to create a file instance prior to having the actual file at the given URI. This is useful when e.g. copying a file instance.

class invenio_files_rest.models.Location(**kwargs)[source]

Model defining base locations.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

classmethod all()[source]

Return query that fetches all locations.

default

True if the location is the default location.

At least one location should be the default location.

classmethod get_by_name(name)[source]

Fetch a specific location object.

classmethod get_default()[source]

Fetch the default location object.

id

Internal identifier for locations.

The internal identifier is used only used as foreign key for buckets in order to decrease storage requirements per row for buckets.

name

External identifier of the location.

uri

URI of the location.

validate_name(key, name)[source]

Validate name.

class invenio_files_rest.models.MultipartObject(**kwargs)[source]

Model for storing files in chunks.

A multipart object belongs to a specific bucket and key and is identified by an upload id. You can have multiple multipart uploads for the same bucket and key. Once all parts of a multipart object is uploaded, the state is changed to completed. Afterwards it is not possible to upload new parts. Once completed, the multipart object is merged, and added as a new version in the current object/bucket.

All parts for a multipart upload must be of the same size, except for the last part.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

bucket

Relationship to buckets.

bucket_id

Bucket identifier.

chunk_size

Size of chunks for file.

complete(*args, **kwargs)[source]

Mark a multipart object as complete.

completed

Defines if object is the completed.

classmethod create(bucket, key, size, chunk_size)[source]

Create a new object in a bucket.

delete()[source]

Delete a multipart object.

expected_part_size(part_number)[source]

Get expected part size for a particular part number.

file

Relationship to buckets.

file_id

File instance for this multipart object.

classmethod get(bucket, key, upload_id, with_completed=False)[source]

Fetch a specific multipart object.

static is_valid_chunksize(chunk_size)[source]

Check if size is valid.

static is_valid_size(size, chunk_size)[source]

Validate max theoretical size.

key

Key identifying the object.

last_part_number

Get last part number.

last_part_size

Get size of last part.

merge_parts(*args, **kwargs)[source]

Merge parts into object version.

classmethod query_by_bucket(bucket)[source]

Query all uncompleted multipart uploads.

classmethod query_expired(dt, bucket=None)[source]

Query all uncompleted multipart uploads.

size

Size of file.

upload_id

Identifier for the specific version of an object.

validate_key(key, key_)[source]

Validate key.

class invenio_files_rest.models.ObjectVersion(**kwargs)[source]

Model for storing versions of objects.

A bucket stores one or more objects identified by a key. Each object is versioned where each version is represented by an ObjectVersion.

An object version can either be 1) a normal version which is linked to a file instance, or 2) a delete marker, which is not linked to a file instance.

An normal object version is linked to a physical file on disk via a file instance. This allows for multiple object versions to point to the same file on disk, to optimize storage efficiency (e.g. useful for snapshotting an entire bucket without duplicating the files).

A delete marker object version represents that the object at hand was deleted.

The latest version of an object is marked using the is_head property. If the latest object version is a delete marker the object will not be shown in the bucket.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

basename

Return filename of the object.

bucket

Relationship to buckets.

bucket_id

Bucket identifier.

copy(*args, **kwargs)[source]

Copy an object version to a given bucket + object key.

The copy operation is handled completely at the metadata level. The actual data on disk is not copied. Instead, the two object versions will point to the same physical file (via the same FileInstance).

Warning

If the destination object exists, it will be replaced by the new object version which will become the latest version.

Parameters:
  • bucket – The bucket (instance or id) to copy the object to. Default: current bucket.
  • key – Key name of destination object. Default: current object key.
Returns:

The copied object version.

classmethod create(bucket, key, _file_id=None, stream=None, mimetype=None, version_id=None, **kwargs)[source]

Create a new object in a bucket.

The created object is by default created as a delete marker. You must use set_contents() or set_location() in order to change this.

Parameters:
  • bucket – The bucket (instance or id) to create the object in.
  • key – Key of object.
  • _file_id – For internal use.
  • stream – File-like stream object. Used to set content of object immediately after being created.
  • mimetype – MIME type of the file object if it is known.
  • kwargs – Keyword arguments passed to Object.set_contents().
classmethod delete(bucket, key)[source]

Delete an object.

Technically works by creating a new version which works as a delete marker.

Parameters:
  • bucket – The bucket (instance or id) to delete the object from.
  • key – Key of object.
  • version_id – Specific version to delete.
Returns:

Created delete marker object if key exists else None.

deleted

Determine if object version is a delete marker.

file

Relationship to file instance.

file_id

File instance for this object version.

A null value in this column defines that the object has been deleted.

classmethod get(bucket, key, version_id=None)[source]

Fetch a specific object.

By default the latest object version is returned, if version_id is not set.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – Key of object.
  • version_id – Specific version of an object.
classmethod get_by_bucket(bucket, versions=False)[source]

Return query that fetches all the objects in a bucket.

classmethod get_versions(bucket, key)[source]

Fetch all versions of a specific object.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – Key of object.
is_head

Defines if object is the latest version.

key

Key identifying the object.

mimetype

Get MIME type of object.

Relink all object versions (for a given file) to a new file.

Warning

Use this method with great care.

remove(*args, **kwargs)[source]

Permanently remove a specific object version from the database.

Warning

This by-passes the normal versioning and should only be used when you want to permanently delete a specific object version. Otherwise use ObjectVersion.delete().

Note the method does not remove the associated file instance which must be garbage collected.

Returns:self.
restore(*args, **kwargs)[source]

Restore this object version to become the latest version.

Raises an exception if the object is the latest version.

send_file(restricted=True, trusted=False, **kwargs)[source]

Wrap around FileInstance’s send file.

set_contents(*args, **kwargs)[source]

Save contents of stream to file instance.

If a file instance has already been set, this methods raises an FileInstanceAlreadySetError exception.

Parameters:
  • stream – File-like stream.
  • size – Size of stream if known.
  • chunk_size – Desired chunk size to read stream in. It is up to the storage interface if it respects this value.
set_file(*args, **kwargs)[source]

Set a file instance.

set_location(*args, **kwargs)[source]

Set only URI location of for object.

Useful to link files on externally controlled storage. If a file instance has already been set, this methods raises an FileInstanceAlreadySetError exception.

Parameters:
  • uri – Full URI to object (which can be interpreted by the storage interface).
  • size – Size of file.
  • checksum – Checksum of file.
  • storage_class – Storage class where file is stored ()
validate_key(key, key_)[source]

Validate key.

version_id

Identifier for the specific version of an object.

class invenio_files_rest.models.Part(**kwargs)[source]

Part object.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

checksum

String representing the checksum of the part.

classmethod count(mp)[source]

Count number of parts for a given multipart object.

classmethod create(mp, part_number, stream=None, **kwargs)[source]

Create a new part object in a multipart object.

classmethod delete(mp, part_number)[source]

Get part number.

end_byte

Get end byte in file for this part.

classmethod get_or_create(mp, part_number)[source]

Get or create a part.

classmethod get_or_none(mp, part_number)[source]

Get part number.

multipart

Relationship to multipart objects.

part_number

Part number.

part_size

Get size of this part.

classmethod query_by_multipart(multipart)[source]

Get all parts for a specific multipart upload.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A invenio_files_rest.models.Part instance.
set_contents(*args, **kwargs)[source]

Save contents of stream to part of file instance.

If a the MultipartObject is completed this methods raises an MultipartAlreadyCompleted exception.

Parameters:
  • stream – File-like stream.
  • size – Size of stream if known.
  • chunk_size – Desired chunk size to read stream in. It is up to the storage interface if it respects this value.
start_byte

Get start byte in file of this part.

upload_id

Multipart object identifier.

Storage

File storage interface.

class invenio_files_rest.storage.FileStorage(size=None, modified=None)[source]

Base class for storage interface to a single file.

Initialize storage object.

checksum(chunk_size=None, progress_callback=None)[source]

Compute checksum of file.

copy(src, chunk_size=None, progress_callback=None)[source]

Copy data from another file instance.

Parameters:
  • src – Source stream.
  • chunk_size – Chunk size to read from source stream.
delete()[source]

Delete the file.

initialize(size=0)[source]

Initialize the file on the storage + truncate to the given size.

open(mode=None)[source]

Open the file.

The caller is responsible for closing the file.

save(incoming_stream, size_limit=None, size=None, chunk_size=None, progress_callback=None)[source]

Save incoming stream to file storage.

send_file(filename, mimetype=None, restricted=True, checksum=None, trusted=False)[source]

Send the file to the client.

update(incoming_stream, seek=0, size=None, chunk_size=None, progress_callback=None)[source]

Update part of file with incoming stream.

invenio_files_rest.storage.pyfs_storage_factory(fileinstance=None, default_location=None, default_storage_class=None, filestorage_class=<class 'invenio_files_rest.storage.pyfs.PyFSFileStorage'>, fileurl=None, size=None, modified=None, clean_dir=True)[source]

Get factory function for creating a PyFS file storage instance.

class invenio_files_rest.storage.PyFSFileStorage(fileurl, size=None, modified=None, clean_dir=True)[source]

File system storage using PyFilesystem for access the file.

This storage class will store files according to the following pattern: <base_uri>/<file instance uuid>/data.

Warning

File operations are not atomic. E.g. if errors happens during e.g. updating part of a file it will leave the file in an inconsistent state. The storage class tries as best as possible to handle errors and leave the system in a consistent state.

Storage initialization.

delete()[source]

Delete a file.

The base directory is also removed, as it is assumed that only one file exists in the directory.

initialize(size=0)[source]

Initialize file on storage and truncate to given size.

open(mode='rb')[source]

Open file.

The caller is responsible for closing the file.

save(incoming_stream, size_limit=None, size=None, chunk_size=None, progress_callback=None)[source]

Save file in the file system.

update(incoming_stream, seek=0, size=None, chunk_size=None, progress_callback=None)[source]

Update a file in the file system.

Signals

Models for Invenio-Files-REST.

invenio_files_rest.signals.file_downloaded = <blinker.base.NamedSignal object at 0x7fc11fc48210; 'file-downloaded'>

File downloaded signal.

Sent when a file is downloaded.

File streaming

File serving helpers for Files REST API.

invenio_files_rest.helpers.MIMETYPE_WHITELIST = set(['image/jpeg', 'audio/mpeg', 'image/png', 'audio/ogg', 'image/gif', 'audio/wav', 'audio/webm', 'image/tiff', 'text/plain'])

List of whitelisted MIME types.

Warning

Do not add new types to this list unless you know what you are doing. You could potentially open up for XSS attacks.

invenio_files_rest.helpers.compute_checksum(stream, algo, message_digest, chunk_size=None, progress_callback=None)[source]

Get helper method to compute checksum from a stream.

Parameters:
  • stream – File-like object.
  • algo – Identifier for checksum algorithm.
  • messsage_digest – A message digest instance.
  • chunk_size – Read at most size bytes from the file. (Default: None)
  • progress_callback – Function accepting one argument with number of bytes read. (Default: None)
Returns:

The checksum.

invenio_files_rest.helpers.compute_md5_checksum(stream, **kwargs)[source]

Get helper method to compute MD5 checksum from a stream.

Parameters:stream – The input stream.
Returns:The MD5 checksum.
invenio_files_rest.helpers.make_path(base_uri, path, filename, path_dimensions, split_length)[source]

Generate a path as base location for file instance.

Parameters:
  • base_uri – The base URI.
  • path – The relative path.
  • path_dimensions – Number of chunks the path should be split into.
  • split_length – The length of any chunk.
Returns:

A string representing the full path.

invenio_files_rest.helpers.populate_from_path(bucket, source, checksum=True, key_prefix='')[source]

Populate a bucket from all files in path.

Parameters:
  • bucket – The bucket (instance or id) to create the object in.
  • source – The file or directory path.
  • checksum – If True then a MD5 checksum will be computed for each file. (Default: True)
  • key_prefix – The key prefix for the bucket.
Returns:

A iterator for all invenio_files_rest.models.ObjectVersion instances.

invenio_files_rest.helpers.sanitize_mimetype(mimetype, filename=None)[source]

Sanitize a MIME type so the browser does not render the file.

invenio_files_rest.helpers.send_stream(stream, filename, size, mtime, mimetype=None, restricted=True, as_attachment=False, etag=None, content_md5=None, chunk_size=8192, conditional=True, trusted=False)[source]

Send the contents of a file to the client.

Warning

It is very easy to be exposed to Cross-Site Scripting (XSS) attacks if you serve user uploaded files. Here are some recommendations:

  1. Serve user uploaded files from a separate domain (not a subdomain). This way a malicious file can only attack other user uploaded files.
  2. Prevent the browser from rendering and executing HTML files (by setting trusted=False).
  3. Force the browser to download the file as an attachment (as_attachment=True).
Parameters:
  • stream – The file stream to send.
  • filename – The file name.
  • size – The file size.
  • mtime – A Unix timestamp that represents last modified time (UTC).
  • mimetype – The file mimetype. If None, the module will try to guess. (Default: None)
  • restricted – If the file is not restricted, the module will set the cache-control. (Default: True)
  • as_attachment – If the file is an attachment. (Default: False)
  • etag – If defined, it will be set as HTTP E-Tag.
  • content_md5 – If defined, a HTTP Content-MD5 header will be set.
  • chunk_size – The chunk size. (Default: 8192)
  • conditional – Make the response conditional to the request. (Default: True)
  • trusted – Do not enable this option unless you know what you are doing. By default this function will send HTTP headers and MIME types that prevents your browser from rendering e.g. a HTML file which could contain a malicious script tag. (Default: False)
Returns:

A Flask response instance.

Tasks

Celery tasks for Invenio-Files-REST.

invenio_files_rest.tasks.default_checksum_verification_files_query()[source]

Return a query of valid FileInstances for checksum verficiation.

invenio_files_rest.tasks.progress_updater(size, total)[source]

Progress reporter for checksum verification.

Exceptions

Errors for Invenio-Files-REST.

exception invenio_files_rest.errors.BucketLockedError(errors=None, **kwargs)[source]

Exception raised when a bucket is locked.

Initialize RESTException.

exception invenio_files_rest.errors.FileInstanceAlreadySetError(errors=None, **kwargs)[source]

Exception raised when file instance already set on object.

Initialize RESTException.

exception invenio_files_rest.errors.FileInstanceUnreadableError(errors=None, **kwargs)[source]

Exception raised when trying to get an unreadable file.

Initialize RESTException.

exception invenio_files_rest.errors.FileSizeError(errors=None, **kwargs)[source]

Exception raised when a file larger than allowed.

Initialize RESTException.

exception invenio_files_rest.errors.FilesException(errors=None, **kwargs)[source]

Base exception for all errors .

Initialize RESTException.

exception invenio_files_rest.errors.InvalidKeyError(errors=None, **kwargs)[source]

Invalid key.

Initialize RESTException.

exception invenio_files_rest.errors.InvalidOperationError(errors=None, **kwargs)[source]

Exception raised when an invalid operation is performed.

Initialize RESTException.

exception invenio_files_rest.errors.MissingQueryParameter(arg_name, **kwargs)[source]

Exception raised when missing a query parameter.

Initialize RESTException.

get_description(environ=None)[source]

Get the description.

exception invenio_files_rest.errors.MultipartAlreadyCompleted(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartException(errors=None, **kwargs)[source]

Exception for multipart objects.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartInvalidChunkSize(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartInvalidPartNumber(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartInvalidSize(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartMissingParts(errors=None, **kwargs)[source]

Exception raised when multipart object is already completed.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartNoPart(errors=None, **kwargs)[source]

Exception raised by part factories when no part was detected.

Initialize RESTException.

exception invenio_files_rest.errors.MultipartNotCompleted(errors=None, **kwargs)[source]

Exception raised when multipart object is not already completed.

Initialize RESTException.

exception invenio_files_rest.errors.StorageError(errors=None, **kwargs)[source]

Exception raised when a storage operation fails.

Initialize RESTException.

exception invenio_files_rest.errors.UnexpectedFileSizeError(errors=None, **kwargs)[source]

Exception raised when a file does not match its expected size.

Initialize RESTException.

Configuration

Invenio Files Rest module configuration file.

invenio_files_rest.config.FILES_REST_CHECKSUM_VERIFICATION_FILES_QUERY = 'invenio_files_rest.tasks.default_checksum_verification_files_query'

Function returning a FileInstance query for files that should be checked.

invenio_files_rest.config.FILES_REST_DEFAULT_MAX_FILE_SIZE = None

Default maximum file size for a bucket in bytes.

invenio_files_rest.config.FILES_REST_DEFAULT_QUOTA_SIZE = None

Default quota size for a bucket in bytes.

invenio_files_rest.config.FILES_REST_DEFAULT_STORAGE_CLASS = 'S'

Default storage class.

invenio_files_rest.config.FILES_REST_FILE_URI_MAX_LEN = 255

Maximum length of the FileInstance.uri field.

Warning

Setting this variable to anything higher than 255 is only supported with PostgreSQL database.

invenio_files_rest.config.FILES_REST_MIN_FILE_SIZE = 1

Minimum file size for uploads (i.e. do not allow empty files).

invenio_files_rest.config.FILES_REST_MULTIPART_CHUNKSIZE_MAX = 5368709120

Minimum chunk size of multipart objects.

invenio_files_rest.config.FILES_REST_MULTIPART_CHUNKSIZE_MIN = 5242880

Minimum chunk size of multipart objects.

invenio_files_rest.config.FILES_REST_MULTIPART_EXPIRES = datetime.timedelta(4)

Time delta after which a multipart upload is considered expired.

invenio_files_rest.config.FILES_REST_MULTIPART_MAX_PARTS = 10000

Maximum number of parts.

invenio_files_rest.config.FILES_REST_MULTIPART_PART_FACTORIES = ['invenio_files_rest.views:default_partfactory', 'invenio_files_rest.views:ngfileupload_partfactory']

Import path of factory used to parse chunked upload parameters.

invenio_files_rest.config.FILES_REST_OBJECT_KEY_MAX_LEN = 255

Maximum length of the ObjectVersion.key field.

Warning

Setting this variable to anything higher than 255 is only supported with PostgreSQL database.

invenio_files_rest.config.FILES_REST_PERMISSION_FACTORY = 'invenio_files_rest.permissions.permission_factory'

Permission factory to control the files access from the REST interface.

invenio_files_rest.config.FILES_REST_SIZE_LIMITERS = 'invenio_files_rest.limiters.file_size_limiters'

Import path of file size limiters factory.

invenio_files_rest.config.FILES_REST_STORAGE_CLASS_LIST = {'A': 'Archive', 'S': 'Standard'}

Storage class list defines the systems storage classes.

Storage classes are useful for e.g. defining the type of storage an object is located on (e.g. offline/online), so that the system knowns if it can serve the file and/or what is the reliability.

invenio_files_rest.config.FILES_REST_STORAGE_FACTORY = 'invenio_files_rest.storage.pyfs_storage_factory'

Import path of factory used to create a storage instance.

invenio_files_rest.config.FILES_REST_STORAGE_PATH_DIMENSIONS = 2

Number of directory levels created for the storage.

invenio_files_rest.config.FILES_REST_STORAGE_PATH_SPLIT_LENGTH = 2

Length of the filename that should be taken to create its root dir.

invenio_files_rest.config.FILES_REST_TASK_WAIT_INTERVAL = 2

Interval in seconds between sending a whitespace to not close connection.

invenio_files_rest.config.FILES_REST_TASK_WAIT_MAX_SECONDS = 600

Maximum number of seconds to wait for a task to finish.

invenio_files_rest.config.FILES_REST_UPLOAD_FACTORIES = ['invenio_files_rest.views:stream_uploadfactory', 'invenio_files_rest.views:ngfileupload_uploadfactory']

Import path of factory used to parse file uploads.

Note

Factories that reads request.stream directly must be first in the list, otherwise Werkzeug’s form-data parser will read the stream.

invenio_files_rest.config.MAX_CONTENT_LENGTH = 16777216

Maximum allowed content length for form data.

This value limits the maximum file upload size via multipart-formdata and is a Flask configuration variable that by default is unlimited. The value must be larger than the maximum part size you want to accept via application/multipart-formdata (used by e.g. ng-file upload). This value only limits file upload size via application/multipart-formdata and in particular does not restrict the maximum file size possible when streaming a file in the body of a PUT request.

Flask, by default, saves any file bigger than 500kb to a temporary file on disk, thus do not set this value to large or you may run out of disk space on your nodes.

Limiters

File size limiting functionality for Invenio-Files-REST.

class invenio_files_rest.limiters.FileSizeLimit(limit, reason)[source]

File size limiter.

Instantiate a new file size limit.

Parameters:
  • limit – The imposed imposed limit.
  • reason – The limit description.
invenio_files_rest.limiters.file_size_limiters(bucket)[source]

Get default file size limiters.

Parameters:bucket – The invenio_files_rest.models.Bucket instance.
Returns:A list containing an instance of invenio_files_rest.limiters.FileSizeLimit with quota left value and description and another one with max file size value and description.

Permissions

Permissions for files using Invenio-Access.

invenio_files_rest.permissions.BucketListMultiparts = <functools.partial object>

Action needed: list multipart uploads in bucket.

invenio_files_rest.permissions.BucketRead = <functools.partial object>

Action needed: list objects in bucket.

invenio_files_rest.permissions.BucketReadVersions = <functools.partial object>

Action needed: list object versions in bucket.

invenio_files_rest.permissions.BucketUpdate = <functools.partial object>

Action needed: create objects and multipart uploads in bucket.

invenio_files_rest.permissions.LocationUpdate = <functools.partial object>

Action needed: location update.

invenio_files_rest.permissions.MultipartDelete = <functools.partial object>

Action needed: abort a multipart upload.

invenio_files_rest.permissions.MultipartRead = <functools.partial object>

Action needed: list parts of a multipart upload in a bucket.

invenio_files_rest.permissions.ObjectDelete = <functools.partial object>

Action needed: delete object in bucket.

invenio_files_rest.permissions.ObjectDeleteVersion = <functools.partial object>

Action needed: permanently delete specific object version in bucket.

invenio_files_rest.permissions.ObjectRead = <functools.partial object>

Action needed: get object in bucket.

invenio_files_rest.permissions.ObjectReadVersion = <functools.partial object>

Action needed: get object version in bucket.

invenio_files_rest.permissions.bucket_listmultiparts_all = Need(method='action', value='files-rest-bucket-listmultiparts', argument=None)

Action needed: list all buckets multiparts.

invenio_files_rest.permissions.bucket_read_all = Need(method='action', value='files-rest-bucket-read', argument=None)

Action needed: read all buckets.

invenio_files_rest.permissions.bucket_read_versions_all = Need(method='action', value='files-rest-bucket-read-versions', argument=None)

Action needed: read all buckets versions.

invenio_files_rest.permissions.bucket_update_all = Need(method='action', value='files-rest-bucket-update', argument=None)

Action needed: update all buckets

invenio_files_rest.permissions.location_update_all = Need(method='action', value='files-rest-location-update', argument=None)

Action needed: update all locations.

invenio_files_rest.permissions.multipart_delete_all = Need(method='action', value='files-rest-multipart-delete', argument=None)

Action needed: delete all multiparts.

invenio_files_rest.permissions.multipart_read_all = Need(method='action', value='files-rest-multipart-read', argument=None)

Action needed: read all multiparts.

invenio_files_rest.permissions.object_delete_all = Need(method='action', value='files-rest-object-delete', argument=None)

Action needed: delete all objects.

invenio_files_rest.permissions.object_delete_version_all = Need(method='action', value='files-rest-object-delete-version', argument=None)

Action needed: delete all objects versions.

invenio_files_rest.permissions.object_read_all = Need(method='action', value='files-rest-object-read', argument=None)

Action needed: read all objects.

invenio_files_rest.permissions.object_read_version_all = Need(method='action', value='files-rest-object-read-version', argument=None)

Action needed: read all objects versions.

invenio_files_rest.permissions.permission_factory(obj, action)[source]

Get default permission factory.

Parameters:
Raises:

RuntimeError – If the object is unknown.

Returns:

A invenio_access.permissions.DynamicPermission instance.

Serializers

REST API serializers.

class invenio_files_rest.serializer.BaseSchema(extra=None, only=(), exclude=(), prefix=u'', strict=None, many=False, context=None, load_only=(), dump_only=(), partial=False)[source]

Base schema for all serializations.

Get base links.

class invenio_files_rest.serializer.BucketSchema(extra=None, only=(), exclude=(), prefix=u'', strict=None, many=False, context=None, load_only=(), dump_only=(), partial=False)[source]

Schema for bucket.

Dump links.

class invenio_files_rest.serializer.MultipartObjectSchema(extra=None, only=(), exclude=(), prefix=u'', strict=None, many=False, context=None, load_only=(), dump_only=(), partial=False)[source]

Schema for ObjectVersions.

Dump links.

class invenio_files_rest.serializer.ObjectVersionSchema(extra=None, only=(), exclude=(), prefix=u'', strict=None, many=False, context=None, load_only=(), dump_only=(), partial=False)[source]

Schema for ObjectVersions.

Dump links.

wrap(data, many)[source]

Wrap response in envelope.

class invenio_files_rest.serializer.PartSchema(extra=None, only=(), exclude=(), prefix=u'', strict=None, many=False, context=None, load_only=(), dump_only=(), partial=False)[source]

Schema for parts.

wrap(data, many)[source]

Wrap response in envelope.

invenio_files_rest.serializer.json_serializer(data=None, code=200, headers=None, context=None, etag=None, task_result=None)[source]

Build a json flask response using the given data.

Parameters:
  • data – The data to serialize. (Default: None)
  • code – The HTTP status code. (Default: 200)
  • headers – The HTTP headers to include. (Default: None)
  • context – The schema class context. (Default: None)
  • etag – The ETag header. (Default: None)
  • task_result – Optionally you can pass async task to wait for. (Default: None)
Returns:

A Flask response with json data.

Return type:

flask.Response

invenio_files_rest.serializer.schema_from_context(context)[source]

Determine which schema to use.

invenio_files_rest.serializer.wait_for_taskresult(task_result, content, interval, max_rounds)[source]

Get helper to wait for async task result to finish.

The task will periodically send whitespace to prevent the connection from being closed.

Parameters:
  • task_result – The async task to wait for.
  • content – The content to return when the task is ready.
  • interval – The duration of a sleep period before check again if the task is ready.
  • max_rounds – The maximum number of intervals the function check before returning an Exception.
Returns:

An iterator on the content or a invenio_files_rest.errors.FilesException exception if the timeout happened or the job failed.

Views

Files download/upload REST API similar to S3 for Invenio.

class invenio_files_rest.views.BucketResource(*args, **kwargs)[source]

Bucket item resource.

Instatiate content negotiated view.

get(*args, **kwargs)[source]

Get list of objects in the bucket.

Parameters:bucket – A invenio_files_rest.models.Bucket instance.
Returns:The Flask response.
head(*args, **kwargs)[source]

Check the existence of the bucket.

listobjects(*args, **kwargs)[source]

List objects in a bucket.

Parameters:bucket – A invenio_files_rest.models.Bucket instance.
Returns:The Flask response.
multipart_listuploads(*args, **kwargs)[source]

List objects in a bucket.

Parameters:bucket – A invenio_files_rest.models.Bucket instance.
Returns:The Flask response.
class invenio_files_rest.views.LocationResource(*args, **kwargs)[source]

Service resource.

Instatiate content negotiated view.

post(*args, **kwargs)[source]

Create bucket.

class invenio_files_rest.views.ObjectResource(*args, **kwargs)[source]

Object item resource.

Instatiate content negotiated view.

static check_object_permission(obj)[source]

Retrieve object and abort if it doesn’t exists.

create_object(bucket, key)[source]

Create a new object.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – The file key.
Returns:

A Flask response.

delete(*args, **kwargs)[source]

Delete an object or abort a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from. (Default: None)
  • key – The file key. (Default: None)
  • version_id – The version ID. (Default: None)
  • upload_id – The upload ID. (Default: None)
Returns:

A Flask response.

delete_object(*args, **kwargs)[source]

Delete an existing object.

Parameters:
Returns:

A Flask response.

get(*args, **kwargs)[source]

Get object or list parts of a multpart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from. (Default: None)
  • key – The file key. (Default: None)
  • version_id – The version ID. (Default: None)
  • upload_id – The upload ID. (Default: None)
Returns:

A Flask response.

classmethod get_object(bucket, key, version_id)[source]

Retrieve object and abort if it doesn’t exists.

If the file is not found, the connection is aborted and the 404 error is returned.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – The file key.
  • version_id – The version ID.
Returns:

A invenio_files_rest.models.ObjectVersion instance.

multipart_complete(bucket, key, upload_id, *args, **kwargs)[source]

Complete a multipart upload.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A Flask response.
multipart_delete(bucket, key, upload_id, *args, **kwargs)[source]

Abort a multipart upload.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A Flask response.
multipart_init(*args, **kwargs)[source]

Initialize a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from.
  • key – The file key.
  • size – The total size.
  • part_size – The part size.
Raises:

invenio_files_rest.errors.MissingQueryParameter – If size or part_size are not defined.

Returns:

A Flask response.

multipart_listparts(bucket, key, upload_id, *args, **kwargs)[source]

Get parts of a multipart upload.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A Flask response.
multipart_uploadpart(bucket, key, upload_id, *args, **kwargs)[source]

Upload a part.

Parameters:multipart – A invenio_files_rest.models.MultipartObject instance.
Returns:A Flask response.
post(*args, **kwargs)[source]

Upload a new object or start/complete a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from. (Default: None)
  • key – The file key. (Default: None)
  • upload_id – The upload ID. (Default: None)
Returns:

A Flask response.

put(*args, **kwargs)[source]

Update a new object or upload a part of a multipart upload.

Parameters:
  • bucket – The bucket (instance or id) to get the object from. (Default: None)
  • key – The file key. (Default: None)
  • upload_id – The upload ID. (Default: None)
Returns:

A Flask response.

static send_object(bucket, obj, expected_chksum=None, logger_data=None, restricted=True)[source]

Send an object for a given bucket.

Parameters:
Params expected_chksum:
 

Expected checksum.

Returns:

A Flask response.

invenio_files_rest.views.as_uuid(value)[source]

Convert value to UUID.

invenio_files_rest.views.bucket_view(*args, **kwargs)

Bucket item resource.

invenio_files_rest.views.check_permission(permission, hidden=True)[source]

Check if permission is allowed.

If permission fails then the connection is aborted.

Parameters:
  • permission – The permission to check.
  • hidden – Determine if a 404 error (True) or 401/403 error (False) should be returned if the permission is rejected (i.e. hide or reveal the existence of a particular object).
invenio_files_rest.views.default_partfactory(*args, **kwargs)[source]

Get default part factory.

Parameters:
  • part_number – The part number. (Default: None)
  • content_length – The content length. (Default: None)
  • content_type – The HTTP Content-Type. (Default: None)
  • content_md5 – The content MD5. (Default: None)
Returns:

The content length, the part number, the stream, the content type, MD5 of the content.

invenio_files_rest.views.invalid_subresource_validator(value)[source]

Ensure subresource.

invenio_files_rest.views.location_view(*args, **kwargs)

Service resource.

invenio_files_rest.views.minsize_validator(value)[source]

Validate Content-Length header.

Raises:invenio_files_rest.errors.FileSizeError – If the value is less than invenio_files_rest.config.FILES_REST_MIN_FILE_SIZE size.
invenio_files_rest.views.need_permissions(object_getter, action, hidden=True)[source]

Get permission for buckets or abort.

Parameters:
  • object_getter – The function used to retrieve the object and pass it to the permission factory.
  • action – The action needed.
  • hidden – Determine which kind of error to return. (Default: True)
invenio_files_rest.views.ngfileupload_partfactory(*args, **kwargs)[source]

Part factory for ng-file-upload.

Parameters:
  • part_number – The part number. (Default: None)
  • content_length – The content length. (Default: None)
  • uploaded_file – The upload request. (Default: None)
Returns:

The content length, part number, stream, HTTP Content-Type header.

invenio_files_rest.views.ngfileupload_uploadfactory(*args, **kwargs)[source]

Get default put factory.

If Content-Type is 'multipart/form-data' then the stream is aborted.

Parameters:
  • content_length – The content length. (Default: None)
  • content_type – The HTTP Content-Type. (Default: None)
  • uploaded_file – The upload request. (Default: None)
Returns:

A tuple containing stream, content length, and empty header.

invenio_files_rest.views.object_view(*args, **kwargs)

Object item resource.

invenio_files_rest.views.pass_bucket(f)[source]

Decorate to retrieve a bucket.

invenio_files_rest.views.pass_multipart(with_completed=False)[source]

Decorate to retrieve an object.

invenio_files_rest.views.stream_uploadfactory(*args, **kwargs)[source]

Get default put factory.

If Content-Type is 'multipart/form-data' then the stream is aborted.

Parameters:
  • content_md5 – The content MD5. (Default: None)
  • content_length – The content length. (Default: None)
  • content_type – The HTTP Content-Type. (Default: None)
Returns:

The stream, content length, MD5 of the content.

Form parser

Werkzeug form data parser customization.

class invenio_files_rest.formparser.FormDataParser(stream_factory=None, charset='utf-8', errors='replace', max_form_memory_size=None, max_content_length=None, cls=None, silent=True)[source]

Custom form data parser.

parse(stream, mimetype, content_length, options=None)[source]

Parse the information from the given request.

Parameters:
  • stream – An input stream.
  • mimetype – The mimetype of the data.
  • content_length – The content length of the incoming data.
  • options – Optional mimetype parameters (used for the multipart boundary for instance).
Returns:

A tuple in the form (stream, form, files).