API Docs¶
Files download/upload REST API similar to S3 for Invenio.
- class invenio_files_rest.ext.InvenioFilesREST(app=None)[source]¶
Invenio-Files-REST extension.
Extension initialization.
Models¶
Models for Invenio-Files-REST.
The entities of this module consists of:
Buckets - Identified by UUIDs, and contains objects.
Buckets tags - Identified uniquely with a bucket by a key. Used to store extra metadata for a bucket.
Objects - Identified uniquely within a bucket by string keys. Each object can have multiple object versions (note: Objects do not have their own database table).
Object versions - Identified by UUIDs and belongs to one specific object in one bucket. Each object version has zero or one file instance. If the object version has no file instance, it is considered a delete marker.
File instance - Identified by UUIDs. Represents a physical file on disk. The location of the file is specified via a URI. A file instance can have many object versions.
Locations - A bucket belongs to a specific location. Locations can be used to represent e.g. different storage systems.
Multipart Objects - Identified by UUIDs and belongs to a specific bucket and key.
Part object - Identified by their multipart object and a part number.
The actual file access is handled by a storage interface. Also, objects do not
have their own model, but are represented via the ObjectVersion
model.
- class invenio_files_rest.models.Bucket(**kwargs)[source]¶
Model for storing buckets.
A bucket is a container of objects. Buckets have a default location and storage class. Individual objects in the bucket can however have different locations and storage classes.
A bucket can be marked as deleted. A bucket can also be marked as locked to prevent operations on the bucket.
Each bucket can also define a quota. The size of a bucket is the size of all objects in the bucket (including all versions).
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- classmethod create(location=None, storage_class=None, **kwargs)[source]¶
Create a bucket.
- Parameters
location – Location of a bucket (instance or name). Default: Default location.
storage_class – Storage class of a bucket. Default: Default storage class.
**kwargs – Keyword arguments are forwarded to the class
**kwargs – Keyword arguments are forwarded to the class constructor.
- Returns
Created bucket.
- created¶
Creation timestamp.
- default_location¶
Default location.
- default_storage_class¶
Default storage class.
- classmethod delete(bucket_id)[source]¶
Delete a bucket.
Does not actually delete the Bucket, just marks it as deleted.
- deleted¶
Delete state of bucket.
- classmethod get(bucket_id)[source]¶
Get a bucket object (excluding deleted).
- Parameters
bucket_id – Bucket identifier.
- Returns
Bucket instance.
- id¶
Bucket identifier.
- location¶
Location associated with this bucket.
- locked¶
Lock state of bucket.
Modifications are not allowed on a locked bucket.
- max_file_size¶
Maximum size of a single file in the bucket.
Usage of this property depends on which file size limiters are installed.
- property quota_left¶
Get how much space is left in the bucket.
- quota_size¶
Quota size of bucket.
Usage of this property depends on which file size limiters are installed.
- remove()[source]¶
Permanently remove a bucket and all objects (including versions).
Warning
This by-passes the normal versioning and should only be used when you want to permanently delete a bucket and its objects. Otherwise use
Bucket.delete()
.Note the method does not remove the associated file instances which must be garbage collected.
- Returns
self
.
- size¶
Size of bucket.
This is a computed property which can rebuilt any time from the objects inside the bucket.
- property size_limit¶
Get size limit for this bucket.
The limit is based on the minimum output of the file size limiters.
- snapshot(lock=False)[source]¶
Create a snapshot of latest objects in bucket.
- Parameters
lock – Create the new bucket in a locked state.
- Returns
Newly created bucket containing copied ObjectVersion.
- sync(bucket, delete_extras=False)[source]¶
Sync self bucket ObjectVersions to the destination bucket.
The bucket is fully mirrored with the destination bucket following the logic:
same ObjectVersions are not touched
new ObjectVersions are added to destination
deleted ObjectVersions are deleted in destination
extra ObjectVersions in dest are deleted if delete_extras param is True
- Parameters
bucket – The destination bucket.
delete_extras – Delete extra ObjectVersions in destination if True.
- Returns
The bucket with an exact copy of ObjectVersions in self.
- updated¶
Modification timestamp.
- class invenio_files_rest.models.BucketTag(**kwargs)[source]¶
Model for storing tags associated to buckets.
This is useful to store extra information for a bucket.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- bucket¶
Relationship to buckets.
- key¶
Tag key.
- value¶
Tag value.
- class invenio_files_rest.models.FileInstance(**kwargs)[source]¶
Model for storing files.
A file instance represents a file on disk. A file instance may be linked from many objects, while an object can have one and only one file instance.
A file instance also records the storage class, size and checksum of the file on disk.
Additionally, a file instance can be read only in case the storage layer is not capable of writing to the file (e.g. can typically be used to link to files on externally controlled storage).
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- checksum¶
String representing the checksum of the object.
- copy_contents(fileinstance, progress_callback=None, chunk_size=None, **kwargs)[source]¶
Copy this file instance into another file instance.
- classmethod create()[source]¶
Create a file instance.
Note, object is only added to the database session.
- created¶
Creation timestamp.
- delete()[source]¶
Delete a file instance.
The file instance can be deleted if it has no references from other objects. The caller is responsible to test if the file instance is writable and that the disk file can actually be removed.
Note
Normally you should use the Celery task to delete a file instance, as this method will not remove the file on disk.
- id¶
Identifier of file.
- last_check¶
Result of last fixity check.
- last_check_at¶
Timestamp of last fixity check.
- readable¶
Defines if the file is read only.
- send_file(filename, restricted=True, mimetype=None, trusted=False, chunk_size=None, as_attachment=False, **kwargs)[source]¶
Send file to client.
- set_contents(stream, chunk_size=None, size=None, size_limit=None, progress_callback=None, **kwargs)[source]¶
Save contents of stream to this file.
- Parameters
obj – ObjectVersion instance from where this file is accessed from.
stream – File-like stream.
- set_uri(uri, size, checksum, readable=True, writable=False, storage_class=None)[source]¶
Set a location of a file.
- size¶
Size of file.
- storage(**kwargs)[source]¶
Get storage interface for object.
Uses the applications storage factory to create a storage interface that can be used for this particular file instance.
- Returns
Storage interface.
- storage_class¶
Storage class of file.
- update_checksum(progress_callback=None, chunk_size=None, checksum_kwargs=None, **kwargs)[source]¶
Update checksum based on file.
- update_contents(stream, seek=0, size=None, chunk_size=None, progress_callback=None, **kwargs)[source]¶
Save contents of stream to this file.
- Parameters
obj – ObjectVersion instance from where this file is accessed from.
stream – File-like stream.
- updated¶
Modification timestamp.
- uri¶
Location of file.
- verify_checksum(progress_callback=None, chunk_size=None, throws=True, checksum_kwargs=None, **kwargs)[source]¶
Verify checksum of file instance.
- Parameters
throws (bool) – If True, exceptions raised during checksum calculation will be re-raised after logging. If set to False, and an exception occurs, the last_check field is set to None (last_check_at of course is updated), since no check actually was performed.
checksum_kwargs (dict) – Passed as **kwargs` to
storage().checksum
.
- writable¶
Defines if file is writable.
This property is used to create a file instance prior to having the actual file at the given URI. This is useful when e.g. copying a file instance.
- class invenio_files_rest.models.Location(**kwargs)[source]¶
Model defining base locations.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- created¶
Creation timestamp.
- default¶
True if the location is the default location.
At least one location should be the default location.
- id¶
Internal identifier for locations.
The internal identifier is used only used as foreign key for buckets in order to decrease storage requirements per row for buckets.
- name¶
External identifier of the location.
- updated¶
Modification timestamp.
- uri¶
URI of the location.
- class invenio_files_rest.models.MultipartObject(**kwargs)[source]¶
Model for storing files in chunks.
A multipart object belongs to a specific bucket and key and is identified by an upload id. You can have multiple multipart uploads for the same bucket and key. Once all parts of a multipart object is uploaded, the state is changed to
completed
. Afterwards it is not possible to upload new parts. Once completed, the multipart object is merged, and added as a new version in the current object/bucket.All parts for a multipart upload must be of the same size, except for the last part.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- bucket¶
Relationship to buckets.
- bucket_id¶
Bucket identifier.
- chunk_size¶
Size of chunks for file.
- completed¶
Defines if object is the completed.
- created¶
Creation timestamp.
- file¶
Relationship to buckets.
- file_id¶
File instance for this multipart object.
- classmethod get(bucket, key, upload_id, with_completed=False)[source]¶
Fetch a specific multipart object.
- key¶
Key identifying the object.
- property last_part_number¶
Get last part number.
- property last_part_size¶
Get size of last part.
- size¶
Size of file.
- updated¶
Modification timestamp.
- upload_id¶
Identifier for the specific version of an object.
- class invenio_files_rest.models.ObjectVersion(**kwargs)[source]¶
Model for storing versions of objects.
A bucket stores one or more objects identified by a key. Each object is versioned where each version is represented by an
ObjectVersion
.An object version can either be 1) a normal version which is linked to a file instance, or 2) a delete marker, which is not linked to a file instance.
An normal object version is linked to a physical file on disk via a file instance. This allows for multiple object versions to point to the same file on disk, to optimize storage efficiency (e.g. useful for snapshotting an entire bucket without duplicating the files).
A delete marker object version represents that the object at hand was deleted.
The latest version of an object is marked using the
is_head
property. If the latest object version is a delete marker the object will not be shown in the bucket.A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- property basename¶
Return filename of the object.
- bucket¶
Relationship to buckets.
- bucket_id¶
Bucket identifier.
- copy(bucket=None, key=None)[source]¶
Copy an object version to a given bucket + object key.
The copy operation is handled completely at the metadata level. The actual data on disk is not copied. Instead, the two object versions will point to the same physical file (via the same FileInstance).
All the tags associated with the current object version are copied over to the new instance.
Warning
If the destination object exists, it will be replaced by the new object version which will become the latest version.
- Parameters
bucket – The bucket (instance or id) to copy the object to. Default: current bucket.
key – Key name of destination object. Default: current object key.
- Returns
The copied object version.
- classmethod create(bucket, key, _file_id=None, stream=None, mimetype=None, version_id=None, **kwargs)[source]¶
Create a new object in a bucket.
The created object is by default created as a delete marker. You must use
set_contents()
orset_location()
in order to change this.- Parameters
bucket – The bucket (instance or id) to create the object in.
key – Key of object.
_file_id – For internal use.
stream – File-like stream object. Used to set content of object immediately after being created.
mimetype – MIME type of the file object if it is known.
kwargs – Keyword arguments passed to
Object.set_contents()
.
- created¶
Creation timestamp.
- classmethod delete(bucket, key)[source]¶
Delete an object.
Technically works by creating a new version which works as a delete marker.
- Parameters
bucket – The bucket (instance or id) to delete the object from.
key – Key of object.
- Returns
Created delete marker object if key exists else
None
.
- property deleted¶
Determine if object version is a delete marker.
- file¶
Relationship to file instance.
- file_id¶
File instance for this object version.
A null value in this column defines that the object has been deleted.
- classmethod get(bucket, key, version_id=None)[source]¶
Fetch a specific object.
By default the latest object version is returned, if
version_id
is not set.- Parameters
bucket – The bucket (instance or id) to get the object from.
key – Key of object.
version_id – Specific version of an object.
- classmethod get_by_bucket(bucket, versions=False, with_deleted=False)[source]¶
Return query that fetches all the objects in a bucket.
- Parameters
bucket – The bucket (instance or id) to query.
versions – Select all versions if True, only heads otherwise.
with_deleted – Select also deleted objects if True.
- Returns
The query to retrieve filtered objects in the given bucket.
- classmethod get_versions(bucket, key, desc=True)[source]¶
Fetch all versions of a specific object.
- Parameters
bucket – The bucket (instance or id) to get the object from.
key – Key of object.
desc – Sort results desc if True, asc otherwise.
- Returns
The query to execute to fetch all versions.
- is_head¶
Defines if object is the latest version.
- classmethod ix_uq_partial_files_object_is_head_dll()[source]¶
Return DDL instruction for ix_uq_partial_files_object_is_head.
- key¶
Key identifying the object.
- mimetype¶
Get MIME type of object.
- classmethod relink_all(old_file, new_file)[source]¶
Relink all object versions (for a given file) to a new file.
Warning
Use this method with great care.
- remove()[source]¶
Permanently remove a specific object version from the database.
Warning
This by-passes the normal versioning and should only be used when you want to permanently delete a specific object version. Otherwise use
ObjectVersion.delete()
.Note the method does not remove the associated file instance which must be garbage collected.
- Returns
self
.
- restore()[source]¶
Restore this object version to become the latest version.
Raises an exception if the object is the latest version.
- set_contents(stream, chunk_size=None, size=None, size_limit=None, progress_callback=None)[source]¶
Save contents of stream to file instance.
If a file instance has already been set, this methods raises an
FileInstanceAlreadySetError
exception.- Parameters
stream – File-like stream.
size – Size of stream if known.
chunk_size – Desired chunk size to read stream in. It is up to the storage interface if it respects this value.
- set_location(uri, size, checksum, storage_class=None)[source]¶
Set only URI location of for object.
Useful to link files on externally controlled storage. If a file instance has already been set, this methods raises an
FileInstanceAlreadySetError
exception.- Parameters
uri – Full URI to object (which can be interpreted by the storage interface).
size – Size of file.
checksum – Checksum of file.
storage_class – Storage class where file is stored ()
- updated¶
Modification timestamp.
- version_id¶
Identifier for the specific version of an object.
- class invenio_files_rest.models.ObjectVersionTag(**kwargs)[source]¶
Model for storing tags associated to object versions.
Used for storing extra technical information for an object version.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- copy(object_version=None, key=None)[source]¶
Copy a tag to a given object version.
- Parameters
object_version – The object version instance to copy the tag to. Default: current object version.
key – Key of destination tag. Default: current tag key.
- Returns
The copied object version tag.
- classmethod create(object_version, key, value)[source]¶
Create a new tag for a given object version.
- classmethod create_or_update(object_version, key, value)[source]¶
Create or update a new tag for a given object version.
- classmethod delete(object_version, key=None)[source]¶
Delete tags.
- Parameters
object_version – The object version instance or id.
key – Key of the tag to delete. Default: delete all tags.
- key¶
Tag key.
- object_version¶
Relationship to object versions.
- value¶
Tag value.
- version_id¶
Object version id.
- class invenio_files_rest.models.Part(**kwargs)[source]¶
Part object.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs
.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- checksum¶
String representing the checksum of the part.
- classmethod create(mp, part_number, stream=None, **kwargs)[source]¶
Create a new part object in a multipart object.
- created¶
Creation timestamp.
- property end_byte¶
Get end byte in file for this part.
- multipart¶
Relationship to multipart objects.
- part_number¶
Part number.
- property part_size¶
Get size of this part.
- classmethod query_by_multipart(multipart)[source]¶
Get all parts for a specific multipart upload.
- Parameters
multipart – A
invenio_files_rest.models.MultipartObject
instance.- Returns
A
invenio_files_rest.models.Part
instance.
- set_contents(stream, progress_callback=None)[source]¶
Save contents of stream to part of file instance.
If a the MultipartObject is completed this methods raises an
MultipartAlreadyCompleted
exception.- Parameters
stream – File-like stream.
size – Size of stream if known.
chunk_size – Desired chunk size to read stream in. It is up to the storage interface if it respects this value.
- property start_byte¶
Get start byte in file of this part.
- updated¶
Modification timestamp.
- upload_id¶
Multipart object identifier.
Storage¶
File storage interface.
- class invenio_files_rest.storage.FileStorage(size=None, modified=None)[source]¶
Base class for storage interface to a single file.
Initialize storage object.
- copy(src, chunk_size=None, progress_callback=None)[source]¶
Copy data from another file instance.
- Parameters
src – Source stream.
chunk_size – Chunk size to read from source stream.
- save(incoming_stream, size_limit=None, size=None, chunk_size=None, progress_callback=None)[source]¶
Save incoming stream to file storage.
- class invenio_files_rest.storage.PyFSFileStorage(fileurl, size=None, modified=None, clean_dir=True)[source]¶
File system storage using PyFilesystem for access the file.
This storage class will store files according to the following pattern:
<base_uri>/<file instance uuid>/data
.Warning
File operations are not atomic. E.g. if errors happens during e.g. updating part of a file it will leave the file in an inconsistent state. The storage class tries as best as possible to handle errors and leave the system in a consistent state.
Storage initialization.
- delete()[source]¶
Delete a file.
The base directory is also removed, as it is assumed that only one file exists in the directory.
- invenio_files_rest.storage.pyfs_storage_factory(fileinstance=None, default_location=None, default_storage_class=None, filestorage_class=<class 'invenio_files_rest.storage.pyfs.PyFSFileStorage'>, fileurl=None, size=None, modified=None, clean_dir=True)[source]¶
Get factory function for creating a PyFS file storage instance.
Signals¶
Models for Invenio-Files-REST.
- invenio_files_rest.signals.file_deleted = <blinker.base.NamedSignal object at 0x7f72e174bc50; 'file-deleted'>¶
File deleted signal.
Sent when a file is deleted.
- invenio_files_rest.signals.file_downloaded = <blinker.base.NamedSignal object at 0x7f72e174ba90; 'file-downloaded'>¶
File downloaded signal.
Sent when a file is downloaded.
- invenio_files_rest.signals.file_uploaded = <blinker.base.NamedSignal object at 0x7f72e174bc10; 'file-uploaded'>¶
File uploaded signal.
Sent when a file is uploaded.
File streaming¶
File serving helpers for Files REST API.
- invenio_files_rest.helpers.MIMETYPE_WHITELIST = {'audio/mpeg', 'audio/ogg', 'audio/wav', 'audio/webm', 'image/gif', 'image/jpeg', 'image/png', 'image/tiff', 'text/plain'}¶
List of whitelisted MIME types.
Warning
Do not add new types to this list unless you know what you are doing. You could potentially open up for XSS attacks.
- invenio_files_rest.helpers.chunk_size_or_default(chunk_size)[source]¶
Use default chunksize if not configured.
- invenio_files_rest.helpers.compute_checksum(stream, algo, message_digest, chunk_size=None, progress_callback=None)[source]¶
Get helper method to compute checksum from a stream.
- Parameters
stream – File-like object.
algo – Identifier for checksum algorithm.
messsage_digest – A message digest instance.
chunk_size – Read at most size bytes from the file at a time.
progress_callback – Function accepting one argument with number of bytes read. (Default:
None
)
- Returns
The checksum.
- invenio_files_rest.helpers.compute_md5_checksum(stream, **kwargs)[source]¶
Get helper method to compute MD5 checksum from a stream.
- Parameters
stream – The input stream.
- Returns
The MD5 checksum.
- invenio_files_rest.helpers.create_file_streaming_redirect_response(obj)[source]¶
Redirect response generating function.
- invenio_files_rest.helpers.make_path(base_uri, path, filename, path_dimensions, split_length)[source]¶
Generate a path as base location for file instance.
- Parameters
base_uri – The base URI.
path – The relative path.
path_dimensions – Number of chunks the path should be split into.
split_length – The length of any chunk.
- Returns
A string representing the full path.
- invenio_files_rest.helpers.populate_from_path(bucket, source, checksum=True, key_prefix='', chunk_size=None)[source]¶
Populate a
bucket
from all files in path.- Parameters
bucket – The bucket (instance or id) to create the object in.
source – The file or directory path.
checksum – If
True
then a MD5 checksum will be computed for each file. (Default:True
)key_prefix – The key prefix for the bucket.
chunk_size – Chunk size to read from file.
- Returns
A iterator for all
invenio_files_rest.models.ObjectVersion
instances.
- invenio_files_rest.helpers.sanitize_mimetype(mimetype, filename=None)[source]¶
Sanitize a MIME type so the browser does not render the file.
- invenio_files_rest.helpers.send_stream(stream, filename, size, mtime, mimetype=None, restricted=True, as_attachment=False, etag=None, content_md5=None, chunk_size=None, conditional=True, trusted=False)[source]¶
Send the contents of a file to the client.
Warning
It is very easy to be exposed to Cross-Site Scripting (XSS) attacks if you serve user uploaded files. Here are some recommendations:
Serve user uploaded files from a separate domain (not a subdomain). This way a malicious file can only attack other user uploaded files.
Prevent the browser from rendering and executing HTML files (by setting
trusted=False
).Force the browser to download the file as an attachment (
as_attachment=True
).
- Parameters
stream – The file stream to send.
filename – The file name.
size – The file size.
mtime – A Unix timestamp that represents last modified time (UTC).
mimetype – The file mimetype. If
None
, the module will try to guess. (Default:None
)restricted – If the file is not restricted, the module will set the cache-control. (Default:
True
)as_attachment – If the file is an attachment. (Default:
False
)etag – If defined, it will be set as HTTP E-Tag.
content_md5 – If defined, a HTTP Content-MD5 header will be set.
chunk_size – The chunk size.
conditional – Make the response conditional to the request. (Default:
True
)trusted – Do not enable this option unless you know what you are doing. By default this function will send HTTP headers and MIME types that prevents your browser from rendering e.g. a HTML file which could contain a malicious script tag. (Default:
False
)
- Returns
A Flask response instance.
Tasks¶
Celery tasks for Invenio-Files-REST.
- invenio_files_rest.tasks.clear_orphaned_files(force_delete_check=<function <lambda> at 0x7f72e16d7440>, limit=1000)[source]¶
Delete orphaned files from DB and storage.
Note
Orphan files are files (
invenio_files_rest.models.FileInstance
objects and their on-disk counterparts) that do not have anyinvenio_files_rest.models.ObjectVersion
objects associated with them (anymore).The celery beat configuration for scheduling this task may set values for this task’s parameters:
"clear-orphan-files": { "task": "invenio_files_rest.tasks.clear_orphaned_files", "schedule": 60 * 60 * 24, "kwargs": { "force_delete_check": lambda file: False, "limit": 500, } }
- Parameters
force_delete_check – A function to be called on each orphan file instance to check if its deletion should be forced (bypass the check of its
writable
flag). For example, this function can be used to force-delete files only if they are located on the local file system. Signature: The function should accept ainvenio_files_rest.models.FileInstance
object and return a boolean value. Default: Never force-delete any orphan files (lambda file_instance: False
).limit – Limit for the number of orphan files considered for deletion in each task execution (and thus the number of generated celery tasks). A value of zero (0) or lower disables the limit.
- invenio_files_rest.tasks.default_checksum_verification_files_query()[source]¶
Return a query of valid FileInstances for checksum verification.
- invenio_files_rest.tasks.merge_multipartobject(upload_id, version_id=None)[source]¶
Merge multipart object.
- Parameters
upload_id – The
invenio_files_rest.models.MultipartObject
upload ID.version_id – Optionally you can define which file version. (Default:
None
)
- Returns
The
invenio_files_rest.models.ObjectVersion
version ID.
- invenio_files_rest.tasks.migrate_file(src_id, location_name, post_fixity_check=False)[source]¶
Task to migrate a file instance to a new location.
Note
If something goes wrong during the content copy, the destination file instance is removed.
- Parameters
src_id – The
invenio_files_rest.models.FileInstance
ID.location_name – Where to migrate the file.
post_fixity_check – Verify checksum after migration. (Default:
False
)
- invenio_files_rest.tasks.progress_updater(size, total)[source]¶
Progress reporter for checksum verification.
- invenio_files_rest.tasks.remove_expired_multipartobjects()[source]¶
Remove expired multipart objects.
- invenio_files_rest.tasks.remove_file_data(file_id, silent=True, force=False)[source]¶
Remove file instance and associated data.
- Parameters
file_id – The
invenio_files_rest.models.FileInstance
ID.silent – It stops propagation of a possible raised IntegrityError exception. (Default:
True
)force – Whether to delete the file even if the file instance is not marked as writable.
- Raises
sqlalchemy.exc.IntegrityError – Raised if the database removal goes wrong and silent is set to
False
.
- invenio_files_rest.tasks.schedule_checksum_verification(frequency=None, batch_interval=None, max_count=None, max_size=None, files_query=None, checksum_kwargs=None)[source]¶
Schedule a batch of files for checksum verification.
The purpose of this task is to be periodically called through celerybeat, in order achieve a repeated verification cycle of all file checksums, while following a set of constraints in order to throttle the execution rate of the checks.
- Parameters
frequency (dict) – Time period over which a full check of all files should be performed. The argument is a dictionary that will be passed as arguments to the datetime.timedelta class. Defaults to a month (30 days).
batch_interval (dict) – How often a batch is sent. If not supplied, this information will be extracted, if possible, from the celery.conf[‘CELERYBEAT_SCHEDULE’] entry of this task. The argument is a dictionary that will be passed as arguments to the datetime.timedelta class.
max_count (int) – Max count of files of a single batch. When set to 0 it’s automatically calculated to be distributed equally through the number of total batches.
max_size (int) – Max size of a single batch in bytes. When set to 0 it’s automatically calculated to be distributed equally through the number of total batches.
files_query (str) – Import path for a function returning a FileInstance query for files that should be checked.
checksum_kwargs (dict) – Passed to
FileInstance.verify_checksum
.
Exceptions¶
Errors for Invenio-Files-REST.
- exception invenio_files_rest.errors.BucketLockedError(errors=None, **kwargs)[source]¶
Exception raised when a bucket is locked.
Initialize RESTException.
- exception invenio_files_rest.errors.DuplicateTagError(errors=None, **kwargs)[source]¶
Invalid tag key and/or value.
Initialize RESTException.
- exception invenio_files_rest.errors.ExhaustedStreamError(errors=None, **kwargs)[source]¶
The incoming file stream has been already consumed.
Initialize RESTException.
- exception invenio_files_rest.errors.FileInstanceAlreadySetError(errors=None, **kwargs)[source]¶
Exception raised when file instance already set on object.
Initialize RESTException.
- exception invenio_files_rest.errors.FileInstanceUnreadableError(errors=None, **kwargs)[source]¶
Exception raised when trying to get an unreadable file.
Initialize RESTException.
- exception invenio_files_rest.errors.FileSizeError(errors=None, **kwargs)[source]¶
Exception raised when a file larger than allowed.
Initialize RESTException.
- exception invenio_files_rest.errors.FilesException(errors=None, **kwargs)[source]¶
Base exception for all errors.
Initialize RESTException.
- exception invenio_files_rest.errors.InvalidKeyError(errors=None, **kwargs)[source]¶
Invalid key.
Initialize RESTException.
- exception invenio_files_rest.errors.InvalidOperationError(errors=None, **kwargs)[source]¶
Exception raised when an invalid operation is performed.
Initialize RESTException.
- exception invenio_files_rest.errors.InvalidTagError(errors=None, **kwargs)[source]¶
Invalid tag key and/or value.
Initialize RESTException.
- exception invenio_files_rest.errors.MissingQueryParameter(arg_name, **kwargs)[source]¶
Exception raised when missing a query parameter.
Initialize RESTException.
- exception invenio_files_rest.errors.MultipartAlreadyCompleted(errors=None, **kwargs)[source]¶
Exception raised when multipart object is already completed.
Initialize RESTException.
- exception invenio_files_rest.errors.MultipartException(errors=None, **kwargs)[source]¶
Exception for multipart objects.
Initialize RESTException.
- exception invenio_files_rest.errors.MultipartInvalidChunkSize(errors=None, **kwargs)[source]¶
Exception raised when multipart object is already completed.
Initialize RESTException.
- exception invenio_files_rest.errors.MultipartInvalidPartNumber(errors=None, **kwargs)[source]¶
Exception raised when multipart object is already completed.
Initialize RESTException.
- exception invenio_files_rest.errors.MultipartInvalidSize(errors=None, **kwargs)[source]¶
Exception raised when multipart object is already completed.
Initialize RESTException.
- exception invenio_files_rest.errors.MultipartMissingParts(errors=None, **kwargs)[source]¶
Exception raised when multipart object is already completed.
Initialize RESTException.
- exception invenio_files_rest.errors.MultipartNoPart(errors=None, **kwargs)[source]¶
Exception raised by part factories when no part was detected.
Initialize RESTException.
- exception invenio_files_rest.errors.MultipartNotCompleted(errors=None, **kwargs)[source]¶
Exception raised when multipart object is not already completed.
Initialize RESTException.
Limiters¶
File size limiting functionality for Invenio-Files-REST.
- class invenio_files_rest.limiters.FileSizeLimit(limit, reason)[source]¶
File size limiter.
Instantiate a new file size limit.
- Parameters
limit – The imposed imposed limit.
reason – The limit description.
- invenio_files_rest.limiters.file_size_limiters(bucket)[source]¶
Get default file size limiters.
- Parameters
bucket – The
invenio_files_rest.models.Bucket
instance.- Returns
A list containing an instance of
invenio_files_rest.limiters.FileSizeLimit
with quota left value and description and another one with max file size value and description.
Permissions¶
Permissions for files using Invenio-Access.
- invenio_files_rest.permissions.BucketListMultiparts = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-bucket-listmultiparts')¶
Action needed: list multipart uploads in bucket.
- invenio_files_rest.permissions.BucketRead = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-bucket-read')¶
Action needed: list objects in bucket.
- invenio_files_rest.permissions.BucketReadVersions = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-bucket-read-versions')¶
Action needed: list object versions in bucket.
- invenio_files_rest.permissions.BucketUpdate = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-bucket-update')¶
Action needed: create objects and multipart uploads in bucket.
- invenio_files_rest.permissions.LocationUpdate = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-location-update')¶
Action needed: location update.
- invenio_files_rest.permissions.MultipartDelete = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-multipart-delete')¶
Action needed: abort a multipart upload.
- invenio_files_rest.permissions.MultipartRead = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-multipart-read')¶
Action needed: list parts of a multipart upload in a bucket.
- invenio_files_rest.permissions.ObjectDelete = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-object-delete')¶
Action needed: delete object in bucket.
- invenio_files_rest.permissions.ObjectDeleteVersion = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-object-delete-version')¶
Action needed: permanently delete specific object version in bucket.
- invenio_files_rest.permissions.ObjectRead = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-object-read')¶
Action needed: get object in bucket.
- invenio_files_rest.permissions.ObjectReadVersion = functools.partial(functools.partial(<class 'invenio_access.permissions.Need'>, 'action'), 'files-rest-object-read-version')¶
Action needed: get object version in bucket.
- invenio_files_rest.permissions.bucket_listmultiparts_all = Need(method='action', value='files-rest-bucket-listmultiparts', argument=None)¶
Action needed: list all buckets multiparts.
- invenio_files_rest.permissions.bucket_read_all = Need(method='action', value='files-rest-bucket-read', argument=None)¶
Action needed: read all buckets.
- invenio_files_rest.permissions.bucket_read_versions_all = Need(method='action', value='files-rest-bucket-read-versions', argument=None)¶
Action needed: read all buckets versions.
- invenio_files_rest.permissions.bucket_update_all = Need(method='action', value='files-rest-bucket-update', argument=None)¶
Action needed: update all buckets
- invenio_files_rest.permissions.location_update_all = Need(method='action', value='files-rest-location-update', argument=None)¶
Action needed: update all locations.
- invenio_files_rest.permissions.multipart_delete_all = Need(method='action', value='files-rest-multipart-delete', argument=None)¶
Action needed: delete all multiparts.
- invenio_files_rest.permissions.multipart_read_all = Need(method='action', value='files-rest-multipart-read', argument=None)¶
Action needed: read all multiparts.
- invenio_files_rest.permissions.object_delete_all = Need(method='action', value='files-rest-object-delete', argument=None)¶
Action needed: delete all objects.
- invenio_files_rest.permissions.object_delete_version_all = Need(method='action', value='files-rest-object-delete-version', argument=None)¶
Action needed: delete all objects versions.
- invenio_files_rest.permissions.object_read_all = Need(method='action', value='files-rest-object-read', argument=None)¶
Action needed: read all objects.
- invenio_files_rest.permissions.object_read_version_all = Need(method='action', value='files-rest-object-read-version', argument=None)¶
Action needed: read all objects versions.
- invenio_files_rest.permissions.permission_factory(obj, action)[source]¶
Get default permission factory.
- Parameters
obj – An instance of
invenio_files_rest.models.Bucket
orinvenio_files_rest.models.ObjectVersion
orinvenio_files_rest.models.MultipartObject
orNone
if the action is global.action – The required action.
- Raises
RuntimeError – If the object is unknown.
- Returns
A
invenio_access.permissions.Permission
instance.
Views¶
Files download/upload REST API similar to S3 for Invenio.
- class invenio_files_rest.views.BucketResource(*args, **kwargs)[source]¶
Bucket item resource.
Instantiate content negotiated view.
- get(bucket=None, versions=<marshmallow.missing>, uploads=<marshmallow.missing>)[source]¶
Get list of objects in the bucket.
- Parameters
bucket – A
invenio_files_rest.models.Bucket
instance.- Returns
The Flask response.
- listobjects(bucket, versions)[source]¶
List objects in a bucket.
- Parameters
bucket – A
invenio_files_rest.models.Bucket
instance.- Returns
The Flask response.
- methods: ClassVar[Optional[Collection[str]]] = {'GET', 'HEAD'}¶
The methods this view is registered for. Uses the same default (
["GET", "HEAD", "OPTIONS"]
) asroute
andadd_url_rule
by default.
- multipart_listuploads(bucket)[source]¶
List objects in a bucket.
- Parameters
bucket – A
invenio_files_rest.models.Bucket
instance.- Returns
The Flask response.
- class invenio_files_rest.views.LocationResource(*args, **kwargs)[source]¶
Service resource.
Instantiate content negotiated view.
- methods: ClassVar[Optional[Collection[str]]] = {'POST'}¶
The methods this view is registered for. Uses the same default (
["GET", "HEAD", "OPTIONS"]
) asroute
andadd_url_rule
by default.
- class invenio_files_rest.views.ObjectResource(*args, **kwargs)[source]¶
Object item resource.
Instantiate content negotiated view.
- create_object(bucket, key)[source]¶
Create a new object.
- Parameters
bucket – The bucket (instance or id) to get the object from.
key – The file key.
- Returns
A Flask response.
- delete(bucket=None, key=None, version_id=None, upload_id=None, uploads=None)[source]¶
Delete an object or abort a multipart upload.
- Parameters
bucket – The bucket (instance or id) to get the object from. (Default:
None
)key – The file key. (Default:
None
)version_id – The version ID. (Default:
None
)upload_id – The upload ID. (Default:
None
)
- Returns
A Flask response.
- delete_object(bucket, obj, version_id)[source]¶
Delete an existing object.
- Parameters
bucket – The bucket (instance or id) to get the object from.
obj – A
invenio_files_rest.models.ObjectVersion
instance.version_id – The version ID.
- Returns
A Flask response.
- get(bucket=None, key=None, version_id=None, upload_id=None, uploads=None, download=None)[source]¶
Get object or list parts of a multipart upload.
- Parameters
bucket – The bucket (instance or id) to get the object from. (Default:
None
)key – The file key. (Default:
None
)version_id – The version ID. (Default:
None
)upload_id – The upload ID. (Default:
None
)download – The download flag. (Default:
None
)
- Returns
A Flask response.
- classmethod get_object(bucket, key, version_id)[source]¶
Retrieve object and abort if it doesn’t exist.
If the file is not found, the connection is aborted and the 404 error is returned.
- Parameters
bucket – The bucket (instance or id) to get the object from.
key – The file key.
version_id – The version ID.
- Returns
A
invenio_files_rest.models.ObjectVersion
instance.
- methods: ClassVar[Optional[Collection[str]]] = {'DELETE', 'GET', 'POST', 'PUT'}¶
The methods this view is registered for. Uses the same default (
["GET", "HEAD", "OPTIONS"]
) asroute
andadd_url_rule
by default.
- multipart_complete(multipart)[source]¶
Complete a multipart upload.
- Parameters
multipart – A
invenio_files_rest.models.MultipartObject
instance.- Returns
A Flask response.
- multipart_delete(multipart)[source]¶
Abort a multipart upload.
- Parameters
multipart – A
invenio_files_rest.models.MultipartObject
instance.- Returns
A Flask response.
- multipart_init(bucket, key, size=None, part_size=None)[source]¶
Initialize a multipart upload.
- Parameters
bucket – The bucket (instance or id) to get the object from.
key – The file key.
size – The total size.
part_size – The part size.
- Raises
invenio_files_rest.errors.MissingQueryParameter – If size or part_size are not defined.
- Returns
A Flask response.
- multipart_listparts(multipart)[source]¶
Get parts of a multipart upload.
- Parameters
multipart – A
invenio_files_rest.models.MultipartObject
instance.- Returns
A Flask response.
- multipart_uploadpart(multipart)[source]¶
Upload a part.
- Parameters
multipart – A
invenio_files_rest.models.MultipartObject
instance.- Returns
A Flask response.
- post(bucket=None, key=None, uploads=<marshmallow.missing>, upload_id=None)[source]¶
Upload a new object or start/complete a multipart upload.
- Parameters
bucket – The bucket (instance or id) to get the object from. (Default:
None
)key – The file key. (Default:
None
)upload_id – The upload ID. (Default:
None
)
- Returns
A Flask response.
- put(bucket=None, key=None, upload_id=None)[source]¶
Update a new object or upload a part of a multipart upload.
- Parameters
bucket – The bucket (instance or id) to get the object from. (Default:
None
)key – The file key. (Default:
None
)upload_id – The upload ID. (Default:
None
)
- Returns
A Flask response.
- static send_object(bucket, obj, expected_chksum=None, logger_data=None, restricted=True, as_attachment=False)[source]¶
Send an object for a given bucket.
- Parameters
bucket – The bucket (instance or id) to get the object from.
obj – A
invenio_files_rest.models.ObjectVersion
instance.logger_data – The python logger.
kwargs – Keyword arguments passed to
Object.send_file()
- Params expected_chksum
Expected checksum.
- Returns
A Flask response.
- invenio_files_rest.views.bucket_view(**kwargs)¶
Bucket item resource.
- Parameters
kwargs (Any) –
- Return type
Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], Union[Headers, Mapping[str, Union[str, List[str], Tuple[str, …]]], Sequence[Tuple[str, Union[str, List[str], Tuple[str, …]]]]]], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], int], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], int, Union[Headers, Mapping[str, Union[str, List[str], Tuple[str, …]]], Sequence[Tuple[str, Union[str, List[str], Tuple[str, …]]]]]], WSGIApplication]
- invenio_files_rest.views.check_permission(permission, hidden=True)[source]¶
Check if permission is allowed.
If permission fails then the connection is aborted.
- Parameters
permission – The permission to check.
hidden – Determine if a 404 error (
True
) or 401/403 error (False
) should be returned if the permission is rejected (i.e. hide or reveal the existence of a particular object).
- invenio_files_rest.views.default_partfactory(part_number=None, content_length=None, content_type=None, content_md5=None)[source]¶
Get default part factory.
- Parameters
part_number – The part number. (Default:
None
)content_length – The content length. (Default:
None
)content_type – The HTTP Content-Type. (Default:
None
)content_md5 – The content MD5. (Default:
None
)
- Returns
The content length, the part number, the stream, the content type, MD5 of the content.
- invenio_files_rest.views.ensure_input_stream_is_not_exhausted(f)[source]¶
Make sure that the input stream has not been read already.
- invenio_files_rest.views.location_view(**kwargs)¶
Service resource.
- Parameters
kwargs (Any) –
- Return type
Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], Union[Headers, Mapping[str, Union[str, List[str], Tuple[str, …]]], Sequence[Tuple[str, Union[str, List[str], Tuple[str, …]]]]]], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], int], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], int, Union[Headers, Mapping[str, Union[str, List[str], Tuple[str, …]]], Sequence[Tuple[str, Union[str, List[str], Tuple[str, …]]]]]], WSGIApplication]
- invenio_files_rest.views.minsize_validator(value)[source]¶
Validate Content-Length header.
- Raises
invenio_files_rest.errors.FileSizeError – If the value is less than
invenio_files_rest.config.FILES_REST_MIN_FILE_SIZE
size.
- invenio_files_rest.views.need_bucket_permission(action, hidden=True)¶
Get permission for buckets or abort.
- Parameters
object_getter – The function used to retrieve the object and pass it to the permission factory.
action – The action needed.
hidden – Determine which kind of error to return. (Default:
True
)
- invenio_files_rest.views.need_location_permission(action, hidden=True)¶
Get permission for buckets or abort.
- Parameters
object_getter – The function used to retrieve the object and pass it to the permission factory.
action – The action needed.
hidden – Determine which kind of error to return. (Default:
True
)
- invenio_files_rest.views.need_permissions(object_getter, action, hidden=True)[source]¶
Get permission for buckets or abort.
- Parameters
object_getter – The function used to retrieve the object and pass it to the permission factory.
action – The action needed.
hidden – Determine which kind of error to return. (Default:
True
)
- invenio_files_rest.views.ngfileupload_partfactory(part_number=None, content_length=None, uploaded_file=None)[source]¶
Part factory for ng-file-upload.
- Parameters
part_number – The part number. (Default:
None
)content_length – The content length. (Default:
None
)uploaded_file – The upload request. (Default:
None
)
- Returns
The content length, part number, stream, HTTP Content-Type header.
- invenio_files_rest.views.ngfileupload_uploadfactory(content_length=None, content_type=None, uploaded_file=None)[source]¶
Get default put factory.
If Content-Type is
'multipart/form-data'
then the stream is aborted.- Parameters
content_length – The content length. (Default:
None
)content_type – The HTTP Content-Type. (Default:
None
)uploaded_file – The upload request. (Default:
None
)file_tags_header – The file tags. (Default:
None
)
- Returns
A tuple containing stream, content length, and empty header.
- invenio_files_rest.views.object_view(**kwargs)¶
Object item resource.
- Parameters
kwargs (Any) –
- Return type
Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], Union[Headers, Mapping[str, Union[str, List[str], Tuple[str, …]]], Sequence[Tuple[str, Union[str, List[str], Tuple[str, …]]]]]], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], int], Tuple[Union[Response, str, bytes, List[Any], Mapping[str, Any], Iterator[str], Iterator[bytes]], int, Union[Headers, Mapping[str, Union[str, List[str], Tuple[str, …]]], Sequence[Tuple[str, Union[str, List[str], Tuple[str, …]]]]]], WSGIApplication]
- invenio_files_rest.views.parse_header_tags()[source]¶
Parse tags specified in the HTTP request header.
- invenio_files_rest.views.pass_multipart(with_completed=False)[source]¶
Decorate to retrieve an object.
- invenio_files_rest.views.stream_uploadfactory(content_md5=None, content_length=None, content_type=None)[source]¶
Get default put factory.
If Content-Type is
'multipart/form-data'
then the stream is aborted.- Parameters
content_md5 – The content MD5. (Default:
None
)content_length – The content length. (Default:
None
)content_type – The HTTP Content-Type. (Default:
None
)
- Returns
The stream, content length, MD5 of the content.
Form parser¶
Werkzeug form data parser customization.
- class invenio_files_rest.formparser.FormDataParser(stream_factory=None, charset='utf-8', errors='replace', max_form_memory_size=None, max_content_length=None, cls=None, silent=True, *, max_form_parts=None)[source]¶
Custom form data parser.
- Parameters
- parse(stream, mimetype, content_length, options=None)[source]¶
Parse the information from the given request.
- Parameters
stream – An input stream.
mimetype – The mimetype of the data.
content_length – The content length of the incoming data.
options – Optional mimetype parameters (used for the multipart boundary for instance).
- Returns
A tuple in the form
(stream, form, files)
.