Special Digital Objects
Files And Directories
Suppose your app can manage files on the machine heliport is running on. It is not a good idea to expose files like this in practice, but it makes for a simple example using just pythons built-in functionality.
You don’t have to do all the steps to get something working.
You can keep all the code in models.py
of your app.
Implementation of a view that allows to add / edit the objects is not part of this section.
Look at How to Integrate a New Module for that.
A Django model to represent the files or directories
from heliport.core.models import DigitalObject
from django.db import models
class DBLocalFile(DigitalObject):
file_id = models.AutoField(primary_key=True)
path_str = models.TextField()
class DBLocalDirectory(DigitalObject):
directory_id = models.AutoField(primary_key=True)
path_str = models.TextField()
This might already be sufficient in some cases, but is not yet specific to files or directories.
A single class in HELIPORT can represent “a file and a directory” but can not represent “either a file or a directory”, because this allows simple and efficient search. Therefore two classes are needed. Feel free to implement a base class for shared functionality.
Implement file-specific functionality
A DigitalObject
represents a file if it inherits from heliport.core.digital_object_aspects.FileObj
,
you have to implement the as_file
method:
from heliport.core.digital_object_aspects import FileObj
class DBLocalFile(DigitalObject, FileObj):
...
def as_file(self, context: Context) -> File:
self.access(context).assert_read_permission()
return LocalFile(self.path_str)
context
can be used for optimizations later and to get the current user or project.
Now you have to implement your LocalFile
class. It should inherit from File
:
from heliport.core.digital_object_aspects import File
from heliport.core.utils.string_tools import format_byte_count
from os import stat
class LocalFile(File):
def __init__(self, path):
self.file = None
self.path = path
def mimetype(self) -> str:
return "application/octet-stream" # just binary data
def open(self):
self.close()
self.file = open(self.path)
def close(self):
if self.file is not None:
self.file.close()
self.file = None
def read(self, number_of_bytes=None):
return self.file.read(number_of_bytes)
def size(self):
return stat(self.path).st_size
def get_file_info(self):
return {"size": format_byte_count(self.size())}
A heliport.core.digital_object_aspects.File
needs to implement some methods:
mimetype()
- you could use pythons built-inmimetypes.guess_type(file_name)
to get more specificopen()
/close()
- context manager is implemented inFile
so you could usewith LocalFile("a.txt") as f:
read()
- you don’t necessarily need to return the exactnumber_of_bytes
if it is easier to implementsize()
- you can returnNone
if size is unknownget_file_info()
- dict of human-readable values and arbitrary keys, to provide more details to the user. (can be an empty dict)
The separation of DBLocalFile
and LocalFile
into two classes gives you flexibility.
You don’t have to worry about interfering with Django things, e.g. you are free to choose the signature of __init__
or
inherit from an abstract base class. Another benefit is, that you have the permission check only in one place.
Implement directory-specific functionality
Directories work very similar to Files, but are even simpler. A DigitalObject
represents a directory if it inherits from DirectoryObj
,
you have to implement the as_directory
method:
from heliport.core.digital_object_aspects import DirectoryObj
class DBLocalDirectory(DigitalObject, DirectoryObj):
...
def as_directory(self, context: Context) -> LocalDirectory:
self.access(context).assert_read_permission()
return LocalDirectory(self.path_str)
Now you have to implement the LocalDirectory
class. It should inherit from Directory
:
from heliport.core.digital_object_aspects import Directory
import os
class LocalDirectory(Directory):
def __init__(self, path):
self.path = path
def get_parts(self):
result = []
for child_path in os.listdir(self.path):
if os.path.isfile(child_path):
child, is_new = DBLocalFile.objects.get_or_create(path_str=child_path, label=child_path)
else:
child, is_new = DBLocalDirectory.objects.get_or_create(path_str=child_path, label=child_path)
result.append(child)
return result
get_parts
could return any iterable or be a generator. Because the caller of get_parts
should be able to use all the features of Django and HELIPORT to work with the result, get_parts
has to return DigitalObject
or GeneralDigitalObject
instances. In our example it can not return
LocalDirectory
or LocalFile
directly.
When creating new objects, you should set additional attributes like the label in the example above.
Using General Digital Objects
The previous example would store all the files in the HELIPORT database the first time a user is looking at them.
To not do this, you could return GeneralDigitalObject
instances instead of a DBLocalFile
or DBLocalDirectory
.
These classes could look like this:
from heliport.core.digital_object_aspects import FileObj, DirectoryObj
from heliport.core.digital_object_interface import GeneralDigitalObject
from heliport.core.utils.collections import RootedRDFGraph
from abc import ABC
class PathObj(GeneralDigitalObject, ABC):
def __init__(self, path):
self.path = path
def as_text(self):
return self.path
def as_rdf(self):
return RootedRDFGraph.from_atomic(self.as_text())
def as_html(self):
return self.as_text()
def __hash__(self):
return hash(self.path)
def __eq__(self, other):
return isinstance(other, PathObj) and self.path == other.path
class LocalFileObj(PathObj, FileObj):
def as_digital_object(self, context):
file, is_new = DBLocalFile.objects.get_or_create(path_str=self.path, label=self.path)
return file
def as_file(self, context):
return LocalFile(self.path)
def type_id(self):
return "LocalFile"
def get_identifying_params(self):
return {"type": self.type_id(), "path": self.path}
@staticmethod
def resolve(params, context):
return LocalFileObj(params["path"])
class LocalDirectoryObj(PathObj, DirectoryObj):
def as_digital_object(self, context):
file, is_new = DBLocalDirectory.objects.get_or_create(path_str=self.path, label=self.path)
return file
def as_directory(self, context):
return LocalDirectory(self.path)
def type_id(self):
return "LocalDirectory"
def get_identifying_params(self):
return {"type": self.type_id(), "path": self.path}
@staticmethod
def resolve(params, context):
return LocalDirectoryObj(params["path"])
This is an extremely basic implementation of all required methods. The documentation of GeneralDigitalObject
and its base classes provides more information.
Also, you need to register these objects like described in the documentation for digital_object_resolution.py
.
Some common functionality has been extracted to the base class PathObj
.
With this, you could update the get_parts
function:
from heliport.core.digital_object_aspects import Directory
import os
class LocalDirectory(Directory):
def __init__(self, path):
self.path = path
def get_parts(self):
result = []
for child_path in os.listdir(self.path):
if os.path.isfile(child_path):
child = LocalFileObj(child_path)
else:
child = LocalDirectoryObj(child_path)
result.append(child)
return result
In total, you could have the following classes for local files and directories:
For Files
LocalFile:
The actual
File
has
read()
DBLocalFile:
Represent a file as
DigitalObject
in database
has
as_file()
LocalFileObj:
Represent a file as
GeneralDigitalObject
not in database
has
as_file()
For Directories
LocalDirectory:
The actual
Directory
has
get_parts()
DBLocalDirectory:
Represent a directory as
DigitalObject
in database
has
as_directory()
LocalDirectoryObj:
Represent a directory as
GeneralDigitalObject
not in database
has
as_directory()
Optimize by using Context
If you need some kind of connection to open a file, it can be slow to reestablish the connection for each file:
from heliport.core.digital_object_aspects import File
def open_connection(some_parameter):
# this part takes some time
return "the connection"
class LocalFile(File):
...
def open(self):
connection = open_connection("some_value")
... # use connection to open file
Instead of calling the open_connection
function every time, you can describe the call using a dataclass, that has a generate
method:
from dataclasses import dataclass
@dataclass(frozen=True)
class MyConnection:
some_parameter: str
def generate(self, context):
return open_connection(self.some_parameter)
If the generate
function returns a context manager, it is entered.
When the context is closed, all context managers from generate
functions are closed automatically.
The file classes can now look like this:
class DBLocalFile(DigitalObject, FileObj):
...
def as_file(self, context: Context) -> File:
self.access(context).assert_read_permission()
return LocalFile(self.path_str, context)
class LocalFile(File):
def __init__(self, path, context):
...
self.context = context
...
def open(self):
connection = self.context[MyConnection("some_value")]
print(self.context.user_or_none) # example how to use context to get user
...
...
This does the following:
Context calls
generate
the first time aMyConnection
object with"some_value"
is needed and stores the resultContext returns the cached connection, if another
MyConnection
object with"some_value"
is later requestedContext is user specific; so more secure than using a global cache
When the context is closed the connection is closed (if it is a context manager)