Special Digital Objects

Files And Directories

Suppose your app can manage files on the machine heliport is running on. It is not a good idea to expose files like this in practice, but it makes for a simple example using just pythons built-in functionality.

You don’t have to do all the steps to get something working. You can keep all the code in models.py of your app. Implementation of a view that allows to add / edit the objects is not part of this section. Look at How to Integrate a new Module for that.

A Django model to represent the files or directories

from heliport.core.models import DigitalObject
from django.db import models

class DBLocalFile(DigitalObject):
    file_id = models.AutoField(primary_key=True)
    path_str = models.TextField()

class DBLocalDirectory(DigitalObject):
    directory_id = models.AutoField(primary_key=True)
    path_str = models.TextField()

This might already be sufficient in some cases, but is not yet specific to files or directories.

A single class in HELIPORT can represent “a file and a directory” but can not represent “either a file or a directory”, because this allows simple and efficient search. Therefore two classes are needed. Feel free to implement a base class for shared functionality.

Implement file-specific functionality

A DigitalObject represents a file if it inherits from heliport.core.digital_object_aspects.FileObj, you have to implement the as_file method:

from heliport.core.digital_object_aspects import FileObj

class DBLocalFile(DigitalObject, FileObj):
    ...
    def as_file(self, context: Context) -> File:
        self.access(context).assert_read_permission()
        return LocalFile(self.path_str)

context can be used for optimizations later and to get the current user or project.

Now you have to implement your LocalFile class. It should inherit from File:

from heliport.core.digital_object_aspects import File
from heliport.core.utils.string_tools import format_byte_count
from os import stat


class LocalFile(File):
    def __init__(self, path):
        self.file = None
        self.path = path

    def mimetype(self) -> str:
        return "application/octet-stream"  # just binary data

    def open(self):
        self.close()
        self.file = open(self.path)

    def close(self):
        if self.file is not None:
            self.file.close()
            self.file = None

    def read(self, number_of_bytes=None):
        return self.file.read(number_of_bytes)

    def size(self):
        return stat(self.path).st_size

    def get_file_info(self):
        return {"size": format_byte_count(self.size())}

A heliport.core.digital_object_aspects.File needs to implement some methods:

  • mimetype() - you could use pythons built-in mimetypes.guess_type(file_name) to get more specific

  • open()/close() - context manager is implemented in File so you could use with LocalFile("a.txt") as f:

  • read() - you don’t necessarily need to return the exact number_of_bytes if it is easier to implement

  • size() - you can return None if size is unknown

  • get_file_info() - dict of human-readable values and arbitrary keys, to provide more details to the user. (can be an empty dict)

The separation of DBLocalFile and LocalFile into two classes gives you flexibility. You don’t have to worry about interfering with Django things, e.g. you are free to choose the signature of __init__ or inherit from an abstract base class. Another benefit is, that you have the permission check only in one place.

Implement directory-specific functionality

Directories work very similar to Files, but are even simpler. A DigitalObject represents a directory if it inherits from DirectoryObj, you have to implement the as_directory method:

from heliport.core.digital_object_aspects import DirectoryObj

class DBLocalDirectory(DigitalObject, DirectoryObj):
    ...
    def as_directory(self, context: Context) -> LocalDirectory:
        self.access(context).assert_read_permission()
        return LocalDirectory(self.path_str)

Now you have to implement the LocalDirectory class. It should inherit from Directory:

from heliport.core.digital_object_aspects import Directory
import os

class LocalDirectory(Directory):
    def __init__(self, path):
        self.path = path

    def get_parts(self):
        result = []
        for child_path in os.listdir(self.path):
            if os.path.isfile(child_path):
                child, is_new = DBLocalFile.objects.get_or_create(path_str=child_path, label=child_path)
            else:
                child, is_new = DBLocalDirectory.objects.get_or_create(path_str=child_path, label=child_path)
            result.append(child)

        return result

get_parts could return any iterable or be a generator. Because the caller of get_parts should be able to use all the features of Django and HELIPORT to work with the result, get_parts has to return DigitalObject or GeneralDigitalObject instances. In our example it can not return LocalDirectory or LocalFile directly.

When creating new objects, you should set additional attributes like the label in the example above.

Using General Digital Objects

The previous example would store all the files in the HELIPORT database the first time a user is looking at them. To not do this, you could return GeneralDigitalObject instances instead of a DBLocalFile or DBLocalDirectory. These classes could look like this:

from heliport.core.digital_object_aspects import FileObj, DirectoryObj
from heliport.core.digital_object_interface import GeneralDigitalObject
from heliport.core.utils.collections import RootedRDFGraph
from abc import ABC


class PathObj(GeneralDigitalObject, ABC):
    def __init__(self, path):
        self.path = path

    def as_text(self):
        return self.path

    def as_rdf(self):
        return RootedRDFGraph.from_atomic(self.as_text())

    def as_html(self):
        return self.as_text()

    def __hash__(self):
        return hash(self.path)

    def __eq__(self, other):
        return isinstance(other, PathObj) and self.path == other.path


class LocalFileObj(PathObj, FileObj):
    def as_digital_object(self, context):
        file, is_new = DBLocalFile.objects.get_or_create(path_str=self.path, label=self.path)
        return file

    def as_file(self, context):
        return LocalFile(self.path)

    def type_id(self):
        return "LocalFile"

    def get_identifying_params(self):
        return {"type": self.type_id(), "path": self.path}

    @staticmethod
    def resolve(params, context):
        return LocalFileObj(params["path"])


class LocalDirectoryObj(PathObj, DirectoryObj):
    def as_digital_object(self, context):
        file, is_new = DBLocalDirectory.objects.get_or_create(path_str=self.path, label=self.path)
        return file

    def as_directory(self, context):
        return LocalDirectory(self.path)

    def type_id(self):
        return "LocalDirectory"

    def get_identifying_params(self):
        return {"type": self.type_id(), "path": self.path}

    @staticmethod
    def resolve(params, context):
        return LocalDirectoryObj(params["path"])

This is an extremely basic implementation of all required methods. The documentation of GeneralDigitalObject and its base classes provides more information. Also, you need to register these objects like described in the documentation for digital_object_resolution.py. Some common functionality has been extracted to the base class PathObj.

With this, you could update the get_parts function:

from heliport.core.digital_object_aspects import Directory
import os

class LocalDirectory(Directory):
    def __init__(self, path):
        self.path = path

    def get_parts(self):
        result = []
        for child_path in os.listdir(self.path):
            if os.path.isfile(child_path):
                child = LocalFileObj(child_path)
            else:
                child = LocalDirectoryObj(child_path)
            result.append(child)

        return result

In total, you could have the following classes for local files and directories:

  • For Files

    • LocalFile:

      • The actual File

      • has read()

    • DBLocalFile:

      • Represent a file as DigitalObject

      • in database

      • has as_file()

    • LocalFileObj:

      • Represent a file as GeneralDigitalObject

      • not in database

      • has as_file()

  • For Directories

    • LocalDirectory:

      • The actual Directory

      • has get_parts()

    • DBLocalDirectory:

      • Represent a directory as DigitalObject

      • in database

      • has as_directory()

    • LocalDirectoryObj:

      • Represent a directory as GeneralDigitalObject

      • not in database

      • has as_directory()

Optimize by using Context

If you need some kind of connection to open a file, it can be slow to reestablish the connection for each file:

from heliport.core.digital_object_aspects import File

def open_connection(some_parameter):
    # this part takes some time
    return "the connection"

class LocalFile(File):
    ...
    def open(self):
        connection = open_connection("some_value")
        ...  # use connection to open file

Instead of calling the open_connection function every time, you can describe the call using a dataclass, that has a generate method:

from dataclasses import dataclass

@dataclass(frozen=True)
class MyConnection:
    some_parameter: str

    def generate(self, context):
        return open_connection(self.some_parameter)

If the generate function returns a context manager, it is entered. When the context is closed, all context managers from generate functions are closed automatically.

The file classes can now look like this:

class DBLocalFile(DigitalObject, FileObj):
    ...
    def as_file(self, context: Context) -> File:
        self.access(context).assert_read_permission()
        return LocalFile(self.path_str, context)

class LocalFile(File):
    def __init__(self, path, context):
        ...
        self.context = context
    ...
    def open(self):
        connection = self.context[MyConnection("some_value")]
        print(self.context.user_or_none)  # example how to use context to get user
        ...
    ...

This does the following:

  • Context calls generate the first time a MyConnection object with "some_value" is needed and stores the result

  • Context returns the cached connection, if another MyConnection object with "some_value" is later requested

  • Context is user specific; so more secure than using a global cache

  • When the context is closed the connection is closed (if it is a context manager)