# Special Digital Objects ## Files And Directories Suppose your app can manage files on the machine heliport is running on. It is not a good idea to expose files like this in practice, but it makes for a simple example using just pythons built-in functionality. You don't have to do all the steps to get something working. You can keep all the code in `models.py` of your app. Implementation of a view that allows to add / edit the objects is not part of this section. Look at {doc}`/development/integrate-a-new-module` for that. ### A Django model to represent the files or directories ```python from heliport.core.models import DigitalObject from django.db import models class DBLocalFile(DigitalObject): file_id = models.AutoField(primary_key=True) path_str = models.TextField() class DBLocalDirectory(DigitalObject): directory_id = models.AutoField(primary_key=True) path_str = models.TextField() ``` This might already be sufficient in some cases, but is not yet specific to files or directories. A single class in HELIPORT can represent "a file and a directory" but can not represent "either a file or a directory", because this allows simple and efficient search. Therefore two classes are needed. Feel free to implement a base class for shared functionality. ### Implement file-specific functionality A `DigitalObject` represents a file if it inherits from {class}`heliport.core.digital_object_aspects.FileObj`, you have to implement the `as_file` method: ```python from heliport.core.digital_object_aspects import FileObj class DBLocalFile(DigitalObject, FileObj): ... def as_file(self, context: Context) -> File: self.access(context).assert_read_permission() return LocalFile(self.path_str) ``` `context` can be used for optimizations {ref}`later ` and to get the current user or project. Now you have to implement your `LocalFile` class. It should inherit from `File`: ```python from heliport.core.digital_object_aspects import File from heliport.core.utils.string_tools import format_byte_count from os import stat class LocalFile(File): def __init__(self, path): self.file = None self.path = path def mimetype(self) -> str: return "application/octet-stream" # just binary data def open(self): self.close() self.file = open(self.path) def close(self): if self.file is not None: self.file.close() self.file = None def read(self, number_of_bytes=None): return self.file.read(number_of_bytes) def size(self): return stat(self.path).st_size def get_file_info(self): return {"size": format_byte_count(self.size())} ``` A {class}`heliport.core.digital_object_aspects.File` needs to implement some methods: - `mimetype()` - you could use pythons built-in `mimetypes.guess_type(file_name)` to get more specific - `open()`/`close()` - context manager is implemented in `File` so you could use `with LocalFile("a.txt") as f:` - `read()` - you don't necessarily need to return the exact `number_of_bytes` if it is easier to implement - `size()` - you can return `None` if size is unknown - `get_file_info()` - dict of human-readable values and arbitrary keys, to provide more details to the user. (can be an empty dict) The separation of `DBLocalFile` and `LocalFile` into two classes gives you flexibility. You don't have to worry about interfering with Django things, e.g. you are free to choose the signature of `__init__` or inherit from an abstract base class. Another benefit is, that you have the permission check only in one place. ### Implement directory-specific functionality Directories work very similar to Files, but are even simpler. A `DigitalObject` represents a directory if it inherits from `DirectoryObj`, you have to implement the `as_directory` method: ```python from heliport.core.digital_object_aspects import DirectoryObj class DBLocalDirectory(DigitalObject, DirectoryObj): ... def as_directory(self, context: Context) -> LocalDirectory: self.access(context).assert_read_permission() return LocalDirectory(self.path_str) ``` Now you have to implement the `LocalDirectory` class. It should inherit from `Directory`: ```python from heliport.core.digital_object_aspects import Directory import os class LocalDirectory(Directory): def __init__(self, path): self.path = path def get_parts(self): result = [] for child_path in os.listdir(self.path): if os.path.isfile(child_path): child, is_new = DBLocalFile.objects.get_or_create(path_str=child_path, label=child_path) else: child, is_new = DBLocalDirectory.objects.get_or_create(path_str=child_path, label=child_path) result.append(child) return result ``` `get_parts` could return any iterable or be a generator. Because the caller of `get_parts` should be able to use all the features of Django and HELIPORT to work with the result, `get_parts` has to return `DigitalObject` or `GeneralDigitalObject` instances. In our example it can not return `LocalDirectory` or `LocalFile` directly. When creating new objects, you should set additional attributes like the label in the example above. ### Using General Digital Objects The previous example would store all the files in the HELIPORT database the first time a user is looking at them. To not do this, you could return `GeneralDigitalObject` instances instead of a `DBLocalFile` or `DBLocalDirectory`. These classes could look like this: ```python from heliport.core.digital_object_aspects import FileObj, DirectoryObj from heliport.core.digital_object_interface import GeneralDigitalObject from heliport.core.utils.collections import RootedRDFGraph from abc import ABC class PathObj(GeneralDigitalObject, ABC): def __init__(self, path): self.path = path def as_text(self): return self.path def as_rdf(self): return RootedRDFGraph.from_atomic(self.as_text()) def as_html(self): return self.as_text() def __hash__(self): return hash(self.path) def __eq__(self, other): return isinstance(other, PathObj) and self.path == other.path class LocalFileObj(PathObj, FileObj): def as_digital_object(self, context): file, is_new = DBLocalFile.objects.get_or_create(path_str=self.path, label=self.path) return file def as_file(self, context): return LocalFile(self.path) def type_id(self): return "LocalFile" def get_identifying_params(self): return {"type": self.type_id(), "path": self.path} @staticmethod def resolve(params, context): return LocalFileObj(params["path"]) class LocalDirectoryObj(PathObj, DirectoryObj): def as_digital_object(self, context): file, is_new = DBLocalDirectory.objects.get_or_create(path_str=self.path, label=self.path) return file def as_directory(self, context): return LocalDirectory(self.path) def type_id(self): return "LocalDirectory" def get_identifying_params(self): return {"type": self.type_id(), "path": self.path} @staticmethod def resolve(params, context): return LocalDirectoryObj(params["path"]) ``` This is an extremely basic implementation of all required methods. The documentation of `GeneralDigitalObject` and its base classes provides more information. Also, you need to register these objects like described in the documentation for `digital_object_resolution.py`. Some common functionality has been extracted to the base class `PathObj`. With this, you could update the `get_parts` function: ```python from heliport.core.digital_object_aspects import Directory import os class LocalDirectory(Directory): def __init__(self, path): self.path = path def get_parts(self): result = [] for child_path in os.listdir(self.path): if os.path.isfile(child_path): child = LocalFileObj(child_path) else: child = LocalDirectoryObj(child_path) result.append(child) return result ``` In total, you could have the following classes for local files and directories: - For Files - **LocalFile**: - The actual `File` - has `read()` - **DBLocalFile**: - Represent a file as `DigitalObject` - in database - has `as_file()` - **LocalFileObj**: - Represent a file as `GeneralDigitalObject` - not in database - has `as_file()` - For Directories - **LocalDirectory**: - The actual `Directory` - has `get_parts()` - **DBLocalDirectory**: - Represent a directory as `DigitalObject` - in database - has `as_directory()` - **LocalDirectoryObj**: - Represent a directory as `GeneralDigitalObject` - not in database - has `as_directory()` ### Optimize by using Context (optimize)= If you need some kind of connection to open a file, it can be slow to reestablish the connection for each file: ```python from heliport.core.digital_object_aspects import File def open_connection(some_parameter): # this part takes some time return "the connection" class LocalFile(File): ... def open(self): connection = open_connection("some_value") ... # use connection to open file ``` Instead of calling the `open_connection` function every time, you can describe the call using a dataclass, that has a `generate` method: ```python from dataclasses import dataclass @dataclass(frozen=True) class MyConnection: some_parameter: str def generate(self, context): return open_connection(self.some_parameter) ``` If the `generate` function returns a context manager, it is entered. When the context is closed, all context managers from `generate` functions are closed automatically. The file classes can now look like this: ```python class DBLocalFile(DigitalObject, FileObj): ... def as_file(self, context: Context) -> File: self.access(context).assert_read_permission() return LocalFile(self.path_str, context) class LocalFile(File): def __init__(self, path, context): ... self.context = context ... def open(self): connection = self.context[MyConnection("some_value")] print(self.context.user_or_none) # example how to use context to get user ... ... ``` This does the following: - Context calls `generate` the first time a `MyConnection` object with `"some_value"` is needed and stores the result - Context returns the cached connection, if another `MyConnection` object with `"some_value"` is later requested - Context is user specific; so more secure than using a global cache - When the context is closed the connection is closed (if it is a context manager)