Index overview

Types of indices

DBOO uses a few different types of indices:

The core index that maps object id to object data,

Reference index,

Type index,

Field index.

Core index

The core index is mandatory and automatically created. It maps object ids to object data. The core index is an extendible hashmap on disk and uses a bloom filter to quicker find out if object ids exists.

Reference index

The reference index keeps track of all objects an object is pointing to. There is one for each object (stored with object data). These indices makes it fast to look up all other objects an object points to.

The reference index is mandatory and automatically generated.

Type index

By default type indices are generated for all classes. There is one index per class. These indices are basically collections of object ids and are primarily used when selecting all objects of a class. They are arranged in hierarchies matching the class hierarchy of the classes they represent.

Type indices are not mandatory. However, you will not be able to look up objects by class name or generate field indices if they do not exist.

Field index

Field indices are mapping member field’s values to object ids. The key can be any primitive type or strings, but not of a class type. Field indices can be generated automatically upon the first select query on a field, or created using the create index API.

Field indices are not mandatory. However, you will not be able to look up objects by field value if an index for that field does not exist.

Field indices can be b-trees or extendible hashmaps. B-trees allow for range queries but hashmaps are faster if range look ups is not required. With hashmaps, a bloom filter can used as well to further speed up look ups.

Index caching

All indexes has some form of caching.

The cache for the core index is maintained as objects are requested, updated or inserted. This cache holds disk locations for objects. The core index cache is cleaned up when reaching a certain size and the least accessed objects are removed from the cache. This cache holds a cache for the reference index for all objects as well.

The core index cache maintains disk locations for the latest transactions of an object. As objects are updated, a new transaction record is created while the old transaction record is still available. This allows concurrent read and write of all objects without locking the index. All transactions records for all begun and ongoing reads are always kept in memory. As reads finish off, the transaction records are put in a clean up queue and later deallocated.

The type and field indices are cached on disk block basis. The default is a cache of 10000 disk blocks (~40MB per index, or about 1.6 million index entries for 64 bit data types for b-trees).

Update buffer

All indices have update buffers that to avoid updating indices on disk for small transactions.

All indices can ultimately be recreated from the object data file and the transaction log. If the transaction log is destroyed the information of transactions is lost but the indices can still be recreated from the object data.