1 mike 1.1
2 <h1>Synopsis</h1>
3
4 This file describes the algorithms and architecture of the instances
5 repository (that part of the repository which deals with instances).
6
7 <h1>Layout</h1>
8
9 All files belonging the instance repository are stored under the
10 repository/instances directory. For each class two files are maintained:
11 and index file (with a ".idx" extension) and an instance file which bears
12 the name of the class whose instances it contains. For example, suppose
13 there is a class called "Zebra". Then two files are used to managed
14 its instances:
15
16 <pre>
17 repository/instances/Zerbra.idx (called the index file)
18 repository/instances/Zerbra (called the instance file)
19 </pre>
20
21 The first line of the index file indicates how many instances have been
22 mike 1.1 modified or deleted since the last reorganization of the instance file.
23 When this number--called a dirty count--reaches a configurable limit, the
24 index file and instance file are reorganized to reclaim unused space;
25 unused space is left by delete and modify operations (discussed later).
26 The dirty count is expressed as eight hex digits (the number had to have a
27 fixed size so that it could be updated in place without having to rewrite
28 the index file).
29
30 Each subsequent line of the index file corresponds to an instance contained
31 in the instance file. Each line has the following fields:
32
33 <ul>
34
35 <li>
36 deleted-flag - '1' if the corresponding instance was deleted, '0' otherwise.
37 When instances are deleted, the corresponding entry in the index file
38 is marked as deleted and a gap is left in the instance file until
39 reorganization time.
40 </li>
41
42 <li>
43 mike 1.1 hash-code - hash code for the key field below. This field is provided to
44 speed lookup of index entries. When looking up an entry, compute the
45 hash code of the key and look for entries with the same hash code.
46 It is still necessary to compare the keys when the hash codes are
47 the same (since collisions are possible), but only when they are the
48 same which is statistically rare.
49 </li>
50
51 <li>
52 offset - offset within the instance file where the instance begins.
53 </li>
54
55 <li>
56 size - size in bytes of the instance itself (as it appears in the instance
57 file).
58 </li>
59
60 <li>
61 key - the compound key of the instance (including all key binding pairs).
62 This key is in standard form which we define as follows: all property
63 names are shifted to lower case and the key-bindings are sorted by
64 mike 1.1 property names (in this way it suffices to compare key expressions to
65 determine if two compound keys refer to the same instance.
66 </li>
67
68 </ul>
69
70 Here is a sample index file:
71
72 <pre>
73 00000002
74 0 02FB6B6C 0 1427 Y.key1=1001,key2="Hello World 1"
75 0 50699A66 1427 1433 Y.key1=1002,key2="Hello World 2"
76 1 EB45F85A 2860 1433 Y.key1=1004,key2="Hello World 4"
77 1 38B42754 4293 1431 Y.key1=1005,key2="Hello World 5"
78 0 F79B5B0A 8583 1427 Y.key1=1666,key2="Hello World N"
79 0 38B42754 4293 1431 Y.key1=1005,key2="Hello World 5"
80 </pre>
81
82 First notice that the dirty count is equal to two and that two entries are
83 marked as deleted (these quantities will always be equal). This indicates
84 that the instance file has two instances which are no longer used. The space
85 mike 1.1 used by these instances will be reclaimed during reorganization.
86
87 The layout of the instance file is trivial. Instances are always appended to
88 to the instance file. The instances are kept end-to-end in the file.
89
90 <h1>Operations</h1>
91
92 There are three operations which may be performed on the instance repository.
93 create, modify, and delete. This section describes how these operations affect
94 the index and instance file. Note that the process described below for
95 performing the three operations actually contains extra steps described in
96 thd "Recovery" section.
97
98 Creation. During creation, the instance is appended to the instance file and
99 an entry is appended to the index file. If an instance with the same key is
100 found, then the operation is disallowed.
101
102 Deletion. To delete an instance, the corresponding entry in the index file is
103 marked as deleted (by changing the first column from '0' to '1'). And then the
104 dirty count is incremented and updated. If the dirty count has reached the
105 configured threshold, then the index and instance files are reogranized (see
106 mike 1.1 the section entitled "Reorganization" for an explanation of how this is
107 done).
108
109 Modification. To modify an instance, the new modified instance is appended to
110 the instance file. Next the old entry with the same key is marked as deleted.
111 Finally, a new entry is inserted into the index file.
112
113 <h1>Reorganization</h1>
114
115 To improve performance, reclamation of unused space in the instance file
116 (called gaps) is postponed until there are N gaps. Deletion and modification
117 operations create gaps. A dirty-count is maintained in the index file as
118 described above. When this dirty count reaches N, available space is reclaimed.
119 Reorganization is expensive: the entire file must be rewritten. The time
120 complexity is O(n) where n is the number of instances in the file. By
121 postponing reorganization, the time complexity may be reduced to O(1).
122
123 Reorganization requires rewriting the instance file and the index file. All
124 gaps are deleted in the instance file. For each gap there is an entry in the
125 index file which is marked as deleted. This entry indicates where the gap
126 starts in the instance file and how long it is. The offsets in the index file
127 mike 1.1 are adjusted accordingly. The index file is also updated: all entries marked
128 as deleted are removed from the index file.
129
130 <h1>Recovery</h1>
131
132 To avoid corruption of the instance repository, a simple recovery scheme is
133 provided. For all operations (described in the Operations section), the
134 following algorithm is used to ensure recoverability.
135
136 <ul>
137
138 <li>Check to see if any rollback files exist for instances of the given
139 class. If so then perform rollback (see recoverability algorithm
140 for details).
141 </li>
142
143 <li>Create a rollback file for the instance file. The rollback file
144 contains the original size of the instance file.
145 </li>
146
147 <li>Create a rollback file for the index file. The rollback file is a
148 mike 1.1 copy of the instance file (this can be optimized later).
149 </li>
150
151 <li>Modify the instance file as described in the operations section.
152 </li>
153
154 <li>Modify the index file as described in the operations section.
155 </li>
156
157 <li>Perform reorganization as described in the reorganization section
158 if the dirty-count has reached the threshold. Otherwise, increment
159 the dirty count.
160 </li>
161
162 <li>Delete the rollback files.
163 </li>
164
165 </ul>
166
167 The recoverability algorithm itself works as follows:
168
169 mike 1.1 <ul>
170
171 <li>Delete the index file.
172 </li>
173
174 <li>Rename the index rollback file to have the same name as the
175 index file.
176 </li>
177
178 <li>Truncate the instance file to have the same number of bytes as
179 indicated in the instance rollback file.
180 </li>
181
182 <li>Delete the rollback files.
183 </li>
184
185 </ul>
186
|