1 mike 1.1
|
2 mike 1.2 /** Instance Repository
3
|
4 mike 1.1 <h1>Synopsis</h1>
|
5 mike 1.2 Describes the algorithms and architecture of the instances repository
6 (that part of the repository which deals with instances).
|
7 mike 1.1
8 <h1>Layout</h1>
9
|
10 mike 1.2 All files belonging to the instance repository are stored under the
|
11 mike 1.3 repository/instances directory. For each class, two files are maintained:
12 an index file (with a ".idx" extension) and an instance file which bears
|
13 mike 1.1 the name of the class whose instances it contains. For example, suppose
|
14 mike 1.2 there is a class called "Zebra". Then two files are used to manage its
15 instances:
|
16 mike 1.1
17 <pre>
|
18 mike 1.2 repository/instances/Zebra.idx (called the index file)
19 repository/instances/Zebra (called the instance file)
|
20 mike 1.1 </pre>
21
22 The first line of the index file indicates how many instances have been
23 modified or deleted since the last reorganization of the instance file.
24 When this number--called a dirty count--reaches a configurable limit, the
25 index file and instance file are reorganized to reclaim unused space;
|
26 mike 1.3 unused space (or gaps) is left by delete and modify operations (discussed
27 later). The dirty count is expressed as eight hex digits (the number had to
28 have a fixed size so that it could be updated in place without having to
29 rewrite the index file).
|
30 mike 1.1
31 Each subsequent line of the index file corresponds to an instance contained
32 in the instance file. Each line has the following fields:
33
34 <ul>
35
36 <li>
37 deleted-flag - '1' if the corresponding instance was deleted, '0' otherwise.
38 When instances are deleted, the corresponding entry in the index file
39 is marked as deleted and a gap is left in the instance file until
40 reorganization time.
41 </li>
42
43 <li>
44 hash-code - hash code for the key field below. This field is provided to
45 speed lookup of index entries. When looking up an entry, compute the
46 hash code of the key and look for entries with the same hash code.
47 It is still necessary to compare the keys when the hash codes are
48 the same (since collisions are possible), but only when they are the
|
49 mike 1.3 same which is rare and hence this scheme saves many comparisons.
|
50 mike 1.1 </li>
51
52 <li>
|
53 mike 1.3 offset - offset within the instance file to where the instance begins.
|
54 mike 1.1 </li>
55
56 <li>
57 size - size in bytes of the instance itself (as it appears in the instance
58 file).
59 </li>
60
61 <li>
62 key - the compound key of the instance (including all key binding pairs).
63 This key is in standard form which we define as follows: all property
64 names are shifted to lower case and the key-bindings are sorted by
65 property names (in this way it suffices to compare key expressions to
|
66 mike 1.3 determine if two compound keys refer to the same instance).
|
67 mike 1.1 </li>
68
69 </ul>
70
71 Here is a sample index file:
72
73 <pre>
74 00000002
75 0 02FB6B6C 0 1427 Y.key1=1001,key2="Hello World 1"
76 0 50699A66 1427 1433 Y.key1=1002,key2="Hello World 2"
77 1 EB45F85A 2860 1433 Y.key1=1004,key2="Hello World 4"
78 1 38B42754 4293 1431 Y.key1=1005,key2="Hello World 5"
79 0 F79B5B0A 8583 1427 Y.key1=1666,key2="Hello World N"
80 0 38B42754 4293 1431 Y.key1=1005,key2="Hello World 5"
81 </pre>
82
|
83 mike 1.3 Notice that the dirty count is equal to two and that two entries are marked
84 as deleted (these quantities must be equal). This indicates that the instance
85 file has two instances which are no longer used. The space used by these
86 instances will be reclaimed during reorganization.
|
87 mike 1.1
88 The layout of the instance file is trivial. Instances are always appended to
|
89 mike 1.3 the the instance file. The instances are kept end-to-end in the file.
|
90 mike 1.1
91 <h1>Operations</h1>
92
93 There are three operations which may be performed on the instance repository.
94 create, modify, and delete. This section describes how these operations affect
95 the index and instance file. Note that the process described below for
96 performing the three operations actually contains extra steps described in
|
97 mike 1.3 the "Recovery" section.
|
98 mike 1.1
99 Creation. During creation, the instance is appended to the instance file and
100 an entry is appended to the index file. If an instance with the same key is
101 found, then the operation is disallowed.
102
103 Deletion. To delete an instance, the corresponding entry in the index file is
104 marked as deleted (by changing the first column from '0' to '1'). And then the
105 dirty count is incremented and updated. If the dirty count has reached the
|
106 mike 1.3 configured threshold, the index and instance files are reogranized (see
|
107 mike 1.1 the section entitled "Reorganization" for an explanation of how this is
108 done).
109
110 Modification. To modify an instance, the new modified instance is appended to
111 the instance file. Next the old entry with the same key is marked as deleted.
112 Finally, a new entry is inserted into the index file.
113
114 <h1>Reorganization</h1>
115
116 To improve performance, reclamation of unused space in the instance file
|
117 mike 1.2 (called gaps) is postponed until there are m gaps. Deletion and modification
|
118 mike 1.1 operations create gaps. A dirty-count is maintained in the index file as
|
119 mike 1.2 described above. When this dirty count reaches m, available space is reclaimed.
|
120 mike 1.1 Reorganization is expensive: the entire file must be rewritten. The time
121 complexity is O(n) where n is the number of instances in the file. By
122 postponing reorganization, the time complexity may be reduced to O(1).
123
124 Reorganization requires rewriting the instance file and the index file. All
125 gaps are deleted in the instance file. For each gap there is an entry in the
126 index file which is marked as deleted. This entry indicates where the gap
127 starts in the instance file and how long it is. The offsets in the index file
128 are adjusted accordingly. The index file is also updated: all entries marked
129 as deleted are removed from the index file.
130
131 <h1>Recovery</h1>
132
133 To avoid corruption of the instance repository, a simple recovery scheme is
134 provided. For all operations (described in the Operations section), the
135 following algorithm is used to ensure recoverability.
136
137 <ul>
138
139 <li>Check to see if any rollback files exist for instances of the given
140 class. If so then perform rollback (see recoverability algorithm
141 mike 1.1 for details).
142 </li>
143
144 <li>Create a rollback file for the instance file. The rollback file
145 contains the original size of the instance file.
146 </li>
147
148 <li>Create a rollback file for the index file. The rollback file is a
149 copy of the instance file (this can be optimized later).
150 </li>
151
152 <li>Modify the instance file as described in the operations section.
153 </li>
154
155 <li>Modify the index file as described in the operations section.
156 </li>
157
158 <li>Perform reorganization as described in the reorganization section
159 if the dirty-count has reached the threshold. Otherwise, increment
160 the dirty count.
161 </li>
162 mike 1.1
163 <li>Delete the rollback files.
164 </li>
165
166 </ul>
167
168 The recoverability algorithm itself works as follows:
169
170 <ul>
171
172 <li>Delete the index file.
173 </li>
174
175 <li>Rename the index rollback file to have the same name as the
176 index file.
177 </li>
178
179 <li>Truncate the instance file to have the same number of bytes as
180 indicated in the instance rollback file.
181 </li>
182
183 mike 1.1 <li>Delete the rollback files.
184 </li>
185
186 </ul>
187
|
188 mike 1.2 */
|