pegasus/src/Pegasus/Repository/instances.txt - annotate

Return to instances.txt CVS log
Up to [Pegasus] / pegasus / src / Pegasus / Repository
  1 mike  1.1 
  2           <h1>Synopsis</h1>
  3           
  4           This file describes the algorithms and architecture of the instances 
  5           repository (that part of the repository which deals with instances).
  6           
  7           <h1>Layout</h1>
  8           
  9           All files belonging the instance repository are stored under the
 10           repository/instances directory. For each class two files are maintained:
 11           and index file (with a ".idx" extension) and an instance file which bears
 12           the name of the class whose instances it contains. For example, suppose 
 13           there is a class called "Zebra". Then two files are used to managed
 14           its instances:
 15           
 16               <pre>
 17               repository/instances/Zerbra.idx (called the index file)
 18               repository/instances/Zerbra (called the instance file)
 19               </pre>
 20           
 21           The first line of the index file indicates how many instances have been 
 22 mike  1.1 modified or deleted since the last reorganization of the instance file.
 23           When this number--called a dirty count--reaches a configurable limit, the 
 24           index file and instance file are reorganized to reclaim unused space; 
 25           unused space is left by delete and modify operations (discussed later). 
 26           The dirty count is expressed as eight hex digits (the number had to have a 
 27           fixed size so that it could be updated in place without having to rewrite 
 28           the index file).
 29           
 30           Each subsequent line of the index file corresponds to an instance contained 
 31           in the instance file. Each line has the following fields:
 32           
 33               <ul>
 34           
 35               <li>
 36               deleted-flag - '1' if the corresponding instance was deleted, '0' otherwise.
 37           	When instances are deleted, the corresponding entry in the index file
 38           	is marked as deleted and a gap is left in the instance file until
 39           	reorganization time.
 40               </li>
 41           
 42               <li>
 43 mike  1.1     hash-code - hash code for the key field below. This field is provided to
 44           	speed lookup of index entries. When looking up an entry, compute the 
 45           	hash code of the key and look for entries with the same hash code.
 46           	It is still necessary to compare the keys when the hash codes are
 47           	the same (since collisions are possible), but only when they are the
 48           	same which is statistically rare.
 49               </li>
 50           
 51               <li>
 52               offset - offset within the instance file where the instance begins.
 53               </li>
 54           
 55               <li>
 56               size - size in bytes of the instance itself (as it appears in the instance
 57           	file).
 58               </li>
 59           
 60               <li>
 61               key - the compound key of the instance (including all key binding pairs). 
 62           	This key is in standard form which we define as follows: all property 
 63           	names are shifted to lower case and the key-bindings are sorted by
 64 mike  1.1 	property names (in this way it suffices to compare key expressions to
 65           	determine if two compound keys refer to the same instance.
 66               </li>
 67           
 68               </ul>
 69           
 70           Here is a sample index file:
 71           
 72               <pre>
 73               00000002
 74               0 02FB6B6C 0 1427 Y.key1=1001,key2="Hello World 1"
 75               0 50699A66 1427 1433 Y.key1=1002,key2="Hello World 2"
 76               1 EB45F85A 2860 1433 Y.key1=1004,key2="Hello World 4"
 77               1 38B42754 4293 1431 Y.key1=1005,key2="Hello World 5"
 78               0 F79B5B0A 8583 1427 Y.key1=1666,key2="Hello World N"
 79               0 38B42754 4293 1431 Y.key1=1005,key2="Hello World 5"
 80               </pre>
 81           
 82           First notice that the dirty count is equal to two and that two entries are 
 83           marked as deleted (these quantities will always be equal). This indicates 
 84           that the instance file has two instances which are no longer used. The space 
 85 mike  1.1 used by these instances will be reclaimed during reorganization.
 86           
 87           The layout of the instance file is trivial. Instances are always appended to
 88           to the instance file. The instances are kept end-to-end in the file.
 89           
 90           <h1>Operations</h1>
 91           
 92           There are three operations which may be performed on the instance repository.
 93           create, modify, and delete. This section describes how these operations affect 
 94           the index and instance file. Note that the process described below for 
 95           performing the three operations actually contains extra steps described in
 96           thd "Recovery" section.
 97           
 98           Creation. During creation, the instance is appended to the instance file and
 99           an entry is appended to the index file. If an instance with the same key is
100           found, then the operation is disallowed.
101           
102           Deletion. To delete an instance, the corresponding entry in the index file is 
103           marked as deleted (by changing the first column from '0' to '1'). And then the 
104           dirty count is incremented and updated. If the dirty count has reached the 
105           configured threshold, then the index and instance files are reogranized (see
106 mike  1.1 the section entitled "Reorganization" for an explanation of how this is 
107           done).
108           
109           Modification. To modify an instance, the new modified instance is appended to
110           the instance file. Next the old entry with the same key is marked as deleted.
111           Finally, a new entry is inserted into the index file.
112           
113           <h1>Reorganization</h1>
114           
115           To improve performance, reclamation of unused space in the instance file 
116           (called gaps) is postponed until there are N gaps. Deletion and modification
117           operations create gaps. A dirty-count is maintained in the index file as
118           described above. When this dirty count reaches N, available space is reclaimed.
119           Reorganization is expensive: the entire file must be rewritten. The time
120           complexity is O(n) where n is the number of instances in the file. By 
121           postponing reorganization, the time complexity may be reduced to O(1).
122           
123           Reorganization requires rewriting the instance file and the index file. All
124           gaps are deleted in the instance file. For each gap there is an entry in the 
125           index file which is marked as deleted. This entry indicates where the gap 
126           starts in the instance file and how long it is. The offsets in the index file
127 mike  1.1 are adjusted accordingly. The index file is also updated: all entries marked
128           as deleted are removed from the index file.
129           
130           <h1>Recovery</h1>
131           
132           To avoid corruption of the instance repository, a simple recovery scheme is
133           provided. For all operations (described in the Operations section), the 
134           following algorithm is used to ensure recoverability.
135           
136               <ul>
137           
138               <li>Check to see if any rollback files exist for instances of the given
139           	class. If so then perform rollback (see recoverability algorithm
140           	for details).
141               </li>
142           
143               <li>Create a rollback file for the instance file. The rollback file 
144           	contains the original size of the instance file.
145               </li>
146           
147               <li>Create a rollback file for the index file. The rollback file is a
148 mike  1.1 	copy of the instance file (this can be optimized later).
149               </li>
150           
151               <li>Modify the instance file as described in the operations section.
152               </li>
153           
154               <li>Modify the index file as described in the operations section.
155               </li>
156           
157               <li>Perform reorganization as described in the reorganization section
158           	if the dirty-count has reached the threshold. Otherwise, increment
159           	the dirty count.
160               </li>
161           
162               <li>Delete the rollback files.
163               </li>
164           
165               </ul>
166           
167           The recoverability algorithm itself works as follows:
168           
169 mike  1.1     <ul>
170           
171               <li>Delete the index file.
172               </li>
173           
174               <li>Rename the index rollback file to have the same name as the
175           	index file.
176               </li>
177           
178               <li>Truncate the instance file to have the same number of bytes as
179           	indicated in the instance rollback file.
180               </li>
181           
182               <li>Delete the rollback files.
183               </li>
184           
185               </ul>
186
No CVS admin address has been configured