|
Home |
Training |
Services |
Free Stuff |
Company |
Customers |
Forum |
Contact |
Careers |
|
|
Using Inner Recursion Published in the Open Systems Database Association Newsletter, Winter 2000 Here in the Multivalue industry, we tend to take complex data structures for granted. Specifically, one of the more common complex structures found in a variety of Multivalue applications is that of the unbalanced tree, commonly used for representing a genealogy or manufacturers bill of materials. Using the bill of materials as an example, an unbalanced tree structure could be used to represent the relationship between a parent part number and any number of child part numbers and the relationship between each child part number and its children, ad infinitum. Such a structure might appear as follows: 12345: Finished Product 45678: Component #1 45679: Subassembly #1.1 45680: Subassembly #1.2 34254: Component #2 34255: Subassembly #2.1 34256: Subassembly #2.2 34257: Subassembly #2.3 54341: Component #3 54342: Subassembly #3.1 54344: Subassembly #3.2 This type of structure can be represented very simply in the Multivalue world, simply by having one item file with a multivalued list of component items in one of the record attributes. A record in such a file might appear something like this: Key: 12345 01 Finished Product ... 20 45678]34254]54341 In this file, attribute 20 nominates the components (shown above as Component #1 through Component #3) which are used in the construction or definition of this item. Each of these items would then in turn have components defining the individual subassemblies. Obviously, by implementing this structure using Multivalue technology, any number of levels can be defined for the structure without going outside the boundary of a single file. As long as a component has components, this structure can implicitly link the records together without indices or multidimensional inverted lists. As long as one parent keeps track of their children, the whole structure remains intact. (This rule, incidentally, applies just as easily to a supermarket as it does programming...) Storing the structure is only half the battle, however. Eventually someone will want to retrieve the information on a report or inquiry. And if the information is stored in a tree structure, why not follow the tree when printing the report? While this is a reasonable expectation, I'm often amazed at the heinous array of code fragments littering the path to the objective. Probably the most obvious technique for tree traversal is simple recursion. With this technique, a routine is written for handling a single item. Once the item has been processed, the routine extracts any descendant data and calls itself for each descendant. Such a routine might appear something like this: SUBROUTINE PROCESS.ITEM(INV.ID) * ...the code to process the item goes here * CHILDREN = ITEM.INV<20> CHILDREN.CNT = DCOUNT(CHILDREN,@VM) FOR CHILDREN.LOOP = 1 TO CHILDREN.CNT CHILD = CHILDREN<1,CHILDREN.LOOP> CALL PROCESS.ITEM(CHILD) NEXT CHILDREN.LOOP In this example, the ID of an inventory record is passed as the parameter. Once the item has been processed, the children are extracted and each child ID passed to the PROCESS.ITEM routine. This will cause the entire tree to be traversed with a single call to the routine, as the routine will repeatedly call itself for all descendant data. Having said this, and having shown it to be so simple, it should be noted that this particular technique is not generally a good idea for Multivalue systems. While I can't fully explain the reasons behind the problems, two common problems with this technique include 1) inordinate quantities of memory are required simply to support the recursion, and 2) it's dog slow. (Apologies to the dogs.) To overcome these problems, a technique called “inner recursion” can be used. Like recursion, inner recursion is a technique whereby a routine calls itself for processing descendant data. Unlike recursion, however, this technique does not have hefty resource requirements or speed problems. In fact, this technique uses very little additional memory to support the recursion, and traverses a tree quite remarkably fast. The trick - if there is one - is that the routine uses GOSUB to call itself, not CALL. The astute among us will instantly see that there is a serious problem with this approach: Subroutines do not have private variables. In other words, if a subroutine calls itself, the called instance does not have its own copy of the variables, and any variables changed by that instance will remain changed after that instance RETURNs to its caller. Consider the following example: ... INV.ID = 12345 GOSUB 1000 ;* Process this item STOP * 1000 * Process this item * ...do the work for the child... * CHILDREN = ITEM.INV<20> CHILDREN.CNT = DCOUNT(CHILDREN,@VM) FOR CHILDREN.LOOP = 1 TO CHILDREN.CNT INV.ID = CHILDREN<1,CHILDREN.LOOP> GOSUB 1000 NEXT CHILDREN.LOOP * RETURN This code fragment will work fine if there's only one level of components. However, if there's an additional level of components - as shown in the subassembly example earlier - the CHILDREN, CHILDREN.LOOP, and CHILDREN.CNT variables for the first level of components will be changed when the routine calls itself, thus destroying their original contents and causing the end result to be incorrect. Fortunately, this problem is easily remedied. To do so, we simply need to identify those variables whose value is critical to a given instance, and convert them to arrays as in this example: DIM CHILDREN(50),CHILDREN.CNT(50),CHILDREN.LOOP(50) SPTR = 0 ;* Initialize the stack pointer ... INV.ID = 12345 ;* The starting inventory item GOSUB 1000 ;* Process this item STOP * 1000 * Process this item * SPTR = SPTR + 1 ;* Increment the stack pointer * ...do the work for the child... * CHILDREN(SPTR) = ITEM.INV<20> CHILDREN.CNT(SPTR) = DCOUNT(CHILDREN(SPTR),@VM) CHILDREN.LOOP(SPTR) = 0 * LOOP CHILDREN.LOOP(SPTR) = CHILDREN.LOOP(SPTR) + 1 WHILE (CHILDREN.LOOP(SPTR) LE CHILDREN.CNT(SPTR)) DO INV.ID = CHILDREN(SPTR)<1,CHILDREN.LOOP(SPTR)> GOSUB 1000 REPEAT * SPTR = SPTR - 1 ;* Decrement the stack pointer * RETURN Note the DIM for the dimensioned arrays at the top of this fragment. If the maximum size of the structure is known and the structure is not too big, dimensioned arrays are always the best performance option. If the maximum size of the structure is unknown, or the structure could grow to an inordinate size, dynamic arrays could be used instead, though notably with a performance penalty. For each iteration of the 1000 subroutine, note how the stack pointer SPTR is incremented at the top and decremented at the bottom. This is used to maintain autonomy for each instance of the CHILDREN, CHILDREN.CNT and CHILDREN.LOOP variables. The first time through, SPTR is incremented to 1. As 1000 calls itself, SPTR is incremented to 2, 3, etc. As each level is completed, SPTR is decremented to its previous value, so when all is said and done, SPTR is returned to the calling routine with its original value of zero. Also note that the FOR/NEXT loop from the previous (erroneous) examples has been replaced by a LOOP/WHILE. While a FOR/NEXT may very well work for this technique, thanks to a bad syntax experience on an old Mentor I've used LOOP/UNTIL or LOOP/WHILE ever since and have never encountered any portability issues. (And in this age of mergers and acquisitions, one should never discount the value of portability.) My point in writing this is simple: To those who have written heinous code to achieve tree traversal (you know who you are), hopefully this has impressed on you that it doesn't have to be that difficult. To those who need to do this kind of a thing and just didn't know where to start, perhaps this gives you some food for thought. And to those for whom this is old hat, I just wanted to let y'all know I'm still alive and well... Kevin King is the President and Chief Technologist with Precision Solutions, Inc., a leading technology solutions provider in Longmont, Colorado. He can be reached by email at Kevin@PrecisOnline.com or by voice at 303/651-7050. |
|