Concurrent Search Trees: An Algorithm for Parallel BSTs

Components of the Algorithm

Question:

Discuss about the Lock Free Binary Search Algorithms.

Recent researches oriented around the concurrent search tress that is a binary search tree best be described as a graphical presentation in which nodes follow up with specific components with left and right sub trees. The research and evaluation on the concurrent search tress has provided the necessary solutions, which is directly dependent on either the locking segments of the data framework or the exhibit suboptimal memory utility. Trees are not as significant to be placed parallelized segment for the fact that they comprise multiple networks of mutable sectors per node but the overall search that is a time relative issue compared with the simpler existing structures as such the linked lists properties creates a demand for them. Utilizing certain component, in this report a binary search tree, parallel in nature is created. The components so used are single-word reads, writes and the compare-and-swap. Accordingly, following this algorithm operations are only contend at the instances wherein, the concurrent updates are seen to affect the similar node/ nodes. Updates are relatively non-blocking for the fact that the present threads are capable of accomplishing the operations based on the correlations among them and each of the performed operations is linearisable. Evidences based on various experimental methodologies reveal the mentioned process to be faster as compared to the alternative solutions and proves to be scalable compared to the large numbers of concurrently threads set to execute. This, in majority scenarios, outperforms the concurrent skip enlistments as set to test. Testifying, it shows or reveals that there is 65% more throughputs in the instance of the performance difference is averaged in each experiment. Additionally, relative memory prints are smaller compared to the other examined structures.

The present CPU configurations have achieved a relative stagnant level in relevance to the single-threaded throughputs; with evolution in the configuration of chips, which in turn provides smaller increments. The designers of the hardware have intended on the increment of the processing units per CPU arrangement. However, this introduced the necessity of improved algorithms that would be efficient enough to utilize the developed processing resources (Howley, 2012).

Multiple threads should be capable of communication and synchronization proactively as an unit through the shared data structures present in the memory. The efficiency relative to these Data Structures might prove to be beneficial and crucial to the evaluation of performances while the relevant designs are considered difficult (Ellen, 2014). When, there is a presence of high value thread inter-leavings makes the configured algorithm to be difficult to be considered.

Performance Evaluation

An efficient measure oriented around the concept of unblocking or as mentioned, utilizing the term ‘lock-free’ is an implementation surrounding a shared object to make it easily accessible to the contentious proceedings. This systemized strategic implementation provides the guarantee of the system to progress even in the presence of one to multiple thread failures. This can be conceptually put forward that the shared object is protected from any further access relative to any lock-related limitations. These limitations enlist the priority inversion, deadlock and convoying (Crain, 2013).

In this report, an algorithm is introduced for a relatively binary search tree (BST). This proves to be compatible with the other available systems based on the reliance relative to the commonly adaptable and available operations. These common components are: Word reads writes and compare-swap (CAS) concepts (Crain, 2013). Traversing the structural tree is possible only in the space of read only memory without interfering with the concurrent updates available. Majority of the updates oriented towards the internal structure of the tree system is potentially tolerated without any triggering towards rebooting.

This report in a descriptive manner provides the algorithm configuration and the relevant operations of the same followed by a detailed evaluation of a C++ configured implementation in the upcoming sections. The report advances with the further briefing of the management issues and the much-required solutions to them, put forward as efficient measures. Consequently, a proof schematic relative to the non-blocking operation and linearisability properties would be discussed. This is a well-known fact that the constructed algorithm serves a good throughput as compared with the set of concurrent algorithms and this is recorded to be better in the context of scaling with the increment in number of threads (Park, 2012).

This algorithm is set with a specific configuration to benefit the data structure interface set in an abstract framework. A set is recorded to comprise a set of unique set of keys with relevant valuation oriented with specific methodologies and conditions. The consists of instructions likely as to: firstly, concerning the addition of a key (k)n to the existing set of which it is still not a part, the instruction ‘add(k)’ is applied. Similarly, the removal of any key from the oriented set ‘remove (k)’ is instructed, when the same is already a member of the targeted set. Another important or major examination lies on the grounds to identify the existence of a variable or a key in the set; this uses the instruction ‘contains (k)’. The core orientation beneath the data structure is based on the BST. Describing the BST in details, the attention lies on the fact that it is a detailed tree structure consisting of the binary components. Such is the consideration that if the dedicated set contains N nodes, then supposedly, the left sub tree should consists of nodes lesser that the N number and it is vice-versa for the sub tree on the right end of the set. A BST takes an estimated time interval of O (log n) averagely while in the worst scenario it takes about O (n). This algorithm can be stated to be non-blocking thus ensuring a system-wide throughput even on the stances of thread starvation. This also serves the benefit of linearisable, which reveals the fact that operations are executes effect at a certain level. Regarding any concurrent executions, the sequence of these linearization points sets the sequential history, which in turn cohesively collaborates with the expectation level of response from the implemented object (Braginsky, 2012).

Challenges in Implementing a Tree Infrastructure with CAS

Typically, a recorded issue surrounding the implementation of a tree infrastructure with the combination of CAS seems to be a limitation surrounding the changes or transformations to the multiple child pointers in an automatic way (Cao, 2016). The pointer system however makes it quite difficult for the utility of the algorithms. When it is surrounding the Harry’s pointer markings as it presents chances, wherein, a child pointer has a probability of getting marked while the node is combined to a different child pointer (Cao, 2016). The tree infrastructure in usage holds the potential to reflect on the sequential BST with the addition of attribute of operations in each of the node. This field is utilized for the storage of details surrounding the live updates to the node and prevents any generation of updates from another contemporary operation until the completion of the same. Following to this an insert is conducted through the process of inclusion of information relative to the next thread for the completion of consecutive operation in the infrastructure. Following this the copy action linked to the pointer into the operation field of the node for receiving a new child. On updating the operation field there is no chance of failure in terms of the operations and thus, can be logically estimated to be inserted in the structure. CAS is the only component to update the operation field and can only fulfill the status relative to the operations only when satisfied on verification. This provides a limitation on the operations to interfere among each other (Drachsler, 2014).

Removal of node with lesser than two children is a direct action, which in correspondence is marked on the operation field of the node with the respective key. While a feature that is notable is the irreversible process of removing the key. While to regain the access over the key or any key for that instance, the insertion instruction can be utilized over the parental node with the intention to complete the physical process of removal.

Presenting or evaluating the major limitation towards the parallelization is the procedure of removal of nodes with two children. In the instance of removal of one node and the identification of this necessary action is followed by the allocation of the next largest or the smallest key. Thus, the one, later allocated, replaces the key belonging to the former node and thus, the later node is removed from the tree structure. However, this action could possibly generate two major problems : firstly, availing updates for both the nodes tends to be atomic and secondly, there is a requirement to detect the any invalidated search relative to the keys in the higher position in the tree(Drachsler, 2014).

Conclusion

Copying the necessary data required for the updating the former node is done and following that a reference is framed on the operation field of the targeted node in the oriented tree structure. This update framework prevents any further interference to the target-node. Following the success relative to the set objectives, the former node is cleared and the targeted node is marked with the operation.

There exists a major conditionality corresponding to the replacement of node-key: if this is a result of removal of any existing key then it direct set an evaluation on the range of keys that can be contained in each of the sub-trees. This can be put forward with a specific formulation explaining that the replacement by a larger key ensures the expansion of the left sub-tree while the range for the right sub-tree is reduced. The opposite result is at the instance of an opposite action involved in the operation field of the nodes. With the traversing searches of the node there are simultaneous implications. Expansion of the ranges according to the action would not have any direct effect on the search, as the key corresponding to the intended search is still contained in the sub-tree but ht reduction to the range might be required for the rebooting (Mariano, 2015).

A particular instance of the structure tree is capable of sustaining only one out of the smaller and the larger replacements and is not efficient towards both simultaneously. Modifications concerning the actions relative to the simultaneous support to the replacement methodologies as well as the ABA prevention mechanics is a critical requirement for the development regarding the protection of the key field consisted in the node. The enlisted set of implementation automatically replaces a particular key with the available largest key; this reflects the requirement for a run-check on the last node. The proper child link was required to be considered for any intended changes (Lowe, 2016).

This section of the report contributes in the detailing out all the necessary components as required for briefing the actual design of the algorithm that was enforced in the main content of the report with the utilization of specific instructions corresponding to the necessary actions for the operation fields of the respective nodes. This section is presented through the representation of multiple sub-sections dedicated to the description of respective structure and the major components.

The various classes that is a substantial part of any C++ programming structure can be witnessed in the figure 1 presented in this section of the report. The node class can be placed in close reference with the a sequential arrangement as for a typical BST; only additionally there is the presence of an operation field, which can be significantly used to detect any changes initiated a the nodes of the tree structure (Miller, 2015). As per the concept of a 32-bit data system, dedicated to the memory storage only returns the allocation addresses on a 4-byte boundary structure with two least significant bits, which are structured to store the auxiliary data. This is the most commonly used methodology regarding the simultaneous use of the CAS based on the pointer values. This technique specifically is dedicated to the storage of four different node change states:

Firstly, the storage state- NONE: this indicates the absence of any operation, secondly, MARK: indicates the absence of any node key and accordingly instructs to remove the node physically, thirdly, CHILDCAS: one of the many child pointers gets modified. Finally, the state, RELOCATE: one of node is affected owing to the relocation.

There are few macros present in the algorithm presented in the figure for the modification of data. There are a few enlisted instructions set, each dedicated to the certain phase of the structure of operations. Firstly, FLAG (ptr, state): this command instructs to set the operation pointer accordingly, corresponding to one of the pre-mentioned state of storage. Secondly, GETFLAG (ptr)- performs a stripping action of the pointer data to collect the state information. Thirdly, the UNFLAG (ptr): this instruction rips off the state bits ensuring the the pointers are intact.

The left and the right pointer have specific and customized values to indicate the presence of a null pointer as this can initiate a ABA issue and accordingly set a valid node and the node is ready for a removal and the pointer is reverted to a null reading (Silver, 2016). The mechanism to restore the null values is used for the retention of pointer data. Simultaneously, set a low order bit to identify the presence of any null pointer. This way this methodology ensures that each null pointer is unique for the operation field. Other two macros parameters: SETNULL (ptr) and ISNULL (ptr) are utilized to configure the null bit of a node pointer and to identify the the presence of any null pointer in a respective manner. A node can be initialized with the characteristic right and left pointers. The entire set up of a node is mutable only the features such as the key, right and left are set to the probability of modification on the stance when the operation is already set on the op field (David, 2015).

The instruction set as CHILDCASOp comprises enough information regarding the next present thread corresponding to the completion-operation linked with the modification of the child pointers of a specific node with few information of conditionality surrounding the change of the pointers and necessary details. An active state of this instruction is flagged with the CHILDCAS state in the operation field (op) of the nodes (Howley, 2012).

Similarly, the instruction RelocateOp object withholds enough information regarding the removal of any key of the node comprising the two children and performs a replacement with the next available largest key. This instruction too needs the details for the designated action as the need and the addressed field subjected for the removal action. Again, this instruction if flagged with the RELOCATE state in the corresponding operation field of the node. The purpose is well established previously. The tree class is not listed while it contains the relative features:; node object, the root, corresponding to which the right sub-tree with the child field indicates logical root of the tree structure or it corresponds to the null value when the present structure is empty. This is subjected for the simplification regarding the implementation actions (Liu, 2012).

The associated CAS operation comprises majorly three parameters: the accessible memory location, location value (expected entity) and writable value (compatible with the expected one). According to the positive or negative execution result, a boolean bit is returned to the main program or the operation field of the node. Alternatively, a VCAS value is utilized recorded to return a stored value of the memory prior to the attempted operation. On the instance of the absence of the same, it is applicable to the user to use the CAS value (Ramachandran, 2015).

The tree structure contains few components, which are essential to mention as a descriptive approach towards the article. These are presented in the following sections of the report.

Contains (k): The operation related to this instruction reveals that it utilizes a find method to locate the position of a key initiating from an estimated starting point. This point is auxRoot. The position information is obtained and stored in the corresponding variables predOp and currOp. The result of the related search is contained in one of the four values presented as- FOUND, NOTFOUND_L, NOTFOUND_R and ABORT. These values predict the necessary situation according to the key position and valuation condition (Silver, 2016).

The search for a particular key initiates with the basic initialization of the variables curr, currOp,next, lastRight and lastRightOp prior to the traversing of the subjected loop. Each of the variables oriented within instructions are designated to perform a definite function such that the next is used to locate the succeeding node in the path directive. While the other curr-instructions are used for the maintenance of the record relative to the last node respective to which the child path was adapted (Timnat, 2014). The search loops progresses its action through the nodes, considering one at a time till the key is discovered or the relative null point is obtained. As presented in the proposed algorithm, it reveals that at first the ongoing action is completed and then the search gets rebooted. This is a proposed mechanism for the reduction of any complexity and special cases to be considered (Zhang, 2016).

Add (k): this instruction introduces the methodology for the addition of a key to the set in the nodes. The instance of verification indicating the non-necessary bonding of the key to the tree structure and the new insertion point for the same is discovered an entirely new node along with the ChildCASOp gets created for the insertion program. With the utilization of the CAS instruction, insertion of an operation is done to the currOp field relative to that a key point can be evaluated to be logically inserted. There are otherwise, present few indications with respect to the succeeding CAS revealing that curr-operation field is yet to be modified, since it was subjected for read and thus indicating few additional changes to some other fields. In this case, the helpChildCAS does the major part of the insertion segment for the node to the tree structure. Any thread encountering any operation in the currOp field can call for this.

Remove (k): the instruction for the intended removal of a key once the correct node has been located among the chosen paths. Specific considerations are instructed, depending on the number of children present for the node paths and the operations are simplest for the path with less than two children. In the instance, where, the CAS needs to change the operation state of the curr-field, there is a need for the removal of the key from the set. At this instance, the requirement of this instruction can be clearly stated (Natarajan, 2014). It utilizes the ChildCASOp for the replacement of pred-child with the curr. This is done categorically with a null pointer or a pointer present in the curr. In case of the requirement of the guarantee for the marking associated with a node before the returning of the marking operations another call instruction can created to find the key using find(k) in case the CAS fails. A successful CAS value reveals that the relocation operation probably does not fail at this very stance it can be precisely estimated that the key is removed from the structure.

Optimizing traversal: the optimized action relative to the find instruction ignores or eliminates any present operations that is being performed in the in the duration of the search but still it is a priority to return any key without any adhered operation to it. This is can be evaluated with the help and retry mechanism that identifies any ongoing operations (Lowe, 2016). The back track gets stored during the traversal phase. In this phase, a chain of previously and logically deleted nodes could be traversed in a obvious manner depending on the memory management relative to the physically removed nodes, calling for a verification to detect the logically deleted node before any further operation. However, this optimization mechanism violates the theory of guarantee associated with the pred observed on the return of find, claiming it is free of any operation. This has an attached issue for the removal action of the nodes. While a search traversing a maerked node or ChildCASOp in predicted to be safer in a natural basis as they cannot affect the validation relative to the search outcomes (Miller, 2015).

Updates in the presented and the proposed algorithm are non-blocking as the threads are capable of performing and supporting the contemporary operations and actions at the necessary instances and thus, as a result, each of the operation is linear sable. The linearisability is executed in the chosen algorithm for the report with the precise definition for linearsisation points reflective to: add, remove and contains instructions for the operations field of the nodes. A deadlock situation refers to the situation wherein, more than two threads are blocked in a permanent way (Drachsler, 2014)

With the corresponding consistency maintained in the sequence of the ordering, in reference to the locking and un-locking of the threads, the deadlocks when present unlike this one, can be avoided.

Thread Starvation reflects to the situation wherein, different threads compete for a single mutex and only a single one gains the benefit (Mariano, 2015). The algorithm briefed in the report reveals the fact that a non-blocking algorithm as such, provides a guarantee that the system will comprise a wide system bearing the consisting the throughput even if there is a presence of thread starvation. There is a show of linearsiability, which in response to any concurrent execution presents a sequential history in relevance to the implementation object. However, there is no guarantee in an instance when a thread is lock free for less than a nanosecond. In this instance, the other thread has probably no chance for acquiring locks.

To prevent this very concurrency, the utilization of a thread yield function, this reschedules the order of the thread execution. This parameter thus, enables a fair locking system for the threads accumulated at the mutex. However, this is dependent on the scheduler implementation of threads(Cao, 2016).

Search trees as in this case provide the solutions related to the concurrency that is relative to the locking parts or display the memory usage. These search trees are traditionally non-trivial to parallelism due to the presence of multiple mutable fields per present nodes. The search trees relatively consume time similar to the simpler structures such as linked-lists (David, 2015)

updates relative to these search trees are no-blocking as the threads complements each others activities or operations and maintains linearisability. There are evaluated evidences that present the fact that the operation system is fast as compared to the alternative solutions and are scalability is present in large numbers in the execution of concurrent threads. This is witnessed to outperform the concurrent skip lists and there is 65% more throughput. The memory management implementation claims no extra space for each node but in the case of hazard pointers, five protected object pointers are necessary for each of the threads (David, 2015).

Majority of the list-based data structure adapted successfully in the concurrent and non-blocking form are oriented on the basis of the methodology of two-stage node deletion mechanism as enlisted utilizing the compare and swap to precisely indicate the specific pointer (Crain, 2013). For the search trees, the data structures included in the same comprises of multiple links and hence, the direct application of the mentioned method is not possible according to the Harris proposed mechanism.

In this instance, the Software Transactional Memory is a worthy tool in resolving the issues and provides a parallelizing data structure to availing the updates to multiple memory locations (Brown, 2014). Additionally in this concept, a contention management is required for deciding the sequence of transaction. STM proves to be beneficial as it outperforms to provide the solution as compared to the simpler locks set to protect the data structure (Brown, 2014).

In a thesis presented by Fraser, regarding the lock-free implementation concerning the internal system BST that comprises the usage of CAS that utilizes the descriptors to estimate and avail the amount of memory locations that requires a session of updates. On the removal of a node with two children demands eight automatically updated memory locations that provide a commendable overhead to the BST algorithm (Park, 2012).

Another proposed theory reflects on the first ever developed and practically approved model for the non-blocking external BST (Howley, 2012). Barnes method of cooperative updates is applicable for changing the routing nodes. A link containing all the details about the updates is technically copied into each node subjected to modifications. In this instance, if any thread is available can potentially be a part for the completion action prior to the creation of the new threads (Chatterjee, 2014).

Another relative theory is descriptive about the relaxed balance AVL is utilized to manage updates available per node but it avoids the issue of locking large segments of the tree structure during the process of removal and replacement. Basic improvements relative to the adjustments succeeding the updates made to balance the structure. This method has experimentally shown to bear the negligible increment in the node numbers (Ellen, 2014).

A theory by Pugh presented the skip lists, which is an alternative to balanced tress. At the base level, this system comprise of a linked list whereas, the base nodes have a considerable chance of containing extra standards of relative node links (Natarajan, 2014). The optimized structure as is briefed earlier provides a complex time structure for the location of nodes and henceforth, is placed at BST at times.

Conclusion:

This report is based on the findings and the propositions of the provided article, which solely highlights a non-blocking internal binary search tree that has though informally, has proved the data structure to be linearisable. Replacement relative to an internal node does not prevent any access to the other nodes out of the procedure of removal or replacement. The outcome of the experiment presents a positive concurrent performance when compared to the other present ordered sets. The algorithm proposed in the article is extremely memory efficient as compared to the other algorithms in consideration of large data structure. At the instance of comparison, with a lock free external BST, this proposed algorithm outperformed in every possible way providing 65% more throughput as compared with the other AVL trees and lock free skip lists, which are losing the potential for competitiveness. The experiments in the proposed paper utilized the keys, generated randomly (Braginsky, 2012). This provides the benefit relative to the proper balancing of the tree structures. In worst scenarios, while the keys are inserted in an ascending order a tree is expected to perform like a linked list, wherein, the balanced structures are expected to execute in a better way. The proposed paper has put forward an excellent lock-free memory management tom the tree using a methodology of CAS (a single word system), which demands no memory space per node.

Braginsky, A. & Petrank, E. (2012). In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures.

Brown, T. E. (2014). A general technique for non-blocking trees. (Vol. 49).

Cao, H. G. (2016). MagicDetector: A Precise and Scalable Static Deadlock Detector for C/C++ Programs. Arabian Journal for Science and Engineering , 5149-5167.

Chatterjee, B. N. (2014). Efficient lock-free binary search trees. In Proceedings of the 2014 ACM symposium on Principles of distributed computing , 322-331.

Crain, T. G. (2013). A contention-friendly binary search tree., (pp. 229-240).

David, T. G. (2015). Asynchronized concurrency: The secret to scaling concurrent search data structures. (Vol. 43).

Drachsler, D. V. (2014). Practical concurrent binary search trees via logical ordering. (Vol. 49).

Ellen, F. F. (2014). The amortized complexity of non-blocking binary search trees.

Howley, S. V. (2012). In Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures.

Liu, W. (2012). Research on cloud computing security problem and strategy. In Consumer Electronics, Communications and Networks (CECNet), 2012 2nd International Conference , 1216-1219.

Lowe, G. (. (2016). Concurrent depth-first search algorithms based on Tarjan’s algorithm. International Journal on Software Tools for Technology Transfer, , 129-147.

Mariano, A. B. (2015). Parallel (probable) lock-free hash sieve: a practical sieving algorithm for the SVP. In Parallel Processing.

Miller, G. L. (2015). Improved parallel algorithms for spanners and hopsets. In Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures , 192-201.

Natarajan, A. & Mittal, N. (2014). Fast concurrent lock-free binary search trees (Vol. 49).

Park, S. V. (2012). A unified approach for localizing non-deadlock concurrency bugs. . In Software Testing, Verification and Validation (ICST), 2012 IEEE Fifth International Conference , 51-60.

Ramachandran, A., & Mittal, N. (2015, January). A fast lock-free internal binary search tree. In Proceedings of the 2015 International Conference on Distributed Computing and Networking (p. 37). ACM..

Silver, D. H. (2016). Mastering the game of Go with deep neural networks and tree search. Nature.

Timnat, S., & Petrank, E. (2014, February). A practical wait-free simulation for lock-free data structures. In ACM SIGPLAN Notices(Vol. 49, No. 8, pp. 357-368). ACM.

Zhang, D., & Dechev, D. (2016). A lock-free priority queue design based on multi-dimensional linked lists. IEEE Transactions on Parallel and Distributed Systems, 27(3), 613-626.

Cite This Work

To export a reference to this article please select a referencing stye below:

My Assignment Help. (2018). Concurrent Search Trees: An Algorithm For Parallel Binary Search Trees. Retrieved from https://myassignmenthelp.com/free-samples/lock-free-binary-search-algorithms.

"Concurrent Search Trees: An Algorithm For Parallel Binary Search Trees." My Assignment Help, 2018, https://myassignmenthelp.com/free-samples/lock-free-binary-search-algorithms.

My Assignment Help (2018) Concurrent Search Trees: An Algorithm For Parallel Binary Search Trees [Online]. Available from: https://myassignmenthelp.com/free-samples/lock-free-binary-search-algorithms
[Accessed 26 April 2024].

My Assignment Help. 'Concurrent Search Trees: An Algorithm For Parallel Binary Search Trees' (My Assignment Help, 2018) <https://myassignmenthelp.com/free-samples/lock-free-binary-search-algorithms> accessed 26 April 2024.

My Assignment Help. Concurrent Search Trees: An Algorithm For Parallel Binary Search Trees [Internet]. My Assignment Help. 2018 [cited 26 April 2024]. Available from: https://myassignmenthelp.com/free-samples/lock-free-binary-search-algorithms.