Discuss about the BCO5501 Business Process Engineering System.

Drift Detection Methods

This paper is purposed to define how one is able to identify and detect accurately business process drift. The review gives different techniques in which the business is prone to changes and how to identify them. A transaction worker might begin carrying out a transaction in a different manner to fit into alteration in the amount of work assigned, period, rules or regulation governing the business. By use of early detection method according to the event registry commonly referred to as the detection of a drift in business transactions, strategists are able work on these changes, (Bevilacqua, Ciarapica & Giacchetta, 2009). The methods used in detection of a drift in business transactions are mostly based on the idea of extraction from a trace. In that, in case there are two different occurrence of business process over a period of time analysts can achieve a suitable level of accuracy by tracing the behavior and frequency of occurrence for a particular technique and hence adjust it.

The review aims at looking at the various important aspects of business processes which are the drift detection methods, the disadvantages of those methods and also the relevance of the methods. The review also gives related articles which are relevant to the review of this paper. This review is structured as follows; a brief description of business processes different methods of analyzing and synthesizing these processes, their relevance, and also how to perform a statistical analysis, (Harmon & Trends, 2010).

The main aim of business process drift detection is to formulate techniques which help to identify a drift in the business process. It entails different methodologies that are able to show the drift before or after it has occurred. A drift maybe due to a competitive environment, supply, demand and also the regulatory environment. The existing methods are based mainly on extracting patterns from traces as stated before. For example; one event S occurs more frequently than another R or even R occurs more than once in a pattern, (Houy, Fettke & Loos, 2010).

The above article proposes the automation and scaling of techniques in the detection of drift in transactions of a business. Drift methods dwell mainly on the following important aspects; first, there is simply the identification of the point whereby there is a significant difference in the observed processes between and after that particular point. In fact, it addresses the question of where are these processes the same or when there is trace equivalence. It also shows a method of identifying drifts on various channels and sources. This is whereby there is monitoring of any statistical significant changes in the distribution that may be split to an identification and discovery population. It also brings out the aspect of use of the adaptive window and finally the evaluation on synthetic logs, (Röglinger, Pöppelbuß & Becker 2012).

Statistical Testing Over Runs

Drift detection methods; this drift detection methods dwell mainly on the statistical view point. These methods can be used from the registry of events to sectional order running. It has basically a collection of simple traces where each requires and captures the event sequences. It also contains three other definitions;

a)The event log trace-it can be considered as B being an event log and C being collection of event occurrence, µ; B>L being a function for labeling. Any trace of an event ΦEB expressed in accordance to say any order f E (0, n-1) if the trace expresses all orders of events, this gives a relationship to concurrent partial order run, (Hallerbach, Bauer & Reichert, 2010).

b)The concurrency of alpha- considered as basically a relation that is symmetrical that is used as a label on the occurrence of events. As stated before the parallel relationship is altered to give incomplete presentation of the listing of events.

However, this method is highly stressed in that it does not capture event concurrency.

Bos et al. (1,3) “Dealing with concept drifts in process mining.” (2015)

Detecting process drift using statistical testing over feature vectors makes the method insufficient in identification of specific drift types which may include insertion of a move that is conditional.

Statistical Testing Over Runs; This method is used mainly in the identification of drifts in a run source. This is carried out basically on a pair of population which have similar quantity and sit on runs that are latest. These runs are then divided into two; the reference and the detection population, hence resulting to two transfixed windows which are also called the reference and the detection window. From these two windows, a sequence is obtained in which for every new run observed, both the reference and the detection window are run in the right order to show a new run and perform a new statistical test.

The main disadvantage of this method from Accorsi (“Discovering workflow changes with time-based trace clustering. In: Data-Driven Process Discovery and Analysis.” Springer (2012) 154–168) is that drifts detection based on trace clustering is heavily varies with the size of the window that is selected for the test. This leads to a situation whereby a smaller sized window causes positives that are false and a bigger sized window causes negatives that are low since detection of occurring drifts happens, (Weske 2012).

EVALUATION ON SYNTHETIC LOGS; A tool that can read a complete event logs is established so as to be used in the statistical test. So as to ensure accuracy in evaluating a drift the measurement of both mean of recall and delay are taken. Secondly, they are computed in terms of average number of log traces and finally from the result, the drift is realized from the information. In this method, the goodness is measured on the basis of how much accurate and scalable they are in various configurations, it is able to accomplish the tracing of events. The traces are then dynamically incorporated to modify the alpha based association for a given group of twos events, and further transform an order run that is partial, which then produces a stream of runs, (Van Der Aalst 2013).

Evaluation on Synthetic Logs

 To assess the accuracy two measurements are used, an F-score is used as both the harmonic average and the mean delay. It is then to find the mean and then detects the drift. This measures not only how late the drift occurs; it also computes the scalability of tracing logs in detecting the drifts.

IMPACT OF WINDOW SIZE ON ACCURACY; In this experiment fixed window sizes ranging from 30- 160 traces in measurement of 30 against each of the 72 logs are used. The F-score obtained with four log sizes (2500 to 10000traces) whereby in each log the F-score averaged produced 18 change patterns. It is evident that the F-score changes with a rise in the size of the window until it reaches a plateau point at the size of 150.As shown in the linear graphs below.

From these results it is evident that the more the data points used to include the reference and detection windows, the more accurate it becomes leading to a detection of all concept drift, with few or no false positives, (Zur Muehlen & Recker, 2013).

An interesting feature from the findings is that after an initial high mean delay, the mean delay grows very slowly as the window size increases. This proves that the method is resilient in terms of mean delay to increase in window size, having a relative low delay of around 40 traces when the window size is 50 or above.

The conclusion from these findings is that the accuracy of the results according to traces display is repeatedly slightly less compared to the runs hence affirms the insinuations.

How accuracy is affected with the window size.

The versatile window strategy on F-score and delay mean was surveyed and observed to be mean of the three log sizes of 5000,7500 and 10000 follows, with the acquired outcomes utilizing a versatile window. For instance, an unchanged 25 size of a window, with those got in a versatile window introduced to 25traces. In any case, the size log of 2500 follows was not used to stay away from the interaction between window size and number of drifts as seen in logs.

From the outcomes got it is clear that the adaptive window beats the unmovable window both regarding f-score and mean delay. The capacity to progressively change the window estimate in light of the varieties enables a satisfactory number of runs to be gotten that is not very substantial or too little, with reference to location windows to perform test factually.

Impact of Window Size on Accuracy

To finish up the discoveries it is discovered that it prompts a reduced mean delay when the window measure is reduced when the variety are lower cause the listed scenarios have a reduced count of runs that are adequate to carry out the factual test. The fundamental preferred standpoint is that the versatile window technique conquers the low exactness got when the window estimate counts are lower or equal to 25 number of traces and a mean count of 28. This likewise results to a conclusion that the mean delay if kept as low as conceivable is necessary to get various large counts of drifts as could be expected under the circumstances.

To additionally verify exactness, the common point of f-score and mean delay for every 12 simply change pattern and 6 composite change pattern are checked. The window size is then settled for a hundred traces used to give the best exchange off regarding f-score, the delay of mean plus the mean quotient got from the given adaptive windows instated with a hundred traces on the following log sizes of 5000,7500 and 10000 traces.

Also, the strategy experiences a sensible lower f-score both for settled and versatile windows for the recurrence change pattern(fr). this design results to the adjustment of recurrence of specific occasions connection in the log. The low f-score is because of a low accuracy, (Rao, Mansingh and Osei-Bryson 2012).

Data set generation

On informational collection age an informational collection of 18 logs, one of each change design is join with the base log of 18 modified logs in the interleaving manner.To reproduce slow changes ,the base log is joined with an adjusted log in an alternate manner.It basically begins by testing traces from the base log just so the quantity of traces increments. In this way expanding the likelihood of testing from the modified log until just the traces from the adjusted log are inspected. This is otherwise called likelihood slow float.

To test the conduct of two logs the straight likelihood work with an incline of 0.2% are used.In other terms,from a likelihood beginning at 1(rep increment) by 0.02 each time another trace is broke down to achieve zero after 500 traces, every slow float interim includs 25% of the conduct of each log so an exact segment of each float is known, (Trkman 2010).

To asess the exactness of the continuous float location technique, the f-score and the mean delay,are characterized in a marginally extraordinary way.To process prcision and review for the f-score it is said that an identified float is genuine positive and on the off chance that it incorporates the essential issue of the interim of the real float is viewed as a false positive.To check this if the real float occurs between the numbers 751 and 1250 the middle would be the trace number 1000. Which for this situation a continuous float that it would identify involves the middle 1000, (McCormack and Johnson, 2016).

Data Set Generation

Nonetheless, if an examination is made for sudden float location, the technique accomplishes a f-score that is beneath 0.7 and a moderately higher mean deferral for three patterns.For disposition, the most minimal exactness is determined with the fr design which is in accordance with the outcomes for sudden float recognition, (Scheer 2012).


To conclude it is clear that a computerized technique for recognizing sudden and continuous float in business transactions making use of traces is discussed and found to be effective. It likewise gives an assessment over  logs further demonstrates if strategy precisely finds normal changes in processes, (Neiger, Rotaru and Churilov, 2009). It additionally demonstrates that continuous drift identification techniques primarily depend on pure presequites that a steady float is not limited to two back to back sudden drifts, in a way that runs dissemination contained is a straight blend of dispersions of runs preceding the two drifts. The exactness  established is the thing that demonstrates that the presumption for the most part holds in place. Coming up with a better and  modern strategy could alter the avenue for future research and analysis,(Papageorgiou 2009).

It also holds that the accuracy for any data trace should be evaluated so as to detect drift in the data trace.T he different methods outlined in the review provides the relevant techniques to determine the drift in business processe. this  aanalysis proves to be impotrant in business processes management since it allows for one to predict changes in business process and detrmine the legitimacy and outcome of the business process environment.This could be in terms of supply, demand, or change in regulatory environment.


