![]() Suppose, you have the number of " Ordered items" in column B and " Delivered items" in column C. When calculating a percent in Excel, you do not have to multiply the resulting fraction by 100 since Excel does this automatically when the Percentage format is applied to a cell.Īnd now, let's see how you can use the Excel percentage formula on real-life data. $\alpha = $ (the number of misses) and $\beta = $ (total number of transactions $-$ the number of misses).If you compare it to the basic math formula for percentage, you will notice that Excel's percentage formula lacks the *100 part. The parameters of the beta distribution would be e.g. If that probability is below some pre-defined level (for example the above mentioned 0.05), you'd consider it an anomaly.įor completeness: If you want to be even more precise (which I doubt, considering that you were given a wrong task in the first place), you can get a confidence interval of $p$ by modelling it by the beta distribution, and use the extreme, but still plausible $p$ in the above binomial distribution. Having all this, you can calculate the cumulative probability of observing at least as many misses as you actually had in the month in question. You can estimate $p$ from historical data, as the fraction of the total number of misses and the total number of transactions in the past months. With $n$ being the number of transactions and $k$ the number of misses in the month in question. ![]() Without having further domain knowledge of your data, the best you can do is to use the binomial distribution: You need to use a different, more appropriate probability distribution. However, your percentages are not normally distributed! As Richard Hardy pointed out in his comment, two SDs above the mean are already impossible to achieve, as it would be above 100%. That's presumably the reason why you were asked to compute these values. If your percentages were normally distributed, you could easily derive it from the mean and the standard deviation: values that are more than 2 SDs away from the mean appear with probability below 0.05. So the question remains how to calculate this probability. If it is very improbable (say, probability below 0.05), you may consider it to be anomalous. The best you can do is to calculate the probability:Īssuming a known probability of a transaction to be a "mis", how probable is to have the given number of misses in a month? ![]() There is no clear-cut answer to that question. Is the number of missed transactions within what could be considered "normal", or does it deviate so much to be considered anomalous? For you use, at least as I understand it, it seems to be incorrect.Īs I understand from your question and comment, you are trying to do anomaly detection.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |