Probabilistic actions to extract artists from a text

Author: Raimon Bosch

Abstract

One of the main problems of software technologies is that often they only provide one possible solution for a problem. This obligates us to create human validation systems to cover this cases that are not that general, those exceptions that a computer can’t detect. But what if we provide several solutions for the same problem? What if at some point towards the solution of a problem we could take several decisions at same time? In this blog post we will view how to design a programming model that creates a probabilistic solution during execution time. So a problem will be decomposed in several solutions ordered from the most probable to the less probable.

Use case

Our use case is artist detection for content about concerts. Over the years we have developed a series of regular grammars that we can apply to try to guess an artists in this content. But when we wanted to be very specific about this rules we reached a point where the accuracy never could improve. So the system was limited to a set of very general rules that were working in most cases and we couldn’t develop rules that could solve edge cases because this would ruin other cases correctly calculated. See (1) and (2) to have an idea:

“Dizraeli & The Small Gods + Cate Ferris” (1)

“Andrew McMahon, Junior Prom & Hunter Hunted”  (2)

In example (1) the correct solution would be (artist1: “Dizraeli & The Small Gods”; artist2: “Cate Ferris”). If we apply this same rules to solve example (2) we would have (artist1: “Andrew McMahon”; artist2: “Junior Prom & Hunter Hunted”) but this is not the correct solution. The correct solution for (2) would be (artist1: “Andrew McMahon”; artist2: “Junior Prom”; artist3: “Hunter Hunted”).

If we apply a linear approach to solve this problem ‘&’ could not be always a separator because it creates more errors than matches. Furthermore, we need a human reader to check manually this ambiguities. So what we do to solve this ambiguity is creating a probabilistic fork so if we find an ‘&’ we consider it as a separator for 60% of cases, and not as a separator for the other 40% of cases. With this method example (1) and example (2) will keep all possible solutions considering ‘&’ as a separator and as part of the artist at the same time. We still need a human reader to reach a final solution because sometimes we will need to choose the less probable solution manually. But accumulating this information about cases where we have to pick secondary options to solve the problem we can adapt this probabilities by features such as kind of music, the venue where the band is playing, source of information, or city. At the same time for the human reader is always faster to pick between several solutions instead of having to correct them with his keyboard.

Algorithm

For this algorithm (Fig. 1) we use a very simple data structure. An array where the key corresponds to the solution provided and the value to the probability of this solution. We still want to keep probabilistic operations with 100% of probability for things such as erasing multiple spaces or erasing venue names. When we create a transition with 100% of probability the data structure may change but not the number of elements in it because we apply the same operation to all elements.

But when we apply an operation with probability 60% we create a fork with probability 60% applying this operation and a second one with probability 40% where this operation is not applied. So the data structure multiplies its number of elements by 2. To init the algorithm we need to provide a correct initialization such as (3) or (4). With this starting point the algorithm will have something to iterate through and it will start to create forks in case it needs to:

[“Dizraeli & The Small Gods + Cate Ferris” => 1.0] (3)

[“Andrew McMahon, Junior Prom & Hunter Hunted” => 1.0] (4)

function prob_action( action, prob, aResults, param1 = “”, param2 = “”)

    newResults = [];

    foreach( aResults as result => result_prob)

      switch(action)

        case “regex_replace”:

          result_new = regex_replace( param1, param2, result)

          result_fork = result

          break;

        case “string_replace”:

          result_new = string_replace( param1, param2, result)

          result_fork = result

          break;

      endswitch

      isset_new = isset(result_new) && result_new != “” && result_new != result

      isset_fork = isset(result_fork) && result_fork != “”

      if( isset_new  || isset_new && isset_fork)

        //We erase this result to fork it in one or two options

        erase(aResults[result])

        if(isset_fork && prob < 1)

          newResults[result_fork] = (1-prob)*result_prob

        endif

        newResults[result_new] = prob*result_prob

      endif

      erase(result_new)

      erase(result_fork)

    endforeach

    aResults = merge(aResults, newResults)

    sort(aResults)

endfunction

Figure 1. Algorithm to perform probabilistic actions

The second part of the algorithm is our own “linear” implementation calling to prob_action all the times we need with our specific rules. As we said before we have some operations with 100% probability and others below 100% such as string_replace(&). You can use this approach to transform any linear algorithm into a probabilistic model just by replacing your calls to regex_replace by prob_action(“regex_replace”, prob, aResults, /REGEX/, replacement) or in case you want to use string_replace use prob_action(“string_replace”, prob, aResults, replace, replacement). Obviously if you perform all your operations with probability 100% the model will keep being linear. The kind of operations to apply is infinite. In our use case regex_replace and string_replace are enough but you can define any other action that might fit in your use case.

Conclusion

We have reviewed a very short algorithm that helps you to create a probabilistic model to solve any problem. With this kind of model we can offer several solutions for the same problem helping human validation steps to be quicker. At the same time this kind of modeling opens the door for machine learning approaches adapting the probabilistic weights of each operation to internal features that we might have.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s