We commonly find different data sources with a shared character variable as a key (for example, city names). Often, these character columns do not match due to typos. In this post I will use the Levenshtein distance as a tool to pair strings from two different data sources.
This post was driven by a demand I received last week when I had two datasets and a simple task: “Merge it!
