The control of the production of the Unified Health System (Sistema Único de Saúde - SUS) is a challenge of vital importance for its managers, and this control is central to the mission of the Department of Regulation, Evaluation and Control (DRAC) of the Secretariat of Health Care (SAS) of the Ministry of Health (MS). In order to improve its control tooling, DRAC asked UFMG to develop a system that explores machine learning and data mining techniques for the automatic detection of deviations from historical series and other anomalies.

Main aim

The main aim is the detection of anomalous productions through statistical and/or machine learning methodologies. Productions relate to the number of procedures performed or to prices operated by health service providers. In addition, the strangeness associated with establishment is expressed by a score between 0 and 1 so that the larger the oddest score (possibly some fraud) is the production of the establishment in question.

Outra demanda é a confecção de relatórios personalizados para cada possível combinação de janela de tempo, hospital e procedimento (aproximadamente possíveis relatórios ). Ou seja, para cada possível janela de tempo (3, 6, 12 e 24 meses) o usuário pode escolher um procedimento (por exemplo: tomografia) e avaliar a estranheza de um determinado estabelecimento com base em alguns gráficos e seu respectivo escore.

Due to the large number of reporting combinations, the document preprocessing process becomes impractical. For this, routines were created in which the user indicates the desired settings and the algorithm automatically generates the desired report. As an online request, it was necessary to use computationally interesting functions to reduce the user’s waiting time.

Below images of some reports created from R. packages. (Click to enlarge)


To work with the approximately 2TB of data, several technologies were used: management tools, database and bi. The entire data analysis and presentation of results was implemented in R language. Below are the most used packages during the project:

  • stringr: String manipulation
  • dplyr: Data handling
  • data.table: Data handling
  • ggplot2: Graphics
  • cairo: Development of reports

My tasks

  • Supporting anomaly detection algorithms
  • Development of new methods for anomaly detection
  • Reporting maintenance
  • Development of other graphic visualizations required by the project


De Junho de 2013 até Junho de 2015.