This WP starts after both WP 1 and 2, because it requires some of the resources they allow to produce. Indeed, it takes advantage of the dataset extracted during WP 1, and more particularly its associated ground truth, and it relies on the methods developed in WP 2. It is application-oriented, with three objectives. First, properly assess the performance of the proposed methods relatively to our ground truth (Task 3.1). Second, integrate these methods in a mainstream tool allowing to conveniently visualize the produced results (Task 3.3). Third, analyze and discuss the results obtained with our tool, in order to identify which criteria could be used to improve real-life public procurement practice (also Task 3.3).

Leader: R. Figueiredo (LIA)

Task 3.1: Qualitative evaluation and analysis

In WP 2, we propose new methods for automatic fraud detection, and evaluate quantitatively their performance thanks to our ground truth. This Task 3.1 directly follows this work, as it consists in performing a qualitative evaluation of the obtained results. We will thoroughly examine these results from the perspective of Law and Economics. WP 2 methods make it possible to determine red flags endogenously. They also allow them to be prioritized and their interdependence to be studied. This makes it possible to propose a normative analytical grid by highlighting the main factors that public authorities should identify and pay attention to. This study will complete the descriptive analysis performed in WP 1 on our dataset. In particular, one of our goals here is to identify which information exactly, among the many data initially collected, is the most relevant to predict fraud and corruption. We will also compare our results to the experts’ opinions collected during Task 1.3, in order to shed light on the priors of experts. We will compare the characteristics of fraud and corruption in France with other countries.

Moreover, this qualitative evaluation is also a part of an iterative cycle aiming at improving our methods classification proposed at WP 2. Indeed, this application-based feedback will allow us to determine if our results “make sense”, from the perspective of Law and Economics, and help us to identify potential limitations in our methods. If so, we will go back to WP 2 to propose some improvements aiming at solving these limitations. Then, back to WP 3 to assess the modified methods and study the resulting changes. The process will be repeated until satisfaction. This interpretation work will be realized mainly by Datactivist and LBNC.

  • Deliverables: mid-term and final reports.
  • Success indicators: scientific publications.
  • Partners involved: LIA, LBNC, CRA, Datactivist.

Task 3.2: DPGR, open data and predictive automation

By the very nature of the issues addressed in DeCoMaP and the data mobilized, the project is subject to counter-injections. On the one hand, the ground truth dataset of French corrupted procurement obtained has its own scientific interest, and will, therefore, be made freely available to the scientific community. On the other hand, the European general data protection regime encourages caution in the use of sensitive data (in particular personal and judicial data). Thus, this task will consist in studying the performance of the detection tool when voluntarily limiting the data processed. This limitation is two-fold. Data that can be used for scientific research purposes are not the same as data that can be used for scientific dissemination purposes. We will conduct a field experiment in order to compare the way in which individuals, with preconceived notions about fraud in public procurement, judge whether or not a market is fraudulent when they are assisted by the non-degraded tool, using all available data, with the judgment of individuals assisted by a voluntarily restricted tool but informed of these restrictions. The second aims to identify and prioritize the data essential to the identification of fraudulent cases in order to provide normative recommendations on open data formats.

  • Deliverables: final reports, prototype of the classifier.
  • Success indicators: scientific publications.
  • Partners involved: LIA, LBNC, CRA, Datactivist.

Task 3.3: Visualization and good practices

This task can be considered as the final step of the whole project, as it takes advantage of all the work performed before. We will develop a proper software including the most relevant methods, as evaluated during Task 3.1 and 3.2. It will include a fully graphical interface, and will be designed for a mainstream audience. It will be published under an open source license, and will be freely available to the public. However, our goal is also to provide a tool to all the professionals concerned with the transparency of public procurement: journalists, academics (especially from Economics, Law and Political Sciences), investigators, NGOs. In particular, the tool will be used by Transparency International France. This part of the work will be conducted by the engineer to be recruited by the LBNC, in collaboration with Transparency International France.

In addition to the software, we will provide two guides. The first guide will present best practices to publish data, for legal procedures as well as on recommended standards. The second guide will be a purchasing best practices guide for public buyers and regulators. This guide will summarize the methods for fraud detection and detail the purchasing procedures which are less susceptible to corruption. It also aims at providing valuable insight into the reliability of data to reflect the level of transparency.

  • Deliverables: final reports, complete software with GUI.
  • Success indicators: scientific publications, software performance.
  • Partners involved: LIA, LBNC, CRA, Datactivist.