One work package is dedicated to the management of the project itself.
The first work-package (WP 1) will be mainly devoted to the constitution and descriptive analysis of our database, noted DB in the rest of this document. It will be constructed by combining two main sources: 1) the BOAMP (Open Data of French Procurement), which will provide us with a description of public procurement tenders; and 2) Legifrance (Open Data of Court Decisions), which will be used to identify corrupted cases among them. This cross-referencing between both sources will allow us to constitute our ground truth. We will also leverage secondary sources, in particular newspaper reports by Transparency International France (TIF, a civil society organization leading the fight against corruption, and which supports DeCoMaP, cf. Appendix).
WP 1 will provide us with a single, integrated DB containing all the relevant information extracted from the above sources. It will contain attributes describing both types of economic agents (suppliers and buyers) as well as tenders. Inside this DB, we distinguish two distinct types of information, which we will later leverage in different ways. The first is individual information and corresponds to all attributes describing an agent or tender independently from the rest of the system. The second is relational information, which corresponds to attributes connecting agents or tenders (e.g. id of the winning supplier).
The second work-package (WP 2) will deal with the deployment of existing automatic fraud detection methods, and with the development of new ones. Both will leverage the data obtained at WP 1, through supervised and unsupervised approaches. Among the existing methods, one can distinguish the regressionbased ones already used for fraud detection in the literature, and more recent machine learning methods which have just started spreading in economics (notably in the closely related problem of collusion detection in public procurement by D. Imhof [40, 42–44]), and have not been used for fraud detection yet. We plan to use the regression-based methods as baselines when assessing the performance of the other methods. Both types of existing methods focus on individual information, i.e. features describing each economical agent or tender separately.
The main challenge will be the development of new graph-based methods, able to take into account the relational information contained in the DB. The first issue is to extract graphs modeling the relationships between suppliers and public buyers, and the second to analyze them in order to identify fraudulent situations. In addition, we plan to explore a number of complex network features allowing to incorporate as much information as possible in our models, including the individual information used by existing classification approaches. This will allow us to leverage both types of information simultaneously to predict fraud. More precisely, we will combine signed and attributed graphs (cf. Section b2): to the best of our knowledge, this is new, and therefore requires the development of specific methods, both for extraction and analysis.
Finally, in the third work-package (WP 3), the consortium will calibrate the classification tools (developed in WP 2) based on the ground truth dataset of corrupted procurement practices obtained at the end of WP 1. By applying the developed tools over the periods on which both market data and court decisions are known, we can identify problematic situations and legal vulnerable environments (types of procedures, nature of clauses, characteristics of actors, data formats etc.). A legal and economic analysis of the latter will make it possible to produce public policy recommendations in terms of public procurement rules and data openness. The production of a functional data-visualization tool for the public at large, and publication of good practice guides will conclude the project.
The success of the project lies with its interdisciplinary approach. To achieve the objective of the three WPs, we have identified and put together three domains of expertise: Data Mining, Economics of Procurement and Administrative Law. In the description of the WPs, we precisely describe the various technical objectives we want to achieve and how we plan to develop synergy between the three fields of expertise to reach the general goals of the project.