Estimating the correctness of award values

8 views
Skip to first unread message

Colin Maudry

unread,
Apr 25, 2026, 5:47:02 AM (5 days ago) Apr 25
to standard...@open-contracting.org

Hello everyone,

I have been working for a while on French open contracting data (not OCDS...) in order to have it aggregated, clean, enriched and most importantly actionable. The details of the project are here https://decp.info/a-propos.

One of the main issues I face is anomalies in award values. Since the semantics of the field is "fixed or estimated maximum value", certain buyers enter 10 billion euros just to be sure "it will fit". I'm consequently looking for a methodology

- to detect aberrant award values
- to flag them (aberrant true/false? levels of correctness (suspect, aberrant)?)
- to suggest a reasonable value to enable aggregations (sums, average, median)? or is it misleading for the users?
- to explain why the value seems aberrant (e.g. the value is 7x higher than the median value of awards with the same CPV, city population, winner size and duration)

To give you an idea of the scale of the issue, I collect the data of 1,6 million awards (= lots) and only ~750 are above 1 billion euros (and it's perfectly normal for some of them). But I have seen 10¹² euros and even more!

The list of available fields is here: schema_base.json (english field names and definitions). It's basically a full OCDS award. I can also get:

- the list of subcontractors and values
- city populations

Thanks!

Colin Maudry

Camila Salazar

unread,
3:48 AM (7 hours ago) 3:48 AM
to Colin Maudry, standard...@open-contracting.org

Hi Colin! 


A couple of ideas that could be helpful:

  1. Before applying any statistical method, you can apply hard upper-bound rules to catch the most obvious errors (e.g the 10¹²  example).  You can think of a meaningful upper ceiling to compare (official total awarded value reported for a period), and any value exceeding or being very close (e.g 70% of the total award value) to that ceiling should be flagged.  

  2. You could try to apply the methodology for Flag R016 - Tender value is higher than average for this item category, which detects outlier values for a specific item category using the IQR (Interquartile Range).  So basically, you can calculate the distribution of award values grouping by CPV code, and procurement method and define outliers as those that are greater to the upper fence.   You can also group by additional variables to refine the “market” definition adding the variables you mention such as duration, city, buyer size, etc. I might start with CPV and method, and based on results could test with additional variables.  In each group you need enough observations to make the calculation meaningful.  In this case I would add an additional variable on the dataset with this “outlier value flag”, and explain the methodology used to the users. 

  3. I would assume most of these outliers occur in above-threshold procedures. However, for those that aren't open and have a lower threshold ("Procédure non négociée ouverte", "Procédure négociée restreinte",  "Procédure non négociée restreinte") then, if the value is  higher than the threshold it would be a flag in itself.  

  4. On subcontractors, do you have the subcontracting %? Because in these cases, if an award value is flagged, but we know 20% was subcontracted and we have those subcontracts, we can compare and estimate the full award amount. 

  5. Try using a machine learning method to detect outliers.  However, interpretability for the users might be harder.  

  6. Regarding replacing values to enable aggregations, there is no “right value” to use, and I think it can be misleading. My advice would be to disclose the issue to the users, flag those potential problematic values, but not correct the value. The final user can choose what to do with those observations depending on the analysis.  


Hope this is helpful, and please share the final output of the project!



Camila



--
You received this message because you are subscribed to the Google Groups "Open Contracting Data Standard - Discussion Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to standard-discu...@open-contracting.org.
To view this discussion visit https://groups.google.com/a/open-contracting.org/d/msgid/standard-discuss/16563dea-843d-4303-b393-80a244bfa058%40maudry.com.


--

Camila Salazar

Head of Data Analytics and Learning

@milamila07

Linkedin


Our 2025 Annual Report is out. Discover how we’re reimagining public procurement for real world impact.


www.open-contracting.org | follow us @opencontracting


Reply all
Reply to author
Forward
0 new messages