[cfp] SynDAiTE II@ECML - Workshop on Synthetic Data for AI Trustworthiness and Evolution (SynDAiTE 2026)

1 view

Skip to first unread message

MARCO Piangerelli

unread,

Apr 10, 2026, 3:03:20 AM (4 days ago) Apr 10

to AIxIA mailing list

***************************************************

Apologies for multiple posting.

Please distribute this call to interested parties.

****************************************************

-----------------------------------------------------

Call For Papers

-----------------------------------------------------

Workshop on Synthetic Data for AI Trustworthiness and Evolution (SynDAiTE 2026)

to be held as part of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2026)

Workshop adjunct proceedings published by Springer-Verlag:

Date: September 11th , 2026 - Napoli (Italy)

Web: TO BE DEFINED

-----------------------------------------------------

Important Dates

-----------------------------------------------------

Paper Submissions: May 31st, 2026

Notifications: June 6th, 2026

Camera-Ready Contributions: July 10th, 2026

Workshop: September 11th, 2026

All deadlines are 11:59 pm Pacific Time

------------------------------------------------------

Workshop Aims and Scope

------------------------------------------------------

The rapid advancement of artificial intelligence (AI) relies heavily on access to large, diverse, and high-quality datasets for training and evaluation. However, the increasing scarcity of data, strict privacy regulations, and the high costs associated with collection and annotation are creating significant barriers to progress. Projections suggest that by 2050, we may face a shortage of fresh text data, and by 2060, image data may become similarly limited. These challenges make it imperative to explore alternatives that can sustain AI’s growth and effectiveness. Synthetic data offers a compelling solution to these issues, with the advantages of scalability, customisation, and inherent anonymisation. It allows for the generation of large volumes of tailored datasets without the same privacy and cost concerns of real data.

--------------------------------------------------------

Workshop Keywords

--------------------------------------------------------

Synthetic data · Trustworthiness · Explainability · Data drift · Red Teaming · Evolving Systems

-------------------------------------------------------

Workshop Topics

-------------------------------------------------------

SynDAiTE welcomes contributions on the use of synthetic data on all topics below, independent of the application domain (e.g., finance, health, business, basic sciences, construction, computational advertising, IoT, etc.) and of data types (e.g., networks, graphs, logs, spatiotemporal, multimedia, time series, genomic sequences, and streaming data):

Synthetic Data Generation:
- Techniques for high-fidelity, domain-specific synthetic data generation.
- Customisation for anomaly and rare-event detection.
- Scalability and adaptability to various applications.
- Ethical aspects of synthetic data.

Responsible AI:
- Meta-learning for understanding models and algorithms.
- Privacy-preserving ML.
- Methodologies to support evaluation according to Responsible AI pillars.
AI Auditing and Red Teaming:
- Stress testing models and algorithms.
- Interpretability and explainability for auditing ML systems.
- Adversarial ML.

Challenges in Synthetic Data Use:
- Fidelity and accuracy concerns in real-world applications.
- Bias detection and mitigation strategies.
- Validation frameworks to ensure reliability and generalisation.
Dynamic and Temporal Contexts:
- Generating data for streaming environments and micro-batch processing.
- Incorporating temporal complexity and drift phenomena.
Learning Frameworks:
- Online continual learning with synthetic datasets.
- Data stream mining with synthetic drifts and anomalies.
- Supervised and unsupervised learning with synthetic augmentation.
- Evaluating synthetic data’s impact on model performance and robustness.
Anomaly, Novelty and Drift Detection:
- Leveraging synthetic data for rare-event detection in evolving datasets.
- Mitigating challenges associated with concept drift and changing data distributions.
- Using synthetic data to test and refine anomaly detection frameworks.
- Simulating anomalies in stream/dynamic environments.
- Applications of Data Synthesis for AI training (Healthcare, Finance, Transportation, Cybersecurity).
Explainable Synthetic Data Generation:
- Explainable synthesis methods, including intrinsically interpretable generators (e.g., structured/constrained models, probabilistic graphical models) and explainability-enhanced approaches.
- Frameworks and tools for tracing, auditing, and evaluating explainability in synthetic data
LLM- and Agent-Based Synthetic Data Generation:
- Synthetic data generation using large language models and foundation models.
- Agent-based approaches for synthetic data generation.
- Hybrid methods combining LLMs with probabilistic, causal, or structured models.
- Evaluation of reliability, bias, controllability, and explainability in LLM-generated synthetic

-------------------------------------------------------

Submission and Publication

-------------------------------------------------------
Papers must be written in English and formatted in LaTeX, following the outline of our author kit https://ecmlpkdd-storage.s3.eu-central-1.amazonaws.com/2025/ECML_PKDD_2025_Author_Kit.zip. The kit includes a README document, a LaTeX file template containing author instructions, and style files. The maximum length of papers is 16 pages (including references) except for short papers, where the limit is 10 pages in this format. The program chairs reserve the right to reject any over-length papers without review. Papers that “cheat” the page limit by, including but not limited to, using smaller than specified margins or font sizes, will also be treated as over-length. Note that, for example, negative spaces are also not allowed by the formatting guidelines; further details can be found in the author kit. Up to 10 MB of additional materials (e.g., proofs, audio, images, video, data, or source code) can be uploaded with your submission. If there is an appendix, ensure it is submitted separately from your paper, which, combined with the main matter, must adhere to the page limit. The reviewers and the program committee reserve the right to judge the paper solely on the basis of the 16 pages; any additional material is at their discretion and not required.

The submission must also be anonymized; authors must omit their names and affiliations from submissions and avoid obvious identifying statements (e.g., citations to the author’s own prior work should be made in the third person). Finally, the submission must not be under review at any other publication venue. Failure to adhere to policies will result in desk rejection.

We strongly recommend using the template above and providing paper code and data in an Anonymous GitHub repository https://anonymous.4open.science/.

We welcome four types of submissions, each with specific expectations regarding length and content. Reviewers will assess whether the size is appropriate for the type of contribution. Accepted papers may be selected for oral or poster presentations; short papers will be presented as posters.

Full papers (up to 16 pages) should be clearly placed in the context of the state of the art and state the proposal's contribution in the application domain, even if presenting preliminary results. In particular, research papers should describe the methodology in detail, experiments should be repeatable, and comparisons with existing approaches in the literature should be made.
Reproducibility/Replicability papers (up to 16 pages) should repeat prior experiments using the original source code and datasets to show how, why, and when the methods work or not (replicability papers) or should repeat prior experiments, preferably using the original source code in new contexts (e.g., different domains and datasets, different evaluation and metrics) to generalize further and validate or not previous work (reproducibility papers).
Case Study papers (up to 16 pages) should provide in-depth analyses of real-world applications in which synthetic data plays a crucial role in addressing challenges in anomaly detection, streaming environments, and machine learning adaptation. These papers should present practical implementations, industry experiences, or applied research findings, offering insights into the effectiveness, limitations, and lessons learned from deploying synthetic data-driven solutions in dynamic AI systems.
Short or position papers (up to 10 pages) should introduce new perspectives on the workshop topics or summarize a group's experience in the field.

Generative AI Usage Policy. Generative AI models, including ChatGPT, BARD, LLaMA, or similar LLMs, do not meet the criteria for authorship of papers accepted for the workshop. If authors use an LLM in any part of the paper-writing process, they assume full responsibility for all content, including checking for plagiarism and the correctness of all text.

Originality and Concurrent Submissions. Papers submitted should report original work. Papers that are identical or substantially similar to papers that have been published or submitted elsewhere may not be submitted to ECML PKDD, and the organizers will reject such papers without review. Authors are also NOT allowed to submit or have submitted their papers elsewhere during the review period. Submitting unpublished technical reports available online (such as on arXiv), or papers presented in workshops without formal proceedings, is allowed, but such reports or presentations should not be cited to preserve anonymity.

Submissions that do not follow these guidelines or do not view or print properly will be desk-rejected.

Post-Proceedings. The accepted papers and the material generated during the meeting will be available on the workshop website. As per ECML-PKDD’s guidelines, the Workshops and Tutorials will be included in a joint Post-Workshop proceeding published by Springer Communications in Computer and Information Science (indexed on Google Scholar, DBLP, and Scopus), in 1-2 volumes, organized by focused scope. Papers' authors will have the option to opt in or out.

----------------------------------------------------------

Registration and Presentation Policy

----------------------------------------------------------

Each accepted paper must have at least one author registered for the full conference by the early registration deadline and must be presented at the workshop even if they opt-out of the post-proceedings. We expect the authors, the program committee, and the organizing committee to adhere to the ECML-PKDD Code of Conduct.

The Main Conference organization team will manage the registration: https://ecmlpkdd.org/2026/attending-registration/

---------------------------------------------------------

Workshop Chairs

---------------------------------------------------------

Dr. Marco Piangerelli

University of Camerino, Camerino (Italy) and Vici & C.

Email: marco.pi...@unicam.it

Dr. Bardh Prenkaj

Technical University of Munich (Germany)

Email: bardh....@tum.de

Ylenia Rotalinti

Brunel University London, London (UK) / Medicines and Healthcare Products Regulatory Agency (MHRA), London (UK)