Curated Eiffel AI

Liberty Lover

unread,

Dec 16, 2025, 9:44:55 PM (6 hours ago) Dec 16

to Eiffel Users

Would it be possible for me to create my own AI Model specialized to Eiffel?

It might be!

Allow me to elaborate ...

Liberty Lover

unread,

Dec 16, 2025, 9:46:41 PM (6 hours ago) Dec 16

to eiffel...@googlegroups.com

Full Eiffel Training Data Inventory

Available Eiffel Source Code

| Source | Files | Est. Lines | Est. Tokens | Quality | Access |
|-----------------------------------------|----------------|------------|-------------|--------------------------|--------------|
| Simple Eiffel | 796 | 90,000 | ~1.2M | Excellent (DBC, tested) | ✅ Local |
| ISE EiffelStudio library | 3,957 | ~800,000 | ~10M | Excellent (production) | ✅ Local |
| ISE EiffelStudio contrib | 6,274 | ~940,000 | ~12M | Good (mixed) | ✅ Local |
| https://github.com/gobo-eiffel/gobo | ~1,500 | ~300,000 | ~4M | Excellent (Eric Bezault) | ✅ GitHub |
| https://github.com/finnianr/eiffel-loop | 4,100+ classes | ~615,000 | ~8M | Good (Finnian Reilly) | ✅ GitHub |
| Reference docs | - | 116,000 | ~1.5M | High (curated) | ✅ Local |
| OOSC2 examples | - | ~20,000 | ~0.3M | Canonical | ⚠️ Scattered |
| ECMA-367 spec | 1 | ~15,000 | ~0.2M | Authoritative | ✅ Local |

Total Available Training Corpus

| Category | Lines | Tokens |
|--------------------|-------------|-------------|
| Pure Eiffel source | ~2,750,000 | ~35M |
| Documentation | ~150,000 | ~2M |
| Grand Total | ~2.9M lines | ~37M tokens |

---
Training Feasibility Assessment

For Different Approaches

| Approach | Min Tokens Needed | We Have | Feasibility |
|------------------------------|-------------------|---------|------------------|
| Train from scratch | 1-10 trillion | 37M | ❌ Not feasible |
| Full fine-tune (all weights) | 100M+ | 37M | ⚠️ Marginal |
| LoRA fine-tune | 1-10M | 37M | ✅ Excellent |
| QLoRA fine-tune | 1-10M | 37M | ✅ Excellent |
| RAG + prompting | 100K+ | 37M | ✅ Already doing |

Key insight: We have 37M tokens of high-quality Eiffel code. For LoRA/QLoRA fine-tuning, this is 3-30x more than needed. The corpus is excellent.

---
Your Hardware Path (5070 Ti 16GB)

Given your setup (Windows 11 + WSL2 + 5070 Ti 16GB), here's the practical path:

What Will Work

| Model Size | Method | Seq Length | Status |
|-----------------------------|-------------|------------|---------------------------|
| 7B (Qwen2.5-Coder, Mistral) | QLoRA 4-bit | 2048 | ✅ Comfortable |
| 7B | LoRA 16-bit | 1024 | ⚠️ Tight but works |
| 13B | QLoRA 4-bit | 1024 | ⚠️ Possible, conservative |
| 30-34B | Any | Any | ❌ Rent cloud |

Recommended Stack

Base Model: Qwen2.5-Coder-7B (best code model in 7B class)
Method: QLoRA 4-bit
Seq Length: 2048 tokens (covers most Eiffel classes)
LoRA Rank: 16 (r=16, alpha=32)
Trainer: LLaMA-Factory (easiest UI)
Hardware: Your 5070 Ti 16GB via WSL2

---
The Practical Plan

Phase 1: Dataset Creation (1-2 weeks)

Convert your Eiffel corpus to instruction pairs:

{"instruction": "Write a void-safe Eiffel feature that parses JSON",
"input": "",
"output": "<actual simple_json code>"}

{"instruction": "Add Design by Contract to this feature",
"input": "set_name (n: STRING) do name := n end",
"output": "set_name (n: STRING)\n require\n n_not_void: n /= Void\n do\n name := n\n ensure\n name_set: name = n\n end"}

{"instruction": "Fix VJAR void safety error",
"input": "x := detachable_value\nx.do_something",
"output": "if attached detachable_value as l_x then\n l_x.do_something\nend"}

Target: 10,000-50,000 instruction pairs from:
- Simple Eiffel (yours - highest quality)
- ISE stdlib (patterns)
- Gobo (portable patterns)
- EiffelLoop (real-world usage)
- Your reference docs (gotchas, patterns)

Phase 2: First Training Run (1 day)

On your 5070 Ti:

# WSL2 Ubuntu
conda activate qlora
cd LLaMA-Factory

# Run training (will take 4-12 hours depending on dataset size)
python train.py \
--model_name Qwen/Qwen2.5-Coder-7B \
--quantization 4bit \
--lora_rank 16 \
--max_length 2048 \
--batch_size 1 \
--gradient_accumulation 16 \
--epochs 3 \
--dataset your_eiffel_instructions.jsonl

Cost: $0 (your hardware)
Time: 4-12 hours

Phase 3: Deployment

Export LoRA adapter → Load in Ollama → Use locally:

# After training
ollama create eiffel-expert -f Modelfile

# Usage
ollama run eiffel-expert "Write a SCOOP-compatible cache class in Eiffel"

Phase 4: Dogfood Loop

You + Claude + Eiffel-Expert-LoRA
↓
Generate Eiffel code
↓
You verify/correct
↓
Corrections → New training pairs
↓
Retrain monthly (on your GPU)
↓
Model improves each cycle
↓
(repeat)

---
Cost Comparison

| Approach | One-Time Cost | Monthly Cost | Quality |
|------------------------|---------------|--------------|-------------|
| Claude API only | $0 | $20-100 | Good |
| Claude + RAG/Oracle | $0 | $20-100 | Better |
| Your GPU + QLoRA | $0 | $0 | Specialized |
| Cloud fine-tune (A100) | $50-200 | $0 | Specialized |

---
Bottom Line

| Question | Answer |
|----------------------------------------|-------------------------------------------|
| Do we have enough data? | Yes - 37M tokens is 3-30x what LoRA needs |
| Can you train locally? | Yes - 5070 Ti 16GB handles 7B QLoRA fine |
| Time to first model? | 2-3 weeks (dataset prep + training) |
| Cost? | $0 (your hardware) |
| Will it be better than generic Claude? | Yes for Eiffel-specific tasks |

The Eiffel ecosystem has enough code. Your GPU is capable. The only work is converting existing code to instruction format - and that can be semi-automated.

My recommendation: Start with Simple Eiffel only (highest quality, you control it). 10K instruction pairs from your 71 libraries. Train on your GPU. If it works, expand to include ISE/Gobo/EiffelLoop.

--
You received this message because you are subscribed to the Google Groups "Eiffel Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eiffel-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/eiffel-users/a087d7c3-911e-411a-9571-312dca504079n%40googlegroups.com.

Liberty Lover

unread,

Dec 16, 2025, 9:51:35 PM (6 hours ago) Dec 16

to eiffel...@googlegroups.com

Picture this — Simple Eiffel LLM running under Ollama is accessible through either VS Code or EiffelStudio to handle all of your Eiffel questions and happily assist you in writing high quality Eiffel code extremely fast.

That's where I have wanted this to go for about the last two years. Claude code finally showed me a pathway to get there.

You're welcome!

Reply all

Reply to author

Forward