Curated Eiffel AI

2 views
Skip to first unread message

Liberty Lover

unread,
Dec 16, 2025, 9:44:55 PM (6 hours ago) Dec 16
to Eiffel Users
Would it be possible for me to create my own AI Model specialized to Eiffel?

It might be!

Allow me to elaborate ...

Liberty Lover

unread,
Dec 16, 2025, 9:46:41 PM (6 hours ago) Dec 16
to eiffel...@googlegroups.com
  Full Eiffel Training Data Inventory

  Available Eiffel Source Code

  | Source                                  | Files          | Est. Lines | Est. Tokens | Quality                  | Access       |
  |-----------------------------------------|----------------|------------|-------------|--------------------------|--------------|
  | Simple Eiffel                           | 796            | 90,000     | ~1.2M       | Excellent (DBC, tested)  | ✅ Local     |
  | ISE EiffelStudio library                | 3,957          | ~800,000   | ~10M        | Excellent (production)   | ✅ Local     |
  | ISE EiffelStudio contrib                | 6,274          | ~940,000   | ~12M        | Good (mixed)             | ✅ Local     |
  | https://github.com/gobo-eiffel/gobo     | ~1,500         | ~300,000   | ~4M         | Excellent (Eric Bezault) | ✅ GitHub    |
  | https://github.com/finnianr/eiffel-loop | 4,100+ classes | ~615,000   | ~8M         | Good (Finnian Reilly)    | ✅ GitHub    |
  | Reference docs                          | -              | 116,000    | ~1.5M       | High (curated)           | ✅ Local     |
  | OOSC2 examples                          | -              | ~20,000    | ~0.3M       | Canonical                | ⚠️ Scattered |
  | ECMA-367 spec                           | 1              | ~15,000    | ~0.2M       | Authoritative            | ✅ Local     |

  Total Available Training Corpus

  | Category           | Lines       | Tokens      |
  |--------------------|-------------|-------------|
  | Pure Eiffel source | ~2,750,000  | ~35M        |
  | Documentation      | ~150,000    | ~2M         |
  | Grand Total        | ~2.9M lines | ~37M tokens |

  ---
  Training Feasibility Assessment

  For Different Approaches

  | Approach                     | Min Tokens Needed | We Have | Feasibility      |
  |------------------------------|-------------------|---------|------------------|
  | Train from scratch           | 1-10 trillion     | 37M     | ❌ Not feasible  |
  | Full fine-tune (all weights) | 100M+             | 37M     | ⚠️ Marginal      |
  | LoRA fine-tune               | 1-10M             | 37M     | ✅ Excellent     |
  | QLoRA fine-tune              | 1-10M             | 37M     | ✅ Excellent     |
  | RAG + prompting              | 100K+             | 37M     | ✅ Already doing |

  Key insight: We have 37M tokens of high-quality Eiffel code. For LoRA/QLoRA fine-tuning, this is 3-30x more than needed. The corpus is excellent.

  ---
  Your Hardware Path (5070 Ti 16GB)

  Given your setup (Windows 11 + WSL2 + 5070 Ti 16GB), here's the practical path:

  What Will Work

  | Model Size                  | Method      | Seq Length | Status                    |
  |-----------------------------|-------------|------------|---------------------------|
  | 7B (Qwen2.5-Coder, Mistral) | QLoRA 4-bit | 2048       | ✅ Comfortable            |
  | 7B                          | LoRA 16-bit | 1024       | ⚠️ Tight but works        |
  | 13B                         | QLoRA 4-bit | 1024       | ⚠️ Possible, conservative |
  | 30-34B                      | Any         | Any        | ❌ Rent cloud             |

  Recommended Stack

  Base Model:     Qwen2.5-Coder-7B (best code model in 7B class)
  Method:         QLoRA 4-bit
  Seq Length:     2048 tokens (covers most Eiffel classes)
  LoRA Rank:      16 (r=16, alpha=32)
  Trainer:        LLaMA-Factory (easiest UI)
  Hardware:       Your 5070 Ti 16GB via WSL2

  ---
  The Practical Plan

  Phase 1: Dataset Creation (1-2 weeks)

  Convert your Eiffel corpus to instruction pairs:

  {"instruction": "Write a void-safe Eiffel feature that parses JSON",
   "input": "",
   "output": "<actual simple_json code>"}

  {"instruction": "Add Design by Contract to this feature",
   "input": "set_name (n: STRING) do name := n end",
   "output": "set_name (n: STRING)\n  require\n    n_not_void: n /= Void\n  do\n    name := n\n  ensure\n    name_set: name = n\n  end"}

  {"instruction": "Fix VJAR void safety error",
   "input": "x := detachable_value\nx.do_something",
   "output": "if attached detachable_value as l_x then\n  l_x.do_something\nend"}

  Target: 10,000-50,000 instruction pairs from:
  - Simple Eiffel (yours - highest quality)
  - ISE stdlib (patterns)
  - Gobo (portable patterns)
  - EiffelLoop (real-world usage)
  - Your reference docs (gotchas, patterns)

  Phase 2: First Training Run (1 day)

  On your 5070 Ti:

  # WSL2 Ubuntu
  conda activate qlora
  cd LLaMA-Factory

  # Run training (will take 4-12 hours depending on dataset size)
  python train.py \
    --model_name Qwen/Qwen2.5-Coder-7B \
    --quantization 4bit \
    --lora_rank 16 \
    --max_length 2048 \
    --batch_size 1 \
    --gradient_accumulation 16 \
    --epochs 3 \
    --dataset your_eiffel_instructions.jsonl

  Cost: $0 (your hardware)
  Time: 4-12 hours

  Phase 3: Deployment

  Export LoRA adapter → Load in Ollama → Use locally:

  # After training
  ollama create eiffel-expert -f Modelfile

  # Usage
  ollama run eiffel-expert "Write a SCOOP-compatible cache class in Eiffel"

  Phase 4: Dogfood Loop

  You + Claude + Eiffel-Expert-LoRA
                ↓
         Generate Eiffel code
                ↓
         You verify/correct
                ↓
         Corrections → New training pairs
                ↓
         Retrain monthly (on your GPU)
                ↓
         Model improves each cycle
                ↓
           (repeat)

  ---
  Cost Comparison

  | Approach               | One-Time Cost | Monthly Cost | Quality     |
  |------------------------|---------------|--------------|-------------|
  | Claude API only        | $0            | $20-100      | Good        |
  | Claude + RAG/Oracle    | $0            | $20-100      | Better      |
  | Your GPU + QLoRA       | $0            | $0           | Specialized |
  | Cloud fine-tune (A100) | $50-200       | $0           | Specialized |

  ---
  Bottom Line

  | Question                               | Answer                                    |
  |----------------------------------------|-------------------------------------------|
  | Do we have enough data?                | Yes - 37M tokens is 3-30x what LoRA needs |
  | Can you train locally?                 | Yes - 5070 Ti 16GB handles 7B QLoRA fine  |
  | Time to first model?                   | 2-3 weeks (dataset prep + training)       |
  | Cost?                                  | $0 (your hardware)                        |
  | Will it be better than generic Claude? | Yes for Eiffel-specific tasks             |

  The Eiffel ecosystem has enough code. Your GPU is capable. The only work is converting existing code to instruction format - and that can be semi-automated.

  My recommendation: Start with Simple Eiffel only (highest quality, you control it). 10K instruction pairs from your 71 libraries. Train on your GPU. If it works, expand to include ISE/Gobo/EiffelLoop.

--
You received this message because you are subscribed to the Google Groups "Eiffel Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eiffel-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/eiffel-users/a087d7c3-911e-411a-9571-312dca504079n%40googlegroups.com.

Liberty Lover

unread,
Dec 16, 2025, 9:51:35 PM (6 hours ago) Dec 16
to eiffel...@googlegroups.com
Picture this — Simple Eiffel LLM running under Ollama is accessible through either VS Code or EiffelStudio to handle all of your Eiffel questions and happily assist you in writing high quality Eiffel code extremely fast.

That's where I have wanted this to go for about the last two years. Claude code finally showed me a pathway to get there.

You're welcome!
Reply all
Reply to author
Forward
0 new messages