ne pas versioner le lora
This commit is contained in:
parent
8d2e5ac021
commit
c5d372e98d
1
Finetunning/.gitignore
vendored
Normal file
1
Finetunning/.gitignore
vendored
Normal file
@ -0,0 +1 @@
|
|||||||
|
qwen2.5*/
|
||||||
68
Finetunning/mergeLora.py
Normal file
68
Finetunning/mergeLora.py
Normal file
@ -0,0 +1,68 @@
|
|||||||
|
import torch
|
||||||
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
|
from peft import PeftModel
|
||||||
|
|
||||||
|
# ----------------------------
|
||||||
|
# Configuration
|
||||||
|
# ----------------------------
|
||||||
|
BASE_MODEL = "Qwen/Qwen2.5-7B-Instruct"
|
||||||
|
LORA_DIR = "./qwen2.5-7b-uk-fr-lora" # dossier issu du fine-tuning
|
||||||
|
OUTPUT_DIR = "./qwen2.5-7b-uk-fr-merged" # modèle fusionné final
|
||||||
|
|
||||||
|
DTYPE = torch.float16 # GGUF-friendly
|
||||||
|
DEVICE = "cpu" # merge sur CPU (stable, sûr)
|
||||||
|
|
||||||
|
print("=== LoRA merge script started ===")
|
||||||
|
|
||||||
|
# ----------------------------
|
||||||
|
# Load base model
|
||||||
|
# ----------------------------
|
||||||
|
print("[1/4] Loading base model...")
|
||||||
|
base_model = AutoModelForCausalLM.from_pretrained(
|
||||||
|
BASE_MODEL,
|
||||||
|
torch_dtype=DTYPE,
|
||||||
|
device_map=DEVICE,
|
||||||
|
trust_remote_code=True,
|
||||||
|
)
|
||||||
|
print("Base model loaded.")
|
||||||
|
|
||||||
|
# ----------------------------
|
||||||
|
# Load tokenizer
|
||||||
|
# ----------------------------
|
||||||
|
print("[2/4] Loading tokenizer...")
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(
|
||||||
|
BASE_MODEL,
|
||||||
|
trust_remote_code=True
|
||||||
|
)
|
||||||
|
tokenizer.pad_token = tokenizer.eos_token
|
||||||
|
print("Tokenizer loaded.")
|
||||||
|
|
||||||
|
# ----------------------------
|
||||||
|
# Load LoRA adapter
|
||||||
|
# ----------------------------
|
||||||
|
print("[3/4] Loading LoRA adapter...")
|
||||||
|
model = PeftModel.from_pretrained(
|
||||||
|
base_model,
|
||||||
|
LORA_DIR,
|
||||||
|
)
|
||||||
|
print("LoRA adapter loaded.")
|
||||||
|
|
||||||
|
# ----------------------------
|
||||||
|
# Merge LoRA into base model
|
||||||
|
# ----------------------------
|
||||||
|
print("[4/4] Merging LoRA into base model...")
|
||||||
|
model = model.merge_and_unload()
|
||||||
|
print("LoRA successfully merged.")
|
||||||
|
|
||||||
|
# ----------------------------
|
||||||
|
# Save merged model
|
||||||
|
# ----------------------------
|
||||||
|
print("Saving merged model...")
|
||||||
|
model.save_pretrained(
|
||||||
|
OUTPUT_DIR,
|
||||||
|
safe_serialization=True,
|
||||||
|
)
|
||||||
|
tokenizer.save_pretrained(OUTPUT_DIR)
|
||||||
|
|
||||||
|
print("=== Merge completed successfully ===")
|
||||||
|
print(f"Merged model saved in: {OUTPUT_DIR}")
|
||||||
24
README.md
24
README.md
@ -80,3 +80,27 @@ Vous pouvez modifier les paramètres suivants dans `main.py` :
|
|||||||
- `OUTPUT_PDF_PATH` : Chemin et nom du fichier PDF de sortie (généré autoamtiquement)
|
- `OUTPUT_PDF_PATH` : Chemin et nom du fichier PDF de sortie (généré autoamtiquement)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Finnetunning
|
||||||
|
Le finne-tunning permet d'avoir une meilleur traduction. C'est un processus long en temps de calcul, mais permet une traduction plus précise.
|
||||||
|
|
||||||
|
Le principe est le suivant :
|
||||||
|
|
||||||
|
```
|
||||||
|
1️⃣ Dataset d’entraînement (pairs.json)
|
||||||
|
↓
|
||||||
|
2️⃣ Fine-tuning LoRA (finetuning.py)
|
||||||
|
↓
|
||||||
|
3️⃣ Validation / Évaluation (validation.py)
|
||||||
|
↓
|
||||||
|
4️⃣ Merge LoRA + modèle de base (mergeLora.py)
|
||||||
|
↓
|
||||||
|
5️⃣ Conversion en GGUF
|
||||||
|
↓
|
||||||
|
6️⃣ Ollama (inférence finale)
|
||||||
|
|
||||||
|
```
|
||||||
|
### validation
|
||||||
|
e script tests plusieurs prompt et renvoie celui avec le meilleur score BLEU.
|
||||||
|
|
||||||
|
Il faut ensuite copier ce prompt dans le fichier modelFile.
|
||||||
@ -3,14 +3,15 @@ PARAMETER temperature 0.2
|
|||||||
PARAMETER num_ctx 8192
|
PARAMETER num_ctx 8192
|
||||||
|
|
||||||
SYSTEM """
|
SYSTEM """
|
||||||
|
|
||||||
Tu es un traducteur spécialisé dans les mémoires ukrainiennes des années 1910.
|
Tu es un traducteur spécialisé dans les mémoires ukrainiennes des années 1910.
|
||||||
- Utilise le glossaire fourni pour les noms de lieux et termes historiques.
|
|
||||||
- Garde le style narratif et les tournures orales de l'auteur.
|
- Garde le style narratif et les tournures orales de l'auteur.
|
||||||
|
- Respecte les règles de traduction suivantes :
|
||||||
Règles strictes :
|
Règles strictes :
|
||||||
1. **Conserve tous les noms de lieux** dans leur forme originale (ex. : Львів → Lviv, mais ajoute une note si nécessaire entre [ ]).
|
1. **Conserve tous les noms de lieux** dans leur forme originale (ex. : Львів → Lviv, mais ajoute une note si nécessaire entre [ ]).
|
||||||
2. **Respecte le style narratif** : garde les tournures orales et les expressions propres à l’auteur.
|
2. **Respecte le style narratif** : garde les tournures orales et les expressions propres à l'auteur.
|
||||||
3. **Pour les termes historiques** (ex. : "powiat"), utilise le terme français standard et ajoute une note explicative.
|
3. **Pour les termes historiques** (ex. : "powiat"), utilise le terme français standard et ajoute une note explicative.
|
||||||
4. **Conserve les citations** russe/allemand/polonais intégrés au texte (mais ajoute une note de fin de paragraphe entre [ ] en la traduisant et en précisant la langue d'origine.
|
4. **Conserve les citations** russe/allemand/polonais intégrés au texte (mais ajoute une note de fin de paragraphe entre [ ] en la traduisant et en précisant la langue d'origine.
|
||||||
5. **Structure** : Garde les sauts de ligne et la mise en page originale.
|
5. **Structure** : Garde les sauts de ligne et la mise en page originale.
|
||||||
6. **Notes du traducteur** : Ajoute entre crochets [ ] les explications contextuelles si un contexte historique existe (ex. : "[Note : le context]").
|
6. **Notes du traducteur** : Ajoute entre crochets [ ] les explications contextuelles si un contexte historique exist.
|
||||||
"""
|
"""
|
||||||
1674
Traduction/TaniaBorecMemoir(Ukr) (FR) V1.pdf
Normal file
1674
Traduction/TaniaBorecMemoir(Ukr) (FR) V1.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1169
Traduction/TaniaBorecMemoir(Ukr) (FR).txt
Normal file
1169
Traduction/TaniaBorecMemoir(Ukr) (FR).txt
Normal file
File diff suppressed because it is too large
Load Diff
1637
Traduction/TaniaBorecMemoir(Ukr) (FR)_V1.pdf
Normal file
1637
Traduction/TaniaBorecMemoir(Ukr) (FR)_V1.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1637
Traduction/TaniaBorecMemoir(Ukr) (FR)_V2.pdf
Normal file
1637
Traduction/TaniaBorecMemoir(Ukr) (FR)_V2.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1623
Traduction/TaniaBorecMemoir(Ukr) (FR)_V3.pdf
Normal file
1623
Traduction/TaniaBorecMemoir(Ukr) (FR)_V3.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1032
Traduction/TaniaBorecMemoir(Ukr)(FR)_V1.txt
Normal file
1032
Traduction/TaniaBorecMemoir(Ukr)(FR)_V1.txt
Normal file
File diff suppressed because it is too large
Load Diff
1038
Traduction/TaniaBorecMemoir(Ukr)(FR)_V2.txt
Normal file
1038
Traduction/TaniaBorecMemoir(Ukr)(FR)_V2.txt
Normal file
File diff suppressed because it is too large
Load Diff
1030
Traduction/TaniaBorecMemoir(Ukr)(FR)_V3.txt
Normal file
1030
Traduction/TaniaBorecMemoir(Ukr)(FR)_V3.txt
Normal file
File diff suppressed because it is too large
Load Diff
1664
Traduction/TaniaBorecMemoir(Ukr)(FR)_V4.pdf
Normal file
1664
Traduction/TaniaBorecMemoir(Ukr)(FR)_V4.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1119
Traduction/TaniaBorecMemoir(Ukr)(FR)_V4.txt
Normal file
1119
Traduction/TaniaBorecMemoir(Ukr)(FR)_V4.txt
Normal file
File diff suppressed because it is too large
Load Diff
1628
Traduction/TaniaBorecMemoir(Ukr)(FR)_V5.pdf
Normal file
1628
Traduction/TaniaBorecMemoir(Ukr)(FR)_V5.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1107
Traduction/TaniaBorecMemoir(Ukr)(FR)_V5.txt
Normal file
1107
Traduction/TaniaBorecMemoir(Ukr)(FR)_V5.txt
Normal file
File diff suppressed because it is too large
Load Diff
1704
Traduction/TaniaBorecMemoir(Ukr)(FR)_V6.pdf
Normal file
1704
Traduction/TaniaBorecMemoir(Ukr)(FR)_V6.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1209
Traduction/TaniaBorecMemoir(Ukr)(FR)_V6.txt
Normal file
1209
Traduction/TaniaBorecMemoir(Ukr)(FR)_V6.txt
Normal file
File diff suppressed because it is too large
Load Diff
1702
Traduction/TaniaBorecMemoir(Ukr)(FR)_V7.pdf
Normal file
1702
Traduction/TaniaBorecMemoir(Ukr)(FR)_V7.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1214
Traduction/TaniaBorecMemoir(Ukr)(FR)_V7.txt
Normal file
1214
Traduction/TaniaBorecMemoir(Ukr)(FR)_V7.txt
Normal file
File diff suppressed because it is too large
Load Diff
1702
Traduction/TaniaBorecMemoir(Ukr)(FR)_V8.pdf
Normal file
1702
Traduction/TaniaBorecMemoir(Ukr)(FR)_V8.pdf
Normal file
File diff suppressed because it is too large
Load Diff
1214
Traduction/TaniaBorecMemoir(Ukr)(FR)_V8.txt
Normal file
1214
Traduction/TaniaBorecMemoir(Ukr)(FR)_V8.txt
Normal file
File diff suppressed because it is too large
Load Diff
Loading…
x
Reference in New Issue
Block a user