Sovereign RAG Platform

Vue d’ensemble du système

Vue globale de l’orchestration entre le cluster k3s, la couche applicative Spring Boot, l’inférence locale Ollama et le stockage souverain Synology NFS.

Compute Cluster k3s pour l’exécution des services applicatifs et LLM.

Stockage Volumes persistants NFS pour documents, modèles et données PostgreSQL.

Souveraineté Aucune dépendance obligatoire à une API LLM externe.

graph TD subgraph K3S [Cluster k3s - Infrastructure AI] direction TB API{{Spring Boot API}}:::app PG[(PostgreSQL + pgvector)]:::service OL[Ollama Inference]:::service end subgraph NAS [Synology - Sovereign Data Control] N1[(NFS: Postgres Data)]:::storage N2[(NFS: LLM Models)]:::storage N3[(NFS: Documents PDF)]:::storage end API <--> PG API <--> OL PG -.-> N1 OL -.-> N2 API -.-> N3 classDef app fill:#f37021,stroke:#fff,color:#fff,font-weight:bold; classDef service fill:#333,stroke:#f37021,color:#eee; classDef storage fill:#1a1a1a,stroke:#58a6ff,color:#58a6ff; style K3S fill:#0d1117,stroke:#30363d style NAS fill:#0d1117,stroke:#30363d

Roadmap globale — Plan 7 phases

Le flux ci-dessous donne la progression logique du projet, de l’infrastructure jusqu’à l’évaluation et l’industrialisation.

graph LR P1(Phase 1: Infra):::done --> P2(Phase 2: Socle) P2 --> P3(Phase 3: Ingestion) P3 --> P4(Phase 4: RAG) P4 --> P5(Phase 5: Sécurité) P5 --> P6(Phase 6: Eval) P6 --> P7(Phase 7: Final) click P1 "#phase1" "Aller à la Phase 1" click P2 "#phase2" "Aller à la Phase 2" click P3 "#phase3" "Aller à la Phase 3" click P4 "#phase4" "Aller à la Phase 4" click P5 "#phase5" "Aller à la Phase 5" click P6 "#phase6" "Aller à la Phase 6" click P7 "#phase7" "Aller à la Phase 7" classDef done stroke:#f37021,stroke-width:2px,color:#f37021; classDef default fill:#161b22,stroke:#30363d,color:#8b949e;

État actuel : Phase 1.1 — Configuration du stockage NFS

Phase 1 — Infrastructure & Stockage NFS

EN PRODUCTION

Mapping complet entre Windows 11, Synology DSM, le cluster k3s et les volumes persistants NFSv4.1.

graph TD WIN["Windows 11 - Docker Desktop"]:::client PUSH["Docker push vers Registry 30500"]:::flow DNS["rag.ouertani.fr - Wildcard SSL"]:::dns subgraph NAS["Synology DS920+ - 192.168.1.30"] direction TB RP["Reverse Proxy DSM - HTTPS vers HTTP"]:::syno NFS["Service NFS - NFSv4.1"]:::syno subgraph STORAGE["Stockage physique"] D1["/volume1/rag-postgres"]:::folder D2["/volume1/rag-ollama"]:::folder D3["/volume1/rag-documents"]:::folder end NFS --> D1 NFS --> D2 NFS --> D3 end subgraph K3S["VM k3s Ubuntu - 192.168.1.50"] direction TB NS["Namespace sovereign-rag"]:::namespace subgraph WORKLOADS["Workloads / Pods"] ING["Pod Nginx Ingress Controller"]:::pod PG["Pod PostgreSQL + pgvector"]:::pod OL["Pod Ollama - mistral / gemma2 / nomic-embed-text"]:::pod REG["Pod Registry 30500"]:::pod API["Pod Spring API - Phase 2"]:::pod end subgraph PVC["Volumes Persistants - PVC"] PVC1["pvc-postgres"]:::k8s PVC2["pvc-ollama"]:::k8s PVC3["pvc-documents"]:::k8s end NS --> ING NS --> PG NS --> OL NS --> REG NS --> API PG -.-> PVC1 OL -.-> PVC2 API -.-> PVC3 end WIN --> PUSH PUSH --> REG DNS --> RP RP --> ING PVC1 === D1 PVC2 === D2 PVC3 === D3 classDef client fill:#161b22,stroke:#58a6ff,color:#e6edf3; classDef flow fill:#1b2330,stroke:#f37021,color:#f37021,font-weight:bold; classDef dns fill:#1a1a1a,stroke:#58a6ff,color:#58a6ff,font-weight:bold; classDef syno fill:#1a1a1a,stroke:#58a6ff,stroke-width:2px,color:#58a6ff; classDef folder fill:#0d1117,stroke:#58a6ff,color:#eee; classDef namespace fill:#1d2a1d,stroke:#2ea043,color:#7ee787,font-weight:bold; classDef k8s fill:#161b22,stroke:#f37021,color:#f37021; classDef pod fill:#f37021,stroke:#fff,color:#fff,font-weight:bold; style NAS fill:#0d1117,stroke:#30363d style K3S fill:#0d1117,stroke:#30363d style STORAGE fill:#0d1117,stroke:#30363d style WORKLOADS fill:#0d1117,stroke:#30363d style PVC fill:#0d1117,stroke:#30363d

Note technique : les partages NFS Synology sont exposés via NFSv4.1 avec l’option No mapping, ce qui correspond au comportement attendu pour laisser les conteneurs gérer correctement les permissions UID/GID sur les volumes persistants.

Phase 2 — Socle Applicatif & CI/CD

Flux de build Spring Boot, publication dans une registry privée et déploiement sur k3s.

graph LR subgraph DEV [Poste de Développement] CODE[Code: Spring Boot]:::code DOCKER[Docker Build]:::code end subgraph K3S [Cluster k3s - Runtime] direction TB REG[Registry Docker Privée :30500]:::registry APP[Pod: Spring AI API]:::app SVC_OL[Service: Ollama]:::service SVC_DB[Service: PostgreSQL]:::service ING[Ingress: Nginx]:::service end CODE --> DOCKER DOCKER -- Docker Push --> REG REG -- Image Pull --> APP APP -- Spring AI --> SVC_OL APP -- JPA / JDBC --> SVC_DB ING --> APP classDef code fill:#333,stroke:#ccc,color:#eee; classDef registry fill:#1a1a1a,stroke:#58a6ff,stroke-width:2px,color:#58a6ff; classDef app fill:#f37021,stroke:#fff,color:#fff,font-weight:bold; classDef service fill:#161b22,stroke:#f37021,color:#f37021; style DEV fill:#0d1117,stroke:#30363d style K3S fill:#0d1117,stroke:#30363d

Composants clés : Spring AI, images conteneurisées, registry privée, exposition via Traefik ou un Ingress Controller équivalent.

Phase 3 — Ingestion & Inférence Locale

Transformation des documents PDF en fragments vectoriels exploitables pour la recherche sémantique.

graph TD PDF[Document PDF / NFS]:::folder --> EXT[Extraction: PDFBox]:::app subgraph APP [Application Spring Boot - Pipeline ETL] EXT --> CHUNK{Chunking Strategy}:::logic subgraph STRAT [Stratégies de Découpage] S1[Fixed Size]:::strat S2[Sliding Window]:::strat S3[Paragraph]:::strat end CHUNK -.-> S1 CHUNK -.-> S2 CHUNK -.-> S3 S2 --> EMB[Spring AI: Embedding Request]:::app end EMB -- Local API Call --> OLLAMA[Ollama: nomic-embed-text]:::service OLLAMA -- Vector --> EMB EMB --> PG[(PostgreSQL + pgvector)]:::db classDef folder fill:#1a1a1a,stroke:#58a6ff,color:#58a6ff; classDef app fill:#f37021,stroke:#fff,color:#fff,font-weight:bold; classDef logic fill:#333,stroke:#f37021,color:#f37021; classDef strat fill:#161b22,stroke:#ccc,color:#eee,font-style:italic; classDef service fill:#161b22,stroke:#f37021,color:#f37021; classDef db fill:#0d1117,stroke:#f37021,stroke-width:2px,color:#eee; style APP fill:#0d1117,stroke:#30363d

Processus : extraction, découpage, génération d’embeddings, puis stockage vectoriel dans pgvector.

Phase 4 — RAG & Anti-Hallucination

Pipeline complet : de la question utilisateur jusqu’à la réponse sourcée.

graph TD USER([Utilisateur: Pose une question]):::user --> Q_EMB[Embedding de la Question]:::app subgraph RETRIEVAL [Etape 4.1: Retrieval Service] Q_EMB -- Vector Search --> PG[(pgvector)]:::db PG -- Top-K Chunks + Scores --> EVAL{Seuil de Similarité}:::logic end subgraph GUARD [Etape 4.2: Anti-Hallucination] EVAL -- Score trop faible --> REFUS[Refus: Information non trouvée]:::hallu EVAL -- Score suffisant --> PROMPT[Construction du Prompt Enrichi]:::app end subgraph GEN [Etape 4.3: Generation] PROMPT -- Prompt + Contexte --> LLM[Ollama: Mistral / Llama 3]:::service LLM -- Réponse Générée --> FINAL[Réponse Finale + Sources]:::app end FINAL --> USER classDef user fill:#1a1a1a,stroke:#ccc,color:#eee; classDef app fill:#f37021,stroke:#fff,color:#fff,font-weight:bold; classDef db fill:#0d1117,stroke:#58a6ff,color:#58a6ff; classDef logic fill:#333,stroke:#f37021,color:#f37021; classDef hallu fill:#800,stroke:#f00,color:#fff; classDef service fill:#161b22,stroke:#f37021,color:#f37021; style RETRIEVAL fill:#0d1117,stroke:#30363d,stroke-dasharray: 5 5 style GUARD fill:#0d1117,stroke:#30363d,stroke-dasharray: 5 5 style GEN fill:#0d1117,stroke:#30363d,stroke-dasharray: 5 5

Garde-fou : si la similarité est insuffisante, le système refuse proprement plutôt que d’inventer une réponse.

Phase 5 — Sécurité & Traçabilité

Contrôle d’accès par rôles, filtrage de confidentialité et historisation des requêtes.

graph TD USER([Utilisateur / Admin]) --> AUTH{Spring Security}:::logic subgraph SECU [Etape 5.2: Contrôle d'Accès] AUTH -- ROLE_USER --> FILTER[Filtre de Confidentialité]:::logic AUTH -- ROLE_ADMIN --> ADMIN[Accès complet / Audit API]:::logic FILTER -- User ID + Classification --> SEARCH[Recherche filtrée dans pgvector]:::app end subgraph AUDIT_PROC [Etape 5.1: Pipeline d'Audit] SEARCH --> INTERCEPT[Intercepteur de Requête]:::app INTERCEPT -- Log Auto --> TABLE_A[(Table: chat_request)]:::db INTERCEPT -- Log Sources --> TABLE_S[(Table: chat_source)]:::db end SEARCH -.-> RAG_ENGINE[Moteur RAG]:::service classDef app fill:#f37021,stroke:#fff,color:#fff,font-weight:bold; classDef db fill:#0d1117,stroke:#58a6ff,color:#58a6ff; classDef logic fill:#333,stroke:#f37021,color:#f37021; classDef service fill:#161b22,stroke:#f37021,color:#f37021; style SECU fill:#0d1117,stroke:#30363d,stroke-dasharray: 5 5 style AUDIT_PROC fill:#0d1117,stroke:#30363d,stroke-dasharray: 5 5

Phase 6 — Évaluation & Qualité du RAG

Mesure de la précision, de la latence et de la fidélité.

graph TD subgraph DATA [Etape 6.1: Ground Truth] GT[Jeu de données JSON: 20 paires Q/A]:::strat PDF_TEST[PDFs de référence]:::folder PDF_TEST --> GT end subgraph EVAL [Etape 6.2: Moteur d'Évaluation] RUN[POST /eval/run]:::logic subgraph TESTS [Configurations Comparées] C1[Mistral vs Llama3]:::test C2[Chunk: Fixed vs Sliding]:::test C3[Top-K: 3 vs 5]:::test end GT --> RUN TESTS --> RUN end subgraph METRICS [Indicateurs de Performance] M1[Context Recall]:::metric M2[Answer Relevance]:::metric M3[Latence ms]:::metric M4[Taux de Refus]:::metric end RUN --> M1 RUN --> M2 RUN --> M3 RUN --> M4 classDef folder fill:#1a1a1a,stroke:#58a6ff,color:#58a6ff; classDef strat fill:#161b22,stroke:#ccc,color:#eee,font-style:italic; classDef logic fill:#f37021,stroke:#fff,color:#fff,font-weight:bold; classDef test fill:#333,stroke:#f37021,color:#f37021; classDef metric fill:#0d1117,stroke:#58a6ff,stroke-width:2px,color:#eee; style DATA fill:#0d1117,stroke:#30363d,stroke-dasharray: 5 5 style EVAL fill:#0d1117,stroke:#30363d,stroke-dasharray: 5 5

Objectif : comparer scientifiquement les stratégies de chunking, les modèles et les paramètres de retrieval.

Phase 7 — Finalisation & Industrialisation

Dernière étape : packaging final, observabilité, sauvegardes, supervision, durcissement de la sécurité et documentation d’exploitation.

À prévoir : monitoring, alerting, stratégie de backup, rotation des logs, tests de restauration, versioning des modèles et documentation runbook.