ยท 9 min read ยท Wingston Sharon

Open Source AI Infrastructure in Europe: Who's Building What

---

Open Source AI Infrastructure in Europe: Who's Building What

By Wingston Sharon | February 2025


There's a tendency in European tech coverage to frame the open source AI question in terms of catching up to OpenAI or Anthropic. I think that's the wrong frame. The more interesting question is what Europe is actually building, what problems it's designed to solve, and where the genuine gaps are. Let me try to give a reasonably accurate picture of where things stand in early 2025.

Foundation Models: The Actual Landscape

Mistral AI is the most prominent European AI company, and the most discussed. Based in Paris, founded in 2023 by former Meta and DeepMind researchers, Mistral has released a sequence of models that are genuinely competitive for their size class: Mistral 7B (released September 2023, Apache 2.0), Mixtral 8x7B (December 2023, Apache 2.0), and Mistral Large (a closed commercial model). The Apache 2.0 releases are notable โ€” they allow commercial use without restriction, which is materially different from Meta's Llama licensing (which imposes usage caps at scale).

Mistral's fundraising has been substantial by European standards: a โ‚ฌ105 million seed round in June 2023, followed by a โ‚ฌ600 million Series B in June 2024 at a โ‚ฌ6 billion valuation. This makes them the most capitalised European AI company focused on foundation models.

The important context: Mistral is excellent relative to its parameter count and training budget. Mistral 7B outperforms comparably sized models from most other providers. But Mistral Large competes in the tier occupied by GPT-3.5, not GPT-4 or Claude 3 Opus. The resource gap between Mistral and the US frontier labs is roughly an order of magnitude in compute investment. That's a real constraint, not a commentary on the quality of the team.

Aleph Alpha, headquartered in Heidelberg, operates differently. Where Mistral has pursued open weights releases as a strategy, Aleph Alpha has focused on the enterprise and government sovereign AI market. Their Luminous model family has been deployed by German federal agencies and regulated industry customers who require data residency and operational control guarantees. In 2024, Aleph Alpha rebranded toward their "PHARIA" stack โ€” a broader platform framing that includes the models, an orchestration layer, and sovereignty-specific compliance tooling.

Aleph Alpha's multilingual capability has been a specific focus โ€” their models handle German, French, Spanish, Italian, and other European languages better than most equivalents trained predominantly on English data. For European enterprise use cases, this matters.

BLOOM (BigScience workshop, 2022) deserves mention because it predates the current wave and its governance model is instructive. BLOOM was a 176-billion parameter model trained through a large international collaboration organised primarily through Hugging Face, with significant French government and CNRS (National Centre for Scientific Research) involvement. The training ran on the Jean Zay supercomputer at IDRIS in France. BLOOM was notable for its multilingual scope โ€” 46 natural languages plus 13 programming languages โ€” and its genuinely open release under the RAIL licence (which attempts to prevent harmful use cases while remaining open for research and most commercial applications).

BLOOM is not a state-of-the-art model by 2025 standards โ€” the AI landscape has moved fast โ€” but it demonstrated that large-scale model training could be organised as a collaborative public research project with EU institutional infrastructure, and that the output could be made genuinely open.

Training Infrastructure: EuroHPC and the Cluster Picture

Foundation model training requires significant compute. Europe's public research compute infrastructure has expanded substantially through the EuroHPC Joint Undertaking, an EU initiative that funds shared high-performance computing systems across member states.

The flagship EuroHPC systems relevant to AI training as of early 2025:

  • LUMI (Kajaani, Finland): One of the most powerful supercomputers in Europe. Operated by CSC (Finnish IT Centre for Science) on behalf of a consortium of ten European countries. GPU partition: ~10,000 AMD Instinct MI250X GPUs. LUMI has been used for large language model training by European research groups, including work related to BLOOM follow-ons.

  • Leonardo (Bologna, Italy): Operated by CINECA. Includes a GPU-accelerated partition with NVIDIA A100 GPUs โ€” approximately 13,696 A100 GPUs in the booster partition. One of the systems with the highest peak AI performance in the EuroHPC network.

  • MareNostrum 5 (Barcelona, Spain): Operated by BSC (Barcelona Supercomputing Center). The general purpose partition includes 6,408 nodes; the AI-optimised partition uses NVIDIA H100 GPUs.

These systems are real and large. The key constraint is access: EuroHPC compute is primarily allocated to academic research projects through competitive calls. Commercial organisations can apply for access under specific schemes (there are industrial access tracks), but the process is slower and more complex than provisioning cloud compute. For a company that needs to train quickly and iterate, EuroHPC is not a drop-in substitute for cloud GPU clusters.

OVHcloud has invested in AI-specific infrastructure and offers NVIDIA H100 and A100 GPU instances from European data centres. Their AI training services are used by companies who need EU data residency for training data combined with meaningful GPU access.

The Tooling Layer: Where US Origin Dominates

This is where the sovereignty picture gets more complicated.

The dominant deep learning frameworks โ€” PyTorch (developed and open-sourced by Meta), JAX (Google) โ€” are US-origin. European AI projects, including Mistral and BLOOM, train on these frameworks. The compute substrate is predominantly NVIDIA GPUs running CUDA (also US-origin). Hugging Face's transformers library, which is central to European model development and deployment, is a US-incorporated company (though founded by French engineers and with significant European operations).

This is not a criticism โ€” these are excellent tools and open source contributions are genuinely open regardless of corporate origin. But it does mean that "European AI" at the tooling layer often means European organisations building on US-developed infrastructure. The value creation (data curation, fine-tuning, safety research, deployment) happens in Europe; the foundational layer is global.

There are European contributions to the tooling layer. The Elixir bioinformatics infrastructure has produced tools for life sciences AI. Flower (federated learning framework, originally from Oxford/Cambridge researchers) is a genuinely European contribution to privacy-preserving machine learning tooling. But the core deep learning stack is not European-origin.

What Is Genuinely Distinct About European AI Development

Despite the gaps, there are areas where European AI development has distinct characteristics that reflect genuine structural advantages rather than just regulatory constraints.

Multilingual capability. European models โ€” particularly Aleph Alpha and BLOOM โ€” have invested disproportionately in non-English language performance. This reflects the European market reality: a product that doesn't work well in German, French, Spanish, Dutch, and Polish is not a product for the European market. US frontier models are catching up on multilingual capability, but European developers have had a head start on the data curation and evaluation infrastructure for European languages.

Compliance-first design. Working under GDPR since 2018 has forced European AI developers to build data governance, consent management, and auditability features that US companies are now scrambling to retrofit. This is a real capability advantage in regulated markets globally.

EU AI Act implementation. The EU AI Act, which became applicable in stages starting August 2024, creates both a constraint and an opportunity. Companies that can demonstrate compliance with high-risk AI system requirements โ€” including documentation, transparency, and human oversight provisions โ€” have a market advantage in the growing number of procurement processes that require it. European developers have had more time to think about this.

The Honest Assessment

Mistral is an excellent company with excellent models. Aleph Alpha is solving a real problem for a real market. EuroHPC gives European researchers access to serious compute. BLOOM demonstrated that open multilingual models can be built at scale with European institutional support.

The gap relative to US frontier labs is real and large. Anthropic and OpenAI have raised multiple billions of dollars each and are training models at compute scales that current European commercial AI companies cannot match. This is not a failure โ€” it reflects different investment timelines, different market contexts, and a different relationship between the tech sector and capital markets. It does mean that European organisations buying AI capabilities for mission-critical applications face a genuine trade-off between capability and sovereignty.

That trade-off is worth understanding clearly rather than papering over with optimistic framing. The European AI ecosystem is building real things that serve real needs. It is not, in 2025, a complete substitute for US frontier models at the top of the capability curve.


If you're working on European AI deployment, open source model evaluation, or EU AI Act compliance infrastructure, I'd be glad to compare notes. Reach me at hello@agentosaurus.com.

Share: X (Twitter) LinkedIn

Build This Infrastructure?

We help AI teams build sovereign GPU clouds and autonomous systems. Free 30-minute consultation. Fixed-price projects from โ‚ฌ5K.

Schedule Free Consultation

Related Articles