📊 Full opportunity report: AMÁLIA · The Three Hard Questions. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Portugal’s AMÁLIA language model is now operational, outperforming many benchmarks. However, key questions about its openness, native data, and objectives remain unanswered, highlighting broader issues in European sovereign-LLM development.
Portugal’s €5.5 million investment in the AMÁLIA language model has resulted in a functional system that outperforms previous open models on European Portuguese benchmarks, but critical questions about its openness, native data, and strategic goals remain unanswered, raising concerns about the broader European sovereign-LLM effort.
The AMÁLIA project, involving around 60 researchers from Portugal’s top research institutions, was announced in December 2024 and released its base version in September 2025. It is a continuation of the EuroLLM multilingual foundation, not trained from scratch, and currently accessible to 450,000 academic users across Portugal.
Technical evaluation shows AMÁLIA surpasses most open models on European Portuguese tasks and outperforms Qwen 3-8B on many benchmarks, though it still trails Qwen on some specific tests like ALBA, its primary benchmark for European Portuguese. The model knowledge is current up to the end of 2023, with a final version expected by June 2026.
Critics, notably Duarte O.Carmo, have raised questions about the model’s openness, the sufficiency of native-language data, and the strategic goals guiding its development, emphasizing that these are structural questions relevant to all European sovereign-LLM projects.
AMÁLIA
The three hard
questions.
Portugal spent €5.5M to build a European Portuguese LLM. The base version is operational, the benchmarks beat Qwen 3-8B on most pt-PT tasks. So why are the most important questions still unanswered?
Last month, Duarte O.Carmo published the sharpest public analysis of AMÁLIA — Portugal’s state-funded European Portuguese large language model. He prefaces his critique with the necessary diplomatic apparatus before doing what almost nobody else in the European-sovereign-LLM discourse has been willing to do publicly: asking hard questions about whether the work, as released, actually does what it set out to do. This piece is a structural extension of his analysis. The AMÁLIA case study exposes three hard questions every national LLM effort needs to answer publicly — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
Three questions every national LLM effort needs to answer publicly.
Duarte O.Carmo’s framing maps cleanly onto the structural argument. Each question lands specifically in AMÁLIA — and the broader European sovereign-LLM movement has been operating without explicit answers to any of them.
The three questions form a structural feedback loop. Q3 (optimization target) determines Q2 (data volume needed) which conditions Q1 (openness sufficient for community contribution). The European sovereign-LLM movement collectively benefits from these questions becoming standard methodology disclosure, not exceptional critique.
European Portuguese language learning AI tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
107 billion tokens. 5.8 billion clearly pt-PT.
The structurally tractable question with a structurally surprising answer. For a model whose entire stated purpose is European Portuguese prioritization, the native-language share of extended pre-training is 5.5%. The implications cascade into every other question.
AI model evaluation software
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
The Olmo standard. AMÁLIA’s current state.
Allen Institute for AI’s Olmo project defines what “fully open” operationally requires. Olmo doesn’t lead frontier benchmarks. That’s not the point. The point is to be the structural reference for openness. AMÁLIA’s “fully open source” claim should track to the operational standard.
large language model development kits
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Four strategic positions. AMÁLIA between two and three.
Approximately €100M+ in publicly disclosed European sovereign-LLM funding across the major initiatives. The structural question every project faces: what is the actual competitive position you’re staking? Four options — none mutually exclusive — but each requiring different commitments.
AI research data annotation tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards. For AMÁLIA and the movement.
The structural critique generalizes beyond AMÁLIA. Italy, France, Germany, Switzerland, the OpenEuroLLM consortium, and every subsequent national project benefit from public discourse holding national LLM efforts to operational standards on openness, data accounting, and strategic positioning.
The European sovereign-AI agenda is a serious strategic project that deserves serious public discourse. O.Carmo’s analysis is what serious public discourse looks like. Appropriately diplomatic. Structurally rigorous. Willing to ask the hard questions in public when the public investment justifies it. More of this is needed — across every European sovereign-LLM project, not just AMÁLIA.
Implications for European Sovereign-LLM Strategies
The development of AMÁLIA highlights critical issues facing European countries pursuing independent large language models, including transparency about data, openness of models, and alignment with national priorities. These questions impact policy, research integrity, and the future of AI sovereignty across Europe.
As European nations invest heavily in similar projects, the answers to these questions will determine whether these models can truly serve national interests or remain limited by technical and strategic constraints. The ongoing debate influences funding, regulation, and international competitiveness in AI.
European Sovereign-Language Model Initiatives and Challenges
Across Europe, countries like Italy, Germany, France, and Norway are investing in sovereign-language models, with efforts ranging from scratch training to foundation-based approaches. The common challenge is balancing openness, native data utilization, and strategic goals amid limited resources and data access.
Portugal’s AMÁLIA exemplifies a strategic choice to build on existing multilingual foundations, contrasting with Italy’s from-scratch approach with Minerva. These decisions reflect broader debates about technical feasibility and national AI sovereignty.
“The three hard questions about AMÁLIA are about openness, native data, and strategic goals, and they expose broader issues across European sovereign-LLM efforts.”
— Duarte O.Carmo
Unanswered Questions About AMÁLIA’s Strategic and Technical Aspects
It remains unclear how open AMÁLIA truly is, especially regarding the accessibility of its underlying code and data. The sufficiency of native Portuguese data for long-term performance and strategic goals beyond benchmark results are still under debate. The final version’s capabilities and strategic alignment are also uncertain until its June 2026 release.
Upcoming Milestones and Evaluation of AMÁLIA’s Impact
The final version of AMÁLIA is expected by June 2026, which will provide a clearer picture of its capabilities and strategic positioning. Further evaluations, transparency disclosures, and policy discussions are anticipated in the coming months, shaping the future of European sovereign-LLMs.
Researchers and policymakers will closely monitor whether the model addresses current gaps and how it influences broader European efforts to develop independent AI systems.
Key Questions
What is the main purpose of AMÁLIA?
AMÁLIA aims to develop a high-performing European Portuguese language model to support academic, governmental, and industrial applications within Portugal and Europe.
How open is AMÁLIA really?
It is not yet clear how accessible the model’s code and data are, and whether it qualifies as ‘fully open’ under European standards. Transparency details are still emerging.
Why are questions about native data important?
Native data quality and quantity directly impact the model’s performance and cultural relevance. Insufficient native data can limit the model’s effectiveness in Portuguese contexts.
What are the broader implications for Europe?
The questions raised about AMÁLIA reflect larger issues faced by European countries in building independent AI systems, including transparency, data sovereignty, and strategic objectives.
Source: ThorstenMeyerAI.com