Read in

Releases

Mistral 3

The EU’s largest LLM trainer, Mistral is back after another sizable hiatus, releasing 4 new models this week.

The models range in size, with 3 dense models named Ministral 3 with 3B, 8B, and 14B parameters, and then Mistral Large 3, which is a 675B MoE model with 41B active parameters, making it one of the largest frontier open source models right now, only behind Kimi K2’s one trillion parameters.

The Ministral models all come in base, instruct, and reasoning variants, which the Large model is only instruct for now, but they say they will release a reasoning variant in the future. They also all come with multimodal vision capabilities, but they are not very good at vision tasks, which is highlighted by the fact they did not even bother to release any vision benchmarks.

Ouptut tokens per intelligence

Mistral Large score

LMArena score, basically how friendly and sycophantic the model is

The general sentiment from people using these models is almost universally negative from what I have seen, with almost everyone saying that the models perform worse than any of their similarly sized counterparts across every task type.

One very glaring fault of the models is their very poor agentic performance, which is one of the main uses of LLM’s today. My guess is that Mistral is not training on synthetic tool calling and coding data (called mid-training), which really helps the models with reasoning, tool use, and agentic tasks. Because of this, I would advise you to not use any of the Mistral models at this point, since there are universally better options out there to use instead.

Based on these models, I would say the Mistral is a bit behind the curve right now, and will need to play some catch up if they want to remain relevant in the current AI ecosystem.

Poor AA score

On Artificial Analysis’s LLM benchmark, Mistral Large (600B parameters) scored similarly to Qwen3 30B, which is frankly abysmal

Kling O1

We have many image editing models and also video generation models. But there has been a surprising lack of video editing models. A surprising lack of video editing models that have been released, with Wan VACE and Runway Aleph being the only models in the video editing space.

We now have a new entrant in this sparsely populated space, with the release of Kling O1. The Kling team has somewhat quietly been improving their video generation models, with their current flagship Kling 2.5 Turbo being in the top 3 for text to video and the best for image to video according to Artificial Analysis.

Their O1 model does a good job maintaining the characteristics of the original reference video, and can fill in the gaps fairly well for anything that may be missing in the image (like removing objects from a video). For video editing, it seems to be the best model out there right now.

Prompt: Remove the train — from Nucleus on Twitter

Another example from a thread on Twitter

Comparison across multiple video editing models

Pricing is $0.17 per second of video edited (pricing gotten from Fal). The model should be available wherever you get your video generation models, including Replicate and Fal, or also directly on the Kling AI website.

Arcee Trinity Models

Arcee AI, an American lab that has had many open source contributions in the past, has released their trained from scratch Trinity family of models in partnership with Datology.

They have released 2 models, a 8 billion parameter MoE model with 1 billion active parameters called Trinity Nano, and a 26 billion parameter model with 3 billion active parameters called Trinity Mini. They also will release Trinity Large, which is a 420B parameter model with 13B active, in January.

The Nano model is only a preview release, and has no benchmarks on it that Arcee or anyone else has released. Arcee says that it will have an updated version and exit preview status when the Large model is released. Given its size and architecture I expect it to compete with the Liquid 8B MoE model we covered a while ago. It is also an instruct (non-reasoning) model which means that its response times should also be very fast.

The Mini model seems to be more polished, being a reasoning model that is competitive with the Qwen 30B MoE model.

Trinity Mini model

Teal: Arcee Mini; purple: Qwen 3 30B; blue: gpt-oss 20B; orange: Mistral Small; pink: Olmo 3 32B

The model passed my preliminary vibe check, and also Arcee has made models previous that are stronger than the benchmarks entail (because they do not benchmaxx) so expect this model to punch a bit above where the benchmarks put it.

I would recommend reading their initial report they published to learn more about the models and how they were able to train them in only 6 months. I am excited to see more from this team and will definitely be keeping an eye on them in the future to see what they release.

Rnj-1

Another American AI lab, Essential AI has released their first model, a dense 8B parameter instruct model called Rnj-1.

The company is young, but its founders have been in the AI space for a while, as their team is headed by Ashish Vaswani, who is the first author on the Attention is All You Need paper, which introduced the transformer architecture LLMs use today.

Released scores

Impressive scores, especially when taking training FLOPs into account

The model was trained with a specific focus on tool use and coding performance, which is why they benchmark it on those tasks specifically. It does well on these despite having no reinforcement learning done to it and only a small supervised finetuning pass. This either means the team is on to something big, or they are overfitting on benchmarks. I have yet to try it myself since it came out less than a day ago at the time of writing, so we will have to wait and see.

Either way, similar to the Arcee team, I am excited to see where this team goes in the future.

Runway 4.5

There is a new top text to video model, that being Runway 4.5.

AA Benchmark Scores

Scores from Artificial Analysis

The Runway team is rolling out access over the coming days/weeks on their platform, which appears to be the only way to use the model.

Some samples from the Runway Team

No real step change in terms of capabilities versus other models, just another incremental improvement in the video generation space.

Finish

I hope you enjoyed the news this week. If you want to get the news every week, be sure to join our mailing list below.

Black cat on orange background

Claude-inspired style in Midjourney — from Twitter

Mistral Is Falling Behind

Releases

Mistral 3

Kling O1

Arcee Trinity Models

Rnj-1

Runway 4.5

Finish

Releases

Mistral 3

Kling O1

Arcee Trinity Models

Rnj-1

Runway 4.5

Finish

Lançamentos

Mistral 3

Kling O1

Modelos Arcee Trinity

Rnj-1

Runway 4.5

Conclusão

Lanzamientos

Mistral 3

Kling O1

Modelos Arcee Trinity

Rnj-1

Runway 4.5

Final

Stay Updated