What are your favorite open-source AI models? Help shape the future of local AI in Nextcloud!

,

I have no support/technical question and have seen the support category. (Be aware that direct support questions will be deleted.)

on

Which general topic do you have

Hi everyone, we’d like to gauge everyone’s interest here regarding our llm2 app which, in case you haven’t heard, enables the use self-hosted/local LLMs in Nextcloud. Currently, we use Llama 3.1 (8B) as the default LLM for this app, but with so many recent developments in the AI space, we’d like to know if anyone now has a preference for something else. Regardless of whether or not you currently run llm2, your input would help us ensure that the app continues to satisfy the needs of the self-hosting community.

To vote, just select any number of models below that you would approve of as the default model of choice for llm2. If I missed any options, feel free to post it in this thread and I will add it to the poll. We don’t have a strict deadline for this poll yet, but we plan to keep this open for at least a week. Thanks!

  • Llama
  • GPT-OSS
  • Mistral
  • Qwen
  • Deepseek
  • Olmo
  • Granite
0 voters

gemma is a good option too

Hmm, it seems I can’t edit the poll anymore, but I’ll keep Gemma in mind too.

To anyone else who approves of Gemma, please like destripador’s post in addition to your normal votes.

Sharing observations from running LocalAI alongside the Nextcloud Assistant + Context Agent stack on consumer hardware (RTX 3090, 24GB VRAM, shared with an Immich photo library).

Setup: Nextcloud AIO with context_agent, context_chat, and integration_openai pointed at a self-hosted LocalAI instance. Approximately 86 tools enabled across the integrations (a significant context-budget consideration on its own — more on that below).

What’s working well for me: Qwen3-30B-A3B-Instruct-2507 (Q4_K_M, 48K context). MoE architecture means 3B active parameters per token, so inference speed is comparable to a 3B dense model despite 30B total weights. Tool-calling format (ChatML with <tool_call> JSON blocks) parses cleanly through LocalAI’s existing tool-call parser without custom regex. Comfortable VRAM headroom alongside Immich. This has been my daily driver and handles all of context_agent’s task types including the tool-heavy ones.

Gemma 3 27B QAT (Q4_0). Better conversational tone for drafting and free-prompt tasks than Qwen3-30B. No native tool-calling support that LocalAI’s parser recognizes, so not viable as a Context Agent backend. Useful as a routed alternative for content tasks (Free Prompt, Summarize, Headline, Reformulation) where tool execution isn’t needed.

Hermes-3-Llama-3.2-3B. Fast slot for short-response tasks where latency matters more than depth. CPU-runnable for the llm2 sidecar.

What I tried and would caution others about (at least for now): Gemma 4 26B-A4B (Mudler’s APEX quant). Loaded after a llama.cpp backend update added gemma4 architecture support. Inference works fine, but the model emits tool calls in its native format (<|tool_call>call:NAME{key:value}<tool_call|> with <|"|> string delimiters), which LocalAI’s existing tool-call parsers don’t recognize. End result: raw model tokens leak through to the user instead of executed tool calls. The format is documented in the model’s chat template, so a custom regex parser is theoretically possible, but I haven’t found existing LocalAI support for it.

For now I’m routing tool tasks to Qwen and content tasks to Gemma 4 instead. Reasoning-distilled models (Qwen3.5-27B-Claude-Distilled, Qwen3.6-35B-distilled). The trace overhead made these unsuitable for routine Assistant tasks even when the underlying answer quality was high. They’re better reserved for explicit “deep think” usage, not as defaults. Mistral-Small-3.2-24B-Instruct-2506.

Works well technically but the prose style felt clipped/cold for general-purpose family use. Subjective preference, not a capability gap. A few practical notes that might help others: LocalAI’s context_size YAML key only takes effect at the top level of the model config — placed under parameters: it’s silently ignored. Cost me a few hours.

The Context Agent ships ~11K tokens of tool definitions per request when all integrations are enabled (Cookbook, Forms, OpenStreetMap, Weather, YouTube, LibreSign, Tables, Analytics, etc.). Pruning unused tool integrations from Nextcloud’s AI settings significantly reduces prompt overhead and noise. Worth doing regardless of model choice. For tool-calling workloads specifically, sticking with models whose tool-call format matches an existing LocalAI parser (Mistral’s [TOOL_CALLS], ChatML’s <tool_call>, Hermes’ bracketed syntax) saves a lot of integration pain. Qwen3-30B’s ChatML format is the cleanest fit I’ve found.