Small experiment with LLMR

Overview

Comparative studies ask whether responses change across models, conditions, and repeated draws. This example covers three model configurations and two task conditions, runs the factorial design in parallel, and compares unstructured with schema-structured results.

For broader designs, repetitions, call_llm_compare(), call_llm_sweep(), llm_par_resume(), llm_failures(), and llm_log_enable() support repeated runs, direct comparisons, parameter sweeps, interrupted-run recovery, diagnostics, and call logging. Those facilities are outside this example. Live execution requires DEEPSEEK_API_KEY, GROQ_API_KEY, and LLMR_RUN_VIGNETTES=true in the R environment.

library(LLMR)
library(dplyr)
cfg_ds     <- llm_config("deepseek", "deepseek-chat")
cfg_groq1  <- llm_config("groq",     "llama-3.1-8b-instant")
cfg_groq   <- llm_config("groq",     "openai/gpt-oss-20b")

Build a factorial design

experiments <- build_factorial_experiments(
  configs       = list(cfg_ds, cfg_groq1, cfg_groq),
  user_prompts  = c("Summarize in one sentence: The Apollo program.",
                    "List two benefits of green tea."),
  system_prompts = c("Be concise.")
)
experiments

Run unstructured

setup_llm_parallel(workers = 10)
res_unstructured <- call_llm_par(experiments, progress = TRUE)
reset_llm_parallel()
res_unstructured |>
  select(provider, model, user_prompt_label, response_text, finish_reason) |>
  head()

Understanding the results:

The finish_reason column shows why each response ended:

"stop": normal completion
"length": hit token limit (increase max_tokens)
"filter": content filter triggered

The user_prompt_label helps track which experimental condition produced each response.

Structured version

schema <- list(
  type = "object",
  properties = list(
    answer = list(type="string"),
    keywords = list(type="array", items = list(type="string"))
  ),
  required = list("answer","keywords"),
  additionalProperties = FALSE
)

experiments2 <- experiments
experiments2$config <- lapply(experiments2$config, enable_structured_output, schema = schema)

setup_llm_parallel(workers = 10)
res_structured <- call_llm_par_structured(experiments2 , .fields = c("answer","keywords") )
reset_llm_parallel()

res_structured |>
  select(provider, model, user_prompt_label, structured_ok, answer) |>
  head()