Home / Catalog / Stable Latent-State Fitting for Long Symbolic Sequences

Stable Latent-State Fitting for Long Symbolic Sequences¶

Task ID: computer_science.stable_latent_state_fitting
Domain: computer_science
Subdomain: machine_learning_sequence_models
Status: test
Tags: latent_state_models, sequence_modeling, numerical_stability, categorical_data, expectation_maximization, posterior_diagnostics

Public Summary¶

This page is generated from task metadata and selected public-safe excerpts.

Example B1 Prompt Excerpt¶

You are given long categorical observation sequences in `data/sequences.csv` and schema information in `data/metadata.json`.
The data were generated from a discrete-emission hidden Markov model with an unknown number of hidden states and `{{ n_symbols }}` possible observed symbols. Write `analysis.py` that fits this model family, estimates the latent-state count from the observations, computes stable posterior quantities for these long streams, and regenerates all outputs under `results/`.
Use scaled forward-backward recursions. The same scaling constants must be used consistently for alpha, beta, gamma, xi, and the log-likelihood. A stable convention is:
- forward: normalize each alpha row by `c_t = sum_j alpha_raw[t, j]`;
- log-likelihood: `sum_t log(c_t)` for each sequence;
- backward: divide the beta recursion at step `t` by the next forward scaling constant `c_{t+1}`;
- posterior marginals: `gamma_t(i) proportional to alpha_t(i) beta_t(i)`;
- adjacent-pair posteriors: `xi_t(i,j) proportional to alpha_t(i) A[i,j] B[j, x_{t+1}] beta_{t+1}(j)`.

Notes¶

This page is a generated site artifact.
Higher-level prompt details and internal benchmark specifics may remain intentionally undisclosed.