Stable Latent-State Fitting for Long Symbolic Sequences¶
- Task ID:
computer_science.stable_latent_state_fitting - Domain:
computer_science - Subdomain:
machine_learning_sequence_models - Status:
test - Tags:
latent_state_models,sequence_modeling,numerical_stability,categorical_data,expectation_maximization,posterior_diagnostics
Public Summary¶
This page is generated from task metadata and selected public-safe excerpts.
Example B1 Prompt Excerpt¶
You are given long categorical observation sequences in `data/sequences.csv` and schema information in `data/metadata.json`.
The data were generated from a discrete-emission hidden Markov model with an unknown number of hidden states and `{{ n_symbols }}` possible observed symbols. Write `analysis.py` that fits this model family, estimates the latent-state count from the observations, computes stable posterior quantities for these long streams, and regenerates all outputs under `results/`.
Use scaled forward-backward recursions. The same scaling constants must be used consistently for alpha, beta, gamma, xi, and the log-likelihood. A stable convention is:
- forward: normalize each alpha row by `c_t = sum_j alpha_raw[t, j]`;
- log-likelihood: `sum_t log(c_t)` for each sequence;
- backward: divide the beta recursion at step `t` by the next forward scaling constant `c_{t+1}`;
- posterior marginals: `gamma_t(i) proportional to alpha_t(i) beta_t(i)`;
- adjacent-pair posteriors: `xi_t(i,j) proportional to alpha_t(i) A[i,j] B[j, x_{t+1}] beta_{t+1}(j)`.
Notes¶
- This page is a generated site artifact.
- Higher-level prompt details and internal benchmark specifics may remain intentionally undisclosed.