Bug Raising Guide¶

How to raise clear, actionable bugs in GitHub Projects for OTS gyms.

Where to Raise Bugs¶

All bugs must be raised as issues in a GitHub Project. Each gym has its own project board.

Don't know which project?

Ask your QA Lead for the correct GitHub project for the gym you are testing. They will also tell you who the development lead is so you can assign the ticket.

Setting Up the Ticket¶

When creating a new issue, configure the following fields:

Field	Value
Type	`Bug`
Status	`Backlog`
Priority	`P0`, `P1`, or `P2` (see Priority Definitions)
Assignee	The development lead for the project (ask your QA Lead if unsure)
Label	One of: `App`, `Data`, or `Task & Verifier` (see Bug Labels)

Your ticket fields should look like this (note: set Status to Backlog, not "Ready"):

GitHub issue fields

Required Information¶

Every bug must include the following.

1. Clear Title¶

Write a concise title that describes the problem. A good title lets someone understand the issue without opening it.

Good TitlesBad Titles

"Verifier passes when item is added to wrong cart category"
"Contact phone number shows +1 prefix twice on profile page"
"Send Email button unresponsive after closing compose modal"

"Verifier issue"
"Data bug"
"Button broken"

2. Description¶

Summarize the problem in a short paragraph. Explain what is happening, where it happens, and why it matters. The reader should understand the impact without looking at the steps yet.

Sample description

Task "Send email to John Smith with subject Meeting Tomorrow" passes verification even when the email is sent with the subject "meeting tomorrow" (lowercase). The assertion email_subject_matches does not perform a case-sensitive comparison, so an incorrect subject is accepted as correct. This means the verifier would not catch an agent that produces the wrong casing.

3. Steps to Reproduce¶

Provide 100% clear steps starting from the initial state of the app (after a state reset). Anyone reading the steps should be able to reproduce the issue without guessing.

Sample steps

Go to /verify_raw and click Reset State
Refresh the main app
Navigate to Contacts → John Smith
Click Send Email
Enter subject: "meeting tomorrow" (all lowercase, intentionally wrong)
Enter body: "Looking forward to our meeting."
Click Send
Go to /verify_raw → select task "Send email to John Smith"
Run the verifier

Start from initial state

Always write steps starting from a fresh state reset. Do not assume the reader has performed any prior actions.

4. Expected Outcome vs Actual Outcome¶

Clearly state what should happen and what actually happens.

Expected ResultActual Result

The assertion email_subject_matches should fail because the subject "meeting tomorrow" does not match the expected value "Meeting Tomorrow".

The assertion email_subject_matches passes. The verifier accepts "meeting tomorrow" as a match for "Meeting Tomorrow", ignoring the case difference.

5. Screenshots and Videos¶

Screenshots are highly encouraged — they help developers understand the issue faster.
Videos are especially useful for bugs that involve a sequence of interactions (e.g. multi-step flows, drag-and-drop, timing-dependent behavior).

Record your testing session

If you are testing a complex flow, consider recording your screen from the start. This makes it easy to attach evidence when something goes wrong.

6. Environment Details¶

Include the URL you tested on, the browser, and the branch (e.g. ots_dev, ots_release).

https://ots-dev.forcesales.rlgym.turing.com/ · Chrome 130 · ots_dev

Severity Definitions¶

UITaskVerifierData

P0 — Critical

Severe UI/system issues that block workflows, cause crashes, introduce security risks, or significantly degrade user experience.

Examples

Landing page UI issues
Inability to perform tasks due to UI
Security vulnerabilities (unauthorized access)
Data loss after submission
Broken primary features
Timezone inconsistencies
Time-based feature failures
App crashes
Dead ends/errors (404, 500, 403, etc.)
Severe UI overlap issues
Slow response (>3 seconds)
Local storage persistence issues
UI blocking user progress

P1 — High

UI issues that affect non-critical features but do not block main workflows.

Examples

Broken UI not affecting primary workflows

P2 — Medium/Low

Minor UI issues with low impact on usability.

Examples

Any other UI issues

P0 — Critical

Issues that make tasks unusable, unclear, logically incorrect, or time-sensitive to failure.

Examples

Duplicate tasks
Incorrect or illogical prompts
Missing required data
Ambiguous prompts
Tasks becoming invalid within a month

P1 — High

Redundancy and lack of diversity in tasks that reduce overall effectiveness.

Examples

Repetitive/similar tasks using same flows

P2 — Medium/Low

Minor task-related issues with limited impact on execution.

Examples

Any other task-related issues

P0 — Critical

Issues that make verification unreliable, incorrect, or inconsistent, leading to invalid results.

Examples

Verifier failing correct tasks
Overly generic assertions
Missing/incorrect assertions
Incorrect initial state handling (localStorage)
No negative assertions
Missing RD operators or incorrect thresholds
Invalid spec objects
Verifier instability over time
Inconsistent results across runs

P1 — High

Issues that reduce the strength and precision of verification logic.

Examples

Assertion ordering issues
Weak negative assertions

P2 — Medium/Low

Minor verifier issues with limited impact on overall accuracy.

Examples

Any other verifier issues

P0 — Critical

Critical data issues that break functionality, cause crashes, create confusion, or prevent proper usage.

Examples

Issues in landing page data
Duplicate records
Malformed data causing crashes
Images not loading
Data mismatch with context
Lack of data diversity
Insufficient data volume

P1 — High

Data quality issues affecting realism and freshness but not immediately breaking the system.

Examples

Timeliness issues
Unrealistic data (e.g., incorrect brand-category mapping)

P2 — Medium/Low

Lower-impact or edge-case data issues not covered in higher priorities.

Examples

Any other data-related issues

Bug Labels¶

Each bug must have exactly one of the following labels. This determines which team triages the issue.

AppDataTask & Verifier

Use for problems in the gym application itself — the UI, functionality, or behavior.

Broken layout, missing elements, wrong styling
Buttons or links that do not work
Features that behave differently than expected
Performance issues

Use for problems with the data displayed in the app — wrong values, missing records, or incorrect content.

Wrong phone numbers, emails, or names
Missing records that should be present
Incorrect prices, dates, or quantities
Data that does not match the prompt requirements

Use for problems with the task definition or the verifier logic, not the app itself.

An assertion fails when the task was performed correctly
An assertion passes when the task was performed incorrectly
A task is tagged with the wrong type (e.g. NRD instead of Hybrid)
The prompt description is ambiguous or contradicts the assertions

One Issue, One Label¶

Keep issues atomic

Each bug ticket should describe one problem and carry one label. Do not combine multiple unrelated issues into a single ticket.

GoodBad

Ticket	Label
"Cart total wrong"	`Data`
"Send button misaligned"	`App`
"Task 5 verifier passes on wrong state"	`Task & Verifier`

Ticket	Labels
"Cart total wrong + send button misaligned + verifier fails on task 5"	`Data`, `App`, `Task & Verifier`

If you discover multiple issues during a testing session, create a separate ticket for each one. This makes triage faster and ensures nothing gets lost.