Skip to content

Bug Raising Guide

How to raise clear, actionable bugs in GitHub Projects for OTS gyms.


Where to Raise Bugs

All bugs must be raised as issues in a GitHub Project. Each gym has its own project board.

Don't know which project?

Ask your QA Lead for the correct GitHub project for the gym you are testing. They will also tell you who the development lead is so you can assign the ticket.


Setting Up the Ticket

When creating a new issue, configure the following fields:

Field Value
Type Bug
Status Backlog
Priority P0, P1, or P2 (see Priority Definitions)
Assignee The development lead for the project (ask your QA Lead if unsure)
Label One of: App, Data, or Task & Verifier (see Bug Labels)

Your ticket fields should look like this (note: set Status to Backlog, not "Ready"):

GitHub issue fields


Required Information

Every bug must include the following.

1. Clear Title

Write a concise title that describes the problem. A good title lets someone understand the issue without opening it.

  • "Verifier passes when item is added to wrong cart category"
  • "Contact phone number shows +1 prefix twice on profile page"
  • "Send Email button unresponsive after closing compose modal"
  • "Verifier issue"
  • "Data bug"
  • "Button broken"

2. Description

Summarize the problem in a short paragraph. Explain what is happening, where it happens, and why it matters. The reader should understand the impact without looking at the steps yet.

Sample description

Task "Send email to John Smith with subject Meeting Tomorrow" passes verification even when the email is sent with the subject "meeting tomorrow" (lowercase). The assertion email_subject_matches does not perform a case-sensitive comparison, so an incorrect subject is accepted as correct. This means the verifier would not catch an agent that produces the wrong casing.

3. Steps to Reproduce

Provide 100% clear steps starting from the initial state of the app (after a state reset). Anyone reading the steps should be able to reproduce the issue without guessing.

Sample steps

  1. Go to /verify_raw and click Reset State
  2. Refresh the main app
  3. Navigate to Contacts → John Smith
  4. Click Send Email
  5. Enter subject: "meeting tomorrow" (all lowercase, intentionally wrong)
  6. Enter body: "Looking forward to our meeting."
  7. Click Send
  8. Go to /verify_raw → select task "Send email to John Smith"
  9. Run the verifier

Start from initial state

Always write steps starting from a fresh state reset. Do not assume the reader has performed any prior actions.

4. Expected Outcome vs Actual Outcome

Clearly state what should happen and what actually happens.

The assertion email_subject_matches should fail because the subject "meeting tomorrow" does not match the expected value "Meeting Tomorrow".

The assertion email_subject_matches passes. The verifier accepts "meeting tomorrow" as a match for "Meeting Tomorrow", ignoring the case difference.

5. Screenshots and Videos

  • Screenshots are highly encouraged — they help developers understand the issue faster.
  • Videos are especially useful for bugs that involve a sequence of interactions (e.g. multi-step flows, drag-and-drop, timing-dependent behavior).

Record your testing session

If you are testing a complex flow, consider recording your screen from the start. This makes it easy to attach evidence when something goes wrong.

6. Environment Details

Include the URL you tested on, the browser, and the branch (e.g. ots_dev, ots_release).

https://ots-dev.forcesales.rlgym.turing.com/ · Chrome 130 · ots_dev


Severity Definitions

P0 — Critical

Severe UI/system issues that block workflows, cause crashes, introduce security risks, or significantly degrade user experience.

Examples
  • Landing page UI issues
  • Inability to perform tasks due to UI
  • Security vulnerabilities (unauthorized access)
  • Data loss after submission
  • Broken primary features
  • Timezone inconsistencies
  • Time-based feature failures
  • App crashes
  • Dead ends/errors (404, 500, 403, etc.)
  • Severe UI overlap issues
  • Slow response (>3 seconds)
  • Local storage persistence issues
  • UI blocking user progress

P1 — High

UI issues that affect non-critical features but do not block main workflows.

Examples
  • Broken UI not affecting primary workflows

P2 — Medium/Low

Minor UI issues with low impact on usability.

Examples
  • Any other UI issues

P0 — Critical

Issues that make tasks unusable, unclear, logically incorrect, or time-sensitive to failure.

Examples
  • Duplicate tasks
  • Incorrect or illogical prompts
  • Missing required data
  • Ambiguous prompts
  • Tasks becoming invalid within a month

P1 — High

Redundancy and lack of diversity in tasks that reduce overall effectiveness.

Examples
  • Repetitive/similar tasks using same flows

P2 — Medium/Low

Minor task-related issues with limited impact on execution.

Examples
  • Any other task-related issues

P0 — Critical

Issues that make verification unreliable, incorrect, or inconsistent, leading to invalid results.

Examples
  • Verifier failing correct tasks
  • Overly generic assertions
  • Missing/incorrect assertions
  • Incorrect initial state handling (localStorage)
  • No negative assertions
  • Missing RD operators or incorrect thresholds
  • Invalid spec objects
  • Verifier instability over time
  • Inconsistent results across runs

P1 — High

Issues that reduce the strength and precision of verification logic.

Examples
  • Assertion ordering issues
  • Weak negative assertions

P2 — Medium/Low

Minor verifier issues with limited impact on overall accuracy.

Examples
  • Any other verifier issues

P0 — Critical

Critical data issues that break functionality, cause crashes, create confusion, or prevent proper usage.

Examples
  • Issues in landing page data
  • Duplicate records
  • Malformed data causing crashes
  • Images not loading
  • Data mismatch with context
  • Lack of data diversity
  • Insufficient data volume

P1 — High

Data quality issues affecting realism and freshness but not immediately breaking the system.

Examples
  • Timeliness issues
  • Unrealistic data (e.g., incorrect brand-category mapping)

P2 — Medium/Low

Lower-impact or edge-case data issues not covered in higher priorities.

Examples
  • Any other data-related issues

Bug Labels

Each bug must have exactly one of the following labels. This determines which team triages the issue.

Use for problems in the gym application itself — the UI, functionality, or behavior.

  • Broken layout, missing elements, wrong styling
  • Buttons or links that do not work
  • Features that behave differently than expected
  • Performance issues

Use for problems with the data displayed in the app — wrong values, missing records, or incorrect content.

  • Wrong phone numbers, emails, or names
  • Missing records that should be present
  • Incorrect prices, dates, or quantities
  • Data that does not match the prompt requirements

Use for problems with the task definition or the verifier logic, not the app itself.

  • An assertion fails when the task was performed correctly
  • An assertion passes when the task was performed incorrectly
  • A task is tagged with the wrong type (e.g. NRD instead of Hybrid)
  • The prompt description is ambiguous or contradicts the assertions

One Issue, One Label

Keep issues atomic

Each bug ticket should describe one problem and carry one label. Do not combine multiple unrelated issues into a single ticket.

Ticket Label
"Cart total wrong" Data
"Send button misaligned" App
"Task 5 verifier passes on wrong state" Task & Verifier
Ticket Labels
"Cart total wrong + send button misaligned + verifier fails on task 5" Data, App, Task & Verifier

If you discover multiple issues during a testing session, create a separate ticket for each one. This makes triage faster and ensures nothing gets lost.