Commit graph

13 commits

Author SHA1 Message Date
buzz-lightsnack-2007
39516acf9a add: output results for various models and datasets 2024-12-07 23:45:28 +08:00
buzz-lightsnack-2007
a028413832 update answer format prompt for clarity and brevity 2024-12-07 23:45:12 +08:00
buzz-lightsnack-2007
27e6dd6653 add: implement results analysis and grading functionality 2024-12-07 23:43:42 +08:00
buzz-lightsnack-2007
9fd769e5b9 add: testing results
These are large files and will need further processing.
2024-12-07 21:39:16 +08:00
buzz-lightsnack-2007
f5c6380b77 add: testing program
This script contains prompt generation and LLM testing.
2024-12-07 21:37:30 +08:00
buzz-lightsnack-2007
c9efe75b77 update prompts to make clear the usage of chain-of-thought 2024-12-05 23:52:41 +08:00
buzz-lightsnack-2007
1c03a7eeaf add outputs for descriptions tests 2024-12-04 13:40:09 +08:00
buzz-lightsnack-2007
b5970cac26 move models and prompts to testing config folder 2024-12-04 13:08:55 +08:00
buzz-lightsnack-2007
3b94f13adc add testing for description 2024-12-04 13:03:02 +08:00
buzz-lightsnack-2007
05b9abe3a6 force LLM to output the results properly 2024-08-30 23:32:27 +08:00
buzz-lightsnack-2007
ab77659f14 make "answer format" prompt more specific 2024-08-25 12:50:17 +00:00
buzz-lightsnack-2007
67d11cd9cc add answer format prompt 2024-08-24 06:11:48 +00:00
buzz-lightsnack-2007
a0b15a48de add LLM prompts 2024-08-24 05:38:52 +00:00