RLHF_API_KEY from your environment. After buying Pro, you receive this key via email. For local testing, set RLHF_API_KEY in your .env file.
Direct Preference Optimization is a technique for fine-tuning LLMs using human preference data. Your 👍/👎 feedback is converted into training pairs:
Use these pairs to fine-tune any model (OpenAI, Llama, Mistral) so it actually learns from your corrections — not just blocks mistakes, but stops making them.