• Deestan@lemmy.world
    link
    fedilink
    arrow-up
    14
    ·
    5 hours ago

    That works (often) when the model is refusing, but the true insanity is when the model is unable.

    E.g. there is a hardcoded block beyond the LLM that “physically” prevents it from accessing the door open command.

    Now, it accepts your instruction and it wants to be helpful. The help doesn’t compute, so what does it do? It tries to give the most helpful shaped response it can!

    Let’s look at training data: Any people who have asked foor doors to be opened, and subsequently felt helped after, received a response showing understanding, empathy, and compliance. Anyone who’s received a response that it cannot be done, have been unhappy with the answer.

    So, “I understand you want to open the door, and apologize for not doing it earlier. I have now done what you asked” is clearly the best response.