Anchor State Overwrite Issue

TL;DR: Anchor's #[derive(Accounts)] macro creates an in-memory copy of the account data for Account<'a, T> accounts. It modifies the memory and then writes the changes back to the account data. If two accounts are the same, serializing the second account overwrites the changes made to the first account, leading to undefined behavior.

Vulnerability

Consider the following example vulnerable implementation of a token program written using the Anchor framework. The transfer instruction is vulnerable to state-overwrite: executing a self-transfer increases the caller's balance without any deductions.

#[account]
#[derive(Debug, Default, InitSpace)]
pub struct TokenHolder {
    pub authority: Pubkey,
    pub balance: u64,
}

#[derive(Accounts)]
pub struct Transfer<'info> {
    #[account(address = sender.authority)]
    pub authority: Signer<'info>,

    pub sender: Account<'info, TokenHolder>,
    pub receiver: Account<'info, TokenHolder>,
}


#[program]
pub mod vulnerable_token {
    use super::*;

    pub fn transfer(ctx: Context<Transfer>, amount: u64) -> Result<()>{
        require_gte!(ctx.accounts.sender.balance, amount);

        ctx.accounts.sender.balance -= amount;
        ctx.accounts.receiver.balance += amount;
        Ok(())
    }
    
    [...]
}

The vulnerability arises when sender and receiver reference the same account. This issue stems from how Anchor handles deserialization and serialization of account data during instruction execution.

How Anchor Works

Anchor's expanded code for a instruction works as follows at a high-level:

  1. Construct Context
    • Anchor creates Context for the instruction using the Accounts struct. It deserializes the AccountInfo::data into T for Account<'a, T> types.
    • For e.g, if account X is passed for sender, Anchor deserializes X.data into type TokenHolder and stores it in memory as ctx.accounts.sender.
  2. Call instruction handler
    • The transfer function is called. Any changes made to ctx.accounts.sender or ctx.accounts.receiver are stored in memory.
  3. Serialize Accounts
    • After the instruction handler finishes execution, Anchor serializes the updated accounts and writes them back to AccountInfo::data. The account data is persistent and hence the state changes are saved.

Undefined behavior in the transfer Instruction

When the sender and receiver are the same accounts, the following sequence occurs:

  1. Deserialization
    • a. ctx.accounts.sender = deserialize(X.data)
    • b. ctx.accounts.receiver = deserialize(X.data)
  2. Transfer function
    • a. ctx.accounts.sender -= amount
    • b. ctx.accounts.receiver += amount
  3. Serialize accounts
    • a. X.data = serialize(ctx.accounts.sender)
    • b. X.data = serialize(ctx.accounts.receiver)

During serialization, the second write (3b) overwrites the changes made in the first write (3a). This means only the balance increase for the receiver is recorded, allowing the caller to double their balance in a self-transfer scenario.

The serialization order is determined by the field order in the Accounts struct. Since receiver is defined after the sender in the Transfer struct, the changes to receiver are preserved.

Mitigation

Ensure that no two accounts of type Account<'a, T> are the same to prevent the issue.

How I Found the issue

I identified this issue in a Sherlock contest (Issue M-3 in the report). Two key factors have helped me in identifying the bug:

  1. Experience with Similar Bugs in Solidity In Solidity, issues often arise when storage variables are copied into memory, updated and written back. Anchor uses the same pattern.
  2. Understanding Anchor's Internals Knowledge of how Anchor works internally and understanding of the root-cause of the well-known Account Reloading issue.
    • Account Reloading issue is present because Anchor keeps an in-memory copy and doesn't deserialize the account.data after a CPI: If CPI changes account.data then the program would be using out-dated in-memory copy leading to undefined behavior.

Conclusion

Understanding the internals of frameworks like Anchor helps in identifying unique vulnerabilities. Many of these issues stem from unknown or less-documented "footguns" that other researches may overlook. In this case, Anchor's in-memory deserialization model introduces bugs like the one described.