You have 1 DAI.
Using a wallet's UI (like Metamask), you click enough buttons and fill enough text inputs to say that you're sending 1 DAI to 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
(that's vitalik.eth).
And hit send.
After some time the wallet says the transaction's been confirmed. All of sudden, Vitalik is now 1 DAI richer. WTF just happened?
Let's rewind. And replay in slow motion.
Ready?
Index
Building the transaction
Wallets are pieces of software that facilitate sending transactions to the Ethereum network.
A transaction is just a way to tell the Ethereum network that you, as a user, want to execute an action. In this case that'd be sending 1 DAI to Vitalik. And a wallet (e.g., Metamask) helps build such transaction in a relatively beginner-friendly way.
Let's first go over the transaction that a wallet would build. It can be represented as an object with fields and their corresponding values.
Ours will start looking like this:
{
"to": "0x6b175474e89094c44da98b954eedeac495271d0f",
// [...]
}
Where the field to states the target address. In this case, 0x6b175474e89094c44da98b954eedeac495271d0f
is the address of the DAI smart contract.
Wait, what?
Weren't we supposed to be sending 1 DAI to Vitalik ? Shouldn't to
be Vitalik's address?
Well, no. To send DAI, one must essentially craft a transaction that executes a piece of code stored in the blockchain (the fancy name for Ethereum's database) that will update the recorded balances of DAI. Both the logic and related storage to execute such update is held in an immutable and public computer program stored in Ethereum's database. The DAI smart contract.
Hence, you want to build a transaction that tells the contract "hey buddy, update your internal balances, taking 1 DAI out of my balance, and adding 1 DAI to Vitalik's balance". In Ethereum jargon, the phrase "hey buddy" translates to setting DAI's address in the to
field of the transaction.
However, the to
field is not enough. From the information you provide in your favorite wallet's UI, the wallet fills up several other fields to build a well-formatted transaction.
{
"to": "0x6b175474e89094c44da98b954eedeac495271d0f",
"amount": 0,
"chainId": 31337,
"nonce": 0,
// [...]
}
It fills the amount
field with a 0. So you're sending 1 DAI to Vitalik, and you neither use Vitalik's address nor put a 1
in the amount
field. That's how tough life is (and we're just warming up). The amount
field is actually included in a transaction to specify how much ETH (the native currency of Ethereum) you're sending along your transaction. Since you don't want to send ETH right now, then the wallet would correctly set that field to 0.
As of the chainId
, it is a field that specifies the chain where the transaction is to be executed. For Ethereum Mainnet, that is 1. However, since I will be running this experiment on a local copy of mainnet, I will use its chain ID: 31337. Other chains have other identifiers.
What about the nonce
field ? That's a number that should be increased every time you send a transaction to the network. It acts a defense mechanism to avoid replaying issues. Wallets usually set it for you. To do so, they query the network asking what's the latest nonce your account used, and then set the current transaction's nonce accordingly. In the example above it's set to 0, though in reality it will depend on the number of transactions your account has executed.
I just said that the wallet "queries the network". What I mean is that the wallet executes a read-only call to an Ethereum node, and the node answers with the requested data. There are multiple ways to read data from an Ethereum node, depending on the node's location, and what kind of APIs it exposes.
Let's imagine the wallet has direct network access to an Ethereum node. More commonly, wallets interact with third-party providers (like Infura, Alchemy, QuickNode and many others). Requests to interact with the node follow a special protocol to execute remote calls. Such protocol is called JSON-RPC.
A request for a wallet that is attempting to fetch an account's nonce would resemble something like this:
POST / HTTP/1.1
connection: keep-alive
Content-Type: application/json
content-length: 124
{
"jsonrpc":"2.0",
"method":"eth_getTransactionCount",
"params":["0x6fC27A75d76d8563840691DDE7a947d7f3F179ba","latest"],
"id":6
}
---
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 42
{"jsonrpc":"2.0","id":6,"result":"0x0"}
Where 0x6fC27A75d76d8563840691DDE7a947d7f3F179ba
would be the sender's account. From the response you can see that its nonce is 0.
Wallets fetch data using network requests (in this case, via HTTP) to hit JSON-RPC endpoints exposed by nodes. Above I included just one, but in practice a wallet can query whatever data they need to build a transaction. Don't be surprised if in real-life cases you notice more network requests to lookup other stuff. For instance, following is a snippet of Metamask traffic hitting a local test node in a couple of minutes:
The transaction's data field
DAI is a smart contract. Its main logic is implemented at address 0x6b175474e89094c44da98b954eedeac495271d0f
in Ethereum mainnet.
More specifically, DAI is an ERC20-compliant fungible token - quite a special type of contract. This means that at least DAI should implement the interface detailed in the ERC20 specification. In (somewhat stretched) web2 jargon, DAI is an immutable open-source web service running on Ethereum. Given it follows the ERC20 spec, it's possible to know in advance (without necessarily looking at the source code) the exact exposed endpoints to interact with it.
Short side note: not all ERC20 tokens are created equal. Implementing a certain interface (which facilitates interactions and integrations) certainly does not guarantee behavior. Still, for this exercise we can safely assume that DAI is quite a standard ERC20 token in terms of behavior.
There are a bunch of functions in the DAI smart contract (source code available here), many of them directly taken from the ERC20 spec. Of particular interest is the external transfer
function.
contract Dai is LibNote {
...
function transfer(address dst, uint wad) external returns (bool) {
...
}
}
This function allows anyone holding DAI tokens to transfer some of them to another Ethereum account. Its signature is transfer(address,uint256)
. Where the first parameter is the address of the receiver account, and the second an unsigned integer representing the amount of tokens to be transferred.
For now let's not focus on the specifics of the function's behavior. Just trust me when I tell you that in its happy path, the function reduces the sender's balance by the passed amount, and then increases the receiver's accordingly.
This is important because when building a transaction to interact with a smart contract, one should know which function of the contract is to be executed. And what parameters are to be passed. It's like if in web2 you wanted to send a POST request to a web API. You'd most likely need to specify the exact URL with its parameters in the request. This is the same. We want to transfer 1 DAI, so we must know how to specify in a transaction that it is supposed to execute the transfer
function on the DAI smart contract.
Luckily, it's SO straightforward and intuitive.
Joking. It's not. This is what you must include in your transaction to send 1 DAI to Vitalik (remember, address 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
):
{
// [...]
"data": "0xa9059cbb000000000000000000000000d8da6bf26964af9d7eed9e03e53415d37aa960450000000000000000000000000000000000000000000000000de0b6b3a7640000"
}
Let me explain.
Aiming to ease integrations and have a standardize way to interact with smart contracts, the Ethereum ecosystem has (kind of) settled into adopting what's called the "Contract ABI specification" (ABI stands for Application Binary Interface). In common use cases, and I stress, IN COMMON USE CASES, in order to execute a smart contract function you must first encode the call following the Contract ABI specification. More advanced use cases may not follow this spec, but we're definitely not going down that rabbit hole. Suffice to say that regular smart contracts programmed with Solidity, such as DAI, usually follow the Contract ABI spec.
What you can see above are the resulting bytes of ABI-encoding a call to transfer 1 DAI to address 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
with DAI's transfer(address,uint256)
function.
There are a number of tools out there to ABI-encode stuff, and in some way or another most wallets are implementing ABI-encoding to interact with contracts. For the sake of the example, we can just verify that the above sequence of bytes is correct using a command-line utility called cast, which is able to ABI-encode the call with the specific arguments:
$ cast calldata "transfer(address,uint256)" 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045 1000000000000000000
0xa9059cbb000000000000000000000000d8da6bf26964af9d7eed9e03e53415d37aa960450000000000000000000000000000000000000000000000000de0b6b3a7640000
Anything bugging you ? What's wrong ?
Ooooh, sorry, yeah. That 1000000000000000000. Honestly I would really love to have a more robust argument for you here. The thing is: lots of ERC20 tokens are represented with 18 decimals. Such as DAI. Yet we can only use unsigned integers. So 1 DAI is actually stored as 1 * 10^18 - which is 1000000000000000000. Deal with it.
We have a beautiful ABI-encoded sequence of bytes to be included in the data
field of the transaction. Which by now looks like:
{
"to": "0x6b175474e89094c44da98b954eedeac495271d0f",
"amount": 0,
"chainId": 31337,
"nonce": 0,
"data": "0xa9059cbb000000000000000000000000d8da6bf26964af9d7eed9e03e53415d37aa960450000000000000000000000000000000000000000000000000de0b6b3a7640000"
}
We will revisit the contents of this data
field once we get to the actual execution of the transaction.
Gas wizardry
Next step is deciding how much to pay for the transaction. Because remember that all transactions must pay a fee to network of nodes that takes the time and resources to execute and validate them.
The cost of executing a transaction is paid in ETH. And the final amount of ETH will depend on how much net gas your transaction consumes (that is, how computationally expensive it is), how much you're willing to pay for each gas unit spent, and how much the network is willing to accept at a minimum.
From a user perspective, bottomline usually is that the more one pays, the faster transactions are included. So if you want to pay Vitalik 1 DAI in the next block, you'll probably need to set a higher fee than if you're willing to wait a couple of minutes (or longer, sometimes way longer), until gas is cheaper.
Different wallets may take different approaches to deciding how much to pay for gas. I'm not aware of a single bullet-proof mechanism used by everyone. Strategies to determine the right fees may involve querying gas-related information from nodes (such as the minimum base fee accepted by the network).
For example, in the following requests you can see the Metamask browser extension sending a request to a local test node for gas fee data when building a transaction:
And the simplified request-response look like:
POST / HTTP/1.1
Content-Type: application/json
Content-Length: 99
{
"id":3951089899794639,
"jsonrpc":"2.0",
"method":"eth_feeHistory",
"params":["0x1","0x1",[10,20,30]]
}
---
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 190
{
"jsonrpc":"2.0",
"id":3951089899794639,
"result":{
"oldestBlock":"0x1",
"baseFeePerGas":["0x342770c0","0x2da4d8cd"],
"gasUsedRatio":[0.0007],
"reward":[["0x59682f00","0x59682f00","0x59682f00"]]
}
}
The eth_feeHistory
endpoint is exposed by some nodes to allow querying transaction fee data. If you're curious, read here or play with it here, or see the spec here.
Popular wallets also use more sophisticated off-chain services to fetch gas price estimations and suggest sensible values to their users. Here's one example of a wallet hitting a public endpoint of a web service, and receiving a bunch of useful gas-related data:
Take a look at a snippet of the response:
Cool, right?
Anyway, hopefully you're getting familiar with the idea that setting the gas fee prices is not straightforward, and it is a fundamental step for building a successful transaction. Even if all you want to do is send 1 DAI. Here is an interesting introductory guide to dig deeper into some of the mechanisms involved to set more accurate fees in transactions.
After some initial context, let's go back to the actual transaction now. There are three gas-related fields that need to be set:
{
"maxPriorityFeePerGas": ...,
"maxFeePerGas": ...,
"gasLimit": ...,
}
Wallets will use some of the mentioned mechanisms to fill the first two fields for you. Interestingly, whenever a wallet UI lets you choose between some version of "slow", "regular" or "fast" transactions, it's actually trying to decide on what values are the most appropriate for those exact parameters. Now you can better understand the contents of the JSON-formatted response received by a wallet that I showed you a couple of paragraphs back.
To determine the third field's value, the gas limit, there's a handy mechanism that wallets may use to simulate a transaction before it is really submitted. It allows them to closely estimate how much gas a transaction would consume, and therefore set a reasonable gas limit. On top of providing you with an estimate on the final USD cost of the transaction.
Why not just set an absurdly large gas limit ? To defend your funds, of course. Smart contracts may have arbitrary logic, you being the one paying for its execution. By choosing a sensible gas limit right from the start in your transaction, you protect yourself against ugly scenarios that may drain all your account's ETH funds in gas fees.
Gas estimations can be done using a node's endpoint called eth_estimateGas
. Before sending 1 DAI, a wallet can leverage this mechanism to simulate your transaction, and determine what's the right gas limit for your DAI transfer. This is what a request-response from a wallet might look like.
POST / HTTP/1.1
Content-Type: application/json
{
"id":2697097754525,
"jsonrpc":"2.0",
"method":"eth_estimateGas",
"params":[
{
"from":"0x6fC27A75d76d8563840691DDE7a947d7f3F179ba",
"value":"0x0",
"data":"0xa9059cbb000000000000000000000000d8da6bf26964af9d7eed9e03e53415d37aa960450000000000000000000000000000000000000000000000000de0b6b3a7640000",
"to":"0x6b175474e89094c44da98b954eedeac495271d0f"
}
]
}
---
HTTP/1.1 200 OK
Content-Type: application/json
{"jsonrpc":"2.0","id":2697097754525,"result":"0x8792"}
In the response you can see that the transfer would take approximately 34706 gas units.
Let's incorporate this information to the transaction payload:
{
"to": "0x6b175474e89094c44da98b954eedeac495271d0f",
"amount": 0,
"chainId": 31337,
"nonce": 0,
"data": "0xa9059cbb000000000000000000000000d8da6bf26964af9d7eed9e03e53415d37aa960450000000000000000000000000000000000000000000000000de0b6b3a7640000",
"maxPriorityFeePerGas": 2000000000,
"maxFeePerGas": 120000000000,
"gasLimit": 40000
}
Remember that the maxPriorityFeePerGas
and maxFeePerGas
will ultimately depend on the network conditions at the moment of sending the transaction. Above I'm just setting somewhat arbitrary values for the sake of this example. As of the value set for the gas limit, I just incremented the estimate a bit to fall on the safe side.
Access list and transaction type
Let's briefly comment on two additional fields that are set in your transaction.
First, the accessList
field. Advanced use cases or edge scenarios may require the transaction to specify in advance the account addresses and contracts' storage slots to be accessed, thus making it somewhat cheaper.
However, it may not be straightforward to build such list in advance, and currently the gas savings may not be not so significant. Particularly for simple transactions like sending 1 DAI. Therefore, we can just set it to an empty list. Although remember that it does exist for a reason, and it may become more relevant in the future.
Second, the transaction type. It is specified in the type
field. The type is an indicator of what's inside the transaction. Our will be a type 2 transaction - because its following the format specified here.
{
"to": "0x6b175474e89094c44da98b954eedeac495271d0f",
"amount": 0,
"chainId": 31337,
"nonce": 0,
"data": "0xa9059cbb000000000000000000000000d8da6bf26964af9d7eed9e03e53415d37aa960450000000000000000000000000000000000000000000000000de0b6b3a7640000",
"maxPriorityFeePerGas": 2000000000,
"maxFeePerGas": 120000000000,
"gasLimit": 40000,
"accessList": [],
"type": 2
}
Signing the transaction
How can nodes know that it is your account, and not somebody else's, who is sending a transaction ?
We've come to the critical step of building a valid transaction: signing it.
Once a wallet has collected enough information to build the transaction, and you hit SEND, it will digitally sign your transaction. How ? Using your account's private key (that your wallet has access to), and a cryptographic algorithm involving curvy shapes called ECDSA.
For the curious, what's actually being signed is the keccak256
hash of the concatenation between the transaction's type and the RLP encoded content of the transaction.
keccak256(0x02 || rlp([chainId, nonce, maxPriorityFeePerGas, maxFeePerGas, gasLimit, to, amount, data, accessList]))
You shouldn't be so knowledgeable in cryptography to understand this though. Put simply, this process seals the transaction. It makes it tamper-proof by putting a smart-ass stamp on it that only your private key could have produced. And from now on anyone with access to that signed transaction (for example, Ethereum nodes) can cryptographically verify that it was your account that produced it.
Just in case: signing is not encrypting. Your transactions are always in plaintext. Once they go public, anyone can make sense out of their contents.
The process of signing the transaction produces, no surprise, a signature. In practice: a bunch of weird unreadable values. These travel along the transaction, and you'll usually find them referred to as v
, r
and s
. If you want to dig deeper on what these actually represent, and their importance to recover your account's address, the Internet is your friend.
You can get a better idea on what signing looks like when implemented by checking out the @ethereumjs/tx package. Also using the ethers package for some utilities. As an extremely simplified example, signing the transaction to send 1 DAI could look like this:
const { FeeMarketEIP1559Transaction } = require("@ethereumjs/tx");
const txData = {
to: "0x6b175474e89094c44da98b954eedeac495271d0f",
amount: 0,
chainId: 31337,
nonce: 0,
data: "0xa9059cbb000000000000000000000000d8da6bf26964af9d7eed9e03e53415d37aa960450000000000000000000000000000000000000000000000000de0b6b3a7640000",
maxPriorityFeePerGas: ethers.utils.parseUnits('2', 'gwei').toNumber(),
maxFeePerGas: ethers.utils.parseUnits('120', 'gwei').toNumber(),
gasLimit: 40000,
accessList: [],
type: 2,
};
const tx = FeeMarketEIP1559Transaction.fromTxData(txData);
const signedTx = tx.sign(Buffer.from(process.env.PRIVATE_KEY, 'hex'));
console.log(signedTx.v.toString('hex'));
// 1
console.log(signedTx.r.toString('hex'));
// 57d733933b12238a2aeb0069b67c6bc58ca8eb6827547274b3bcf4efdad620a
console.log(signedTx.s.toString('hex'));
// e49937ec81db89ce70ebec5e51b839c0949234d8aad8f8b55a877bd78cc293