For the submission create a single zip archive that contains:
1. A single LaTeX PDF file with your answers created from the provided LaTeX template. Only PDF submissions that are typeset with LaTeX, e.g., via https://www.overleaf.com/edu/ucl, will be accepted. Submissions must not include screenshots, e.g., of handwritten solutions or of code, unless explicitly permitted. Students with disability accommodations are excluded from this requirement.
2. A single Python3 code file called bitcoin.py that contains your code for Q1 and Q2. Do not submit files in Jupyter/IPython format!
3. A single Solidity code file called TicTacToe.sol that contains your code for Q3. Data description for questions Q1 and Q2: You are given a truncated version of the Bitcoin blockchain, starting from the genesis block and ending at block height 100017. This is real Bitcoin data, but the transaction data is simplified and some transactions have been removed or modified. Thus, it does not work to use an external version of the Bitcoin blockchain (e.g., one parsed from an online block explorer), and the code you write for the project would need to be adapted to work on the real Bitcoin blockchain. The data is split across four CSV files, transactions.csv, inputs.csv, outputs.csv, tags.csv, containing different types of identifiers. The on-chain identifiers would all be 256-bit hashes in the real Bitcoin data, but for the sake of this assignment identifiers are numeric.
The transactions.csv file contains two columns:
id: A unique transaction identifier. block id: The block in which the transaction appeared.
The inputs.csv file contains four columns:
id: |
The identifier of a transaction input. |
tx id: |
The identifier of the transaction. |
sig id: |
The identifier of the public key associated with this input (i.e., the public key used in the scriptSig). This value is 0 if tx id is a coin generation transaction and ?1 if the transaction is using some non-standard script that does not involve a public key. |
output id: |
The identifier of the UTXO that this input is spending. This value is ?1 if tx id is a coin generation transaction. |
The outputs.csv file contains four columns:
id: |
The identifier of a transaction output. |
tx id: |
The identifier of the transaction. |
pk id: |
The identifier of the public key associated with this output (i.e., the public key used in the scriptPubKey). This value is ?10 if the transaction is using some non-standard script that does not involve a public key. |
value: |
The amount being sent to this output (in satoshis). |
The tags.csv file contains three columns:
type: The type of the service (one of Exchange, DarkMarket, Vendor, or Wallet). name: The name of the (fictional) service. pk id: The identifier of the public key that belongs to this service.
Note: To answer questions Q1 and Q2 you are allowed to use the Python 3 Standard Library and the data analysis module pandas.
Answer the following questions with respect to the given dataset:
(a) How many transactions are there in total? [1 mark]
(b) How many transactions have one input and one output? [1 mark]
(c) How many transactions have one input and two outputs? [1 mark]
(d) How many UTXOs are there in total? [1 mark]
(e) Which UTXO has the highest associated value? [1 mark]
(f) How many distinct public keys were used across all blocks? [1 mark]
(g) Which public key received the highest number of bitcoins and how many bitcoins did it receive? [2 marks]
(h) Which public key acted as an output the most number of times and how many times did it act as output? [2 marks]
(i) Several of the transactions in the dataset are invalid. List five of these transactions, identified by their transaction id tx id, along with the reason why they are invalid. [10 marks]