How to use
The bundestag
cli
A tool to assist with the data processing.
For an overview over commands run
bundestag --help
To get preprocessed data simply run
bundestag download huggingface
To download data from abgeordnetenwatch, for a specific legislature id
bundestag download abgeordnetenwatch 132
To transform the downloaded data run
bundestag transform abgeordnetenwatch 132
To find out the legislature id for the current Bundestag, visit abgeordnetenwatch.de and click on the "Open Data" button at the bottom of the page.
To download data from bundestag.de
bundestag download bundestag-sheets --do-create-xlsx-uris-json
Note that this is using selenium and is therefore starting a browser. Currently this is not using a headless browser so it is easy to see when the scraping should be broken.
To transform the downloaded data run
bundestag transform bundestag-sheet --sheet-source=json_file
Note: If you run the extraction for the legislature with id 132, i.e.
bundestag transform abgeordnetenwatch 132
the data is damaged for some reason. To fix it run
uv run python scripts/fix_empty_fraction.py
download
commands will store artefacts in ./data/raw
and transform
commands will transform that data and store artefacts in ./data/preprocessed
.
The get_xlsx_uris
cli
Pre-processing cli for bundestag
cli.
uv run get_xlsx_uris run --help
Module for collecting and storing XLSX URIs from Bundestag data sources. Also done with
bundestag download bundestag_sheet --do-create-xlsx-uris-json