API-referentie¶

De publieke API van eencijferho is beschikbaar via directe import:

from eencijferho import run_turbo_convert_pipeline
from eencijferho import process_txt_folder, write_variable_metadata

Alle publieke functies staan in __all__.

Pipeline¶

`eencijferho.core.pipeline.run_turbo_convert_pipeline(storage, input_dir='data/01-input', dec_metadata_json=None, output_dir='data/02-output', metadata_dir=None, progress_callback=None, status_callback=None, output_config=None)` ¶

Run the full turbo-convert pipeline.

Parameters:

Name	Type	Description	Default
`input_dir`	`str`	Folder containing input fixed-width / CSV files.	`'data/01-input'`
`dec_metadata_json`	`str \| None`	Path to the DEC bestandsbeschrijving JSON. When None it is derived automatically from metadata_dir.	`None`
`output_dir`	`str`	Folder where converted files are written.	`'data/02-output'`
`metadata_dir`	`str \| None`	Folder used for metadata (JSON, Excel, logs). When None it defaults to `data/00-metadata` (legacy behaviour).	`None`
`progress_callback`	`Callable[[int], None] \| None`	Optional callable(int) for progress 0-100.	`None`
`status_callback`	`Callable[[str], None] \| None`	Optional callable(str) for status messages.	`None`
`output_config`	`OutputConfig \| None`	Controls which output variants are produced. When None the defaults from :class:`OutputConfig` are used (decoded + enriched + parquet + encrypt + snake_case).	`None`

Extractor¶

`eencijferho.core.extractor.process_txt_folder(storage, input_folder, json_output_folder='data/00-metadata/json')` ¶

Finds all .txt files containing 'Bestandsbeschrijving' in the root directory and extracts tables from them. Also processes all .asc files in the root directory.

Parameters:

Name	Type	Description	Default
`input_folder`	`str`	Folder to search for .txt and .asc files.	required
`json_output_folder`	`str`	Output folder for JSON files. Defaults to 'data/00-metadata/json'.	`'data/00-metadata/json'`

Returns:

Type	Description
`list[str]`	list[str]: Paths to all extracted JSON files.

Edge Cases

Removes any existing JSON files in the output folder.
Handles missing input folder gracefully.

Example

process_txt_folder('data/01-input')

`eencijferho.core.extractor.write_variable_metadata(storage, input_dir='data/01-input', json_folder='data/00-metadata/json', output_filename='variable_metadata.json')` ¶

Scans all Bestandsbeschrijving*.txt files (recursively) from input_dir and writes a consolidated variable metadata JSON.

Parameters:

Name	Type	Description	Default
`input_dir`	`str`	Folder to search for input text files. Defaults to 'data/01-input'.	`'data/01-input'`
`json_folder`	`str`	Output folder for JSON file. Defaults to 'data/00-metadata/json'.	`'data/00-metadata/json'`
`output_filename`	`str`	Name for output file. Defaults to 'variable_metadata.json'.	`'variable_metadata.json'`

Returns:

Type	Description
`None`	None

Edge Cases

Handles missing input files or parser errors gracefully.
Avoids duplicate variables by name across all files.

Example

write_variable_metadata()

`eencijferho.core.extractor.process_json_folder(storage, json_input_folder='data/00-metadata/json', excel_output_folder='data/00-metadata')` ¶

Processes all JSON files in a folder, converting tables to Excel files.

Parameters:

Name	Type	Description	Default
`json_input_folder`	`str`	Folder containing JSON files. Defaults to 'data/00-metadata/json'.	`'data/00-metadata/json'`
`excel_output_folder`	`str`	Output folder for Excel files. Defaults to 'data/00-metadata'.	`'data/00-metadata'`

Returns:

Type	Description
`None`	None

Edge Cases

Removes any existing Excel files in the output folder.
Handles missing or empty input folder gracefully.

Example

process_json_folder('data/00-metadata/json', 'data/00-metadata')

`eencijferho.core.extractor.get_fwf_params(txt_file, table_index=0)` ¶

Extract field names and column specs from a DUO bestandsbeschrijving .txt file.

Returns a dict that can be passed directly to pandas.read_fwf() as keyword arguments, so no manual parsing of the fixed-width metadata is needed::

import pandas as pd
from eencijferho import get_fwf_params

params = get_fwf_params("Bestandsbeschrijving_1cyferho_2023_v1.1.txt")
df = pd.read_fwf("1cyferho_2023.asc", encoding="latin-1", header=None, **params)

The colspecs values follow the pandas convention: 0-based, half-open intervals [start, end). Startpositie from DUO is 1-based, so start = Startpositie - 1 and end = Startpositie - 1 + Aantal posities.

Parameters:

Name	Type	Description	Default
`txt_file`	`str`	Path to the DUO bestandsbeschrijving `.txt` (or `.asc`) file.	required
`table_index`	`int`	Which table to use when the file contains multiple tables. Defaults to 0 (the first / main table).	`0`

Returns:

Type	Description
`dict`	`{"names": list[str], "colspecs": list[tuple[int, int]]}`

Raises:

Type	Description
`ValueError`	When the file cannot be parsed, no tables are found, or `table_index` is out of range.

Example::

>>> params = get_fwf_params("Bestandsbeschrijving_1cyferho_2023_v1.1.txt")
>>> params["names"][:3]
['Onderwijstype HO', 'BRIN-nummer', 'Opleidingscode (CROHO)']
>>> params["colspecs"][:3]
[(0, 3), (3, 7), (7, 12)]

`eencijferho.core.extractor.list_fwf_tables(txt_file)` ¶

List the table names found in a DUO bestandsbeschrijving .txt file.

Useful for discovering which tables are available before calling :func:get_fwf_params with a specific table_index::

>>> list_fwf_tables("Bestandsbeschrijving_1cyferho_2023_v1.1.txt")
['1CijferHO 2023']

Parameters:

Name	Type	Description	Default
`txt_file`	`str`	Path to the DUO bestandsbeschrijving `.txt` (or `.asc`) file.	required

Returns:

Type	Description
`list[str]`	List of table titles, in order. Returns an empty list when the file
`list[str]`	cannot be read or contains no tables.

Validatie¶

`eencijferho.utils.extractor_validation.validate_metadata_folder(storage, metadata_folder='data/00-metadata', return_dict=False)` ¶

Validates all Excel files in a metadata folder and returns a summary of results.

Parameters:

Name	Type	Description	Default
`metadata_folder`	`str`	Path to the folder containing Excel metadata files. Defaults to 'data/00-metadata'.	`'data/00-metadata'`
`return_dict`	`bool`	If True, returns a dictionary with validation results for each file. If False, returns None.	`False`

Returns:

Type	Description
`dict[str, dict[str, Any]] \| None`	dict[str, dict[str, Any]] \| None: Dictionary mapping file names to their validation results if return_dict is True, otherwise None.

Edge Cases

If no Excel files are found, prints a warning and returns empty dict or None.
Handles and logs errors for individual files without stopping the batch process.
Saves both timestamped and latest logs in the log folder.

Example

results = validate_metadata_folder('data/00-metadata', return_dict=True) for fname, res in results.items(): ... print(fname, res)

`eencijferho.utils.converter_validation.converter_validation(storage, conversion_log_path='data/00-metadata/logs/(5)_conversion_log_latest.json', matching_log_path='data/00-metadata/logs/(4)_file_matching_log_latest.json', output_log_path='data/00-metadata/logs/(6)_conversion_validation_log_latest.json')` ¶

Validates that row counts in the matching log match the total lines in the conversion log for each processed file.

Parameters:

Name	Type	Description	Default
`conversion_log_path`	`str`	Path to the conversion log JSON file. Defaults to the latest log.	`'data/00-metadata/logs/(5)_conversion_log_latest.json'`
`matching_log_path`	`str`	Path to the file matching log JSON file. Defaults to the latest log.	`'data/00-metadata/logs/(4)_file_matching_log_latest.json'`
`output_log_path`	`str`	Path to save the output validation log. Defaults to the latest log.	`'data/00-metadata/logs/(6)_conversion_validation_log_latest.json'`

Returns:

Type	Description
`dict[str, Any]`	dict[str, Any]: Dictionary containing validation summary, including total files, successful and failed conversions, and per-file details.

Edge Cases

If a file is present in the matching log but not in the conversion log, it is ignored.
Only files with status 'success' in the conversion log are validated.
If row counts do not match, the file is marked as failed with an error message.
Handles missing or empty details gracefully.

Example

results = converter_validation() print(results["successful_conversions"])

`eencijferho.utils.converter_match.match_files(storage, input_folder, log_path='data/00-metadata/logs/(3)_xlsx_validation_log_latest.json')` ¶

Match input files with metadata files and log the results.

Special matching rules: - Files starting with "EV" match with files containing "1cyferho" - Files containing "VAKHAVW" match with files containing "Vakgegevens"

Uitvoer-hulpfuncties¶

`eencijferho.utils.compressor.convert_csv_to_parquet(storage, input_dir=None)` ¶

`eencijferho.utils.encryptor.encryptor(storage, input_dir=None, output_dir=None)` ¶

`eencijferho.utils.converter_headers.convert_csv_headers_to_snake_case(storage, input_dir=None, delimiter=';', encoding='utf-8', quote_char='"', infer_schema_length=0)` ¶

Convert all CSV file headers in the input directory to snake_case.

Parameters:

Name	Type	Description	Default
`input_dir`	`str \| None`	Path to directory containing CSV files	`None`
`delimiter`	`str`	CSV delimiter (default: ";")	`';'`
`encoding`	`str`	File encoding (default: "utf-8")	`'utf-8'`
`quote_char`	`str`	Quote character to use. Use "" to disable quoting (default: "")	`'"'`
`infer_schema_length`	`int \| None`	Number of rows to scan for schema inference (default: None - scan all)	`0`

Configuratie¶

`eencijferho.config.OutputConfig` `dataclass` ¶

Controls which output variants the pipeline produces.

Attributes:

Name	Type	Description
`variants`	`list[str]`	Which decoded variants to create. Supported values: `"decoded"` (Dec-only substitution) and `"enriched"` (Dec + variable_metadata label substitution).
`formats`	`list[str]`	Extra output formats. `"parquet"` compresses each CSV to a Parquet file.
`encrypt`	`bool`	When True, sensitive columns (e.g. BSN) are encrypted.
`column_casing`	`str`	Header style applied to all output CSV/Parquet files. `"snake_case"` converts headers to snake_case; `"none"` leaves headers unchanged.
`convert_ev`	`bool`	When True (default), EV main data files are converted from fixed-width to CSV.
`convert_vakhavw`	`bool`	When True (default), VAKHAVW main data files are converted from fixed-width to CSV.
`decode_columns`	`list[str] \| None`	Column names to decode via Dec_* lookup tables. `None` decodes all available columns.
`enrich_variables`	`list[str] \| None`	Variable names to enrich via variable_metadata labels. `None` enriches all available variables.

Example — CSV-only, no encryption, no header rename::

OutputConfig(variants=["decoded"], formats=[], encrypt=False, column_casing="none")

API-referentie¶

Pipeline¶

eencijferho.core.pipeline.run_turbo_convert_pipeline(storage, input_dir='data/01-input', dec_metadata_json=None, output_dir='data/02-output', metadata_dir=None, progress_callback=None, status_callback=None, output_config=None) ¶

Extractor¶

eencijferho.core.extractor.process_txt_folder(storage, input_folder, json_output_folder='data/00-metadata/json') ¶

eencijferho.core.extractor.write_variable_metadata(storage, input_dir='data/01-input', json_folder='data/00-metadata/json', output_filename='variable_metadata.json') ¶

eencijferho.core.extractor.process_json_folder(storage, json_input_folder='data/00-metadata/json', excel_output_folder='data/00-metadata') ¶

eencijferho.core.extractor.get_fwf_params(txt_file, table_index=0) ¶

eencijferho.core.extractor.list_fwf_tables(txt_file) ¶

Validatie¶

eencijferho.utils.extractor_validation.validate_metadata_folder(storage, metadata_folder='data/00-metadata', return_dict=False) ¶

eencijferho.utils.converter_match.match_files(storage, input_folder, log_path='data/00-metadata/logs/(3)_xlsx_validation_log_latest.json') ¶

Uitvoer-hulpfuncties¶

eencijferho.utils.compressor.convert_csv_to_parquet(storage, input_dir=None) ¶

eencijferho.utils.encryptor.encryptor(storage, input_dir=None, output_dir=None) ¶

eencijferho.utils.converter_headers.convert_csv_headers_to_snake_case(storage, input_dir=None, delimiter=';', encoding='utf-8', quote_char='"', infer_schema_length=0) ¶

Configuratie¶

eencijferho.config.OutputConfig dataclass ¶

`eencijferho.core.pipeline.run_turbo_convert_pipeline(storage, input_dir='data/01-input', dec_metadata_json=None, output_dir='data/02-output', metadata_dir=None, progress_callback=None, status_callback=None, output_config=None)` ¶

`eencijferho.core.extractor.process_txt_folder(storage, input_folder, json_output_folder='data/00-metadata/json')` ¶

`eencijferho.core.extractor.write_variable_metadata(storage, input_dir='data/01-input', json_folder='data/00-metadata/json', output_filename='variable_metadata.json')` ¶

`eencijferho.core.extractor.process_json_folder(storage, json_input_folder='data/00-metadata/json', excel_output_folder='data/00-metadata')` ¶

`eencijferho.core.extractor.get_fwf_params(txt_file, table_index=0)` ¶

`eencijferho.core.extractor.list_fwf_tables(txt_file)` ¶

`eencijferho.utils.extractor_validation.validate_metadata_folder(storage, metadata_folder='data/00-metadata', return_dict=False)` ¶

`eencijferho.utils.converter_match.match_files(storage, input_folder, log_path='data/00-metadata/logs/(3)_xlsx_validation_log_latest.json')` ¶

`eencijferho.utils.compressor.convert_csv_to_parquet(storage, input_dir=None)` ¶

`eencijferho.utils.encryptor.encryptor(storage, input_dir=None, output_dir=None)` ¶

`eencijferho.utils.converter_headers.convert_csv_headers_to_snake_case(storage, input_dir=None, delimiter=';', encoding='utf-8', quote_char='"', infer_schema_length=0)` ¶

`eencijferho.config.OutputConfig` `dataclass` ¶