datapunt_processing.extract package

Submodules

datapunt_processing.extract.download_all_resources_from_dcatd_to_csv module

datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.create_dir_if_not_exists(directory)

Create directory if it does not yet exists.

Args:
Specify the name of directory, for example: dir/anotherdir
Returns:
Creates the directory if it does not exists, of return the error message.
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.getDatasets(url, dcatd_url)

Parse each dataset json response into a non-nested dict structure.

datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.getPage(url, access_token=None)

Get data from url. Evaluate if employee credentials are present in environment variables. export DATAPUNT_EMAIL=******* export DATAPUNT_PASSWORD=******

If present, get an accesstoken. Args:

url
Returns:
response data
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.main()
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.save_file(data, output_folder, filename)

save_file currently works with: csv, txt, geojson and json as suffixes. It reads the filename suffix and saves the file as the appropriate type.

Args:
  1. data: list of flattened dictionary objects for example: [{id:1, attr:value, attr2:value}, {id:2, attr:value, attr2:value}]
  2. filename: data_output.csv or data_output.json
  3. output_folder: dir/anotherdir
Returns:
Saved the list of objects to the given geojson or csv type.

datapunt_processing.extract.download_bbga_by_variable__area_year module

datapunt_processing.extract.download_bbga_by_variable__area_year.main()

Example using total citizens by department in 2017 Written to schema ‘bi_afval’ and table d_bbga_cd’

datapunt_processing.extract.download_bbga_by_variable__area_year.statisticsByAreaByYear(variableName, AreaType, Year)

Area options: stadsdeel, gebiedsberichtwerken, buurtcombinatie, buurt Year options: e.g., 2015, 2016, 2017 variableNames can be found here: https://api.datapunt.amsterdam.nl/bbga/variabelen/

datapunt_processing.extract.download_bbga_by_variable__area_year.writeStatisticsTable2PGTable(schema, tableName, df_std)

Change database conenction parameters with your own login credentials and make sure that schema exists

datapunt_processing.extract.download_from_api_brk module

datapunt_processing.extract.download_from_api_brk.getJsonData(url, accessToken)

Get a json response from a url with accesstoken.

Args:
  1. url: api endpoint

  2. accessToken: acces token generated using the auth helper: GetAccessToken().getAccessToken(usertype=’employee_plus’,

    scopes=’BRK/RS,BRK/RSN/,BRK/RO’)

Returns:
parsed json or error message
datapunt_processing.extract.download_from_api_brk.main()
datapunt_processing.extract.download_from_api_brk.parser()

Parser function to run arguments from commandline and to add description to sphinx docs.

datapunt_processing.extract.download_from_api_kvk module

datapunt_processing.extract.download_from_api_kvk.get_kvk_json(url, params, api_key=None)

Get a json response from a url, provided params + api_key. Args:

url: api endpoint params: kvkNumber, branchNumber, rsin, street, houseNumber, postalCode,

city, tradeName, or provide lists/dicts of values

api_key: kvk api_key. add KVK_API_KEY to your ENV variables

Returns:
parsed json or error message
datapunt_processing.extract.download_from_api_kvk.main()
datapunt_processing.extract.download_from_api_kvk.parser()

Parser function to run arguments from commandline and to add description to sphinx docs.

datapunt_processing.extract.download_from_api_kvk.response_to_json(response)

datapunt_processing.extract.download_from_api_tellus module

datapunt_processing.extract.download_from_api_tellus.conversionListCvalues(metadata)

Create a conversion dictionairy for values in tellus api which consists of 60 speed +length values named: c1 to c60

datapunt_processing.extract.download_from_api_tellus.getJsonData(url, accessToken)

Get a json response from a url with accesstoken.

Args:
  1. url: api endpoint
  2. accessToken: acces token generated using the auth helper: GetAccessToken().getAccessToken(usertype=’employee’, scopes=’TLLS/R’)
Returns:
parsed json or error message
datapunt_processing.extract.download_from_api_tellus.get_data(url_api, endpoint, metadata, accessToken, limit)

Get and flatten all the data from the api.

Args:
  1. url_api: get the main api url:

    https://api.data.amsterdam.nl/tellus
    
  2. get one endpoint:

    tellus
    
  3. get a list of dictionaries from other endpoints, in this case: for tellus location, speed and length.

  4. accessToken: acces token generated using the auth helper: GetAccessToken().getAccessToken()

  5. limit: set the number of pages you want to retrieve, ideal for testing first:

    10

Returns:
A list containing multiple items which are all reformatted to a flattened json with added metadata.
datapunt_processing.extract.download_from_api_tellus.main()
datapunt_processing.extract.download_from_api_tellus.parser()

Parser function to run arguments from commandline and to add description to sphinx docs.

datapunt_processing.extract.download_from_api_tellus.reformatData(item, tellus_metadata, cvalues)

Reformat the data from a matrix to a flattend dict with label and tellus names.

Args:
  1. item: one recorded hour which contains 60 types of registrations c1-c60.
  2. tellus_metadata: list of description values for each tellus.
  3. cvalues: converted 60 values to add the proper labels to c1 to c6 counted record.
Returns:
60 rows by c-value with metadata an label descriptions

datapunt_processing.extract.download_from_api_with_authentication module

datapunt_processing.extract.download_from_api_with_authentication.getJsonData(url, access_token)

Get a json response from a url with accesstoken.

Args:
  1. url: api endpoint
  2. accessToken: acces token generated using the auth helper: GetAccessToken().getAccessToken(usertype=’employee_plus’, scopes=’BRK/RS,BRK/RSN/,BRK/RO’)
Returns:
parsed json or error message
datapunt_processing.extract.download_from_api_with_authentication.main()
datapunt_processing.extract.download_from_api_with_authentication.parser()

Parser function to run arguments from commandline and to add description to sphinx docs.

datapunt_processing.extract.download_from_api_with_authentication.retrywithtrailingslash(url, access_token)

datapunt_processing.extract.download_from_catalog module

datapunt_processing.extract.download_from_catalog.download_all_files(metadata, download_directory)

Download all files from metadata resources list.

Args:
  1. metadata: json dictonary from ckan with all the metadata including the resources list of all files.
  2. download_directory: path where to store the files from the files, for example data.
Result:
Unzipped and created dir filled with all data in the download_directory, if this does not yet exists.
datapunt_processing.extract.download_from_catalog.download_file(file_location, target)
datapunt_processing.extract.download_from_catalog.download_metadata(url)

Download files from data catalog using the dcatd identifier.

Args:
url: full data.amsterdam.nl url of the desired dataset, for example: https://data.amsterdam.nl/#?dte=dcatd%2Fdatasets%2Finzameldagen-grofvuil-huisvuil&dtfs=T&mpb=topografie&mpz=11&mpv=52.3731081:4.8932945
Result:
All the Metadata from this dataset as a json dictonary, with the owner, refresh data, resource url’s to the desired files, etc.
datapunt_processing.extract.download_from_catalog.get_catalog_package_id(url)

Retrieve dcatd URI from full url from data.amsterdam.nl, for example: dcatd/datasets/inzameldagen-grofvuil-huisvuil

Args:
url: full data.amsterdam.nl url of the desired dataset, for example: https://data.amsterdam.nl/#?dte=dcatd%2Fdatasets%2Finzameldagen-grofvuil-huisvuil&dtfs=T&mpb=topografie&mpz=11&mpv=52.3731081:4.8932945
Result:
Unique id number of package.
datapunt_processing.extract.download_from_catalog.main()
datapunt_processing.extract.download_from_catalog.parser()

Parser function to run arguments from commandline and to add description to sphinx.

datapunt_processing.extract.download_from_ckan module

datapunt_processing.extract.download_from_ckan.download_all_files(metadata, download_directory)

Download all files from metadata resources list.

Args:
  1. metadata: json dictonary from ckan with all the metadata including the resources list of all files.
  2. download_directory: path where to store the files from the files, for example data.
Result:
Unzipped and created dir filled with all data in the download_directory, if this does not yet exists.
datapunt_processing.extract.download_from_ckan.download_file(file_location, target)
datapunt_processing.extract.download_from_ckan.download_metadata(url)

Download files from data catalog response id.

Args:
url: full data.amsterdam.nl url of the desired dataset, for example: https://data.amsterdam.nl/#?dte=catalogus%2Fapi%2F3%2Faction%2Fpackage_show%3Fid%3D5d84c216-b826-4406-8297-292678dee13c
Result:
All the Metadata from this dataset as a json dictonary, with the owner, refresh data, resource url’s to the desired files, etc.
datapunt_processing.extract.download_from_ckan.get_catalog_package_id(url)

Retrieve package id from full url from data.amsterdam.nl, for example: catalogus/api/3/action/package_show?id=c1f04a62-8b69-4775-ad83-ce2647a076ef

Args:
url: full data.amsterdam.nl url of the desired dataset, for example: https://data.amsterdam.nl/#?dte=catalogus%2Fapi%2F3%2Faction%2Fpackage_show%3Fid%3D5d84c216-b826-4406-8297-292678dee13c
Result:
Unique id number of package.
datapunt_processing.extract.download_from_ckan.main()
datapunt_processing.extract.download_from_ckan.parser()

Parser function to run arguments from commandline and to add description to sphinx.

datapunt_processing.extract.download_from_objectstore module

datapunt_processing.extract.download_from_objectstore.download_container(connection, container, prefix, output_folder)

Download file from objectstore.

Args:
  1. connection: connection session using the objectstore_connection function from the helpers.connections
  2. prefix: tag or folder name of file, for example subfolder/subsubfolder
  3. output_folder = ‘/{folder}/ ‘
Returns:
Written file /{folder}/{prefix}/{file}
datapunt_processing.extract.download_from_objectstore.download_containers(config_path, config_name, prefixes, output_folder)

Download multiple files from the objectstore.

Args:
  1. connection: connection session using the objectstore_connection function from the helpers.connections
  2. prefixes: multiple folders where the files are located, for example aanvalsplan_schoon/crow,aanvalsplan_schoon/mora
  3. output_folder: local folder to write files into, for example app/data for a docker setup
Result:
Loops through download_container function for each prefix (=folder)
datapunt_processing.extract.download_from_objectstore.get_full_container_list(connection, container, **kwargs)

Get all files stored in container (incl. sub-containers)

Args:
  1. connection: connection session using the objectstore_connection function from the helpers.connections
  2. container: “name of the root container/folder in objectstore”
Returns:
Generator object with all containers.
datapunt_processing.extract.download_from_objectstore.main()
datapunt_processing.extract.download_from_objectstore.parser()

Parser function to run arguments from commandline and to add description to sphinx docs.

datapunt_processing.extract.download_from_signals_api module

datapunt_processing.extract.download_from_signals_api.get_sia_json(url, scope, params, acc=False, page_limit=0)

first: put SIGNALS_USER en SIGNALS_PASSWORD to env variables (!)

Args:
url: sia api endpoint params: created_at, main_cat, sub_cat, text, address, pc, bc, sd, geometry, status or provide lists/dicts of values bearer_token: bearer_token
Returns:
parsed json or error message
datapunt_processing.extract.download_from_signals_api.main()
datapunt_processing.extract.download_from_signals_api.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.extract.download_from_signals_api.process_address(location)

Extract

datapunt_processing.extract.download_from_wfs module

datapunt_processing.extract.download_from_wfs.get_layer_from_wfs(url_wfs, layer_name, srs, outputformat, retry_count=3)

Get layer from a wfs service. Args:

  1. url_wfs: full url of the WFS including https, excluding /?:

    https://map.data.amsterdam.nl/maps/gebieden
    
  2. layer_name: Title of the layer:

    stadsdeel
    
  3. srs: coordinate system number, excluding EPSG:

    28992
    
  4. outputformat: leave empty to return standard GML, else define json, geojson, txt, shapezip:

    geojson
    
Returns:
The layer in the specified output format.
datapunt_processing.extract.download_from_wfs.get_layers_from_wfs(url_wfs)

Get all layer names in WFS service, print and return them in a list.

datapunt_processing.extract.download_from_wfs.get_multiple_geojson_from_wfs(url_wfs, layer_names, srs, output_folder)

Get all layers and save them as a geojson

Args:
  1. url_wfs: full url of the WFS including https, excluding /?:

    https://map.data.amsterdam.nl/maps/gebieden
    
  2. layer_names: single or multiple titles of the layers, separated by a comma without spaces:

    stadsdeel,buurtcombinatie,gebiedsgerichtwerken,buurt
    
  3. srs: coordinate system number, excluding EPSG:

    28992
    
  4. output_folder: define the folder to save the files:

    path_to_folder/another_folder
    
datapunt_processing.extract.download_from_wfs.main()
datapunt_processing.extract.download_from_wfs.parser()

Parser function to run arguments from the command line and to add description to sphinx.

datapunt_processing.extract.download_tables_from_dokuwiki_to_json module

datapunt_processing.extract.download_tables_from_dokuwiki_to_json.create_dir_if_not_exists(directory)

Create directory if it does not yet exists.

Args:
Specify the name of directory, for example: dir/anotherdir
Returns:
Creates the directory if it does not exists, of return the error message.
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.getHeaders(row)
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.getPage(url)

Get parsed text data from url’s. Wait for 1 second for slow networks. Retry 5 times.

datapunt_processing.extract.download_tables_from_dokuwiki_to_json.getRows(url, headers, row)

Get all rows from tables, add them into a dict and add host url to wiki urls.

datapunt_processing.extract.download_tables_from_dokuwiki_to_json.getTableValues(url, table)
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.main()
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.parseHtmlTable(url, html_doc, header_name_urls, cluster_headertype, table_headertype='h3')

Retrieve one html page to parse tables and H3 names from. Args:

  • htmldoc: wiki url
  • name: name of the page
  • headertype: h1, h2, or h3 type of the titles used above each table. h3 is not used.
Result:
{table_title: h3 text, [{name: value}, ..]} if no name is specified: [{cluster: title of the page},{name: value}, …]
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.extract.download_tables_from_dokuwiki_to_json.saveFile(data, folder, name)

Save file as json and return the full path.

datapunt_processing.extract.write_csv_to_dataframe module

datapunt_processing.extract.write_csv_to_dataframe.is_valid_file(parser, arg)
datapunt_processing.extract.write_csv_to_dataframe.main()
datapunt_processing.extract.write_csv_to_dataframe.parser()

Parser function to run arguments from the command line and to add description to sphinx.

datapunt_processing.extract.write_csv_to_dataframe.read_crow_file(file, datecol)

parses the CROW afvaldata Args:

file (xls/xlsx): containing at least a date column datecol: ‘datum’ format %Y-m-%d %H:%M:%S
Returns:
  • pd.DataFrame: cleaned data frame with datum and time column added
datapunt_processing.extract.write_csv_to_dataframe.read_mora_file(file, datecol)

parses the MORA csv and transforms into clean Pandas Dataframe Args:

file (csv/xls/xlsx): containing at least a date column datecol: ‘aa_adwh_datum_melding’ format %Y-m-%d %H:%M:%S
Returns:
pd.DataFrame: cleaned data frame with datum and time column added
datapunt_processing.extract.write_csv_to_dataframe.strip_cols(df)

simple utility function to clean dataframe columns

datapunt_processing.extract.write_csv_to_dataframe.valid_date(s)

datapunt_processing.extract.write_mdb_to_csv module

datapunt_processing.extract.write_mdb_to_csv.dump_mdb_tables_to_csv(mdb_file, output_folder, table_names)

Dump each table as a CSV file using “mdb-export” and converting ” ” in table names to “_” for the CSV filenames.

datapunt_processing.extract.write_mdb_to_csv.get_tables_mdb(mdb_file)

Get the list of table names with “mdb-tables” for a *.mdb file using latin1 as encoding.

datapunt_processing.extract.write_mdb_to_csv.main()
datapunt_processing.extract.write_mdb_to_csv.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.extract.write_table_to_csv module

datapunt_processing.extract.write_table_to_csv.export_table_to_csv(config_path, db_config, table_name, output_folder)

Export table to CSV file.

Args:
  1. pg_str: psycopg2 connection string, for example: host=localhost port=5432 user=your_username dbname=your_database_name password=very_secret
  2. table_name: for example my_tablename
  3. output_folder: define output folder, for example: /app/data
Result:
Exported csv file to output_folder/table_name_2018-12-31.csv
datapunt_processing.extract.write_table_to_csv.main()
datapunt_processing.extract.write_table_to_csv.parser()

datapunt_processing.extract.write_table_to_geojson module

datapunt_processing.extract.write_table_to_geojson.main()
datapunt_processing.extract.write_table_to_geojson.parser()
datapunt_processing.extract.write_table_to_geojson.write_table_to_geojson(config_path, db_config, table_name, output_folder)

Export table to a GeoJson file.

Args:
  1. pg_str: psycopg2 connection string, for example: host=localhost port=5432 user=your_username dbname=your_database_name password=very_secret
  2. table_name: for example my_tablename
  3. output_folder: define output folder, for example: /app/data
Result:
Exported file to output_folder/table_name_2018-12-31.geojson

datapunt_processing.extract.write_xml_to_df_to_csv module

class datapunt_processing.extract.write_xml_to_df_to_csv.XML2DataFrame(xml_data)

Bases: object

Class for parsing and XML to a Dataframe

normalize(name)

Remove the schemaname from keys/values. input:

returns:
percelen
parse_element(element, parsed=None)
parse_root(root)
process_data()
datapunt_processing.extract.write_xml_to_df_to_csv.main()
datapunt_processing.extract.write_xml_to_df_to_csv.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.extract.write_xml_to_df_to_csv.xml_to_df(file)

Function to parse an XML file to a Pandas dataframe. Args:

file = filename of the XML
Result:
df of the xml

datapunt_processing.extract.write_xml_to_df_to_csv_fout module

class datapunt_processing.extract.write_xml_to_df_to_csv_fout.XML2DataFrame(xml_data)

Bases: object

Class for parsing and XML to a Dataframe

normalize(name)

Remove the schemaname from keys/values. input:

returns:
percelen
parse_element(element, key_name=None, parsed=None)
parse_root(root)
process_data()
datapunt_processing.extract.write_xml_to_df_to_csv_fout.main()
datapunt_processing.extract.write_xml_to_df_to_csv_fout.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.extract.write_xml_to_df_to_csv_fout.xml_to_df(file)

Function to parse an XML file to a Pandas dataframe. Args:

file = filename of the XML
Result:
df of the xml

datapunt_processing.extract.write_xml_to_json module

datapunt_processing.extract.write_xml_to_json.main()
datapunt_processing.extract.write_xml_to_json.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.extract.write_xml_to_json.xml2json(file_input, output_name)

Args: - full path name - full output file name

datapunt_processing.extract.xml_to_dict module

datapunt_processing.extract.xml_to_dict.main()
datapunt_processing.extract.xml_to_dict.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.extract.xml_to_dict.xml2json(file_input, output_name)

Module contents