datapunt_processing.extract package¶
Submodules¶
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv module¶
-
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.
create_dir_if_not_exists
(directory)¶ Create directory if it does not yet exists.
- Args:
- Specify the name of directory, for example: dir/anotherdir
- Returns:
- Creates the directory if it does not exists, of return the error message.
-
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.
getDatasets
(url, dcatd_url)¶ Parse each dataset json response into a non-nested dict structure.
-
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.
getPage
(url, access_token=None)¶ Get data from url. Evaluate if employee credentials are present in environment variables. export DATAPUNT_EMAIL=******* export DATAPUNT_PASSWORD=******
If present, get an accesstoken. Args:
url- Returns:
- response data
-
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.
main
()¶
-
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
-
datapunt_processing.extract.download_all_resources_from_dcatd_to_csv.
save_file
(data, output_folder, filename)¶ save_file currently works with: csv, txt, geojson and json as suffixes. It reads the filename suffix and saves the file as the appropriate type.
- Args:
- data: list of flattened dictionary objects for example: [{id:1, attr:value, attr2:value}, {id:2, attr:value, attr2:value}]
- filename: data_output.csv or data_output.json
- output_folder: dir/anotherdir
- Returns:
- Saved the list of objects to the given geojson or csv type.
datapunt_processing.extract.download_bbga_by_variable__area_year module¶
-
datapunt_processing.extract.download_bbga_by_variable__area_year.
main
()¶ Example using total citizens by department in 2017 Written to schema ‘bi_afval’ and table d_bbga_cd’
-
datapunt_processing.extract.download_bbga_by_variable__area_year.
statisticsByAreaByYear
(variableName, AreaType, Year)¶ Area options: stadsdeel, gebiedsberichtwerken, buurtcombinatie, buurt Year options: e.g., 2015, 2016, 2017 variableNames can be found here: https://api.datapunt.amsterdam.nl/bbga/variabelen/
-
datapunt_processing.extract.download_bbga_by_variable__area_year.
writeStatisticsTable2PGTable
(schema, tableName, df_std)¶ Change database conenction parameters with your own login credentials and make sure that schema exists
datapunt_processing.extract.download_from_api_brk module¶
-
datapunt_processing.extract.download_from_api_brk.
getJsonData
(url, accessToken)¶ Get a json response from a url with accesstoken.
- Args:
url: api endpoint
accessToken: acces token generated using the auth helper: GetAccessToken().getAccessToken(usertype=’employee_plus’,
scopes=’BRK/RS,BRK/RSN/,BRK/RO’)
- Returns:
- parsed json or error message
-
datapunt_processing.extract.download_from_api_brk.
main
()¶
-
datapunt_processing.extract.download_from_api_brk.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs.
datapunt_processing.extract.download_from_api_kvk module¶
-
datapunt_processing.extract.download_from_api_kvk.
get_kvk_json
(url, params, api_key=None)¶ Get a json response from a url, provided params + api_key. Args:
url: api endpoint params: kvkNumber, branchNumber, rsin, street, houseNumber, postalCode,
city, tradeName, or provide lists/dicts of valuesapi_key: kvk api_key. add KVK_API_KEY to your ENV variables
- Returns:
- parsed json or error message
-
datapunt_processing.extract.download_from_api_kvk.
main
()¶
-
datapunt_processing.extract.download_from_api_kvk.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs.
-
datapunt_processing.extract.download_from_api_kvk.
response_to_json
(response)¶
datapunt_processing.extract.download_from_api_tellus module¶
-
datapunt_processing.extract.download_from_api_tellus.
conversionListCvalues
(metadata)¶ Create a conversion dictionairy for values in tellus api which consists of 60 speed +length values named: c1 to c60
-
datapunt_processing.extract.download_from_api_tellus.
getJsonData
(url, accessToken)¶ Get a json response from a url with accesstoken.
- Args:
- url: api endpoint
- accessToken: acces token generated using the auth helper: GetAccessToken().getAccessToken(usertype=’employee’, scopes=’TLLS/R’)
- Returns:
- parsed json or error message
-
datapunt_processing.extract.download_from_api_tellus.
get_data
(url_api, endpoint, metadata, accessToken, limit)¶ Get and flatten all the data from the api.
- Args:
url_api: get the main api url:
https://api.data.amsterdam.nl/tellus
get one endpoint:
tellus
get a list of dictionaries from other endpoints, in this case: for tellus location, speed and length.
accessToken: acces token generated using the auth helper: GetAccessToken().getAccessToken()
limit: set the number of pages you want to retrieve, ideal for testing first:
10
- Returns:
- A list containing multiple items which are all reformatted to a flattened json with added metadata.
-
datapunt_processing.extract.download_from_api_tellus.
main
()¶
-
datapunt_processing.extract.download_from_api_tellus.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs.
-
datapunt_processing.extract.download_from_api_tellus.
reformatData
(item, tellus_metadata, cvalues)¶ Reformat the data from a matrix to a flattend dict with label and tellus names.
- Args:
- item: one recorded hour which contains 60 types of registrations c1-c60.
- tellus_metadata: list of description values for each tellus.
- cvalues: converted 60 values to add the proper labels to c1 to c6 counted record.
- Returns:
- 60 rows by c-value with metadata an label descriptions
datapunt_processing.extract.download_from_api_with_authentication module¶
-
datapunt_processing.extract.download_from_api_with_authentication.
getJsonData
(url, access_token)¶ Get a json response from a url with accesstoken.
- Args:
- url: api endpoint
- accessToken: acces token generated using the auth helper: GetAccessToken().getAccessToken(usertype=’employee_plus’, scopes=’BRK/RS,BRK/RSN/,BRK/RO’)
- Returns:
- parsed json or error message
-
datapunt_processing.extract.download_from_api_with_authentication.
main
()¶
-
datapunt_processing.extract.download_from_api_with_authentication.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs.
-
datapunt_processing.extract.download_from_api_with_authentication.
retrywithtrailingslash
(url, access_token)¶
datapunt_processing.extract.download_from_catalog module¶
-
datapunt_processing.extract.download_from_catalog.
download_all_files
(metadata, download_directory)¶ Download all files from metadata resources list.
- Args:
- metadata: json dictonary from ckan with all the metadata including the resources list of all files.
- download_directory: path where to store the files from the files, for example data.
- Result:
- Unzipped and created dir filled with all data in the download_directory, if this does not yet exists.
-
datapunt_processing.extract.download_from_catalog.
download_file
(file_location, target)¶
-
datapunt_processing.extract.download_from_catalog.
download_metadata
(url)¶ Download files from data catalog using the dcatd identifier.
- Args:
- url: full data.amsterdam.nl url of the desired dataset, for example: https://data.amsterdam.nl/#?dte=dcatd%2Fdatasets%2Finzameldagen-grofvuil-huisvuil&dtfs=T&mpb=topografie&mpz=11&mpv=52.3731081:4.8932945
- Result:
- All the Metadata from this dataset as a json dictonary, with the owner, refresh data, resource url’s to the desired files, etc.
-
datapunt_processing.extract.download_from_catalog.
get_catalog_package_id
(url)¶ Retrieve dcatd URI from full url from data.amsterdam.nl, for example: dcatd/datasets/inzameldagen-grofvuil-huisvuil
- Args:
- url: full data.amsterdam.nl url of the desired dataset, for example: https://data.amsterdam.nl/#?dte=dcatd%2Fdatasets%2Finzameldagen-grofvuil-huisvuil&dtfs=T&mpb=topografie&mpz=11&mpv=52.3731081:4.8932945
- Result:
- Unique id number of package.
-
datapunt_processing.extract.download_from_catalog.
main
()¶
-
datapunt_processing.extract.download_from_catalog.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx.
datapunt_processing.extract.download_from_ckan module¶
-
datapunt_processing.extract.download_from_ckan.
download_all_files
(metadata, download_directory)¶ Download all files from metadata resources list.
- Args:
- metadata: json dictonary from ckan with all the metadata including the resources list of all files.
- download_directory: path where to store the files from the files, for example data.
- Result:
- Unzipped and created dir filled with all data in the download_directory, if this does not yet exists.
-
datapunt_processing.extract.download_from_ckan.
download_file
(file_location, target)¶
-
datapunt_processing.extract.download_from_ckan.
download_metadata
(url)¶ Download files from data catalog response id.
- Args:
- url: full data.amsterdam.nl url of the desired dataset, for example: https://data.amsterdam.nl/#?dte=catalogus%2Fapi%2F3%2Faction%2Fpackage_show%3Fid%3D5d84c216-b826-4406-8297-292678dee13c
- Result:
- All the Metadata from this dataset as a json dictonary, with the owner, refresh data, resource url’s to the desired files, etc.
-
datapunt_processing.extract.download_from_ckan.
get_catalog_package_id
(url)¶ Retrieve package id from full url from data.amsterdam.nl, for example: catalogus/api/3/action/package_show?id=c1f04a62-8b69-4775-ad83-ce2647a076ef
- Args:
- url: full data.amsterdam.nl url of the desired dataset, for example: https://data.amsterdam.nl/#?dte=catalogus%2Fapi%2F3%2Faction%2Fpackage_show%3Fid%3D5d84c216-b826-4406-8297-292678dee13c
- Result:
- Unique id number of package.
-
datapunt_processing.extract.download_from_ckan.
main
()¶
-
datapunt_processing.extract.download_from_ckan.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx.
datapunt_processing.extract.download_from_objectstore module¶
-
datapunt_processing.extract.download_from_objectstore.
download_container
(connection, container, prefix, output_folder)¶ Download file from objectstore.
- Args:
- connection: connection session using the objectstore_connection function from the helpers.connections
- prefix: tag or folder name of file, for example subfolder/subsubfolder
- output_folder = ‘/{folder}/ ‘
- Returns:
- Written file /{folder}/{prefix}/{file}
-
datapunt_processing.extract.download_from_objectstore.
download_containers
(config_path, config_name, prefixes, output_folder)¶ Download multiple files from the objectstore.
- Args:
- connection: connection session using the objectstore_connection function from the helpers.connections
- prefixes: multiple folders where the files are located, for example aanvalsplan_schoon/crow,aanvalsplan_schoon/mora
- output_folder: local folder to write files into, for example app/data for a docker setup
- Result:
- Loops through download_container function for each prefix (=folder)
-
datapunt_processing.extract.download_from_objectstore.
get_full_container_list
(connection, container, **kwargs)¶ Get all files stored in container (incl. sub-containers)
- Args:
- connection: connection session using the objectstore_connection function from the helpers.connections
- container: “name of the root container/folder in objectstore”
- Returns:
- Generator object with all containers.
-
datapunt_processing.extract.download_from_objectstore.
main
()¶
-
datapunt_processing.extract.download_from_objectstore.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs.
datapunt_processing.extract.download_from_signals_api module¶
-
datapunt_processing.extract.download_from_signals_api.
get_sia_json
(url, scope, params, acc=False, page_limit=0)¶ first: put SIGNALS_USER en SIGNALS_PASSWORD to env variables (!)
- Args:
- url: sia api endpoint params: created_at, main_cat, sub_cat, text, address, pc, bc, sd, geometry, status or provide lists/dicts of values bearer_token: bearer_token
- Returns:
- parsed json or error message
-
datapunt_processing.extract.download_from_signals_api.
main
()¶
-
datapunt_processing.extract.download_from_signals_api.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
-
datapunt_processing.extract.download_from_signals_api.
process_address
(location)¶ Extract
datapunt_processing.extract.download_from_wfs module¶
-
datapunt_processing.extract.download_from_wfs.
get_layer_from_wfs
(url_wfs, layer_name, srs, outputformat, retry_count=3)¶ Get layer from a wfs service. Args:
url_wfs: full url of the WFS including https, excluding /?:
https://map.data.amsterdam.nl/maps/gebieden
layer_name: Title of the layer:
stadsdeel
srs: coordinate system number, excluding EPSG:
28992
outputformat: leave empty to return standard GML, else define json, geojson, txt, shapezip:
geojson
- Returns:
- The layer in the specified output format.
-
datapunt_processing.extract.download_from_wfs.
get_layers_from_wfs
(url_wfs)¶ Get all layer names in WFS service, print and return them in a list.
-
datapunt_processing.extract.download_from_wfs.
get_multiple_geojson_from_wfs
(url_wfs, layer_names, srs, output_folder)¶ Get all layers and save them as a geojson
- Args:
url_wfs: full url of the WFS including https, excluding /?:
https://map.data.amsterdam.nl/maps/gebieden
layer_names: single or multiple titles of the layers, separated by a comma without spaces:
stadsdeel,buurtcombinatie,gebiedsgerichtwerken,buurt
srs: coordinate system number, excluding EPSG:
28992
output_folder: define the folder to save the files:
path_to_folder/another_folder
-
datapunt_processing.extract.download_from_wfs.
main
()¶
-
datapunt_processing.extract.download_from_wfs.
parser
()¶ Parser function to run arguments from the command line and to add description to sphinx.
datapunt_processing.extract.download_tables_from_dokuwiki_to_json module¶
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
create_dir_if_not_exists
(directory)¶ Create directory if it does not yet exists.
- Args:
- Specify the name of directory, for example: dir/anotherdir
- Returns:
- Creates the directory if it does not exists, of return the error message.
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
getHeaders
(row)¶
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
getPage
(url)¶ Get parsed text data from url’s. Wait for 1 second for slow networks. Retry 5 times.
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
getRows
(url, headers, row)¶ Get all rows from tables, add them into a dict and add host url to wiki urls.
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
getTableValues
(url, table)¶
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
main
()¶
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
parseHtmlTable
(url, html_doc, header_name_urls, cluster_headertype, table_headertype='h3')¶ Retrieve one html page to parse tables and H3 names from. Args:
- htmldoc: wiki url
- name: name of the page
- headertype: h1, h2, or h3 type of the titles used above each table. h3 is not used.
- Result:
- {table_title: h3 text, [{name: value}, ..]} if no name is specified: [{cluster: title of the page},{name: value}, …]
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
-
datapunt_processing.extract.download_tables_from_dokuwiki_to_json.
saveFile
(data, folder, name)¶ Save file as json and return the full path.
datapunt_processing.extract.write_csv_to_dataframe module¶
-
datapunt_processing.extract.write_csv_to_dataframe.
is_valid_file
(parser, arg)¶
-
datapunt_processing.extract.write_csv_to_dataframe.
main
()¶
-
datapunt_processing.extract.write_csv_to_dataframe.
parser
()¶ Parser function to run arguments from the command line and to add description to sphinx.
-
datapunt_processing.extract.write_csv_to_dataframe.
read_crow_file
(file, datecol)¶ parses the CROW afvaldata Args:
file (xls/xlsx): containing at least a date column datecol: ‘datum’ format %Y-m-%d %H:%M:%S- Returns:
- pd.DataFrame: cleaned data frame with datum and time column added
-
datapunt_processing.extract.write_csv_to_dataframe.
read_mora_file
(file, datecol)¶ parses the MORA csv and transforms into clean Pandas Dataframe Args:
file (csv/xls/xlsx): containing at least a date column datecol: ‘aa_adwh_datum_melding’ format %Y-m-%d %H:%M:%S- Returns:
- pd.DataFrame: cleaned data frame with datum and time column added
-
datapunt_processing.extract.write_csv_to_dataframe.
strip_cols
(df)¶ simple utility function to clean dataframe columns
-
datapunt_processing.extract.write_csv_to_dataframe.
valid_date
(s)¶
datapunt_processing.extract.write_mdb_to_csv module¶
-
datapunt_processing.extract.write_mdb_to_csv.
dump_mdb_tables_to_csv
(mdb_file, output_folder, table_names)¶ Dump each table as a CSV file using “mdb-export” and converting ” ” in table names to “_” for the CSV filenames.
-
datapunt_processing.extract.write_mdb_to_csv.
get_tables_mdb
(mdb_file)¶ Get the list of table names with “mdb-tables” for a *.mdb file using latin1 as encoding.
-
datapunt_processing.extract.write_mdb_to_csv.
main
()¶
-
datapunt_processing.extract.write_mdb_to_csv.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
datapunt_processing.extract.write_table_to_csv module¶
-
datapunt_processing.extract.write_table_to_csv.
export_table_to_csv
(config_path, db_config, table_name, output_folder)¶ Export table to CSV file.
- Args:
- pg_str: psycopg2 connection string, for example: host=localhost port=5432 user=your_username dbname=your_database_name password=very_secret
- table_name: for example my_tablename
- output_folder: define output folder, for example: /app/data
- Result:
- Exported csv file to output_folder/table_name_2018-12-31.csv
-
datapunt_processing.extract.write_table_to_csv.
main
()¶
-
datapunt_processing.extract.write_table_to_csv.
parser
()¶
datapunt_processing.extract.write_table_to_geojson module¶
-
datapunt_processing.extract.write_table_to_geojson.
main
()¶
-
datapunt_processing.extract.write_table_to_geojson.
parser
()¶
-
datapunt_processing.extract.write_table_to_geojson.
write_table_to_geojson
(config_path, db_config, table_name, output_folder)¶ Export table to a GeoJson file.
- Args:
- pg_str: psycopg2 connection string, for example: host=localhost port=5432 user=your_username dbname=your_database_name password=very_secret
- table_name: for example my_tablename
- output_folder: define output folder, for example: /app/data
- Result:
- Exported file to output_folder/table_name_2018-12-31.geojson
datapunt_processing.extract.write_xml_to_df_to_csv module¶
-
class
datapunt_processing.extract.write_xml_to_df_to_csv.
XML2DataFrame
(xml_data)¶ Bases:
object
Class for parsing and XML to a Dataframe
-
normalize
(name)¶ Remove the schemaname from keys/values. input:
- returns:
- percelen
-
parse_element
(element, parsed=None)¶
-
parse_root
(root)¶
-
process_data
()¶
-
-
datapunt_processing.extract.write_xml_to_df_to_csv.
main
()¶
-
datapunt_processing.extract.write_xml_to_df_to_csv.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
-
datapunt_processing.extract.write_xml_to_df_to_csv.
xml_to_df
(file)¶ Function to parse an XML file to a Pandas dataframe. Args:
file = filename of the XML- Result:
- df of the xml
datapunt_processing.extract.write_xml_to_df_to_csv_fout module¶
-
class
datapunt_processing.extract.write_xml_to_df_to_csv_fout.
XML2DataFrame
(xml_data)¶ Bases:
object
Class for parsing and XML to a Dataframe
-
normalize
(name)¶ Remove the schemaname from keys/values. input:
- returns:
- percelen
-
parse_element
(element, key_name=None, parsed=None)¶
-
parse_root
(root)¶
-
process_data
()¶
-
-
datapunt_processing.extract.write_xml_to_df_to_csv_fout.
main
()¶
-
datapunt_processing.extract.write_xml_to_df_to_csv_fout.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
-
datapunt_processing.extract.write_xml_to_df_to_csv_fout.
xml_to_df
(file)¶ Function to parse an XML file to a Pandas dataframe. Args:
file = filename of the XML- Result:
- df of the xml
datapunt_processing.extract.write_xml_to_json module¶
-
datapunt_processing.extract.write_xml_to_json.
main
()¶
-
datapunt_processing.extract.write_xml_to_json.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
-
datapunt_processing.extract.write_xml_to_json.
xml2json
(file_input, output_name)¶ Args: - full path name - full output file name
datapunt_processing.extract.xml_to_dict module¶
-
datapunt_processing.extract.xml_to_dict.
main
()¶
-
datapunt_processing.extract.xml_to_dict.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
-
datapunt_processing.extract.xml_to_dict.
xml2json
(file_input, output_name)¶