datapunt_processing.helpers package

Submodules

datapunt_processing.helpers.connections module

datapunt_processing.helpers.connections.execute_sql(pg_str, sql)

Execute a sql query with psycopg2.

Args:
  1. pg_str: connection string using helper function psycopg_connection_string, returning:host= port= user= dbname= password=

  2. sql: SQL string in triple quotes:

    ```CREATE TABLE foo (bar text)```
    
Returns:
Executed sql with conn.cursor().execute(sql)
datapunt_processing.helpers.connections.get_config(full_path)

Get config file with all login credentials, port numbers, etc.

Args:
full_path: provide the full path to the config.ini file, for example authentication/config.ini
Returns:
The entire configuration file to use them with config.get(config_name, 'AUTHURL')
datapunt_processing.helpers.connections.objectstore_connection(config_full_path, config_name, print_config_vars=None)

Get an objectsctore connection.

Args:
  1. config_full_path: /path_to_config/config.ini or config.ini if in root.
  2. config_name: objectstore
  3. print_config_vars: if set to True: print all variables from the config file
Returns:
An objectstore connection session.
datapunt_processing.helpers.connections.postgres_engine_pandas(config_full_path, db_config_name)

Pandas uses SQLalchemy, this is the config wrapper to insert config parameters in to_sql queries.

Args:
  1. config_full_path: location of the config.ini file including the name of the file, for example authentication/config.ini
  2. db_config_name: dev or docker to get the ip user/password and port values.
Returns:
The postgres pandas engine to do sql queries with.
datapunt_processing.helpers.connections.psycopg_connection_string(config_full_path, db_config_name)

Postgres connection string for psycopg2.

Args:
  1. config_full_path: location of the config.ini file including the name of the file, for example authentication/config.ini
  2. db_config_name: dev or docker to get the ip user/password and port values.
Returns:
Returns the psycopg required connection string: ‘PG:host= port= user= dbname= password=’

datapunt_processing.helpers.demo_asyncio module

datapunt_processing.helpers.demo_asyncio.custom_sleep()
datapunt_processing.helpers.demo_asyncio.factorial(name, number)

datapunt_processing.helpers.files module

datapunt_processing.helpers.files.create_dir_if_not_exists(directory)

Create directory if it does not yet exists.

Args:
Specify the name of directory, for example: dir/anotherdir
Returns:
Creates the directory if it does not exists, of return the error message.
datapunt_processing.helpers.files.save_file(data, output_folder, filename)

save_file currently works with: csv, txt, geojson and json as suffixes. It reads the filename suffix and saves the file as the appropriate type.

Args:
  1. data: list of flattened dictionary objects for example: [{id:1, attr:value, attr2:value}, {id:2, attr:value, attr2:value}]
  2. filename: data_output.csv or data_output.json
  3. output_folder: dir/anotherdir
Returns:
Saved the list of objects to the given geojson or csv type.
datapunt_processing.helpers.files.unzip(path, filename_as_folder=False)

Find all zip files and unzip in root.

Args:
  1. path: set the folder to check for zip files.
  2. filename_as_folder:Set it to True to unzip to subfolders with name of zipfile instead of in the root folder.
Returns:
Unzipped files in the path directory or in the path/name of the zip file.

datapunt_processing.helpers.getaccesstoken module

class datapunt_processing.helpers.getaccesstoken.GetAccessToken

Bases: object

Get an header authentication item for access token for using the internal API’s by logging in as with email and password credentials and authenticated scopes or as type ‘employee’ To see the available scopes and types, see this file:

Usage:

from authentication.getaccesstoken import GetAccessToken

accessToken = GetAccessToken().getAccessToken(usertype=’employee_plus’, scopes=BRK/RS,BRK/RSN,BRK/RO) requests.get(url, headers=accessToken)

Args:
  • scopes: Add scopes as a comma separated list.
  • usertype: Add the usertype
  • email: Set and get environment variable: export DATAPUNT_EMAIL=*****
  • password: Set and get environment variable: export DATAPUNT_PASSWORD=*****
Returns:
accesstoken
getAccessToken(usertype='employee', scopes='TLLS/R', acc=False)
datapunt_processing.helpers.getaccesstoken.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.helpers.json_dict_handlers module

datapunt_processing.helpers.json_dict_handlers.clean_dict(dictionary, key_name)

Remove a field from a dict based on key name. Args:

  1. dictionary: {id:1, dates:2018-12-02}
  2. key_name: ‘dates’
Returns:
{id:1}
datapunt_processing.helpers.json_dict_handlers.flatten_json(json_object)

Flatten nested json Object. Args:

1 json_object, for example: {“key”: “subkey”: { “subsubkey”:”value” }}
Returns:
{“key.subkey.subsubkey”:”value”}
Source:
https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10
datapunt_processing.helpers.json_dict_handlers.joinByKeyNames(geojson, dataset, key1, key2)

Insert data from dataset to geojson where key1 from dataset matches key2 from geojson

datapunt_processing.helpers.json_dict_handlers.jsonPoints2geojson(df, latColumn, lonColumn)

Convert JSON with lat/lon columns to geojson. https://gis.stackexchange.com/questions/220997/pandas-to-geojson-multiples-points-features-with-python

datapunt_processing.helpers.json_dict_handlers.openJsonArrayKeyDict2FlattenedJson(fileName)

Open json and return array of objects without object value name. For example: [{‘container’:{…}}, {‘container’:{…}}] returns now as [{…},{…}])

datapunt_processing.helpers.logging module

datapunt_processing.helpers.logging.logger()

Setup basic logging for console.

Usage:
Initialize the logger by adding the code at the top of your script: logger = logger()

TODO: add log file export

datapunt_processing.helpers.split_file_by_date module

datapunt_processing.helpers.split_file_by_date.load_csv(csvfile)
datapunt_processing.helpers.split_file_by_date.main()
datapunt_processing.helpers.split_file_by_date.parser()

Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.helpers.split_file_by_date.split_file_by_date(data, column_name, date_splitter)

Split file into time series.

Args:
filename: full path to of file date_splitter: year, month, day
Returns:
A multiple files split by year/day/month

datapunt_processing.helpers.xml_handlers module

datapunt_processing.helpers.xml_handlers.parse_and_remove(filename, path)

incremental XML parsing Args:

filename: xml file name path: path to xml file
source:
https://github.com/dabeaz/python-cookbook/blob/master/src/6/incremental_parsing_of_huge_xml_files/example.py

Module contents