datapunt_processing.helpers package¶

Submodules¶

datapunt_processing.helpers.connections module¶

datapunt_processing.helpers.connections.execute_sql(pg_str, sql)¶

Execute a sql query with psycopg2.

Args:

pg_str: connection string using helper function psycopg_connection_string, returning:host= port= user= dbname= password=
sql: SQL string in triple quotes:
```
```CREATE TABLE foo (bar text)```
```

Returns:

Executed sql with conn.cursor().execute(sql)

datapunt_processing.helpers.connections.get_config(full_path)¶

Get config file with all login credentials, port numbers, etc.

Args:: full_path: provide the full path to the config.ini file, for example authentication/config.ini
Returns:: The entire configuration file to use them with config.get(config_name, 'AUTHURL')

datapunt_processing.helpers.connections.objectstore_connection(config_full_path, config_name, print_config_vars=None)¶

Get an objectsctore connection.

Args:

config_full_path: /path_to_config/config.ini or config.ini if in root.
config_name: objectstore
print_config_vars: if set to True: print all variables from the config file

Returns:

An objectstore connection session.

datapunt_processing.helpers.connections.postgres_engine_pandas(config_full_path, db_config_name)¶

Pandas uses SQLalchemy, this is the config wrapper to insert config parameters in to_sql queries.

Args:

config_full_path: location of the config.ini file including the name of the file, for example authentication/config.ini
db_config_name: dev or docker to get the ip user/password and port values.

Returns:

The postgres pandas engine to do sql queries with.

datapunt_processing.helpers.connections.psycopg_connection_string(config_full_path, db_config_name)¶

Postgres connection string for psycopg2.

Args:

config_full_path: location of the config.ini file including the name of the file, for example authentication/config.ini
db_config_name: dev or docker to get the ip user/password and port values.

Returns:

Returns the psycopg required connection string: ‘PG:host= port= user= dbname= password=’

datapunt_processing.helpers.demo_asyncio module¶

datapunt_processing.helpers.demo_asyncio.custom_sleep()¶

datapunt_processing.helpers.demo_asyncio.factorial(name, number)¶

datapunt_processing.helpers.files module¶

datapunt_processing.helpers.files.create_dir_if_not_exists(directory)¶

Create directory if it does not yet exists.

Args:: Specify the name of directory, for example: dir/anotherdir
Returns:: Creates the directory if it does not exists, of return the error message.

datapunt_processing.helpers.files.save_file(data, output_folder, filename)¶

save_file currently works with: csv, txt, geojson and json as suffixes. It reads the filename suffix and saves the file as the appropriate type.

Args:

data: list of flattened dictionary objects for example: [{id:1, attr:value, attr2:value}, {id:2, attr:value, attr2:value}]
filename: data_output.csv or data_output.json
output_folder: dir/anotherdir

Returns:

Saved the list of objects to the given geojson or csv type.

datapunt_processing.helpers.files.unzip(path, filename_as_folder=False)¶

Find all zip files and unzip in root.

Args:

path: set the folder to check for zip files.
filename_as_folder:Set it to True to unzip to subfolders with name of zipfile instead of in the root folder.

Returns:

Unzipped files in the path directory or in the path/name of the zip file.

datapunt_processing.helpers.getaccesstoken module¶

class datapunt_processing.helpers.getaccesstoken.GetAccessToken¶

Bases: object

Get an header authentication item for access token for using the internal API’s by logging in as with email and password credentials and authenticated scopes or as type ‘employee’ To see the available scopes and types, see this file:

https://github.com/Amsterdam/authorization_levels/blob/master/authorization_levels.py

Usage:

from authentication.getaccesstoken import GetAccessToken

accessToken = GetAccessToken().getAccessToken(usertype=’employee_plus’, scopes=BRK/RS,BRK/RSN,BRK/RO) requests.get(url, headers=accessToken)

Args:

scopes: Add scopes as a comma separated list.
usertype: Add the usertype
email: Set and get environment variable: export DATAPUNT_EMAIL=*****
password: Set and get environment variable: export DATAPUNT_PASSWORD=*****

Returns:

accesstoken

getAccessToken(usertype='employee', scopes='TLLS/R', acc=False)¶

datapunt_processing.helpers.getaccesstoken.parser()¶: Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.helpers.json_dict_handlers module¶

datapunt_processing.helpers.json_dict_handlers.clean_dict(dictionary, key_name)¶

Remove a field from a dict based on key name. Args:

dictionary: {id:1, dates:2018-12-02}

key_name: ‘dates’

Returns:: {id:1}

datapunt_processing.helpers.json_dict_handlers.flatten_json(json_object)¶

Flatten nested json Object. Args:

1 json_object, for example: {“key”: “subkey”: { “subsubkey”:”value” }}

Returns:: {“key.subkey.subsubkey”:”value”}
Source:: https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10

datapunt_processing.helpers.json_dict_handlers.joinByKeyNames(geojson, dataset, key1, key2)¶: Insert data from dataset to geojson where key1 from dataset matches key2 from geojson

datapunt_processing.helpers.json_dict_handlers.jsonPoints2geojson(df, latColumn, lonColumn)¶: Convert JSON with lat/lon columns to geojson. https://gis.stackexchange.com/questions/220997/pandas-to-geojson-multiples-points-features-with-python

datapunt_processing.helpers.json_dict_handlers.openJsonArrayKeyDict2FlattenedJson(fileName)¶: Open json and return array of objects without object value name. For example: [{‘container’:{…}}, {‘container’:{…}}] returns now as [{…},{…}])

datapunt_processing.helpers.logging module¶

datapunt_processing.helpers.logging.logger()¶

Setup basic logging for console.

Usage:: Initialize the logger by adding the code at the top of your script: logger = logger()

TODO: add log file export

datapunt_processing.helpers.split_file_by_date module¶

datapunt_processing.helpers.split_file_by_date.load_csv(csvfile)¶

datapunt_processing.helpers.split_file_by_date.main()¶

datapunt_processing.helpers.split_file_by_date.parser()¶: Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html

datapunt_processing.helpers.split_file_by_date.split_file_by_date(data, column_name, date_splitter)¶

Split file into time series.

Args:: filename: full path to of file date_splitter: year, month, day
Returns:: A multiple files split by year/day/month

datapunt_processing.helpers.xml_handlers module¶

datapunt_processing.helpers.xml_handlers.parse_and_remove(filename, path)¶

incremental XML parsing Args:

filename: xml file name path: path to xml file

source:: https://github.com/dabeaz/python-cookbook/blob/master/src/6/incremental_parsing_of_huge_xml_files/example.py

datapunt_processing.helpers package¶

Submodules¶

datapunt_processing.helpers.connections module¶

datapunt_processing.helpers.demo_asyncio module¶

datapunt_processing.helpers.files module¶

datapunt_processing.helpers.getaccesstoken module¶

datapunt_processing.helpers.json_dict_handlers module¶

datapunt_processing.helpers.logging module¶

datapunt_processing.helpers.split_file_by_date module¶

datapunt_processing.helpers.xml_handlers module¶

Module contents¶