datapunt_processing.helpers package¶
Submodules¶
datapunt_processing.helpers.connections module¶
-
datapunt_processing.helpers.connections.
execute_sql
(pg_str, sql)¶ Execute a sql query with psycopg2.
- Args:
pg_str: connection string using helper function psycopg_connection_string, returning:
host= port= user= dbname= password=
sql: SQL string in triple quotes:
```CREATE TABLE foo (bar text)```
- Returns:
- Executed sql with conn.cursor().execute(sql)
-
datapunt_processing.helpers.connections.
get_config
(full_path)¶ Get config file with all login credentials, port numbers, etc.
- Args:
- full_path: provide the full path to the config.ini file, for example authentication/config.ini
- Returns:
- The entire configuration file to use them with
config.get(config_name, 'AUTHURL')
-
datapunt_processing.helpers.connections.
objectstore_connection
(config_full_path, config_name, print_config_vars=None)¶ Get an objectsctore connection.
- Args:
- config_full_path: /path_to_config/config.ini or config.ini if in root.
- config_name: objectstore
- print_config_vars: if set to True: print all variables from the config file
- Returns:
- An objectstore connection session.
-
datapunt_processing.helpers.connections.
postgres_engine_pandas
(config_full_path, db_config_name)¶ Pandas uses SQLalchemy, this is the config wrapper to insert config parameters in to_sql queries.
- Args:
- config_full_path: location of the config.ini file including the name of the file, for example authentication/config.ini
- db_config_name: dev or docker to get the ip user/password and port values.
- Returns:
- The postgres pandas engine to do sql queries with.
-
datapunt_processing.helpers.connections.
psycopg_connection_string
(config_full_path, db_config_name)¶ Postgres connection string for psycopg2.
- Args:
- config_full_path: location of the config.ini file including the name of the file, for example authentication/config.ini
- db_config_name: dev or docker to get the ip user/password and port values.
- Returns:
- Returns the psycopg required connection string: ‘PG:host= port= user= dbname= password=’
datapunt_processing.helpers.demo_asyncio module¶
-
datapunt_processing.helpers.demo_asyncio.
custom_sleep
()¶
-
datapunt_processing.helpers.demo_asyncio.
factorial
(name, number)¶
datapunt_processing.helpers.files module¶
-
datapunt_processing.helpers.files.
create_dir_if_not_exists
(directory)¶ Create directory if it does not yet exists.
- Args:
- Specify the name of directory, for example: dir/anotherdir
- Returns:
- Creates the directory if it does not exists, of return the error message.
-
datapunt_processing.helpers.files.
save_file
(data, output_folder, filename)¶ save_file currently works with: csv, txt, geojson and json as suffixes. It reads the filename suffix and saves the file as the appropriate type.
- Args:
- data: list of flattened dictionary objects for example: [{id:1, attr:value, attr2:value}, {id:2, attr:value, attr2:value}]
- filename: data_output.csv or data_output.json
- output_folder: dir/anotherdir
- Returns:
- Saved the list of objects to the given geojson or csv type.
-
datapunt_processing.helpers.files.
unzip
(path, filename_as_folder=False)¶ Find all zip files and unzip in root.
- Args:
- path: set the folder to check for zip files.
- filename_as_folder:Set it to True to unzip to subfolders with name of zipfile instead of in the root folder.
- Returns:
- Unzipped files in the path directory or in the path/name of the zip file.
datapunt_processing.helpers.getaccesstoken module¶
-
class
datapunt_processing.helpers.getaccesstoken.
GetAccessToken
¶ Bases:
object
Get an header authentication item for access token for using the internal API’s by logging in as with email and password credentials and authenticated scopes or as type ‘employee’ To see the available scopes and types, see this file:
- Usage:
from authentication.getaccesstoken import GetAccessToken
accessToken = GetAccessToken().getAccessToken(usertype=’employee_plus’, scopes=BRK/RS,BRK/RSN,BRK/RO) requests.get(url, headers=accessToken)
- Args:
- scopes: Add scopes as a comma separated list.
- usertype: Add the usertype
- email: Set and get environment variable: export DATAPUNT_EMAIL=*****
- password: Set and get environment variable: export DATAPUNT_PASSWORD=*****
- Returns:
- accesstoken
-
getAccessToken
(usertype='employee', scopes='TLLS/R', acc=False)¶
-
datapunt_processing.helpers.getaccesstoken.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
datapunt_processing.helpers.json_dict_handlers module¶
-
datapunt_processing.helpers.json_dict_handlers.
clean_dict
(dictionary, key_name)¶ Remove a field from a dict based on key name. Args:
- dictionary: {id:1, dates:2018-12-02}
- key_name: ‘dates’
- Returns:
- {id:1}
-
datapunt_processing.helpers.json_dict_handlers.
flatten_json
(json_object)¶ Flatten nested json Object. Args:
1 json_object, for example: {“key”: “subkey”: { “subsubkey”:”value” }}- Returns:
- {“key.subkey.subsubkey”:”value”}
- Source:
- https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10
-
datapunt_processing.helpers.json_dict_handlers.
joinByKeyNames
(geojson, dataset, key1, key2)¶ Insert data from dataset to geojson where key1 from dataset matches key2 from geojson
-
datapunt_processing.helpers.json_dict_handlers.
jsonPoints2geojson
(df, latColumn, lonColumn)¶ Convert JSON with lat/lon columns to geojson. https://gis.stackexchange.com/questions/220997/pandas-to-geojson-multiples-points-features-with-python
-
datapunt_processing.helpers.json_dict_handlers.
openJsonArrayKeyDict2FlattenedJson
(fileName)¶ Open json and return array of objects without object value name. For example: [{‘container’:{…}}, {‘container’:{…}}] returns now as [{…},{…}])
datapunt_processing.helpers.logging module¶
-
datapunt_processing.helpers.logging.
logger
()¶ Setup basic logging for console.
- Usage:
- Initialize the logger by adding the code at the top of your script:
logger = logger()
TODO: add log file export
datapunt_processing.helpers.split_file_by_date module¶
-
datapunt_processing.helpers.split_file_by_date.
load_csv
(csvfile)¶
-
datapunt_processing.helpers.split_file_by_date.
main
()¶
-
datapunt_processing.helpers.split_file_by_date.
parser
()¶ Parser function to run arguments from commandline and to add description to sphinx docs. To see possible styling options: https://pythonhosted.org/an_example_pypi_project/sphinx.html
-
datapunt_processing.helpers.split_file_by_date.
split_file_by_date
(data, column_name, date_splitter)¶ Split file into time series.
- Args:
- filename: full path to of file date_splitter: year, month, day
- Returns:
- A multiple files split by year/day/month
datapunt_processing.helpers.xml_handlers module¶
-
datapunt_processing.helpers.xml_handlers.
parse_and_remove
(filename, path)¶ incremental XML parsing Args:
filename: xml file name path: path to xml file