Core of the Data Catalog Project¶
A microservice with API to store, manage and search through meta data of data sets.
The latest documentation can always be found at https://amsterdam.github.io/dcatd/.
API¶
The API spec can be found at /openapi on a running instance with default settings. The API of the current running instance at Amsterdam can be browsed using this Swagger UI.
How to run locally¶
Requires Python 3.6.1 or above
Default configuration uses a PostgreSQL database which can be spun up in a container: Requires Docker and a free port 5433 (this is deliberately another port than PG’s default one, to preempt a collision)
This means you can also change the port to 5432 (and other connection parameters)
in /examples/running/config.yml
and use a locally running instance of PostgreSQL
Create a virtual environment and install all the dependencies:
make alldeps
Example server¶
docker-compose up -d database
make example
See: http://localhost:8000/openapi and http://localhost:8000/datasets
You could even go one further and also spin up the Amsterdam Swagger UI:
docker-compose up -d database swagger-ui
make example
Apart from the urls mentioned above you also can access http://localhost:8686/swagger-ui/?url=http://localhost:8000/openapi
How to run in docker¶
Requires Docker (duh)
Example server¶
docker-compose up -d
That’s it.
See: http://localhost:8001/openapi , http://localhost:8001/datasets and http://localhost:8686/swagger-ui/?url=http://localhost:8001/openapi
(Example server in docker is accessable through port 8001, while the locally running example runs on port 8000)
Bootstrap your setup with data¶
You can import CKAN data into the DCAT-API to bootstrap your install with data
PUT-ting and DELETE-ing data via the API require authorisation.
In the context of Amsterdam City Data can obtiain a JWT from the swagger-ui (http://localhost:8686/swagger-ui/?url=http://localhost:8000/openapi); export it to JWT: (For more information see: https://hub.docker.com/r/amsterdam/oauth2swaggerui/ )
export JWT='<JWT>'
Define your local API, and the source CKAN (point to the root of the API of CKAN):
export DCATD='http://localhost:8000/' # or :8001 , see above
export CKAN='https://demo.ckan.org/api' # or for instance https://api.data.amsterdam.nl/catalogus/api
Then use the scripts in the utils directory to import data
Remark: this is highly localized for the Amsterdam CKAN instance and will fail beyond the first step when using CKAN demo data.
Currently this also will fail on the resources2distributions
step, but you will end up with at least a
somewhat filled database
cd utils
python dumpckan.py "${CKAN}"
python ckan2dcat.py "${DCATD}"
python resources2distributions.py "${DCATD}files" "${JWT}"
for d in dcatdata/*.json; do
b=`basename "${d}" '.json'`
echo -n "${b}..."
STATUS=$(
curl --header "Authorization: Bearer ${JWT}" \
--header "If-None-Match: *" --upload-file "${d}" \
--silent --output /dev/stderr --write-out "%{http_code}" \
"${DCATD}datasets/${b}"
)
[ "$STATUS" -eq 201 ] && echo "OK" && rm "${d}" || echo "FAILED: $STATUS"
done
Load production data¶
If you need to load production data in development you can do the following commands The first two commands should not be required if database backups would correctly create the dcatd_latest.gz link. However this is currently not the case. That is why we need to copy manually with:
scp admin.datapunt.amsterdam.nl:/mnt/backup_postgres/dcatd_2018-09-03.gz /tmp/
docker cp /tmp/dcatd_2018-09-03.gz dcatd_database_1:/tmp/dcatd_latest.gz
Here 81adf880164c is the docker container.
Then we can load it in Postgres with :
docker-compose exec database update-db.sh dcatd <yourname>
Update documentation¶
Requires Sphinx plus extras:
pip install -e .[docs]
Run the following command to push latest version to github:
make -C sphinx gh-pages