Difference between revisions of "CKAN Tricks"
(Add CKAN Tricks page) |
(Add "CKAN" as an additional category for this page) |
||
Line 20: | Line 20: | ||
To avoid keeping local databases about datasets, store such information (such as the last time an ETL job was run on a given package) in the 'extras' metadata field of the CKAN package, as much as possible. This stores information in a centralized location so ETL jobs can be run from multiple computers without any other coordination. The extras metadata fields are currently cataloged on the [[CKAN_Metadata]] page. | To avoid keeping local databases about datasets, store such information (such as the last time an ETL job was run on a given package) in the 'extras' metadata field of the CKAN package, as much as possible. This stores information in a centralized location so ETL jobs can be run from multiple computers without any other coordination. The extras metadata fields are currently cataloged on the [[CKAN_Metadata]] page. | ||
[[Category:Onboarding]] | [[Category:Onboarding]] [[Category:CKAN]] |
Latest revision as of 18:22, 21 November 2023
Manage datasets
Undelete deleted datasets
If you know the URL of the deleted dataset AND you are logged in as an administrator, you can go to that URL in your web browser and see the deleted dataset. "[Deleted]" will have been appended to the title of the dataset, and the value of the metadata field state
will be equal to "deleted".
To undelete such a dataset, just use the CKAN API or use the
set_package_parameters_to_values() function to set the package's state
metadata value to "active". For the latter option, invoke the function like this:
set_package_parameters_to_values("https://data.wprdc.org", deleted_package_id, ['state'], ['active'], API_key)
Queries
Make queries faster
Speed up datastore_search queries by including the include_total=False
parameter to skip calculation of the total number of rows (which can reduce response time by a factor of 2). The datastore_search API call lets you search a given datastore by column values and return subsets of the records. There's more on benchmarking CKAN performance here.
Another way to speed up datastore search queries is to index fields used in the filtering. Note that (at least when the primary key is a combination of fields), if you don't list each primary key field as a separate field to index, those fields don't get indexed and queries take way longer.
Avoiding stale query caches
Queries/API responses can be cached based upon nginx settings. If you find that your SQL query is getting a stale response, try changing your query slightly. For instance, instead of `SELECT MAX(some_field) AS biggest FROM <resource-id>`, you could change the assigned variable name (`SELECT MAX(some_field) AS biggest0413 FROM <resource-id>`) or add another field that you ignore (`SELECT MAX(some_field) AS biggest, MAX(some_field) AS whatever FROM <resource-id>`).
Scripts that interact with CKAN through the API
Run those CKAN-monitoring/modifying scripts from multiple servers by centralizing data
To avoid keeping local databases about datasets, store such information (such as the last time an ETL job was run on a given package) in the 'extras' metadata field of the CKAN package, as much as possible. This stores information in a centralized location so ETL jobs can be run from multiple computers without any other coordination. The extras metadata fields are currently cataloged on the CKAN_Metadata page.