-
Notifications
You must be signed in to change notification settings - Fork 0
PowerQuery Functions
This page documents a number of PowerQuery functions that perform common data processing tasks for HPC data, and can be used in PowerBI or Excel.
the below code combines the two columns indicatorCustomRef and indicatorCustomRef.1 taking the values of the indicatorCustomRef column if not null otherwise take the values from indicatorCustomRef.1.
Merge_indicatorCustomRef= Table.CombineColumns(Merge_adminLevel, {"indicatorCustomRef", "indicatorCustomRef.1"}, each if _{0} is null then _{1} else _{0}, "indicatorCustomRef"),
the below code replaces the long org type names with shorten names.
ReplaceORgTypes = Table.ReplaceValue(AddOrgTypeDetails, each[organizationSubtype],
each if List.Contains({"Local NGOs/CSOs", "National NGOs/CSOs"},[organizationSubtype]) then "NNGO"
else if List.Contains({"International NGOs", "Internationally Affiliated Organizations"},[organizationSubtype]) then "INGO"
else if List.Contains({"National Governments","Local Governments"},[organizationSubtype]) then "GOV"
else if List.Contains({"UN Agencies"},[organizationSubtype]) then "UN"
else if List.Contains({"Country-based UN Pooled Funds","Global UN Pooled Funds", "Regional UN Pooled Funds"},[organizationSubtype]) then "PooledFund"
else if List.Contains({"International Private Organizations", "Local/National Private Organizations"},[organizationSubtype]) then "Private"
else if List.Contains({"International Red Cross/Red Crescent Movement", "Red Cross/Red Crescent National Societies"},[organizationSubtype]) then "Red C/M"
else if List.Contains({"Other Multilateral Organizations", "Uncategorized NGOs", "Uncategorized Private Organizations"},[organizationSubtype]) then "OTHER"
else [organizationSubtype], Replacer.ReplaceText,{"organizationSubtype"})
the below code create a new column called result which retrieve the value calculated based on the values stored in the column [calculationMethod] i.e. Sum, Maximum value, ...etc. for the columns which there names are stored in the column openMPs and table table Open_MPs
Calculated_indicators_measure = Table.AddColumn(Pivot_MPs, "Result", each
try Function.Invoke(
Record.Field(
[Sum=List.Sum, Maximum value=List.Max, Minimum value=List.Min, Average=List.Average, Latest value=List.Last, Manual value=List.Last],
[calculationMethod]??"Manual value"),
{List.RemoveNulls(Record.FieldValues(Record.SelectFields(_,Open_MPs[openMPs])))})
otherwise null)
This function takes a Service URL (e.g. https://api.hpc.tools
) and list of location IDs, and produces a detailed table with all locations in HPC up to admin level3 (including deactivated locations), including the admin0, admin1, admin2 and admin3 IDs, names and Pcodes for all of a location's parents, and a name
, id
and pcode
.
This is useful for example to lookup locations that are used in disaggregated caseloads or indicators throughout HPC.
(Service as text, locationIDs as list) => let
Data = List.Transform(locationIDs, (locationId) => Json.Document(Web.Contents(Service & "/v2/public/location/" & Number.ToText(locationId) & "?maxLevel=3&status=all"))[data]),
Countries = Table.FromList(Data, Splitter.SplitByNothing(), {"Admin0"}),
ProcessAdminLevel = (
baseTable as table,
adminLevel as number,
calculateName as function
) => let
namespace = "Admin" & Number.ToText(adminLevel),
// Add columns that will be used across all descendant admin levels
addId = Table.AddColumn(baseTable, namespace & "Id", each Record.Field(_, namespace)[id]),
addName = Table.AddColumn(addId, namespace & "Name", each Record.Field(_, namespace)[name]),
addPcode = Table.Buffer(Table.AddColumn(addName, namespace & "Pcode", each Record.Field(_, namespace)[pcode])),
// Add columns that are specifically for locations that are the current admin level
addFinalId = Table.AddColumn(addPcode, "id", each Record.Field(_, namespace)[id]),
addFinalName = Table.AddColumn(addFinalId, "name", calculateName),
addFinalPCode = Table.AddColumn(addFinalName, "pcode", each Record.Field(_, namespace)[pcode]),
addFinalAdminLevel = Table.AddColumn(addFinalPCode, "AdminLevel", each adminLevel),
// Used in next base
childrenColumn = "Admin" & Number.ToText(adminLevel + 1),
addChildren = Table.Buffer(Table.AddColumn(addPcode, childrenColumn, each Record.Field(_, namespace)[children])),
expandChildren = Table.SelectRows(Table.ExpandListColumn(addChildren, childrenColumn), each Record.Field(_, childrenColumn) <> null)
in
[
nextBase = expandChildren,
final = addFinalAdminLevel
],
Admin0 = ProcessAdminLevel(Countries, 0, each [Admin0][name]),
Admin1 = ProcessAdminLevel(Admin0[nextBase], 1, each [Admin0][name] & " -> " & [Admin1][name]),
Admin2 = ProcessAdminLevel(Admin1[nextBase], 2, each [Admin0][name] & " -> " & [Admin1][name] & " -> " & [Admin2][name]),
Admin3 = ProcessAdminLevel(Admin2[nextBase], 3, each [Admin0][name] & " -> " & [Admin1][name] & " -> " & [Admin2][name] & " -> " & [Admin3][name]),
Result = Table.Combine({
Admin0[final],
Admin1[final],
Admin2[final],
Admin3[final]
})
in
Result
This function accepts a list of ISO country codes, and will return all the locations that existin in HPC up to admin level 3 for the countries listed. This function makes use of the function GetLocationsById
(Service as text, locationISOs as list) => let
AllLocations = Json.Document(Web.Contents(Service & "/v2/public/location"))[data],
FilteredLocations = List.Select(AllLocations, each List.Contains(locationISOs, [iso3])),
FilteredLocationIDs = List.Transform(FilteredLocations, each [id])
in
GetLocationsById(Service, FilteredLocationIDs)
This function can take a caseload or indicator attachment JSON Record, and parse the disaggregated data, producing a table with a separate row for each (location, category, metric) combination.
It requires a table of locations (produced by either GetLocationsById
or GetLocationsByISO
) to use to lookup the location name
and pcode
based on ID.
(attachment as record, locations as table) => let
Values = attachment[attachmentVersion][value][metrics][values],
Metrics = Values[totals],
Disaggregated = if Record.HasFields(Values, "disaggregated") then Values[disaggregated] else null
in
if Disaggregated = null then null else (
let
Categories = Disaggregated[categories],
Locations =
if List.Count(Disaggregated[dataMatrix]) = List.Count(Disaggregated[locations]) + 1 then
Disaggregated[locations]
else
error Error.Record("Mismatch of rows and location length"),
RowIndexes = {0..(List.Count(Disaggregated[dataMatrix]) - 1)},
InitialTable = Table.FromList(RowIndexes, Splitter.SplitByNothing(), { "RowIndex" }),
AddRow = Table.AddColumn(InitialTable, "row", each Disaggregated[dataMatrix]{[RowIndex]}),
AddColIndex = Table.AddColumn(AddRow, "ColIndex", each {0..(List.Count([row]) - 1)}),
ExpandColIndex = Table.ExpandListColumn(AddColIndex, "ColIndex"),
AddCellValue = Table.AddColumn(ExpandColIndex, "Value", each [row]{[ColIndex]}),
FilterNullValues = Table.SelectRows(AddCellValue, each ([Value] <> null and [Value] <> 0 and [Value] <> "")),
AddMetricIndex = Table.AddColumn(FilterNullValues, "MetricIndex", each Number.Mod([ColIndex], List.Count(Metrics))),
AddCategoryIndex = Table.AddColumn(AddMetricIndex, "CategoryIndex", each Number.RoundDown([ColIndex] / List.Count(Metrics))),
AddLocationRef = Table.AddColumn(AddCategoryIndex, "LocationRef", each Locations{[RowIndex] - 1}),
AddCategory = Table.AddColumn(AddLocationRef, "Category", each if [CategoryIndex] < List.Count(Categories) then Categories{[CategoryIndex]}[label] else "Total"),
AddMetric = Table.AddColumn(AddCategory, "Metric", each Metrics{[MetricIndex]}),
AddLocationId = Table.AddColumn(AddMetric, "LocationId", each [LocationRef][id]),
AddLocation = Table.AddColumn(AddLocationId, "Location", each Table.SelectRows(locations, (loc) => loc[id] = [LocationId]){0}),
AddLocationName = Table.AddColumn(AddLocation, "LocationName", each try [Location][name] otherwise "Broken Location: " & [LocationRef][name] ),
AddLocationPcode = Table.AddColumn(AddLocationName, "LocationPcode", each try [Location][pcode] otherwise "Broken Location: " & [LocationRef][name]),
AddLocationAdminLevel = Table.AddColumn(AddLocationPcode, "LocationAdminLevel", each try [Location][AdminLevel] otherwise "Broken Location: " & [LocationRef][name]),
AddMetricType = Table.AddColumn(AddLocationAdminLevel, "MetricType", each [Metric][type]),
AddMetricName = Table.AddColumn(AddMetricType, "MetricName", each [Metric][name][en]),
Result = Table.SelectColumns(AddMetricName, {"Location", "LocationName", "LocationPcode", "LocationAdminLevel", "Category", "MetricType", "MetricName", "Value"})
in
Result
)
The HPC API makes use of pagination for many of its endpoints (such as searching for flows or projects). What this means is that to get a complete set of certain parts of our data, you need to make multiple requests to our API to get the individual pages of data.
This is quite a common practice in API design to prevent requests becoming too large for unbounded data-sets, or prevent requests taking too long to serve.
You can create queries in PowerQuery that dynamically make the appropriate number of requests to get a complete data-set by using recursion.
We use this utility function to help us create such queries:
(Get as function, args) => let
data = Get(args),
nextResults = if data[nextArgs] <> null then @GetPaginatedData(Get, data[nextArgs]) else {}
in
List.Combine({data[items], nextResults})
The first argument (Get
) needs to be a function that:
- accepts an argument that provides enough data for the function to craft an API request for a single page (e.g. the URL, or page ID)
- returns a
Record
with the properties:-
items
: aList
of the items for the requested page -
nextArgs
: either:-
null
(when the current page is the last page, and no more requests are needed) - a value that will be passed into the
Get
function to get the next page
-
-
The second argument (args
) needs to be a value that represents the first page
you would like to fetch, and is passed to Get
for the first request.
Note: when fetching data like this, you ALMOST ALWAYS want to use List.Buffer
to prevent making an unnecessary number of queries to the server.
(see examples below)
This function makes use of GetPaginatedData
to get all of the flows for a
particular query.
Note the use of limit=1000
to increase the number of results per page
(1000
is the maximum for this endpoint).
This will reduce the number requests that need to be made overall.
(query as text) => let
GetFlowPage = (url as text) => let
data = Binary.Buffer(Web.Contents(url)),
json = Json.Document(data)
in
[
items = json[data][flows],
nextArgs = if Record.HasFields(json[meta], "nextLink") then json[meta][nextLink] else null
]
in
List.Buffer(GetPaginatedData(
GetFlowPage,
"https://api.hpc.tools/v1/public/fts/flow?limit=1000&" & query
))
Notes:
- The
query
argument takes the parameters that you would put after the?
in the URL. You can add multiple arguments using&
as normal. - You may be surprised that we pass in a URL rather than a page ID for the arguments of this function. The reason we do this is because the API handily provides the URL that we need to use to get the next page of data, so we can use that instead, and simply provide the first URL of the sequence.
So for example, to get all flows for 2019
, you would use it like so:
GetAllFlows("year=2019")
We have recently announced a number of upcoming changes to the HPC API,
including the development of v4
of the API that will use GraphQL at its core.
These changes are at the very early planning and development stages at the
moment, and are not yet ready to be tested or used, so for the time being we
recommend continuing to use the v2
endpoints that are documented in the
Swagger documentation at https://api.hpc.tools/docs/v2/ and in this wiki.
If you'd like to keep updated as to when these new endpoints will become available for early user feedback or use, please see the section below.
We have introduced some mailing lists that we will now be using to keep API users up-to-date, and engage with users for early testing and feedback of new API features.
If you would like to join either of these mailing lists, please send an email to api-support@hpc.tools with your request:
-
api-announce@hpc.tools
This mailing list will be used for official announcements from the HPC.tools team regarding changes in the API, including new features and deprecation notices.
-
api-discuss@hpc.tools
This mailing list allows for open discussion between different users of the HPC API, as well as the development team. Feel free to use this mailing list to ask for advice on how to use the API.
We will be using this mailing list to solicit early feedback for the
v4
API as we continue to plan and develop it.