Skip to content

PowerQuery Functions

Bassil Aleter edited this page Oct 13, 2023 · 9 revisions

This page documents a number of PowerQuery functions that perform common data processing tasks for HPC data, and can be used in PowerBI or Excel.

Merge two columns in new one based on condition

the below code combines the two columns indicatorCustomRef and indicatorCustomRef.1 taking the values of the indicatorCustomRef column if not null otherwise take the values from indicatorCustomRef.1.

Merge_indicatorCustomRef= Table.CombineColumns(Merge_adminLevel, {"indicatorCustomRef", "indicatorCustomRef.1"}, each if _{0} is null then _{1} else _{0}, "indicatorCustomRef"),

Replace multiple values in same column in one step

the below code replaces the long org type names with shorten names.

ReplaceORgTypes = Table.ReplaceValue(AddOrgTypeDetails, each[organizationSubtype], 
each if List.Contains({"Local NGOs/CSOs", "National NGOs/CSOs"},[organizationSubtype]) then "NNGO"
else if List.Contains({"International NGOs", "Internationally Affiliated Organizations"},[organizationSubtype]) then "INGO" 
else if List.Contains({"National Governments","Local Governments"},[organizationSubtype]) then "GOV"
else if List.Contains({"UN Agencies"},[organizationSubtype]) then "UN"
else if List.Contains({"Country-based UN Pooled Funds","Global UN Pooled Funds", "Regional UN Pooled Funds"},[organizationSubtype]) then "PooledFund"
else if List.Contains({"International Private Organizations", "Local/National Private Organizations"},[organizationSubtype]) then "Private"
else if List.Contains({"International Red Cross/Red Crescent Movement", "Red Cross/Red Crescent National Societies"},[organizationSubtype]) then "Red C/M"
else if List.Contains({"Other Multilateral Organizations", "Uncategorized NGOs", "Uncategorized Private Organizations"},[organizationSubtype]) then "OTHER"
else [organizationSubtype], Replacer.ReplaceText,{"organizationSubtype"})

Create calculated column based on values stored in pivoted columns that change dynamically

the below code create a new column called result which retrieve the value calculated based on the values stored in the column [calculationMethod] i.e. Sum, Maximum value, ...etc. for the columns which there names are stored in the column openMPs and table table Open_MPs

    Calculated_indicators_measure = Table.AddColumn(Pivot_MPs, "Result", each 
    try Function.Invoke(
        Record.Field(
        [Sum=List.Sum, Maximum value=List.Max, Minimum value=List.Min, Average=List.Average, Latest value=List.Last, Manual value=List.Last],
        [calculationMethod]??"Manual value"),
        {List.RemoveNulls(Record.FieldValues(Record.SelectFields(_,Open_MPs[openMPs])))})
    otherwise null)

GetLocationsById

This function takes a Service URL (e.g. https://api.hpc.tools) and list of location IDs, and produces a detailed table with all locations in HPC up to admin level3 (including deactivated locations), including the admin0, admin1, admin2 and admin3 IDs, names and Pcodes for all of a location's parents, and a name, id and pcode.

This is useful for example to lookup locations that are used in disaggregated caseloads or indicators throughout HPC.

(Service as text, locationIDs as list) => let
    Data = List.Transform(locationIDs, (locationId) => Json.Document(Web.Contents(Service & "/v2/public/location/" & Number.ToText(locationId) & "?maxLevel=3&status=all"))[data]),
    Countries = Table.FromList(Data, Splitter.SplitByNothing(), {"Admin0"}),

    ProcessAdminLevel = (
        baseTable as table,
        adminLevel as number,
        calculateName as function
    ) => let
      namespace = "Admin" & Number.ToText(adminLevel),
      // Add columns that will be used across all descendant admin levels
      addId = Table.AddColumn(baseTable, namespace & "Id", each Record.Field(_, namespace)[id]),
      addName = Table.AddColumn(addId, namespace & "Name", each Record.Field(_, namespace)[name]),
      addPcode = Table.Buffer(Table.AddColumn(addName, namespace & "Pcode", each Record.Field(_, namespace)[pcode])),
      // Add columns that are specifically for locations that are the current admin level
      addFinalId = Table.AddColumn(addPcode, "id", each Record.Field(_, namespace)[id]),
      addFinalName = Table.AddColumn(addFinalId, "name", calculateName),
      addFinalPCode = Table.AddColumn(addFinalName, "pcode", each Record.Field(_, namespace)[pcode]),
      addFinalAdminLevel = Table.AddColumn(addFinalPCode, "AdminLevel", each adminLevel),
      // Used in next base
      childrenColumn = "Admin" & Number.ToText(adminLevel + 1),
      addChildren = Table.Buffer(Table.AddColumn(addPcode, childrenColumn, each Record.Field(_, namespace)[children])),
      expandChildren = Table.SelectRows(Table.ExpandListColumn(addChildren, childrenColumn), each Record.Field(_, childrenColumn) <> null)
    in
      [
          nextBase = expandChildren,
          final = addFinalAdminLevel
      ],

    Admin0 = ProcessAdminLevel(Countries, 0, each [Admin0][name]),
    Admin1 = ProcessAdminLevel(Admin0[nextBase], 1, each [Admin0][name] & " -> " & [Admin1][name]),
    Admin2 = ProcessAdminLevel(Admin1[nextBase], 2, each [Admin0][name] & " -> " & [Admin1][name] & " -> " & [Admin2][name]),
    Admin3 = ProcessAdminLevel(Admin2[nextBase], 3, each [Admin0][name] & " -> " & [Admin1][name] & " -> " & [Admin2][name] & " -> " & [Admin3][name]),

    Result = Table.Combine({
        Admin0[final],
        Admin1[final],
        Admin2[final],
        Admin3[final]
    })
in
    Result

GetLocationsByISO

This function accepts a list of ISO country codes, and will return all the locations that existin in HPC up to admin level 3 for the countries listed. This function makes use of the function GetLocationsById

(Service as text, locationISOs as list) => let
    AllLocations = Json.Document(Web.Contents(Service & "/v2/public/location"))[data],
    FilteredLocations = List.Select(AllLocations, each List.Contains(locationISOs, [iso3])),
    FilteredLocationIDs = List.Transform(FilteredLocations, each [id])
in
    GetLocationsById(Service, FilteredLocationIDs)

ProcessDisaggregatedAttachment

This function can take a caseload or indicator attachment JSON Record, and parse the disaggregated data, producing a table with a separate row for each (location, category, metric) combination.

It requires a table of locations (produced by either GetLocationsById or GetLocationsByISO) to use to lookup the location name and pcode based on ID.

(attachment as record, locations as table) => let
  Values = attachment[attachmentVersion][value][metrics][values],
  Metrics = Values[totals],
  Disaggregated = if Record.HasFields(Values, "disaggregated") then Values[disaggregated] else null
in
  if Disaggregated = null then null else (
    let
      Categories = Disaggregated[categories],
      Locations =
        if List.Count(Disaggregated[dataMatrix]) = List.Count(Disaggregated[locations]) + 1 then
          Disaggregated[locations]
        else
          error Error.Record("Mismatch of rows and location length"),
      RowIndexes = {0..(List.Count(Disaggregated[dataMatrix]) - 1)},
      InitialTable = Table.FromList(RowIndexes, Splitter.SplitByNothing(), { "RowIndex" }),
      AddRow = Table.AddColumn(InitialTable, "row", each Disaggregated[dataMatrix]{[RowIndex]}),
      AddColIndex = Table.AddColumn(AddRow, "ColIndex", each {0..(List.Count([row]) - 1)}),
      ExpandColIndex = Table.ExpandListColumn(AddColIndex, "ColIndex"),
      AddCellValue = Table.AddColumn(ExpandColIndex, "Value", each [row]{[ColIndex]}),
      FilterNullValues = Table.SelectRows(AddCellValue, each ([Value] <> null and [Value] <> 0 and [Value] <> "")),
      AddMetricIndex = Table.AddColumn(FilterNullValues, "MetricIndex", each Number.Mod([ColIndex], List.Count(Metrics))),
      AddCategoryIndex = Table.AddColumn(AddMetricIndex, "CategoryIndex", each Number.RoundDown([ColIndex] / List.Count(Metrics))),
      AddLocationRef = Table.AddColumn(AddCategoryIndex, "LocationRef", each Locations{[RowIndex] - 1}),
      AddCategory = Table.AddColumn(AddLocationRef, "Category", each if [CategoryIndex] < List.Count(Categories) then Categories{[CategoryIndex]}[label] else "Total"),
      AddMetric = Table.AddColumn(AddCategory, "Metric", each Metrics{[MetricIndex]}),
      AddLocationId = Table.AddColumn(AddMetric, "LocationId", each [LocationRef][id]),
      AddLocation = Table.AddColumn(AddLocationId, "Location", each Table.SelectRows(locations, (loc) => loc[id] = [LocationId]){0}),
      AddLocationName = Table.AddColumn(AddLocation, "LocationName", each try [Location][name] otherwise "Broken Location: " & [LocationRef][name] ),
      AddLocationPcode = Table.AddColumn(AddLocationName, "LocationPcode", each try [Location][pcode] otherwise "Broken Location: " & [LocationRef][name]),
      AddLocationAdminLevel = Table.AddColumn(AddLocationPcode, "LocationAdminLevel", each try [Location][AdminLevel] otherwise "Broken Location: " & [LocationRef][name]),
      AddMetricType = Table.AddColumn(AddLocationAdminLevel, "MetricType", each [Metric][type]),
      AddMetricName = Table.AddColumn(AddMetricType, "MetricName", each [Metric][name][en]),
      Result = Table.SelectColumns(AddMetricName, {"Location", "LocationName", "LocationPcode", "LocationAdminLevel", "Category", "MetricType", "MetricName", "Value"})
    in
      Result
  )

GetPaginatedData

The HPC API makes use of pagination for many of its endpoints (such as searching for flows or projects). What this means is that to get a complete set of certain parts of our data, you need to make multiple requests to our API to get the individual pages of data.

This is quite a common practice in API design to prevent requests becoming too large for unbounded data-sets, or prevent requests taking too long to serve.

You can create queries in PowerQuery that dynamically make the appropriate number of requests to get a complete data-set by using recursion.

We use this utility function to help us create such queries:

(Get as function, args) => let
  data = Get(args),
  nextResults = if data[nextArgs] <> null then @GetPaginatedData(Get, data[nextArgs]) else {}
in
  List.Combine({data[items], nextResults})

The first argument (Get) needs to be a function that:

  • accepts an argument that provides enough data for the function to craft an API request for a single page (e.g. the URL, or page ID)
  • returns a Record with the properties:
    • items: a List of the items for the requested page
    • nextArgs: either:
      • null (when the current page is the last page, and no more requests are needed)
      • a value that will be passed into the Get function to get the next page

The second argument (args) needs to be a value that represents the first page you would like to fetch, and is passed to Get for the first request.

Note: when fetching data like this, you ALMOST ALWAYS want to use List.Buffer to prevent making an unnecessary number of queries to the server. (see examples below)

GetAllFlows

This function makes use of GetPaginatedData to get all of the flows for a particular query.

Note the use of limit=1000 to increase the number of results per page (1000 is the maximum for this endpoint). This will reduce the number requests that need to be made overall.

(query as text) => let
  GetFlowPage = (url as text) => let
    data = Binary.Buffer(Web.Contents(url)),
    json = Json.Document(data)
  in
    [
      items = json[data][flows],
      nextArgs = if Record.HasFields(json[meta], "nextLink") then json[meta][nextLink] else null
    ]
in
  List.Buffer(GetPaginatedData(
    GetFlowPage,
    "https://api.hpc.tools/v1/public/fts/flow?limit=1000&" & query
  ))

Notes:

  • The query argument takes the parameters that you would put after the ? in the URL. You can add multiple arguments using & as normal.
  • You may be surprised that we pass in a URL rather than a page ID for the arguments of this function. The reason we do this is because the API handily provides the URL that we need to use to get the next page of data, so we can use that instead, and simply provide the first URL of the sequence.

So for example, to get all flows for 2019, you would use it like so:

GetAllFlows("year=2019")