Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request - wrapdims - fill missing cells with missing #154

Open
Lincoln-Hannah opened this issue Mar 8, 2024 · 4 comments
Open

Request - wrapdims - fill missing cells with missing #154

Lincoln-Hannah opened this issue Mar 8, 2024 · 4 comments

Comments

@Lincoln-Hannah
Copy link

Currently wrapdims fills missing cells with some very large number. Example below.
Could these be replaced by 'missing' or some specified value.

df = DataFrame( x=[1,1,2], y=[1,2,1], data=[10,11,12])

wrapdims(df, :data, :x, :y )

ka(2,2)   #=2800920006512

Use Case
I would then use coalesce to fill missings with some value (unique to each cell). Then convert back to a DataFrame.
This could all be done with DataFrame functions (groupby etc) but a KeyedArray is more elegant.

@mcabbott
Copy link
Owner

mcabbott commented Mar 8, 2024

I think you want this:

julia> wrapdims(df, :data, :x, :y, default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
   x  2-element Vector{Int64}
   y  2-element Vector{Int64}
And data, 2×2 Matrix{Union{Missing, Int64}}:
      (1)  (2)
 (1)   10   11
 (2)   12     missing

I don't recall why this isn't the default. Or if the default wants to avoid Union{Missing, Int}, it doesn't throw an error if some entries are undefined.

@lincolnhannah
Copy link

Thanks:)
Where you have docstrings on the website. Most other packages have docs.

I didn't realise there was documentation for the package. (Which contains the above).
Maybe change it to docs ?

@mcabbott
Copy link
Owner

mcabbott commented Mar 9, 2024

At present there is no website, but docstrings are accessible at the REPL prompt by pressing ?:

help?> wrapdims(df, :data, :x, :y )
  wrapdims(table, value, names...; default=undef, sort=false, force=false)

  Construct KeyedArray(NamedDimsArray(A,names),keys) from a table matching the Tables.jl
  (https://github.com/JuliaData/Tables.jl) API. (It must support both Tables.columns and
  Tables.rows.)

  The contents of the array is taken from the column value::Symbol of the table. Each symbol in
  names specifies a column whose unique entries become the keys along a dimenension of the array.

  If there is no row in the table matching a possible set of keys, then this element of the array
  is undefined, unless you provide the default keyword. If several rows share the same set of
  keys, then by default an ArgumentError is thrown. Keyword force=true will instead cause these
  non-unique entries to be overwritten.

  See also populate! to fill an existing array in the same manner.

  Setting AxisKeys.nameouter() = false will reverse the order of wrappers produced.

  Examples
  ≡≡≡≡≡≡≡≡

  julia> using DataFrames, AxisKeys
  
  julia> df = DataFrame("a" => 1:3, "b" => 10:12.0, "c" => ["cat", "dog", "cat"])
  3×3 DataFrame
   Row │ a      b        c      
       │ Int64  Float64  String 
  ─────┼────────────────────────
     1 │     1     10.0  cat
     2 │     2     11.0  dog
     3 │     3     12.0  cat
  
  julia> wrapdims(df, :a, :b, :c; default=missing)
  2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
  ↓   b ∈ 3-element Vector{Float64}
  →   c ∈ 2-element Vector{String}
  And data, 3×2 Matrix{Union{Missing, Int64}}:
           ("cat")    ("dog")
   (10.0)   1           missing
   (11.0)    missing   2
   (12.0)   3           missing
  
  julia> wrapdims(df, :a, :b)
  1-dimensional NamedDimsArray(KeyedArray(...)) with keys:
  ↓   b ∈ 3-element Vector{Float64}
  And data, 3-element Vector{Union{Missing, Int64}}:
   (10.0)  1
   (11.0)  2
   (12.0)  3
  
  julia> wrapdims(df, :a, :c)
  ERROR: ArgumentError: Key ("cat",) is not unique
  
  julia> wrapdims(df, :a, :c, force=true)
  1-dimensional NamedDimsArray(KeyedArray(...)) with keys:
  ↓   c ∈ 2-element Vector{String}
  And data, 2-element Vector{Int64}:
   ("cat")  3
   ("dog")  2

julia> wrapdims(df, :data, :x, :y, default=missing)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   x ∈ 2-element Vector{Int64}
→   y ∈ 2-element Vector{Int64}
And data, 2×2 Matrix{Union{Missing, Int64}}:
      (1)  (2)
 (1)   10   11
 (2)   12     missing

@Lincoln-Hannah
Copy link
Author

Would it be possible to add functionality like @rtransform from DataFramesMeta.
Such that you specify a dimension then add new elements (like adding columns to a dataframe).
Below is a model I coded using a KeyedArray, a DataFrame, and finally pseudo code using this proposed functionality

Using KeyedArray

X = wrapdims( data, :rate, :date, :pair, :periodName, :delta )
X = rekey(X, :date => [h1,h2,base] )
y = copy(X)
y = rekey(y, :delta => [10,25,50,75,90])

y(10) .= X(ATM)  +  X(BF10)  + .5X(RR10)
y(25) .= X(ATM)  +  X(BF25)  + .5X(RR25)
y(50) .= X(ATM) 
y(75) .= X(ATM)  +  X(BF25)  - .5X(RR25)
y(90) .= X(ATM)  +  X(BF10)  - .5X(RR10)

y(base) .*= y(h2)./ y(h1)

y = y(base)
X = copy(y)
X = rekey(X, :delta => [ATM,BF10,BF25,RR10,RR25] )

X(ATM)  .=  y(50)    
X(BF10) .= -y(50)  + .5( y(10) + y(90))
X(BF25) .= -y(50)  + .5( y(25) + y(75))
X(RR10) .=  y(10)  -     y(90)
X(RR25) .=  y(25)  -     y(75) 

Using DataFrames

@chain data begin

    unstack( [:date,:pair,:periodName], :delta, :rate )

    @rtransform begin
        :Δ10 = :ATM + :BF10 + .5 * :RR10
        :Δ25 = :ATM + :BF25 + .5 * :RR25
        :Δ50 = :ATM 
        :Δ75 = :ATM + :BF25 - .5 * :RR25
        :Δ90 = :ATM + :BF10 - .5 * :RR10
    end

    @select :pair :date :periodName :Δ10 :Δ25 :Δ50 :Δ75 :Δ90
    stack( Not([:pair,:date,:periodName]), value_name = :vol, variable_name=:delta )
                                
    unstack( [:pair,:delta,:periodName], :hist_point, :vol )
    @rtransform :shocked = :Base * :H2 / :H1
    @select :pair :delta :periodName :shocked    
    unstack( [:pair,:periodName], :delta, :shocked)
    
    @rtransform begin
        :ATM    = :Δ50 
        :BF10   = -:Δ50 + .5( :Δ10 + :Δ90 )
        :BF25   = -:Δ50 + .5( :Δ25 + :Δ75 )
        :RR10   = :Δ10 - :Δ90 
        :RR25   = :Δ25 - :Δ75
    end

Using Proposed @rtransform on KeyedArray

@chain KA begin

    @rtransform (dimension= delta) 
        :Δ10 = :ATM + :BF10 + .5 * :RR10
        :Δ25 = :ATM + :BF25 + .5 * :RR25
        :Δ50 = :ATM 
        :Δ75 = :ATM + :BF25 - .5 * :RR25
        :Δ90 = :ATM + :BF10 - .5 * :RR10
    end

    @rtransform (dimension = history )
        :shocked = :Base * :H2 / :H1
    end

    @rtransform (dimension = delta )    
        :ATM    = :Δ50 
        :BF10   = -:Δ50 + .5( :Δ10 + :Δ90 )
        :BF25   = -:Δ50 + .5( :Δ25 + :Δ75 )
        :RR10   = :Δ10 - :Δ90 
        :RR25   = :Δ25 - :Δ75
    end

end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants