FunctionalData
is a package for fast and expressive data modification.
Built around a simple memory layout convention, it provides a small set of general purpose functional constructs as well as routines for efficient computation with dense numerical arrays.
Optionally, it supplies a syntax for clean, concise code:
wordcount(filename) = @p read filename | lines | map split | flatten | length
Indexing is simplified for dense n-dimensional arrays, which are viewed as collections of (n-1)-dimensional items.
For example, this allows to use the exact same code for 2D patches and 3D blocks:
a = [1 2 3; 4 5 6]
b = ones(2, 2, 10) # 10 2D patches
c = ones(2, 2, 2, 10) # 10 3D blocks
len(a) => 3
len(b) => 10
len(c) => 10
at(a,2) => [2 5]'
part(a,2:3) => [2 3; 5 6]
normsum(x) = x/sum(x)
map(b, normsum) => [0.25 ... ] of size 2 x 2 x 10
map(c, normsum) => [0.125 ... ] of size 2 x 2 x 2 x 10
# Result shape may change:
map(b, sum) => [4 ... ] of size 1 x 10
map(c, sum) => [8 ... ] of size 1 x 10
Using a custom View
type based on this memory layout assumption, the provided map
operations can be considerably faster than built-ins. Given our data and desired operation:
a = rand(10, 1000000) # => 80 MB
csum!(x) = for i = 2:length(x) x[i] += x[i-1] end
csumoncopy(x) = (for i = 2:length(x) x[i] += x[i-1] end; x)
we can use the following simple, general and efficient statement:
map!(a, csum!)
# elapsed time: 0.027491752 seconds (256 bytes allocated)
Built-in alternatives are either slower or require manual inlining, for a specific data layout:
mapslices(csumoncopy, a, [1])
# elapsed time: 0.85726391 seconds (404 MB allocated, 5.34% gc time)
f(a) = for i = 1:size(a,2) a[:,i] = csumoncopy(a[:,i]) end
# elapsed time: 0.110978216 seconds (144 MB allocated, 3.86% gc time)
f2(a) = for i = 1:size(a,2) csum!(sub(a,:,i)) end
# elapsed time: 0.071394038 seconds (160 MB allocated, 16.46% gc time)
function f3(a)
for n = 1:size(a,2)
for m = 2:size(a,1) a[m,n] += a[m-1,n] end
end
end
# elapsed time: 0.017072235 seconds (80 bytes allocated)
function f4(a)
for n = 1:size(a,1):length(a)
for m = 1:size(a,1)-1 a[n+m] += a[n+m-1] end
end
end
# elapsed time: 0.013347679 seconds (80 bytes allocated)
With the exact same syntax we can easily parallelize our code using the local workers via shared memory or Julia's inter-process serialization, both on the local host or all machines:
shmap!(a, csum!) # local processes, shared memory
lmap!(a, csum!) # local processes
pmap!(a, csum!) # all available processes
For each of these variants there are optimized functions available for in-place operation on the input array, in-place operation on a new output array, or fallback options for functions which do not work in-place. For details, see the section on map and Friends.
- version requirement for 0.4 build
map
andmapmap
forDict
- fix
typed
- fixed
repeat
for numeric arrays - made
test_equal
more robust - reworked
map
andview
forArray{T,1}
/ scalar return values - fix
partsoflen
,concat
- add
takelast(a)
,unequal
,sortpermrev
,filter
- fix
map
forDict
- added
localworkers
andhostpids
- added
hmap
and variants, which map tasks to the first pid of each machine - removed
makeliteral
, as the built-inrepr
does the same - sped up
matrix
- added map2, map3, map4, map5
- fixed unzip
- added flip, flipdims
- added extract, removed @getfield
Please see the overview below for one-line descriptions of each function. More details and examples can then be found in the following sections (work in progress)
- Length and size
- Data access
- Data Layout
- Pipeline syntax
- Efficient views
- Computing: map and friends
- Output
- I/O
- Helpers
- Unit tests
Length and Size [details]
len(a) # length
siz(a) # lsize, ndims x 1
siz3(a) # lsize, 3 x 1
Data Access [details]
at(a, i) # item i
setat!(a, i, value) # set item i to value
fst(a) # first item
snd(a) # second item
third(a) # third item
last(a) # last item
part(a, ind) # items at indices ind
trimmedpart(a, ind) # items at ind, no error if a is too short
take(a, n) # the first up to n elements
takelast(a,n=1) # the last up to elements
drop(a,n) # a, except for the first n elements
droplast(a,n=1) # a, except for the last n elements
partition(a, n) # partition into n parts
partsoflen(a, n) # partition into parts of length n
extract(a, field, default) # get key x of dict or field x of composite type instance
Data Layout [details]
row(a) # reshape into row vector
col(a) # reshape into column vector
reshape(a, siz) # reshape into size in ndim x 1 vector siz
split(a, x or f) # split a where item == x or f(item) == true
concat(a...) # same as flatten([a...])
subtoind(sub, a) # transform ndims x npoints sub to linear ind for a
indtosub(ind, a) # transform linear ind to ndims x len(ind) sub for a
stack(a) # concat along the n + 1st dim of the items in a
flatten(a) # reduce the nestedness of a
unstack(a) # split the dense array a into array of items
riffle(a, x) # insert x between the items of a
matrix(a) # reshape items of a to column vectors
unmatrix(a, example) # reshape the column vector items in a according to example
lines(a) # split the text a into array of lines
unlines(a) # concat a with newlines
unzip(a) # unzip items
findsub(a) # return sub for the non-zero entries
randsample(a, n) # draw n items from a with repetition
randperm(a) # randomly permute order of items
flip(a) # reverse the order of items
flipdims(a,d1,d2) # flip dims d1 and d2
Pipeline Syntax [details]
r = @p f1 a b | f2 | f3 c # pipeline macro, equals f3(f2(f1(a,b)),c)
r = @p f1 a | f2 b _ | f3 e # equals f3(f2(b,f1(a)),c)
Efficient Views [details]
view(a,i) # lightweight view of item i of a
view(a,i,v) # lightweight view of item i of a, reusing v
next!(v) # make v point to the i + 1th item of a
trytoview(a,v) # for dense array, use view, otherwise part
trytoview(a,v,i) # for dense array, use view reusing v, otherwise part
Computing: map and Friends [details]
map(a, f) # apply f to each item
map!(a, f!) # apply f! to each item in-place
map!r(a, f) # apply f to each item, overwriting a
map2!(a, f, f!) # apply f to fst(a), f! to other items
map2!(a, r, f!) # apply f!(resultitem, item) to each item
shmap(a, f) # parallel map f to shared array a, accross procs(a)
shmap!(a, f!) # inplace shmap f!, overwriting a, accross procs(a)
shmap!r(a, f) # apply f to each item, overwriting a, accross procs(a)
shmap2!(a, f, f!) # apply f to fst(a), f! to other items, accross procs(a)
shmap2!(a, r, f!) # apply f!(resultitem, item), accross procs(a)
pmap(a, f) # parallel map of f accross all workers
lmap(a, f) # parallel map of f accross local workers
mapmap(a, f) # shorthand for map(a, x->map(x,f))
map2(a,b,f), map3, map4, map5 # map over a and b invoking f(x,y)
work(a, f) # apply f to each item, no result value
pwork, lwork, shwork, workwork # like the corresponding map variants
any(a, f) # is any f(item) true
anyequal(a, x) # is any item == x
all(a, f) # are all f(item) true
allequal(a, x) # are all items == x
unequal(a,b) # shortcut for !isequal(a,b)
sort(a, f; kargs...) # sort a accorting to f(item)
uniq(a[, f]) # unique elements of a or uniq(a,map(a,f))
table(f, a...) # like [f(m,n) for m in a[1], n in a[2]], for any length of a
ptable, ltable # parallel table using all workers, local workes
tableany, ptableany, ltableany # like table, but does not flatten result
Output [details]
showinfo
tee
I/O [details]
read
write
existsfile
mkdir
filenames
filepaths
dirnames
dirpaths
readmat
writemat
@snapshot
Helpers [details]
zerossiz(s, typ) # zeros(s...), default typ is Float64
shzerossiz(s, typ) # shared zerossiz
shzeros([typ,] s...) # shared zeros
onessiz(s, typ) # ones(s...), default typ is Float64
shonessiz(s, typ) # shared onessiz
shones([typ,] s...) # shared ones
randsiz(s, typ) # rand(s...), default typ is Float64
shrandnsiz(s, typ) # shared randsiz
shrand([typ,] s...) # shared rand
randnsiz(s, typ) # randn(s...), default typ is Float64
shrandnsiz(s, typ) # shared randnsiz
shrandn([typ,] s...) # shared randn
zeroel(a) # zero(eltype(a))
oneel # one(eltype(a))
@dict a b c ... # Dict("a" => a, "b" => b, "c" => c, ...)
+
*
repeat(a, n) # repeat a n times
nop() # no-op
id(a...) # returns a...
istrue(a or f) # is a or result of f true
isfalse(a or f) # !istrue
not # alias for !
or # alias for ||
and # alias for &&
plus # alias for .+
minus # alias for .-
times # alias for .*
divby # alias for ./
Unit Tests [details]
@test_equal a b # test a and b for equality, show detailed info if not
@assert_equal a b # like test_equal, then throws error
@test_almostequal a b maxdiff # like test_equal, but allows up to maxdiff difference