Selecting and transforming columns
When selecting variables of a data frame, we do add :
to indicate that it’s a variable name. Thus for example, if we want to select column with variable x
, we use :x
inside select()
.
First let’s construct some 500 by 3 dimensional data frame called df
that contains three columns each of which has name of A
, B
, and C
.
using DataFrames
df = DataFrame(
A=1:2:1000,
B= repeat(1:10, inner=50),
C= 1:500)
Each of below codes produces 500 by 1 dataframe, a data frame that only contains column A
.
select(df, :A)
df[:, [:A]]
For example, see the first 10 rows of select(df, :A)
.
select(df, :A) |> (x->first(x, 10))
Output:
10×1 DataFrame
Row │ A
│ Int64
─────┼───────
1 │ 1
2 │ 3
3 │ 5
4 │ 7
5 │ 9
6 │ 11
7 │ 13
8 │ 15
9 │ 17
10 │ 19
Rename
Below codes allow us to rename the exisitng columns, from A
and B
to a
and b
. Note that we use broadcasting .=>
to make this code work.
select(df, [:A, :B].=>[:a, :b])
Create a new column
Below we create a new column called C
that adds two columns, A
and B
, element-wise.
select(df, :, [:A, :B]=>((a, b)->a.+b)=>:C)
Pipe operator
Pipe operator |>
is in Julia Base package. I was pleasantly surprised that Julia has similar operator to pipe operator, %>%
, in R.
The pipe operator is a helpful tool for nesting multiple functions within one another but in a concise and legible way.
For example suppose we want to raise a vector vec
to the power of 3 and then sum the results. There are several ways to achieve this, but we can conveniently use the pipe operator to accomplish the task, regardless of which way we choose.
vec=[1,2,3,4,5]
vec .^3 |> sum
[vec[i]^3 for i in 1:5] |>sum
vec |> x->x.^3|>x->sum(x)