Selecting and transforming columns
When selecting variables of a data frame, we do add
: to indicate that it’s a variable name. Thus for example, if we want to select column with variable
x, we use
First let’s construct some 500 by 3 dimensional data frame called
df that contains three columns each of which has name of
using DataFrames df = DataFrame( A=1:2:1000, B= repeat(1:10, inner=50), C= 1:500)
Each of below codes produces 500 by 1 dataframe, a data frame that only contains column
select(df, :A) df[:, [:A]]
For example, see the first 10 rows of
select(df, :A) |> (x->first(x, 10))
10×1 DataFrame Row │ A │ Int64 ─────┼─────── 1 │ 1 2 │ 3 3 │ 5 4 │ 7 5 │ 9 6 │ 11 7 │ 13 8 │ 15 9 │ 17 10 │ 19
Below codes allow us to rename the exisitng columns, from
b. Note that we use broadcasting
.=> to make this code work.
select(df, [:A, :B].=>[:a, :b])
Create a new column
Below we create a new column called
Cthat adds two columns,
select(df, :, [:A, :B]=>((a, b)->a.+b)=>:C)
|> is in Julia Base package. I was pleasantly surprised that Julia has similar operator to pipe operator,
%>%, in R.
The pipe operator is a helpful tool for nesting multiple functions within one another but in a concise and legible way.
For example suppose we want to raise a vector
vec to the power of 3 and then sum the results. There are several ways to achieve this, but we can conveniently use the pipe operator to accomplish the task, regardless of which way we choose.
vec=[1,2,3,4,5] vec .^3 |> sum [vec[i]^3 for i in 1:5] |>sum vec |> x->x.^3|>x->sum(x)