Selecting and transforming columns

When selecting variables of a data frame, we do add : to indicate that it’s a variable name. Thus for example, if we want to select column with variable x, we use :x inside select().

First let’s construct some 500 by 3 dimensional data frame called df that contains three columns each of which has name of A, B, and C.

using DataFrames
df = DataFrame(
    A=1:2:1000, 
    B= repeat(1:10, inner=50), 
    C= 1:500)

Each of below codes produces 500 by 1 dataframe, a data frame that only contains column A.

select(df, :A)
df[:, [:A]]

For example, see the first 10 rows of select(df, :A).

select(df, :A) |> (x->first(x, 10))

Output:

10×1 DataFrame
 Row │ A     
     │ Int64 
─────┼───────
   1 │     1
   2 │     3
   3 │     5
   4 │     7
   5 │     9
   6 │    11
   7 │    13
   8 │    15
   9 │    17
  10 │    19

Rename

Below codes allow us to rename the exisitng columns, from A and B to a and b. Note that we use broadcasting .=> to make this code work.

select(df, [:A, :B].=>[:a, :b])

Create a new column

Below we create a new column called Cthat adds two columns, A and B, element-wise.

select(df, :, [:A, :B]=>((a, b)->a.+b)=>:C)

Pipe operator

Pipe operator |> is in Julia Base package. I was pleasantly surprised that Julia has similar operator to pipe operator, %>%, in R.

The pipe operator is a helpful tool for nesting multiple functions within one another but in a concise and legible way.

For example suppose we want to raise a vector vec to the power of 3 and then sum the results. There are several ways to achieve this, but we can conveniently use the pipe operator to accomplish the task, regardless of which way we choose.

vec=[1,2,3,4,5]
vec .^3 |> sum
[vec[i]^3 for i in 1:5] |>sum
vec |> x->x.^3|>x->sum(x)