Taking a Subset

We can subset a dataframe using index. For example we can subset specific columns and rows by following codes.

# column subset (subset df to only include column A and C)
df[:, [:A, :C]]

# row subset (subset df to only include 1st, 3rd, and 5th rows)
df[[1,3,5], :]

We could also use some selectors such as Not, Between, Colsand All. Following code removes columns whose names match r"x".

# add two columns x1 and x2 to df 
df[:, :x1].=0
df[:, :x2].=1

# subset columns whose names match "x"
df[:, r"x"]

# subset columns whose names do not match "x"
df[:, Not(r"x")]

# subset columns that are not A
df[:, Not(:A)][1:10, :]

Subsetting functions

We could use subset function to subset a dataframe.

# subset df where it looks at column A 
# to screen through observations whose values are less than 10
subset(df, :A=>a-> a .<10)

Reference

https://dataframes.juliadata.org/stable/man/working_with_dataframes/