Taking a Subset
We can subset a dataframe using index. For example we can subset specific columns and rows by following codes.
# column subset (subset df to only include column A and C)
df[:, [:A, :C]]
# row subset (subset df to only include 1st, 3rd, and 5th rows)
df[[1,3,5], :]
We could also use some selectors such as Not
, Between
, Cols
and All
. Following code removes columns whose names match r"x"
.
# add two columns x1 and x2 to df
df[:, :x1].=0
df[:, :x2].=1
# subset columns whose names match "x"
df[:, r"x"]
# subset columns whose names do not match "x"
df[:, Not(r"x")]
# subset columns that are not A
df[:, Not(:A)][1:10, :]
Subsetting functions
We could use subset
function to subset a dataframe.
# subset df where it looks at column A
# to screen through observations whose values are less than 10
subset(df, :A=>a-> a .<10)
Reference
https://dataframes.juliadata.org/stable/man/working_with_dataframes/