问题

给定两个数据帧:

df1 = data.frame(CustomerId = c(1:6), Product = c(rep("Toaster", 3), rep("Radio", 3)))
df2 = data.frame(CustomerId = c(2, 4, 6), State = c(rep("Alabama", 2), rep("Ohio", 1)))

df1
#  CustomerId Product
#           1 Toaster
#           2 Toaster
#           3 Toaster
#           4   Radio
#           5   Radio
#           6   Radio

df2
#  CustomerId   State
#           2 Alabama
#           4 Alabama
#           6    Ohio

如何处理数据库样式,即 sql风格,加入?也就是说,我如何获得:

  • An inner join of df1 and df2:
    Return only the rows in which the left table have matching keys in the right table.
  • An outer join of df1 and df2:
    Returns all rows from both tables, join records from the left which have matching keys in the right table.
  • A left outer join (or simply left join) of df1 and df2
    Return all rows from the left table, and any rows with matching keys from the right table.
  • A right outer join of df1 and df2
    Return all rows from the right table, and any rows with matching keys from the left table.

额外功劳:

如何执行SQL风格的select语句?



解决方法

使用 merge 函数及其可选参数:

内连接: merge(df1,df2)适用于这些示例,因为R会通过常用变量名称自动连接框架,您最有可能想要指定 merge(df1,df2,by ="CustomerId"),以确保您只匹配所需的字段.如果匹配的变量在不同的数据帧中具有不同的名称,您还可以使用 by.x by.y 参数.

外部加入 merge(x = df1,y = df2,by ="CustomerId",all = TRUE) >

左外部: merge(x = df1,y = df2,by ="CustomerId",all.x = TRUE) / p>

右外部 merge(x = df1,y = df2,by ="CustomerId",all.y = TRUE) / p>

交叉连接 merge(x = df1,y = df2,by = NULL) 与内部连接一样,你可能想要将"CustomerId"显式地传递给R作为匹配变量.我认为最好明确地陈述你想要的标识符合并;如果输入数据框架意外更改,稍后阅读更方便,则更安全.




相关问题推荐