Comparing 2 lists in R

Just a simple R script that reads a CSV file made up of 2 columns, and outputs which elements are only in the first column, which elements are only in the second column, and which elements are in both (note that it won’t count elements but will remove duplicates). Pretty simple actually (except that I tried to vectorize it a bit, so it should be a bit optimized), I’m only posting it as a backup.

Here’s the script:

myData=read.csv("MyListComparisonData.csv");
g1=as.character(unique(myData[,1])); # we'll name the first list (first column) g1. For some reason it can be interpreted as a factor (maybe because we had some repetitions, thus the need for "unique" to improve speed), so we specify "as.character"
g2=as.character(unique(myData[,2])); # second list: g2
gcommon=c(); # initialize vector for storing common elements
gonlyG1=c(); # initialize vector for storing elements specific to g1
gonlyG2=c(); # initialize vector for storing elements specific to g2

# we'll loop over g1 and see if the current element can be found in g2
repeat{
  if(length(which(g2==g1[1]))>0){ # vectorized way to check if one or more elements of g2 match the current one
    gcommon[length(gcommon)+1]=g1[1]; # if yes, we add it to the "common" list
    g2=g2[-which(g2==g1[1])]; # then we remove it from g2
    
  }
  else{gonlyG1[length(gonlyG1)+1]=g1[1];} # otherwise, we add it to the "g1 only" list
  g1=g1[-1]; # then in any case we remove it from g1
  if(length(g1)==0) break; # we're done emptying g1, so move on
}
gonlyG2=g2; # because we removed from g2 all common elements with g1, anything left if specific to g2

# fill with void the columns which are too small
nRows=max(length(gcommon),length(gonlyG1),length(gonlyG2))+1;
gcommon[nRows]="";gonlyG1[nRows]="";gonlyG2[nRows]="";
# now that all vectors have same length, we can put them into a dataframe
output=data.frame(gcommon,gonlyG1,gonlyG2);
# and write to disk
write.csv(output,"comparaison_result.csv",quote=TRUE,sep=",",row.names=FALSE,col.names=TRUE);

And here’s a sample CSV input file:

"elementA","elementB"
"elementA","elementB"
"elementB","elementC"

The result for this input should be: elementA is only in first column, elementC is only in second column, elementB is in both columns.

Posted in R (R-project).

rev="post-2580" No comments

By patheticcockroach – 2012-01-10

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

« Securing a XAMPP installation (for local, single-PC use) in 2 steps… PhotoBucket edits the pictures you upload… poorly »

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30

Comparing 2 lists in R

0 Responses

See also…

Recent Comments

Meta

Calendar

Archives

Comparing 2 lists in R

0 Responses

Subscribe

See also…

Recent Comments

Meta

Calendar

Archives