|
I've been asking around but I haven't got any answers. I know this community is very technically proficient and intelligent, so I come to you. I understand that this is random but please help!
I have been given the task to clean up a contact list of a survey of over 6,000 people! Some of them are duplicate names or emails so they must be removed. HOWEVER, the problem is that I cannot simply delete a duplicate name or email because the email or name might be a duplicate but the cell phone numbers are different! Thus, I am required to combine both cell phone numbers under the cell phone column FIRST before I delete the duplicate.
Here is an example to clearly convey the issue:
In this example, we have two duplicate names (yes, people can have the same names but in this case imagine that they are the same person listed twice), Mark and DJ. I need to delete one, but before I do I need to make sure I am not deleting vital information in the process. In this case they have two different cell phone numbers but they are not collected under one name. I need to keep the two different cell phone numbers and then delete the duplicate names.
The thing is I can do it manually but it would take me forever to go over the hundreds of duplicates in the Excel. I am looking for the most efficient way to tackle this problem. I hope it is okay that I ask this in my blog. I know we have a General Tech but I am pretty sure I can't ask Excel questions as that would be blasphemous in a gaming forum.
Thanks for listening!
|
|
|
I think you missed the 6000 part.
OP, I don't suppose you know any VBA?
|
Calgary25955 Posts
I use Excel a llllllllllloooooooooooooooottttttttttttttt and you should just sort and do it manually. No other method is going to be 100% accurate if you can't be sure of how your data is formatted.
|
I tried this but I was not getting anywhere because there were so many different categories to change and combine (the cell phone was just simple example) and there are over 700 duplicates.
|
Calgary25955 Posts
On December 16 2011 04:10 Glacierz wrote:I think you missed the 6000 part. OP, I don't suppose you know any VBA? I know VBA. What do you need help with specifically?
|
On December 16 2011 04:11 Chill wrote: I use Excel a llllllllllloooooooooooooooottttttttttttttt and you should just sort and do it manually. No other method is going to be 100% accurate if you can't be sure of how your data is formatted.
Thanks Chill, I guess I just better get started and chip away.
Yeah I know how to remove duplicates, that is the easy part
|
When you say duplicate, are the content in both cells exactly the same? as in if I did something like "=IF(A1=A2, 1, 0) it would return 1?
|
Calgary25955 Posts
Oh okay I just saw what you're looking for (imgur is blocked at work, saw on my phone).
Basically you want to have a unique list of names and then add the other fields together? PM me the file I'll write you a script.
The problem you're going to have is formatting consistency. If someone has John Smith and another one is john smith and lastly smith, john they're all going to be different.
If your data is formatted uniformly then I can do it. Otherwise, manually is the best.
|
United States1967 Posts
I can't see the image, since imgur is blocked at work, but I'm sure there's a way to automate it. I'll take a look when I get home.
|
On December 16 2011 04:15 Glacierz wrote: When you say duplicate, are the content in both cells exactly the same? as in if I did something like "=IF(A1=A2, 1, 0) it would return 1?
Yes, some of the content in two different cells might be exactly-exactly the same but it does not mean that the other information in corresponding row are the same as well. They can be different and so I must combine those differences into one row and match it with just one name.
|
Isn't there a way to use the A-Z row sort function such that the columns stay locked? (or vice versa) If you did that, then you could just merge all the duplicate names.
That's all that's needed, and with all the crazy functions Excel has I would be shocked if that wasn't possible.
|
With VBA, you can write a nested loop after the names are sorted in alphabetical order. You keep an expanding collection of phone numbers that are unique and check with the list with every iterative step.
If the data set is well behaved, you can even do this in MATLAB, which contains the function union(), that accomplishes what you are trying to do in 1 line of code.
|
Uh, a quick approach off the top of my head:
1) sort A->Z based on names (keep # column in the same selection so it's sorted along... you know this part right) 2) run a simple macro to check name cells and remove if same as previous (gets rid of duplicate names) 3) run a macro to check number cells and if the pair name cell is empty (duplicate), then append current cell with the previous cell. you may need to do some extra work to make sure excel treats the number cell as "general" or "text" not "number", or you may run into errors. 4) run a macro to remove rows if name cell is empty -> sorted list with no duplicate names and no loss of numbers
|
On December 16 2011 04:16 Chill wrote: Oh okay I just saw what you're looking for (imgur is blocked at work, saw on my phone).
Basically you want to have a unique list of names and then add the other fields together? PM me the file I'll write you a script.
The problem you're going to have is formatting consistency. If someone has John Smith and another one is john smith and lastly smith, john they're all going to be different.
If your data is formatted uniformly then I can do it. Otherwise, manually is the best.
If it is just capitalization that can be taken care of easily. There's a VBA function that converts all char to lower or upper case.
|
On December 16 2011 04:16 Chill wrote:
Basically you want to have a unique list of names and then add the other fields together? PM me the file I'll write you a script. .
Yes, exactly what I was looking for!!! But I have no idea how to do it.
Wow, that is very kind of you, but I doubt my prof would let me since this a list of confidential contact info! Hehe people would be pissed at me. Is there a resource or tutorial online that I could study to tackle this problem? Thanks again.
|
Calgary25955 Posts
On December 16 2011 04:20 OpticalShot wrote: Uh, a quick approach off the top of my head:
1) sort A->Z based on names (keep # column in the same selection so it's sorted along... you know this part right) 2) run a simple macro to check name cells and remove if same as previous (gets rid of duplicate names) 3) run a macro to check number cells and if the pair name cell is empty (duplicate), then append current cell with the previous cell. you may need to do some extra work to make sure excel treats the number cell as "general" or "text" not "number", or you may run into errors. 4) run a macro to remove rows if name cell is empty -> sorted list with no duplicate names and no loss of numbers Yea I have a similar solution.
Run macro that: 1) Sorts A->Z based on names 2) Loops through and checks if n name is the same as n+1 name. 2a) If it is, checks if all n+1 data is unique. If it is, appends it to the n data. If it isn't, skip. 2b) Deletes row n+1 2c) Rechecks row n vs new row n+1 2d) n++ until blank
|
On December 16 2011 04:20 OpticalShot wrote: Uh, a quick approach off the top of my head:
1) sort A->Z based on names (keep # column in the same selection so it's sorted along... you know this part right) 2) run a simple macro to check name cells and remove if same as previous (gets rid of duplicate names) 3) run a macro to check number cells and if the pair name cell is empty (duplicate), then append current cell with the previous cell. you may need to do some extra work to make sure excel treats the number cell as "general" or "text" not "number", or you may run into errors. 4) run a macro to remove rows if name cell is empty -> sorted list with no duplicate names and no loss of numbers
Yes, I know how to sort but I am lost when it comes to running macros. Are macros similar to the functions within excel?
edit: Learning about how to make a macro right now. I'll do my best.
|
for(int i = 0;i<cells.length();i++){ for(int j=1 + i;j<cells.length();j++){ if(name[i].trim() == name[j].trim() && phone[i].trim() == phone[j].trim()){ //delete the row corresponding to name[j] } } }
I don't know how to make macros in excel but I hope that helps
EDIT:Thanks Glacerz <3
|
|
|
|