9/19/2023 0 Comments Pyspark uuid generator![]() Create a new global_id for every session_id which cannotīe linked to neither a user_id nor universal_id but has.rows with don't match on neither user_id or universal_id but on session_id Propagate existing values for global_id to all rows with matching session_id, i.e.Propagate values for global_id to all rows with.Create a new global_id for every universal_id whichĬannot be linked to a user_id but has multiple occurrences.There is an arbitrary tie break if multiple universal_ids match on one or more user_ids, where all matching universal_ids are assigned to the same user_id rows which don't match on user_id but match on universal_id. Propagate values for global_id to all rows with matching universal_id, i.e.Create a new global_id for every user_id that has single occurrence (n=1).Propagate values for global_id to all rows with matching.Create a new global_id for every user_id that has.Update: Since you were concerned about the risk of duplicates using fully randomly generated UUID4s, I coded you a little function which allows you to generate a UUID leveraging both UUID1 and/or UUID4 - I personally would not be worried about clashes of UUID4 values whatsoever, but it's up to you. Update: The algorithm now features the arbitrary tie-break when multiple user_ids match multiple universal_ids. Here's my test dataset: import pandas as pdĭf = pd.DataFrame(data, columns=)īased on what you described, we can formulate an algorithm as follows, referring to the new ID as global_id. I have Snowflake and Databricks at my disposal. If you know how this is done please help, or at least point me to a subject that I should research to be able to do this. ![]() The idea is to create a map of universal_id:unique_id. I'm trying to find a thing in python (or pyspark because I may be using this on millions of rows) that can help me do the clustering of this data (or however this process is called in data science).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |