• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 08:10
CET 14:10
KST 22:10
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
RSL Season 3 - RO16 Groups C & D Preview0RSL Season 3 - RO16 Groups A & B Preview2TL.net Map Contest #21: Winners12Intel X Team Liquid Seoul event: Showmatches and Meet the Pros10[ASL20] Finals Preview: Arrival13
Community News
Weekly Cups (Nov 10-16): Reynor, Solar lead Zerg surge1[TLMC] Fall/Winter 2025 Ladder Map Rotation13Weekly Cups (Nov 3-9): Clem Conquers in Canada4SC: Evo Complete - Ranked Ladder OPEN ALPHA8StarCraft, SC2, HotS, WC3, Returning to Blizzcon!45
StarCraft 2
General
Weekly Cups (Nov 10-16): Reynor, Solar lead Zerg surge [TLMC] Fall/Winter 2025 Ladder Map Rotation Mech is the composition that needs teleportation t RotterdaM "Serral is the GOAT, and it's not close" RSL Season 3 - RO16 Groups C & D Preview
Tourneys
$5,000+ WardiTV 2025 Championship RSL Revival: Season 3 Sparkling Tuna Cup - Weekly Open Tournament Constellation Cup - Main Event - Stellar Fest Tenacious Turtle Tussle
Strategy
Custom Maps
Map Editor closed ?
External Content
Mutation # 500 Fright night Mutation # 499 Chilling Adaptation Mutation # 498 Wheel of Misfortune|Cradle of Death Mutation # 497 Battle Haredened
Brood War
General
FlaSh on: Biggest Problem With SnOw's Playstyle What happened to TvZ on Retro? BGH Auto Balance -> http://bghmmr.eu/ SnOw's ASL S20 Finals Review BW General Discussion
Tourneys
[Megathread] Daily Proleagues Small VOD Thread 2.0 [BSL21] RO32 Group D - Sunday 21:00 CET [BSL21] RO32 Group C - Saturday 21:00 CET
Strategy
How to stay on top of macro? Current Meta PvZ map balance Simple Questions, Simple Answers
Other Games
General Games
Should offensive tower rushing be viable in RTS games? Path of Exile Clair Obscur - Expedition 33 Stormgate/Frost Giant Megathread Nintendo Switch Thread
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread SPIRED by.ASL Mafia {211640}
Community
General
Things Aren’t Peaceful in Palestine US Politics Mega-thread Russo-Ukrainian War Thread About SC2SEA.COM Canadian Politics Mega-thread
Fan Clubs
White-Ra Fan Club The herO Fan Club!
Media & Entertainment
Movie Discussion! [Manga] One Piece Anime Discussion Thread Korean Music Discussion Series you have seen recently...
Sports
2024 - 2026 Football Thread Formula 1 Discussion NBA General Discussion MLB/Baseball 2023 TeamLiquid Health and Fitness Initiative For 2023
World Cup 2022
Tech Support
SC2 Client Relocalization [Change SC2 Language] Linksys AE2500 USB WIFI keeps disconnecting Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List
Blogs
Dyadica Gospel – a Pulp No…
Hildegard
Coffee x Performance in Espo…
TrAiDoS
Saturation point
Uldridge
DnB/metal remix FFO Mick Go…
ImbaTosS
Reality "theory" prov…
perfectspheres
Customize Sidebar...

Website Feedback

Closed Threads



Active: 2181 users

An interesting complex programming problem

Blogs > Qzy
Post a Reply
Normal
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-05-21 15:55:19
May 21 2011 15:44 GMT
#1
Hi programmers/math people.

Okay, here's the problem.
I have a hashmap with Strings as keys and values pointing to objects (as seen below in java)

HashMap<String, SomeObject>

The chars within a single string is element of the set {0, 1, #}. # is a wildcard which can represent either a 0 or 1.

When presented by a message, ie: 011010111 (a message's char is element of the set {0, 1}), the following strings are satisfied:
01#01#111
#1101011#
011010111
#########
etc., due to their wildcards.

Which look up/sorting method would you do, such that you have the fastest algorithm to store the strings and also find the strings which are satisfied?

Bruteforce
Complexity: Finding all satisfied strings: O(n*p) with n = population of strings, p = size of string.

Bruteforce works ofcourse:
for(all strings in hashmap)
is string satisfied? Save it
next string

Tree
Complexity: O(p*n), but very unlikely that all strings are found in ONE leaf. Constructing the tree O(2^p) (!HOLY FUCK!)

Keeping a tree which branches every time a wildcard apears in a string. Each leaf in the tree has a hashset, which looks like the one above. The string's SomeObject ie, 01#01#111 would be possible to find in 4 leafs of the tree:
010010111
010011111
011011111
011010111

The problem is constructing the tree... if the String is big like 20-30 chars, the construction is simply too big to be possible.

How would you do it?

*****
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Cube
Profile Blog Joined February 2008
Canada777 Posts
Last Edited: 2011-05-21 16:21:33
May 21 2011 16:03 GMT
#2
what I want to do is solve the problem by "folding" the strings into unique integers somehow, but i'm not sure it can be done.

edit: I really don't think I can help you, sorry.

edit2: what about making a new hashmap with no wildcards by replicating each string/object pairing 2^(num #s) times, then sorting the strings as integers. (big setup time, subsequent searches are O(lgn)).
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-05-21 16:29:24
May 21 2011 16:23 GMT
#3
On May 22 2011 01:03 Cube wrote:
what I want to do is solve the problem by "folding" the strings into unique integers somehow, but i'm not sure it can be done.

edit: I really don't think I can help you, sorry.


Might actually be a good idea.

Then it's an experiment of how much fold it required to sort it into serveral small hashmaps
Ignore what i wrote, I gotta think more about it.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Famulus
Profile Joined April 2011
United States8 Posts
May 21 2011 16:33 GMT
#4
What about bruteforcing the other way. Assuming you only care about getting the correct object and not the actual string, make an entry in the hash table for every possible message for each string with a wildcard.
pullarius1
Profile Blog Joined May 2010
United States523 Posts
Last Edited: 2011-05-21 16:35:48
May 21 2011 16:34 GMT
#5
Just to clarify, there are no limits on the sizes or types of data, eg a string could be a million characters long and the population of acceptable strings could be arbitrarily large? Also, are we assuming that all strings we're working with are of the same length?

I guess what I'm really asking is whether the problem is for an actual project in real life or just a theoretical puzzle?
@pullarius1
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 21 2011 16:38 GMT
#6
On May 22 2011 01:33 Famulus wrote:
What about bruteforcing the other way. Assuming you only care about getting the correct object and not the actual string, make an entry in the hash table for every possible message for each string with a wildcard.


It would be a good idea, but every possible message is 2^(length of string), that's

length -> combinations
20 -> 1,048,576
40 -> 1,099,511,627,776 (in my case)

Not scalable :/.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 21 2011 16:40 GMT
#7
On May 22 2011 01:34 pullarius1 wrote:
Just to clarify, there are no limits on the sizes or types of data, eg a string could be a million characters long and the population of acceptable strings could be arbitrarily large? Also, are we assuming that all strings we're working with are of the same length?

I guess what I'm really asking is whether the problem is for an actual project in real life or just a theoretical puzzle?


Perfectly good questions. All strings are the same length, and so is the message that needs to be satisfied. It's for an XCS engine - I'll provide a paper in a sec.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-05-21 16:49:05
May 21 2011 16:43 GMT
#8
algorithmic description of XCS

It's an AI learning technique, based on "Learning classifier systems". You don't really need to read it to understand the problem though.

Bunch of strings with #10 in it, and a message with 10 which needs to find the strings that satisfies it.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
arioch
Profile Joined May 2010
England403 Posts
May 21 2011 17:15 GMT
#9
I am interested to see if someone comes up with an alternative to iteration for this as I parse huge data files on a daily basis for work.

I often find myself setting up foreach loops with regular expressions to loop through hashtables in perl, and always wondered if there was a more efficient way of doing it.
Mx.DeeP
Profile Joined February 2008
China25 Posts
May 21 2011 17:18 GMT
#10
If you're not worried about memory, you can just take the initial HashMap and convert it into a new HashMap<String, ArrayList<SomeObject>> where the key is only {0,1}. You just iterate through the original HashMap and convert all '#' into '0' and '1'. This is worst case O(2^p) for a String of all '#' for storing, but gives you O(1) look-up time.
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-05-21 17:33:34
May 21 2011 17:32 GMT
#11
On May 22 2011 02:18 Mx.DeeP wrote:
If you're not worried about memory, you can just take the initial HashMap and convert it into a new HashMap<String, ArrayList<SomeObject>> where the key is only {0,1}. You just iterate through the original HashMap and convert all '#' into '0' and '1'. This is worst case O(2^p) for a String of all '#' for storing, but gives you O(1) look-up time.


Exactly - that's the "tree" i talked about..

My message (in my problem) has 40 bits, that's 2^40 in construction of that tree... 1,099,511,627,776 nodes in it - would take too long :/.

I'm gonna go work out, and think about it. .
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
pullarius1
Profile Blog Joined May 2010
United States523 Posts
Last Edited: 2011-05-21 17:34:14
May 21 2011 17:32 GMT
#12
I'm not sure how much you get to work with the lists beforehand, but obviously if you could sort the list of wild strings before hand it would help a lot. But it would be a waste of time if the number of strings you were checking for matches for were very low. That is, if you have 100 01# strings, but only needed to find the matches for a few 10 strings, sorting would probably hurt. But if you had 100 strings to match, it would probably be worth your while.

One thing I thought of that is probably not useful at all:
For each string, rehash it into integers in the following way-
For each placenumber i, assign the the 2i-th and (2i-1)th prime to it. If that place number holds a 1 one, choose the odd prime, a 0, choose the even prime, a # choose neither. Multiply all the chosen primes together.

For instance 10110 would be (2 or 3) (5 or 7) (11 or 13) (17 or 19) (23 or 29) 2*7*11*17*29 = 75,922

While #01#0 would be (2 or 3) (5 or 7) (11 or 13) (17 or 19) (23 or29) 7*11*29 = 2,233

The benefit of this system would be that wild strings would divide precisely the strings that satisfied them. For whatever that's worth.

...sometimes I wish I had taken some practical programming classes in school :-(.

@pullarius1
Cube
Profile Blog Joined February 2008
Canada777 Posts
May 21 2011 18:07 GMT
#13
On May 22 2011 02:32 pullarius1 wrote:
I'm not sure how much you get to work with the lists beforehand, but obviously if you could sort the list of wild strings before hand it would help a lot. But it would be a waste of time if the number of strings you were checking for matches for were very low. That is, if you have 100 01# strings, but only needed to find the matches for a few 10 strings, sorting would probably hurt. But if you had 100 strings to match, it would probably be worth your while.

One thing I thought of that is probably not useful at all:
For each string, rehash it into integers in the following way-
For each placenumber i, assign the the 2i-th and (2i-1)th prime to it. If that place number holds a 1 one, choose the odd prime, a 0, choose the even prime, a # choose neither. Multiply all the chosen primes together.

For instance 10110 would be (2 or 3) (5 or 7) (11 or 13) (17 or 19) (23 or 29) 2*7*11*17*29 = 75,922

While #01#0 would be (2 or 3) (5 or 7) (11 or 13) (17 or 19) (23 or29) 7*11*29 = 2,233

The benefit of this system would be that wild strings would divide precisely the strings that satisfied them. For whatever that's worth.

...sometimes I wish I had taken some practical programming classes in school :-(.



this is basically what I had in mind but as the string size grows arbitrarily large this becomes impractical. :[
Oracle
Profile Blog Joined May 2007
Canada411 Posts
Last Edited: 2011-05-21 19:20:00
May 21 2011 18:14 GMT
#14
On May 22 2011 02:32 pullarius1 wrote:
I'm not sure how much you get to work with the lists beforehand, but obviously if you could sort the list of wild strings before hand it would help a lot. But it would be a waste of time if the number of strings you were checking for matches for were very low. That is, if you have 100 01# strings, but only needed to find the matches for a few 10 strings, sorting would probably hurt. But if you had 100 strings to match, it would probably be worth your while.

One thing I thought of that is probably not useful at all:
For each string, rehash it into integers in the following way-
For each placenumber i, assign the the 2i-th and (2i-1)th prime to it. If that place number holds a 1 one, choose the odd prime, a 0, choose the even prime, a # choose neither. Multiply all the chosen primes together.

For instance 10110 would be (2 or 3) (5 or 7) (11 or 13) (17 or 19) (23 or 29) 2*7*11*17*29 = 75,922

While #01#0 would be (2 or 3) (5 or 7) (11 or 13) (17 or 19) (23 or29) 7*11*29 = 2,233

The benefit of this system would be that wild strings would divide precisely the strings that satisfied them. For whatever that's worth.

...sometimes I wish I had taken some practical programming classes in school :-(.



well thats actually a great solution, since if message modulo hashed-key = 0 then such an index satisfies the constraint.

So if you map every key by the hash function to this form, and store it in the next slot in the database, as well as with its object pointer, then simply do a linear search for message modulo hashed key = 0 on each element of the array

this circumvents directly hashing onto an array location since that hash function increases faster than n factorial
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-05-21 19:16:33
May 21 2011 19:02 GMT
#15
Okay, I gotta re-read it all, cos I'm a bit lost on this one.. .

Edit: okay, I read it! I need to write a sketch over it Might actually work with modulus it with the message.

Your only problem is if the String has 40 wildcards in it - then it takes a long time to write all the possible prime combinations (right?) ... or if you ignore wildcards, is 40x # = 0?

I'm gonna write an algorithm for this rly quick .
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Oracle
Profile Blog Joined May 2007
Canada411 Posts
May 21 2011 19:19 GMT
#16
So when you store an object by its key, create an array with the hashed version of the key (H_i) O(p) and its object pointer O(1). Then store both into the next available position in the dataset O(1).

When you're searching for keys which satisfy a certain message:
First hash the key O(p) = H_k.
Then do a linear check over all database entries such that H_k modulo H_i = 0 and return it. O(l)

p = length of string
l = length of dataset
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-05-21 19:34:53
May 21 2011 19:34 GMT
#17
The problem is then, what if the dataset is 5,000,000 strings? :/
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Oracle
Profile Blog Joined May 2007
Canada411 Posts
May 21 2011 19:40 GMT
#18
insertion is O(p)
extraction is O(p+l) in which l will probably dominate p so O(l)

which is still acceptable by any means (l = length of array, so linear time)
5,000,000 wouldn't take an enormous amount of time (in fact 5,000,000 is actually really fast to compute)
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 21 2011 19:42 GMT
#19
I'm thinking it might be possible to speed up look up.. Perhaps with tree-search, or other sorting methods.. Ofcourse this would kill the insertion-time.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
haxorz
Profile Blog Joined June 2009
United States138 Posts
May 21 2011 19:50 GMT
#20
^ Yes, it is. I've been thinking about this for the past hour or so and have coded up a working implementation in Java. I'll PM you once I write more tests.
And theres the GG.
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 21 2011 19:57 GMT
#21
Sounds good I'm trying to work out something aswell.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
evanthebouncy
Profile Joined November 2004
China491 Posts
Last Edited: 2011-05-21 20:37:22
May 21 2011 20:35 GMT
#22
can I have some bearing on this problem? Are you saying your initial set, i.e. the set that's a subset of
{ {0,1,#}^n } is relatively large or small?

by big I mean is it close to the size 3^n i.e. everything?
BOINK BOINK! Recursively defined
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 21 2011 20:41 GMT
#23
It's huuge, as in 1 million strings.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Oracle
Profile Blog Joined May 2007
Canada411 Posts
May 21 2011 20:47 GMT
#24
I think evan is more asking how many permutations are covered than how many strings there are in total.

Because 1 million strings is meaningless without the size of n (length of a string)
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 21 2011 20:52 GMT
#25
All strings are different from eachother :O.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
evanthebouncy
Profile Joined November 2004
China491 Posts
May 21 2011 20:56 GMT
#26
What oracle said.

I'll say my idea now as I'll be going to my old apartment trying to contact a moving company to move some stuff. But my idea so far is this:

Create a DAG on the initial string structure, with the vertex in the DAG the strings themselves, and the edge correspond to an "implication".

The edge is defined as this:
vertex v implies vertex u if we accept v implies we have to accept u as well.

To make it concrete, vertex (0#1) will have an edge pointing to vertex (0##) because if we accept the string 0#1 we MUST accept the string 0##.

So, suppose you CAN construct this DAG (i'm working on how to best construct it, you don't want the dag to be dense, for instance), the lookup will be something like this:

on input message:

ret = {}
while DAG not empty:
...for all leaf-nodes in DAG: #i.e. the nodes who have no implication pointing toward them
......if satisify(leaf-node, message): #if we accept the leaf node as matching the msg
.........move( transitiveClosure(leaf-node), ret) #take the leaf node, and all it implies, to the return set
......else: #if the leaf do not satisfy
.........delete(leaf-node) #remove the leaf node, so some other node can potentially be new leaf node

I don't have bound on the runtime of lookup, however, if you look at it I'm gaining knowledge as I traverse through the graph, which is good. When I decide if I want to match a particular string to my message, not only I learned if I can match it, but I also learned if other things can match it.

So yeah, gtg now, will think it through on paper, brb!!
BOINK BOINK! Recursively defined
evanthebouncy
Profile Joined November 2004
China491 Posts
May 21 2011 20:58 GMT
#27
On May 22 2011 05:52 Qzy wrote:
All strings are different from eachother :O.

no no that doesn't tell me anything.

Say you have the set {0,1,#}^3, so that's 27 total strings right?
How dense is your data set? is it just {001, #11, 01#} i.e. only 1/9 of the total string?
or is it super dense like, 20 of the total string?
BOINK BOINK! Recursively defined
Oracle
Profile Blog Joined May 2007
Canada411 Posts
May 21 2011 21:05 GMT
#28
On May 22 2011 05:56 evanthebouncy wrote:
What oracle said.

I'll say my idea now as I'll be going to my old apartment trying to contact a moving company to move some stuff. But my idea so far is this:

Create a DAG on the initial string structure, with the vertex in the DAG the strings themselves, and the edge correspond to an "implication".

The edge is defined as this:
vertex v implies vertex u if we accept v implies we have to accept u as well.

To make it concrete, vertex (0#1) will have an edge pointing to vertex (0##) because if we accept the string 0#1 we MUST accept the string 0##.

So, suppose you CAN construct this DAG (i'm working on how to best construct it, you don't want the dag to be dense, for instance), the lookup will be something like this:

on input message:

ret = {}
while DAG not empty:
...for all leaf-nodes in DAG: #i.e. the nodes who have no implication pointing toward them
......if satisify(leaf-node, message): #if we accept the leaf node as matching the msg
.........move( transitiveClosure(leaf-node), ret) #take the leaf node, and all it implies, to the return set
......else: #if the leaf do not satisfy
.........delete(leaf-node) #remove the leaf node, so some other node can potentially be new leaf node

I don't have bound on the runtime of lookup, however, if you look at it I'm gaining knowledge as I traverse through the graph, which is good. When I decide if I want to match a particular string to my message, not only I learned if I can match it, but I also learned if other things can match it.

So yeah, gtg now, will think it through on paper, brb!!

I played around with the idea of a DAG for a bit but I couldn't find a good way to construct it, do post if you figure out an efficient way.

In fact the lookup time will be very short, its just the construction which is the basis of your algorithm which may make or break it.
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 21 2011 21:13 GMT
#29
On May 22 2011 05:58 evanthebouncy wrote:
Show nested quote +
On May 22 2011 05:52 Qzy wrote:
All strings are different from eachother :O.

no no that doesn't tell me anything.

Say you have the set {0,1,#}^3, so that's 27 total strings right?
How dense is your data set? is it just {001, #11, 01#} i.e. only 1/9 of the total string?
or is it super dense like, 20 of the total string?


I'm a bit confused by this comment (sorry, mate, i know you are trying to help )

The string can be set to any length to begin with, consisting of only 1, 0 and #.
The amount of wildcards can be set aswell, ie 40% chance of wilcard being inserted.

In the end you end up with some random string:
10101010
0111110#
00#1011#, etc. There can be millions of these

Then a message is given: (no wildcards, same length of the strings) 10101111, and you have to find all the strings which satisfies the message, given wildcards can represent both 1 and 0.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 21 2011 21:26 GMT
#30
And yes, please do post your code here for all to see seems to be lots of followers to this blog post.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
pullarius1
Profile Blog Joined May 2010
United States523 Posts
May 21 2011 21:47 GMT
#31
On May 22 2011 06:13 Qzy wrote:
Show nested quote +
On May 22 2011 05:58 evanthebouncy wrote:
On May 22 2011 05:52 Qzy wrote:
All strings are different from eachother :O.

no no that doesn't tell me anything.

Say you have the set {0,1,#}^3, so that's 27 total strings right?
How dense is your data set? is it just {001, #11, 01#} i.e. only 1/9 of the total string?
or is it super dense like, 20 of the total string?


I'm a bit confused by this comment (sorry, mate, i know you are trying to help )

The string can be set to any length to begin with, consisting of only 1, 0 and #.
The amount of wildcards can be set aswell, ie 40% chance of wilcard being inserted.

In the end you end up with some random string:
10101010
0111110#
00#1011#, etc. There can be millions of these

Then a message is given: (no wildcards, same length of the strings) 10101111, and you have to find all the strings which satisfies the message, given wildcards can represent both 1 and 0.



He's essentially asking what percentage of all possible strings exist in the 01# set? You said there are 40 bits in the strings, giving 3^40 possible strings. Do you know about what fraction of those are in the reference set? I could imagine, for instance, that if the number was high enough, the complement problem could actually be easier to solve.
@pullarius1
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
Last Edited: 2011-05-21 22:12:12
May 21 2011 22:03 GMT
#32
It's possible to set a cap on the amount of strings possible, ie 50,000 or 1 million. So when 1 million strings exists, it's no longer possible to insert more strings. We would probably crash even googles servers if we allowed 3^40, hehe.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
DeLoAdEr
Profile Blog Joined July 2003
Japan527 Posts
May 21 2011 22:28 GMT
#33
Hmm, just a quick thought: maybe it helps if you sort the strings into different sets depending on their digits.

Lets call S_{n, k} the set of your strings which have char k at digit n. For example S_{1, 1} = { 0001, 111#, 1111, 011#, ... } is the set of all your strings containing the 1 at the least-significant bit.

For a given string s the goal is now to calculate the intersection between S_{1, s[1]}, S_{2, s[2]}, ..., S_{p, s[p]}. The brute-force implementation of this intersection would have a runtime of O(n * p) again i think. =(

But this could be programmed efficiently with bitvectors representing the sets and logical AND for intersection.
evanthebouncy
Profile Joined November 2004
China491 Posts
May 22 2011 01:32 GMT
#34
On May 22 2011 06:05 Oracle wrote:
Show nested quote +
On May 22 2011 05:56 evanthebouncy wrote:
What oracle said.

I'll say my idea now as I'll be going to my old apartment trying to contact a moving company to move some stuff. But my idea so far is this:

Create a DAG on the initial string structure, with the vertex in the DAG the strings themselves, and the edge correspond to an "implication".

The edge is defined as this:
vertex v implies vertex u if we accept v implies we have to accept u as well.

To make it concrete, vertex (0#1) will have an edge pointing to vertex (0##) because if we accept the string 0#1 we MUST accept the string 0##.

So, suppose you CAN construct this DAG (i'm working on how to best construct it, you don't want the dag to be dense, for instance), the lookup will be something like this:

on input message:

ret = {}
while DAG not empty:
...for all leaf-nodes in DAG: #i.e. the nodes who have no implication pointing toward them
......if satisify(leaf-node, message): #if we accept the leaf node as matching the msg
.........move( transitiveClosure(leaf-node), ret) #take the leaf node, and all it implies, to the return set
......else: #if the leaf do not satisfy
.........delete(leaf-node) #remove the leaf node, so some other node can potentially be new leaf node

I don't have bound on the runtime of lookup, however, if you look at it I'm gaining knowledge as I traverse through the graph, which is good. When I decide if I want to match a particular string to my message, not only I learned if I can match it, but I also learned if other things can match it.

So yeah, gtg now, will think it through on paper, brb!!

I played around with the idea of a DAG for a bit but I couldn't find a good way to construct it, do post if you figure out an efficient way.

In fact the lookup time will be very short, its just the construction which is the basis of your algorithm which may make or break it.


You want to make a GOOD dag, which is tricky...

You want the dag to be "deep" rather than shallow, because the deeper it is the more inference you can do...

construction is indeed tricky.

For the sake of algorithm let us abstract the problem to a higher level...

Let there be a collection of sets: F = { A_i s.t. A_i is a set }
For example, F can be F = { {1,2,3}, {1,3}, {1}, {2,3} }

Find an efficient algorithm that given an element a, return a collection that contains all the sets inside F which contains a.
For example, take F as it is, and say we want to return all the sets containing 1. We'd return
T = { {1,2,3}, {1,3}, {1} }
Whereas if we try to say containing 2, we'd return
T = { {1,2,3}, {2,3} }

You see how these 2 problems are equivalent.

BOINK BOINK! Recursively defined
Qzy
Profile Blog Joined July 2010
Denmark1121 Posts
May 22 2011 20:24 GMT
#35
Someone actually rated this 1 star Sick..

It's a good discussion I think - reading every post carefully.
TG Sambo... Intel classic! Life of lively to live to life of full life thx to shield battery
Normal
Please log in or register to reply.
Live Events Refresh
Wardi Open
12:00
#61
WardiTV707
TKL 170
Rex107
Liquipedia
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Reynor 356
TKL 170
Harstem 131
ProTech116
Rex 107
StarCraft: Brood War
Britney 45172
Calm 10059
Horang2 1602
Jaedong 761
Soma 734
EffOrt 649
Stork 440
firebathero 377
Larva 340
Rush 237
[ Show more ]
Pusan 190
ZerO 180
Zeus 152
Killer 80
Mind 79
ToSsGirL 77
yabsab 63
Liquid`Ret 41
scan(afreeca) 28
Icarus 22
Noble 14
Hm[arnc] 14
ivOry 9
NaDa 9
Dota 2
Dendi1088
qojqva263
XcaliburYe209
Counter-Strike
olofmeister2086
x6flipin717
allub213
oskar145
Other Games
B2W.Neo744
Pyrionflax438
crisheroes359
Fuzer 308
hiko140
Sick93
QueenE24
ZerO(Twitch)18
Organizations
Dota 2
PGL Dota 2 - Main Stream10370
PGL Dota 2 - Secondary Stream4933
StarCraft: Brood War
UltimateBattle 70
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 11 non-featured ]
StarCraft 2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• C_a_k_e 1675
Upcoming Events
Monday Night Weeklies
3h 50m
Replay Cast
9h 50m
ChoboTeamLeague
11h 50m
WardiTV Korean Royale
22h 50m
BSL: GosuLeague
1d 7h
The PondCast
1d 20h
Replay Cast
2 days
RSL Revival
2 days
herO vs Zoun
Classic vs Reynor
Maru vs SHIN
MaxPax vs TriGGeR
BSL: GosuLeague
3 days
RSL Revival
3 days
[ Show More ]
WardiTV Korean Royale
3 days
RSL Revival
4 days
WardiTV Korean Royale
4 days
IPSL
5 days
Julia vs Artosis
JDConan vs DragOn
RSL Revival
5 days
Wardi Open
6 days
IPSL
6 days
StRyKeR vs OldBoy
Sziky vs Tarson
Replay Cast
6 days
Liquipedia Results

Completed

Proleague 2025-11-14
Stellar Fest: Constellation Cup
Eternal Conflict S1

Ongoing

C-Race Season 1
IPSL Winter 2025-26
KCM Race Survival 2025 Season 4
SOOP Univ League 2025
YSL S2
BSL Season 21
CSCL: Masked Kings S3
SLON Tour Season 2
RSL Revival: Season 3
META Madness #9
BLAST Rivals Fall 2025
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
CS Asia Championships 2025
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025

Upcoming

BSL 21 Non-Korean Championship
Acropolis #4
IPSL Spring 2026
HSC XXVIII
RSL Offline Finals
WardiTV 2025
IEM Kraków 2026
BLAST Bounty Winter 2026
BLAST Bounty Winter 2026: Closed Qualifier
eXTREMESLAND 2025
ESL Impact League Season 8
SL Budapest Major 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.