An interesting complex programming problem - Page 2
Blogs > Qzy |
Qzy
Denmark1121 Posts
| ||
evanthebouncy
China491 Posts
{ {0,1,#}^n } is relatively large or small? by big I mean is it close to the size 3^n i.e. everything? | ||
Qzy
Denmark1121 Posts
| ||
Oracle
Canada411 Posts
Because 1 million strings is meaningless without the size of n (length of a string) | ||
Qzy
Denmark1121 Posts
| ||
evanthebouncy
China491 Posts
I'll say my idea now as I'll be going to my old apartment trying to contact a moving company to move some stuff. But my idea so far is this: Create a DAG on the initial string structure, with the vertex in the DAG the strings themselves, and the edge correspond to an "implication". The edge is defined as this: vertex v implies vertex u if we accept v implies we have to accept u as well. To make it concrete, vertex (0#1) will have an edge pointing to vertex (0##) because if we accept the string 0#1 we MUST accept the string 0##. So, suppose you CAN construct this DAG (i'm working on how to best construct it, you don't want the dag to be dense, for instance), the lookup will be something like this: on input message: ret = {} while DAG not empty: ...for all leaf-nodes in DAG: #i.e. the nodes who have no implication pointing toward them ......if satisify(leaf-node, message): #if we accept the leaf node as matching the msg .........move( transitiveClosure(leaf-node), ret) #take the leaf node, and all it implies, to the return set ......else: #if the leaf do not satisfy .........delete(leaf-node) #remove the leaf node, so some other node can potentially be new leaf node I don't have bound on the runtime of lookup, however, if you look at it I'm gaining knowledge as I traverse through the graph, which is good. When I decide if I want to match a particular string to my message, not only I learned if I can match it, but I also learned if other things can match it. So yeah, gtg now, will think it through on paper, brb!! | ||
evanthebouncy
China491 Posts
On May 22 2011 05:52 Qzy wrote: All strings are different from eachother :O. no no that doesn't tell me anything. Say you have the set {0,1,#}^3, so that's 27 total strings right? How dense is your data set? is it just {001, #11, 01#} i.e. only 1/9 of the total string? or is it super dense like, 20 of the total string? | ||
Oracle
Canada411 Posts
On May 22 2011 05:56 evanthebouncy wrote: What oracle said. I'll say my idea now as I'll be going to my old apartment trying to contact a moving company to move some stuff. But my idea so far is this: Create a DAG on the initial string structure, with the vertex in the DAG the strings themselves, and the edge correspond to an "implication". The edge is defined as this: vertex v implies vertex u if we accept v implies we have to accept u as well. To make it concrete, vertex (0#1) will have an edge pointing to vertex (0##) because if we accept the string 0#1 we MUST accept the string 0##. So, suppose you CAN construct this DAG (i'm working on how to best construct it, you don't want the dag to be dense, for instance), the lookup will be something like this: on input message: ret = {} while DAG not empty: ...for all leaf-nodes in DAG: #i.e. the nodes who have no implication pointing toward them ......if satisify(leaf-node, message): #if we accept the leaf node as matching the msg .........move( transitiveClosure(leaf-node), ret) #take the leaf node, and all it implies, to the return set ......else: #if the leaf do not satisfy .........delete(leaf-node) #remove the leaf node, so some other node can potentially be new leaf node I don't have bound on the runtime of lookup, however, if you look at it I'm gaining knowledge as I traverse through the graph, which is good. When I decide if I want to match a particular string to my message, not only I learned if I can match it, but I also learned if other things can match it. So yeah, gtg now, will think it through on paper, brb!! I played around with the idea of a DAG for a bit but I couldn't find a good way to construct it, do post if you figure out an efficient way. In fact the lookup time will be very short, its just the construction which is the basis of your algorithm which may make or break it. | ||
Qzy
Denmark1121 Posts
On May 22 2011 05:58 evanthebouncy wrote: no no that doesn't tell me anything. Say you have the set {0,1,#}^3, so that's 27 total strings right? How dense is your data set? is it just {001, #11, 01#} i.e. only 1/9 of the total string? or is it super dense like, 20 of the total string? I'm a bit confused by this comment (sorry, mate, i know you are trying to help ) The string can be set to any length to begin with, consisting of only 1, 0 and #. The amount of wildcards can be set aswell, ie 40% chance of wilcard being inserted. In the end you end up with some random string: 10101010 0111110# 00#1011#, etc. There can be millions of these Then a message is given: (no wildcards, same length of the strings) 10101111, and you have to find all the strings which satisfies the message, given wildcards can represent both 1 and 0. | ||
Qzy
Denmark1121 Posts
| ||
pullarius1
United States522 Posts
On May 22 2011 06:13 Qzy wrote: I'm a bit confused by this comment (sorry, mate, i know you are trying to help ) The string can be set to any length to begin with, consisting of only 1, 0 and #. The amount of wildcards can be set aswell, ie 40% chance of wilcard being inserted. In the end you end up with some random string: 10101010 0111110# 00#1011#, etc. There can be millions of these Then a message is given: (no wildcards, same length of the strings) 10101111, and you have to find all the strings which satisfies the message, given wildcards can represent both 1 and 0. He's essentially asking what percentage of all possible strings exist in the 01# set? You said there are 40 bits in the strings, giving 3^40 possible strings. Do you know about what fraction of those are in the reference set? I could imagine, for instance, that if the number was high enough, the complement problem could actually be easier to solve. | ||
Qzy
Denmark1121 Posts
| ||
DeLoAdEr
Japan527 Posts
Lets call S_{n, k} the set of your strings which have char k at digit n. For example S_{1, 1} = { 0001, 111#, 1111, 011#, ... } is the set of all your strings containing the 1 at the least-significant bit. For a given string s the goal is now to calculate the intersection between S_{1, s[1]}, S_{2, s[2]}, ..., S_{p, s[p]}. The brute-force implementation of this intersection would have a runtime of O(n * p) again i think. =( But this could be programmed efficiently with bitvectors representing the sets and logical AND for intersection. | ||
evanthebouncy
China491 Posts
On May 22 2011 06:05 Oracle wrote: I played around with the idea of a DAG for a bit but I couldn't find a good way to construct it, do post if you figure out an efficient way. In fact the lookup time will be very short, its just the construction which is the basis of your algorithm which may make or break it. You want to make a GOOD dag, which is tricky... You want the dag to be "deep" rather than shallow, because the deeper it is the more inference you can do... construction is indeed tricky. For the sake of algorithm let us abstract the problem to a higher level... Let there be a collection of sets: F = { A_i s.t. A_i is a set } For example, F can be F = { {1,2,3}, {1,3}, {1}, {2,3} } Find an efficient algorithm that given an element a, return a collection that contains all the sets inside F which contains a. For example, take F as it is, and say we want to return all the sets containing 1. We'd return T = { {1,2,3}, {1,3}, {1} } Whereas if we try to say containing 2, we'd return T = { {1,2,3}, {2,3} } You see how these 2 problems are equivalent. | ||
Qzy
Denmark1121 Posts
It's a good discussion I think - reading every post carefully. | ||
| ||