I'm toying with a new type of level map compression (will share the details later if the results are good), and in order to make the most out of it, at one point I need to combine several strips of numbers (all strips are the same length) into one long strip. The resulting strip should be the shortest possible combination of all strips, which must be overlaid in a way that makes optimal use of any redundancy that might exist.
For example, if I need to combine the strips "AABBBBBAAAAA" and "AAAAACCAAAAA", the result would be "AABBBBBAAAAACCAAAAA". This is simple when you only have a handful of strips, but with several hundred of them it's easy to make the wrong choices. My current idea is the following:
1 - Find the longest overlap for each strip and save this information (length of the overlap, index of the strip);
2 - Find the longest overlap among all strips, using the previously calculated values (if the longest overlap is 0, go to step 7);
3 - Join the two strips that are involved in the longest overlap and mark both original slots "empty";
4 - Add the new strip to the end of the list and find the longest overlap for it;
5 - Find the longest overlap for any strips that had the removed strips as their longest overlaps;
6 - Go to step 2;
7 - There are no more overlaps, so concatenate all the strips, ignoring the empty slots;
That seemed OK to me at first, but now I have the impression that by combining the best possible pair at each step, I might be botching future combinations that would result in larger gains, but I can't seem to think of a better way to do this. This is why I'm asking here if anyone knows of an algorithm (or can come up with one on the spot!) that can solve this problem optimally.
For example, if I need to combine the strips "AABBBBBAAAAA" and "AAAAACCAAAAA", the result would be "AABBBBBAAAAACCAAAAA". This is simple when you only have a handful of strips, but with several hundred of them it's easy to make the wrong choices. My current idea is the following:
1 - Find the longest overlap for each strip and save this information (length of the overlap, index of the strip);
2 - Find the longest overlap among all strips, using the previously calculated values (if the longest overlap is 0, go to step 7);
3 - Join the two strips that are involved in the longest overlap and mark both original slots "empty";
4 - Add the new strip to the end of the list and find the longest overlap for it;
5 - Find the longest overlap for any strips that had the removed strips as their longest overlaps;
6 - Go to step 2;
7 - There are no more overlaps, so concatenate all the strips, ignoring the empty slots;
That seemed OK to me at first, but now I have the impression that by combining the best possible pair at each step, I might be botching future combinations that would result in larger gains, but I can't seem to think of a better way to do this. This is why I'm asking here if anyone knows of an algorithm (or can come up with one on the spot!) that can solve this problem optimally.