wu :: forums - Print Page


    
      
        wu :: forums
        (http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi)
      

        riddles >> hard >> Pandigital expressions
        
(Message started by: Leonid Broukhis on Sep 17^th, 2007, 9:29pm)

Title: Pandigital expressions
Post by Leonid Broukhis on Sep 17^th, 2007, 9:29pm

What is the min positive integer that cannot be represented as an expression that uses only digits 1 to 9 - each one exactly once in any order, four arithmetic operations (+, -, *, /) and parentheses?

Concatenating digits is not allowed.

E.g.

1 = (3+5+7-2-9)/8*(6-4)*1
10 = 3*5*1+6-9+8+4-7*2
100 = 4*1+(2+7)*8+6+9*(5-3)
1000 = 8*(3+7)*(6+9-2*5/4/1)
10000 = (7*(((2+3)*9)*(5-1))-6-4) * 8

Title: Re: Pandigital expressions
Post by Grimbal on Sep 18^th, 2007, 8:27am

I wouldn't try that without a computer.

Title: Re: Pandigital expressions
Post by SMQ on Sep 18^th, 2007, 8:33am

I'm working on trying it with a computer, but there are a huge number (tens of trillions) of legal expressions, with likely multiple billions of unique ~~integer~~ values (although integer values must be in the range of 1-9! to 9!). If I can find a fast way to eliminate expressions which are equivalent by commutation or associativity that might bring it into the realm of ~~computational feasibility~~ practicality on my hardware...

--SMQ

Title: Re: Pandigital expressions
Post by Grimbal on Sep 18^th, 2007, 9:25am

on 09/18/07 at 08:33:23, SMQ wrote:

(although integer values must be in the range of 1-9! to 9!).

Or maybe 9!*3/2. ;)

Title: Re: Pandigital expressions
Post by Leonid Broukhis on Sep 18^th, 2007, 10:04am

Quote:

Or maybe 9!*3/2.

Exactly!

SMQ: I have not started writing such a program, but the base data structure should be the bitmask of used digits, the value, top arith op, and two links to the operands. This way all equivalences are automatically avoided.

Title: Re: Pandigital expressions
Post by SMQ on Sep 18^th, 2007, 10:19am

Actually, after some thought, I think I'm going to use common subexpression elimination "in reverse." Two subexpressions are unique iff they have different values or have the same values but utilize different terms. Given all all unique subexpressions of lengths 1 through n-1 it is fairly straightforward to calculate all subexpressions of length n, and by eliminating duplicate values early in the process the number of calculations is greatly reduced. Together with a few other tricks, I think that can be made to run in a reasonable amount of time.

--SMQ

Title: Re: Pandigital expressions
Post by Barukh on Sep 18^th, 2007, 10:46am

I've got a program (written for another thread long time ago). It's highly non-optimized - takes a long time to run!

:(

Title: Re: Pandigital expressions
Post by Leonid Broukhis on Sep 18^th, 2007, 11:11am

Another interesting question: what is the expected value of such an expression (assuming all 4 arith ops are equiprobable, and omitting division by zero).

Title: Re: Pandigital expressions
Post by Barukh on Sep 19^th, 2007, 2:18am

Just for the record:

Digits Min non-rep
-----------------------------
1-5 76
1-6 284
1-7 1413

Title: Re: Pandigital expressions
Post by Leonid Broukhis on Sep 19^th, 2007, 2:28am

Thank you.

http://www.research.att.com/~njas/sequences/A060315

Title: Re: Pandigital expressions
Post by Grimbal on Sep 19^th, 2007, 2:37am

It is not exactly the same. The Integer Sequence Encyclopedia includes zero, which could be used to cancel some terms. It doesn't seem to affect the result much, though.

Title: Re: Pandigital expressions
Post by SMQ on Sep 19^th, 2007, 6:16am

Also, the Sloanes sequence doesn't require you to use all the digits (see the example there). My program looks like it'll be reasonably fast, but it runs out of memory, so I have to rewrite parts of it to be able to store intermediate results to disk...

--SMQ

Title: Re: Pandigital expressions
Post by Barukh on Sep 19^th, 2007, 6:51am

SMQ, I think that restriction won't change the final result.

But computing the result in a reasonable amount of time is an interesting and non-trivial task in its own right. My naive program would compute it in more than 27 years on a single machine... :o

Title: Re: Pandigital expressions
Post by SMQ on Sep 19^th, 2007, 2:44pm

I can now confirm that the smallest positive integer which cannot be represented as a pandigital expression is 38103 (matching the sequence from Sloanes). 8)

My program takes a bit over 1/2 hour (on a 3.2 GHz Pentium 4) to generate all unique integer values of pandigital expressions (there are 205901 of them). I'll work on the expected value question tomorrow. ;)

--SMQ

Title: Re: Pandigital expressions
Post by Leonid Broukhis on Sep 19^th, 2007, 3:16pm

SMQ: Disallow division, and things should be much faster :) This will let you answer the question what is the min number that really needs division, and how many of them out of 205901 do.

Title: Re: Pandigital expressions
Post by Barukh on Sep 20^th, 2007, 6:49am

Very impressive, SMQ! I would like to learn about your program more than you outlined in one the previous posts.

Title: Re: Pandigital expressions
Post by Hippo on Sep 20^th, 2007, 7:02am

Hmm, so far I count the number of expressions as 9! 4⁸(C₇¹⁴-C₆¹⁴). Which is of order around 10¹³.
So you must optimise the enumeration as you have limited time...

Title: Re: Pandigital expressions
Post by SMQ on Sep 20^th, 2007, 7:30am

on 09/20/07 at 06:49:36, Barukh wrote:

Very impressive, SMQ! I would like to learn about your program more than you outlined in one the previous posts.

Thanks. :)

The primary data structure is a 512-element array of sets of rationals in lowest terms (one set of rationals for each set of digits used). After populating the trivial sets for single-digit subexpressions, the main loop is as follows:

For each n from 2 through 9
For each set S of n digits
For each pair of (nonempty) sets of digits A, B, A http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/cup.gif B = S, A http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/cap.gif B = http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/varnothing.gif
For each rational a in the set of rationals associated with A
For each rational b in the set of rationals associated with B
Add (iff not already present) to the set of rationals associated with S, each of:
a + b,
a - b,
b - a,
a * b,
a / b (iff b _{http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/ne.gif} 0), and
b / a (iff a _{http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/ne.gif} 0).

By keeping the sets of rationals in lowest terms, duplicate subexpression values are eliminated early in the process, bringing the number of calculations down by a factor of several thousand. To reach the 1/2 hour timeframe I additionally only consider integer results for n = 9, rather than deriving the entire set of unique rational results. The rest of the details are in caching large sets of rationals to disk in order to keep the RAM requirements reasonable for my platform (under 1GB).

--SMQ

Title: Re: Pandigital expressions
Post by SMQ on Sep 20^th, 2007, 7:39am

on 09/20/07 at 07:02:59, Hippo wrote:

Hmm, so far I count the number of expressions as 9! 4⁸(C₇¹⁴-C₆¹⁴). Which is of order around 10¹³.

You must be eliminating some obvious duplicates, as I get the number of lexicographically-distinct expressions as (₁₇C₈/17) http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/cdot.gif 9! http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/cdot.gif 4⁸ = ₁₆C₈ http://www.ocf.berkeley.edu/~wwu/YaBBImages/symbols/cdot.gif 4⁸.

--SMQ

Title: Re: Pandigital expressions
Post by Hippo on Sep 21^st, 2007, 6:42am

I enumerate all expresions in polish notation.
The hardest question was to count positions where can the operators be placed. (The ₁₄C₇-₁₄C₆ term).

BTW: I have thought about the algoritmn meanwhile ... I would use another set of 4 operations:
+/- as one of them, *, / , \ the others.
I would apply less complex expression to more complex therefore two kinds of division should be used /, \.

I would not store rational results if quotient is not multiple of not used numbers.

I would store results without sign with flag denoting if both signs are reachable.

If the same result is obtained for the given set of numbers, the flags are ored. (*,/,\ or's the flag's too, +/- ... the flag rules are more complicated but easy ... it generates two results, one with flag set (-), the other is ored flag of the originals (+))

Rest will be simillar algorithm to yours, but I will generate expressions in order of growing number of used numbers.

... I haven't start codding yet (and I am not sure I will).

Title: Re: Pandigital expressions
Post by Grimbal on Sep 21^st, 2007, 7:09am

I thought that if I were to write the program, I would restrict the expressions to something like
expression = term ['+' term]* ['-' term]*
term = factor ['*' factor]* ['/' factor]*
factor = number | '(' expression ')'
where the terms and factors are in increasing order within each operation. As a result, for instance, I would not try A+B if B is already a sum or if A is the result of a subtraction. This way, I would not try the same sum in all possible permutations. But I later realized that SMQ's method of pairing intermediate results in all possible ways and checking whether the value exists already for the same set of digits is just as efficient to prune these cases. The only difference is that the result needs to be actually computed. So in the end, I don't think it would be much faster.

Title: Re: Pandigital expressions
Post by SMQ on Sep 21^st, 2007, 7:45am

on 09/21/07 at 06:42:25, Hippo wrote:

I enumerate all expresions in polish notation.
The hardest question was to count positions where can the operators be placed. (The ₁₄C₇-₁₄C₆ term).

As did I, but I'm pretty sure that term should be Cat(8) (where Cat(n) is the n^th Catalan Number) = ₁₇C₈/17 = ₁₆C₈ - ₁₆C₇ = 1430 rather than Cat(7) = ₁₅C₇/15 = ₁₄C₇ - ₁₄C₆ = 429.

My program to enumerate the complete set of result values, along with their multiplicities and an example expression, looks like it should finish sometime tomorrow (the relative slowness due mainly to my inefficient algorithm to merge partial results in memory with the 3GB+ file of results on disk). Then I'll be able to answer the expected value question. ;)

--SMQ

Title: Re: Pandigital expressions
Post by Grimbal on Sep 21^st, 2007, 8:28am

I think that a mean value has no meaning unless you tell exactly what expression are considered as equal.
For instance, do "1-(2-3)", "(1-2)+3" and (3+1)-2 count as different expressions?
Co "(1+2)+3" and "1+(2+3)" count as the same?

Title: Re: Pandigital expressions
Post by SMQ on Sep 21^st, 2007, 8:46am

I would consider all of those distinct expressions, some of which have the same value as others. My reasoning being that when written in postfix notation or graphed as an operator tree they would have distinct representations. That's the definition of multiplicity my currently-running program is using. ... or would be if I had remembered to multiply by 2 for addition and multiplication, gol durn it! >:( :-[ Now I'm going to have to start it over. Maybe I can improve on the merge algorithm while I'm at it.

(That definition is useful in this case as the total number of distinct expressions is fairly easily calculated (although Hippo and I seem do disagree on which Catalan number to use) and can serve as a sanity check on the results of the program...)

--SMQ

Title: Re: Pandigital expressions
Post by Leonid Broukhis on Sep 21^st, 2007, 9:05am

on 09/21/07 at 06:42:25, Hippo wrote:

I would not store rational results if quotient is not multiple of not used numbers.

And that would be your mistake. Note, for example, that 21 = 6 / (1 - 5/7)

Title: Re: Pandigital expressions
Post by Hippo on Sep 21^st, 2007, 10:26am

Leonid: Oops, thanks ... so this one optimisation does not work ... so I should stay just with +/- trick.

SMQ: I was not sure ... is this Catalanyan number or not. ... I have solved the recurence. May be I have made a +/-1 mistake in indexes.
I have heard about Catalanyan numbers just one small talk ... I have never realised so far that you obtain them from Pascall triangle "by subtracting its mirror image".

Yes, you are right ... I am one number off.
I would compute that ₂C₁-₂C₀=1 is the number of possibilities how to place two operators around last one operand (into 3 operands string).
So _nX_k = _n+k+1C_k+1 - _n+k+1C_k
is corrected expression for the number of possibilities how to place n operators around last k operands.

OK, I had a bit of time so I made a code.
First observation: To represent results for each set of operands separately is must. If you use one set and filter it, you spend majority of time filtering.

Second observation: Not saving results of form a/b for eight operands when remains number c and neither b|c nor a|c can be helpful.

For full set of operands just a/1 results are required.

Even with these tricks there are more than 3 000 000 results for 1,...,8 operands.

There can be one more trick applied ... as an aditive flag compresses representation of -a/b to a/b (and speeds up *,/,\ oprations by doing them in parallel).
We can use a multiplicative flag to compress representatin of +/-b/a to +/-a/b. In that case the +/- operation will have at most 8 results, but operations *,/,\ can be replaced by one operation \*/ having just 2 results. This will parallelize the process even more. (There is small problem here ... the flag is not enough ... you need 3 states a/b means a/b in the first state, it means b/a in the second state and either a/b or b/a in the third state. This allows you to store a/b such that a<b)

Oops ... how to represent 1/5 and +/-5 at the same time ... I should use distinct aditive flags.
So far the \*/ operation has 2 results, but the rules for flags are more complicated. The +/- operation generates at most 2,4 or 8 results depending on multiplicative flags set (its results have to unset multiplicative flag).

The speed-up by the parallelisation is notable, (and the size of the sets fits to my remaining HDD space now).
OK: done :) ... 38103,38201,38618,...

Now I have read the describtion of SMQ's algorithm in details ... I suppose you coded it in rather low level language. On my NOTEBOOK (1.6GHz, 512MB)it took around half a day time (But I didn't implement hash tables and I used indexed tables in FoxPro instead so the multiplicative "constant" is notable (for upto 5 operands I use cursors ... tables in memory). Just created table (corresponding to S) was indexed.).

Filtering writes on 8th level saved a lot of space, too. Oops, the filtering trick was implemented wrongly and didn't filter at all. With the filtering the file size shrinks around 10 times.

May be the trick with multiplicative and additive flags should be described better:
I store items in a/b form where a<=b and LCD(a,b)=1. With each a/b I store 4 bits. bit 0 seys -b/a is reachable, bit 1 seys b/a is reachable, bit 2 seys -a/b is reachable and bit 3 seys a/b is reachable.

When working with an item I first compare bits 32 with bits 10 and reorder them such that bits 32 are higher than bits 10. I swap a and b as well.
Now There ar 5 possible bitmaps: 1111,1110,1100,1010 and 1000.
I have precomputed matrix of corresponding bits for * and \*/ operations.

(Yes you can use larger matrices and not to swap but this was easier for me.)

... instead of bits the counts should be used when counting results and the matrices holding four flags results cannot be easily adopted for counting purposes. You must multiply corresponding multicities and sum them together. So you loose the parallelisation in this case, the trick only saves space, and as the ratio of info/key size increases, the save is smaller.

If eight bytes for a count are required, counting would consume 4 times much HDD space ... around 2GB. (And with the shrinking on 8th level ...)
... yes its feasible, but I am not sure I am interested in it.

Wow ... I have now realised that the original computation would fit into memory ... but who can expected it.

Title: Re: Pandigital expressions
Post by SMQ on Sep 26^th, 2007, 8:53pm

Nice work, Hippo!

As for my own program, after fixing the multiplicity bug and overhauling the file merge code, I can now announce, after a run of only 4.5 hours:

{1,2,3,4,5,6,7,8,9}: 115521109 unique values, ints to 38103

34,007,836,262,400 distinct expressions
33,854,394,991,592 rational expressions
153,441,270,808 divisions by 0

expected value: 151.409

:D

--SMQ

Title: Re: Pandigital expressions
Post by Leonid Broukhis on Sep 26^th, 2007, 8:59pm

SMQ,

please run it again with disabled division!

Title: Re: Pandigital expressions
Post by Hippo on Sep 27^th, 2007, 3:11am

on 09/26/07 at 20:53:25, SMQ wrote:

Nice work, Hippo!
--SMQ

Thanks, the same to you SMQ. ... I would never do such stupid experiment if not excited by your work ;).

Can you do something for me? Try to do the +/-,\*/ trick on your platform and compare the time to the original one (Not to the counting one).
It should take much less then an half of an hour I expected.

Usability of not saving the 8 opernad results when hashing and having sufficient amount of memory is discutable ...

I had other thoughts about saveing memory ... to made a count how many times you will need the resulting set of 7 or 8 operands in the future and delete them as early as possible ...

Do you hash keys or you find a fast injection to memory and you maintain a separate list of results?

Title: Re: Pandigital expressions
Post by SMQ on Sep 27^th, 2007, 6:06am

on 09/26/07 at 20:59:17, Leonid Broukhis wrote:

please run it again with disabled division!

{1,2,3,4,5,6,7,8,9} 205898 unique values, ints to 38103

3,404,623,622,400 distinct expressions

expected value: 746.466

(in under 2 minutes)

--SMQ

Title: Re: Pandigital expressions
Post by SMQ on Sep 27^th, 2007, 1:23pm

on 09/27/07 at 03:11:41, Hippo wrote:

Can you do something for me? Try to do the +/-,\*/ trick on your platform and compare the time to the original one (Not to the counting one).

I'm not sure it's strictly comparable, as implementation details of my program have changed quite a bit since that first run, but it looks like about 9 minutes.

Quote:

Do you hash keys or you find a fast injection to memory and you maintain a separate list of results?

I hash them, based on a random number derived (by table lookup) from the numerator and denominator. Checks for equality are a fair bit quicker than checks for order (two comparisons and a boolean AND vs. two signed multiplies and a comparison), so I never bother to sort the intermediate results, just the final integers.

--SMQ

Title: Re: Pandigital expressions
Post by Hippo on Sep 27^th, 2007, 2:24pm

Of course, sorting of subresults while hashing is a nonsense. The list in question was not sorted ... if the lookup table is much larger than represented set the unsorted list can be helpful.

Nice work SMQ, thanks. It seems to be closed for me now.