• Bit sequences with bounded number of continuous 1s.

    From Michael S@21:1/5 to All on Wed Dec 6 12:30:11 2023
    Bit sequences with bounded number of continuous '1' bits are useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading '1'
    bits, at most T continuous trailing '1' bits and at most M=L+T
    continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a single
    '0' bit and then by desired number of extended words. Any gaps between
    packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for (L=2,T=2)
    there exist more than 2**16 extended words with desired properties.
    For those interested, 76735 and 83665 extended words, respectively.
    So, desired 16b-to-17b conversions certainly exist for M=4.
    The problem is that I know of no simple "algorithmic" way of
    implementing such conversion short of using big look-up tables. Neither
    in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old and
    outdated?
    Or, better, in books or articles?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Wed Dec 6 15:41:45 2023
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 11:30, Michael S wrote:
    Bit sequences with bounded number of continuous '1' bits are useful
    for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading
    '1' bits, at most T continuous trailing '1' bits and at most M=L+T continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a
    single '0' bit and then by desired number of extended words. Any
    gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired properties. For those interested, 76735 and 83665 extended words, respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old
    and outdated?
    Or, better, in books or articles?



    Usually the interest is in limiting length of repeats at any point in
    the data, not just the start and end. After all, it's not going to
    help anyone if the start and end of the word are limited to no more
    than 2 consecutive ones if you have a run of 10 ones in the middle of
    the word.

    Please read again my original post. It's all covered.


    It is also common to want to combine other encoding features at the
    same time. Limiting the lengths of both zeros and ones means you
    have transitions, which are needed for clock recovery and
    synchronisation. For many transmissions, approximately 0 DC balance
    is highly desirable. And you may also want forward error correction
    or at least error detection.


    In my case of interest the bits are encoded as NRZI so mere presence of
    '1' bits in headers and gaps guarantees that there are at least M
    transitions per packet. It is sufficient for clock phase recovery
    at receiver (a.k.a Data Phase Alignment). Clock frequency recovery is
    not necessary, because clocks at both sides of the link are derived from
    the same source.
    Anyway, when somewhat higher density of transitions is desirable, say,
    one transition (i.e. logical 1) per 25 or 30 bits, it's absolutely
    trivial to add at no additional coding cost since 76735 is sufficiently
    large than 65736.

    DC balance is not required in my case. When DC balance is required at
    all, the balance requirements tend to be rather strict. Weak balance is
    rarely of interest. Typically disparity should be bounded to few
    percents over rather short intervals of few dozens of bit times. Or
    better. In order to achieve that you need rather higher coding rates
    than 17/16 or rather longer coding blocks than 16 bits. Or sometimes
    both. In other words, you have to play completely different game.

    Maybe you have some specialised use in mind?


    I do have a specialized use in mind, but for my specialized use the
    schemes that limit the number of continuous '1' bits to 4 are not
    necessary. Limit of 5 is perfectly o.k. and as I mentioned in original
    post, it's easy. In fact, if 5 was not so easy, from practical
    perspective I would be satisfied with significantly higher limit, like 7
    or 8.

    The question about bound of 4 is more of theoretical than of
    practical interest. Yesterday it surprised me to find out that such
    codes exists not just for 16b-to-17b, but for much longer words as
    well, up to 30b-to-31b. But there is a big difference between knowing
    that the code exists and finding a way to implement it in
    environment with limited resources.

    Otherwise, these links might be of interest to you, on related ideas :

    <https://en.wikipedia.org/wiki/64b/66b_encoding> <https://en.wikipedia.org/wiki/8b/10b_encoding> <https://en.wikipedia.org/wiki/Bit_stuffing>



    Don't you know me long enough to guess that I had read that much and
    more?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Wed Dec 6 13:06:29 2023
    On 06/12/2023 11:30, Michael S wrote:
    Bit sequences with bounded number of continuous '1' bits are useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading '1'
    bits, at most T continuous trailing '1' bits and at most M=L+T
    continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a single
    '0' bit and then by desired number of extended words. Any gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for (L=2,T=2)
    there exist more than 2**16 extended words with desired properties.
    For those interested, 76735 and 83665 extended words, respectively.
    So, desired 16b-to-17b conversions certainly exist for M=4.
    The problem is that I know of no simple "algorithmic" way of
    implementing such conversion short of using big look-up tables. Neither
    in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old and outdated?
    Or, better, in books or articles?



    Usually the interest is in limiting length of repeats at any point in
    the data, not just the start and end. After all, it's not going to help
    anyone if the start and end of the word are limited to no more than 2 consecutive ones if you have a run of 10 ones in the middle of the word.

    It is also common to want to combine other encoding features at the same
    time. Limiting the lengths of both zeros and ones means you have
    transitions, which are needed for clock recovery and synchronisation.
    For many transmissions, approximately 0 DC balance is highly desirable.
    And you may also want forward error correction or at least error detection.

    Maybe you have some specialised use in mind?

    Otherwise, these links might be of interest to you, on related ideas :

    <https://en.wikipedia.org/wiki/64b/66b_encoding> <https://en.wikipedia.org/wiki/8b/10b_encoding> <https://en.wikipedia.org/wiki/Bit_stuffing>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Michael S on Wed Dec 6 14:43:59 2023
    Michael S wrote:

    Bit sequences with bounded number of continuous '1' bits are useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading '1'
    bits, at most T continuous trailing '1' bits and at most M=L+T
    continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a single
    '0' bit and then by desired number of extended words. Any gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for (L=2,T=2)
    there exist more than 2**16 extended words with desired properties.
    For those interested, 76735 and 83665 extended words, respectively.
    So, desired 16b-to-17b conversions certainly exist for M=4.
    The problem is that I know of no simple "algorithmic" way of
    implementing such conversion short of using big look-up tables. Neither
    in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    A table with 65536 entries is no longer considered "big".

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old and outdated?
    Or, better, in books or articles?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Wed Dec 6 16:48:51 2023
    On 06/12/2023 14:41, Michael S wrote:
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 11:30, Michael S wrote:
    Bit sequences with bounded number of continuous '1' bits are useful
    for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading
    '1' bits, at most T continuous trailing '1' bits and at most M=L+T
    continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a
    single '0' bit and then by desired number of extended words. Any
    gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired
    properties. For those interested, 76735 and 83665 extended words,
    respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old
    and outdated?
    Or, better, in books or articles?



    Usually the interest is in limiting length of repeats at any point in
    the data, not just the start and end. After all, it's not going to
    help anyone if the start and end of the word are limited to no more
    than 2 consecutive ones if you have a run of 10 ones in the middle of
    the word.

    Please read again my original post. It's all covered.

    Not as far as I can see. Maybe I have misunderstood you, or maybe you
    have not understood me. Correct me if I am wrong below.

    For L = 2, T = 2 (as an example), your coding would not allow

    1100 1110 0101 1011 1

    because it ends in 3 ones. (The spaces are just to make it easier to
    read.) Your aim is to avoid the join of two packets having more than 4
    ones in a row, so that if the receive sees 5 or more ones in a row, it
    knows it is an inter-packet idle period.

    But you /would/ allow :

    1000 1111 1111 1000 1

    This has no more than 1 leading or trailing one. But it has 8 ones in a
    row. A receiver that is trying to synchronise would see a run of 8 ones
    and interpret it as an inter-packet idle gap.

    Additionally, how is the receiver supposed to identify the number of
    ones at the start of a packet, after a gap? If it sees a sequence of 10
    ones, then a zero (and then other values), how does it know if the
    packet is starting with 0, 10, or 110 ?

    Synchronisation for the start of a frame in communication is always done
    with specific patterns containing at least one transition. For example,
    with good-old-fashioned UART communication, break (inter-character
    pauses) are a string of at least 10 ones, and the start of a character
    frame is always a zero with the line going from high to low (either
    after a break, or after the previous character's stop bit).



    It is also common to want to combine other encoding features at the
    same time. Limiting the lengths of both zeros and ones means you
    have transitions, which are needed for clock recovery and
    synchronisation. For many transmissions, approximately 0 DC balance
    is highly desirable. And you may also want forward error correction
    or at least error detection.


    In my case of interest the bits are encoded as NRZI so mere presence of
    '1' bits in headers and gaps guarantees that there are at least M
    transitions per packet.

    OK, but you still have to make sure there are a suitable minimum number
    of ones per frame. As it stands, a run of 17 zeros would fit your requirements.

    It is sufficient for clock phase recovery
    at receiver (a.k.a Data Phase Alignment). Clock frequency recovery is
    not necessary, because clocks at both sides of the link are derived from
    the same source.

    Fair enough. It is convenient when you have that!

    Anyway, when somewhat higher density of transitions is desirable, say,
    one transition (i.e. logical 1) per 25 or 30 bits, it's absolutely
    trivial to add at no additional coding cost since 76735 is sufficiently
    large than 65736.

    Certainly you can make sure you have enough ones in the packet. (I
    would not say it is "trivial" until you have found a simple algorithm
    for coding and decoding to match your requirements - but it is unlikely
    to make the problem any harder.)


    DC balance is not required in my case. When DC balance is required at
    all, the balance requirements tend to be rather strict. Weak balance is rarely of interest.

    Weak balance - a statistical or probabilistic balance - is certainly of interest in some systems. PCIe communication, for instance, maintains
    an approximate balance by using a scrambler. Some DC "leakage" through
    the terminators is enough to deal with the remainder, if I have
    understood things correctly.

    Other systems use active DC balance tracking to keep a stricter control,
    and transmit different encodings to minimise the DC imbalance. But even
    then, it is never perfect - no line drivers are equally strong for high
    and low drives, and there are always differences in the latencies for
    high and low transitions. Even with an ideal 50% duty cycle coming in,
    there will be a small DC bias that must be leaked away. So DC balancing
    in the encoding is /always/ weak, if you look closely enough.

    Typically disparity should be bounded to few
    percents over rather short intervals of few dozens of bit times. Or
    better. In order to achieve that you need rather higher coding rates
    than 17/16 or rather longer coding blocks than 16 bits. Or sometimes
    both. In other words, you have to play completely different game.

    Maybe you have some specialised use in mind?


    I do have a specialized use in mind, but for my specialized use the
    schemes that limit the number of continuous '1' bits to 4 are not
    necessary. Limit of 5 is perfectly o.k. and as I mentioned in original
    post, it's easy. In fact, if 5 was not so easy, from practical
    perspective I would be satisfied with significantly higher limit, like 7
    or 8.

    The question about bound of 4 is more of theoretical than of
    practical interest. Yesterday it surprised me to find out that such
    codes exists not just for 16b-to-17b, but for much longer words as
    well, up to 30b-to-31b. But there is a big difference between knowing
    that the code exists and finding a way to implement it in
    environment with limited resources.

    Otherwise, these links might be of interest to you, on related ideas :

    <https://en.wikipedia.org/wiki/64b/66b_encoding>
    <https://en.wikipedia.org/wiki/8b/10b_encoding>
    <https://en.wikipedia.org/wiki/Bit_stuffing>



    Don't you know me long enough to guess that I had read that much and
    more?


    I try to answer questions as I see them, rather than attempting to keep
    track of what everyone knows.

    When I see a post that looks like someone is trying to invent a new
    technique in an existing field, and their ideas are noticeably different
    from standard solutions, there are various possible explanations. One
    is that they have a very niche use-case so the standard ideas don't
    suit. Another is that they have come up with something truly original
    and exciting. Most common, however, is that they don't know much about
    the field and are not very aware of existing methods, and are currently
    just playing about with ideas for solutions. Option 3 is the first
    assumption until demonstrated otherwise. It looks like option 1 applies
    to you here.

    I can't say that I really understand why you have the requirements you
    are asking for here - they don't make sense to me. Obviously I don't
    have a full view of your system, but I cannot currently imagine a system
    for which your requirements would be the best way to solve the problem.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Dec 6 09:02:49 2023
    Michael S <[email protected]> writes:

    Bit sequences with bounded number of continuous '1' bits are useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading '1'
    bits, at most T continuous trailing '1' bits and at most M=L+T
    continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a single
    '0' bit and then by desired number of extended words. Any gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for (L=2,T=2)
    there exist more than 2**16 extended words with desired properties.
    For those interested, 76735 and 83665 extended words, respectively.
    So, desired 16b-to-17b conversions certainly exist for M=4.
    The problem is that I know of no simple "algorithmic" way of
    implementing such conversion short of using big look-up tables. Neither
    in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?

    Please excuse my not following.. I don't quite see what you're
    getting at here. I get that you want to transform an arbitrary
    16-bit value into a 17-bit value that satisfies a certain
    property (such that the 16-bit value can be reconstructed), but I
    don't get what the property is exactly or how a 17-bit value is
    transformed back into a 16-bit value. I do realize that what
    you're looking for is a nice way of turning a 16-bit value into a
    17-bit value (assuming I'm not completely confused, which is
    always a possibility). Can you spell it out for me in a bit more
    detail?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to MitchAlsup on Wed Dec 6 18:26:38 2023
    On Wed, 6 Dec 2023 14:43:59 +0000
    [email protected] (MitchAlsup) wrote:

    Michael S wrote:

    Bit sequences with bounded number of continuous '1' bits are useful
    for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading
    '1' bits, at most T continuous trailing '1' bits and at most M=L+T continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a
    single '0' bit and then by desired number of extended words. Any
    gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired properties. For those interested, 76735 and 83665 extended words, respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    A table with 65536 entries is no longer considered "big".


    "Compact hardware" in this case means ~50 LEs on each side of the link.
    LEs of sort we have in old-fashioned FPGA where each LE contains 1
    register, 1 4-input LUT, carry-in/carry-out for adder and little else.
    Use of one or two 8 Kbit SRAM blocks is acceptable on transmitter side,
    but not on receiver side.

    Pay attention that it's acceptable and even expected for implementation
    to be serial i.e. transmitting/receiving one bit per clock and
    completing the full transform every 17 clock cycles.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Wed Dec 6 20:01:05 2023
    On Wed, 06 Dec 2023 09:02:49 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    Bit sequences with bounded number of continuous '1' bits are useful
    for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading
    '1' bits, at most T continuous trailing '1' bits and at most M=L+T continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a
    single '0' bit and then by desired number of extended words. Any
    gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired properties. For those interested, 76735 and 83665 extended words, respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?

    Please excuse my not following.. I don't quite see what you're
    getting at here. I get that you want to transform an arbitrary
    16-bit value into a 17-bit value that satisfies a certain
    property (such that the 16-bit value can be reconstructed), but I
    don't get what the property is exactly or how a 17-bit value is
    transformed back into a 16-bit value. I do realize that what
    you're looking for is a nice way of turning a 16-bit value into a
    17-bit value (assuming I'm not completely confused, which is
    always a possibility). Can you spell it out for me in a bit more
    detail?

    I thought it was rather obvious but rather obviously it was not :(
    So let's do it again.

    1. Reversibility.
    No two 16-bit input patterns correspond to same 17-bit output pattern.
    2. Leading '1' bits.
    Output patterns that have L+1 or more consecutive '1' MS bits are
    illegal.
    3. Trailing '1' bits.
    Output patterns that have T+1 or more consecutive '1' LS bits are
    illegal.
    4. Inner '1' bits.
    Output patterns that have L+T+1 or more consecutive '1' bits starting
    at any position are illegal.

    (L,T) = either (1,3) or (2,2).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Wed Dec 6 19:43:07 2023
    On Wed, 6 Dec 2023 16:48:51 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 14:41, Michael S wrote:
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 11:30, Michael S wrote:
    Bit sequences with bounded number of continuous '1' bits are
    useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words
    to unique (n+1)-bit extended words with at most L continuous
    leading '1' bits, at most T continuous trailing '1' bits and at
    most M=L+T continuous inner '1' bits then we can transfer
    multi-word packets starting with a header consisting of M+1 '1'
    bits followed by a single '0' bit and then by desired number of
    extended words. Any gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and
    will quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired
    properties. For those interested, 76735 and 83665 extended words,
    respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and
    L=2,T=3) where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old
    and outdated?
    Or, better, in books or articles?



    Usually the interest is in limiting length of repeats at any point
    in the data, not just the start and end. After all, it's not
    going to help anyone if the start and end of the word are limited
    to no more than 2 consecutive ones if you have a run of 10 ones in
    the middle of the word.

    Please read again my original post. It's all covered.

    Not as far as I can see. Maybe I have misunderstood you, or maybe
    you have not understood me. Correct me if I am wrong below.

    For L = 2, T = 2 (as an example), your coding would not allow

    1100 1110 0101 1011 1

    because it ends in 3 ones. (The spaces are just to make it easier to
    read.) Your aim is to avoid the join of two packets having more than
    4 ones in a row, so that if the receive sees 5 or more ones in a row,
    it knows it is an inter-packet idle period.

    But you /would/ allow :

    1000 1111 1111 1000 1

    This has no more than 1 leading or trailing one. But it has 8 ones
    in a row. A receiver that is trying to synchronise would see a run
    of 8 ones and interpret it as an inter-packet idle gap.

    Additionally, how is the receiver supposed to identify the number of
    ones at the start of a packet, after a gap? If it sees a sequence of
    10 ones, then a zero (and then other values), how does it know if the
    packet is starting with 0, 10, or 110 ?

    Synchronisation for the start of a frame in communication is always
    done with specific patterns containing at least one transition. For
    example, with good-old-fashioned UART communication, break
    (inter-character pauses) are a string of at least 10 ones, and the
    start of a character frame is always a zero with the line going from
    high to low (either after a break, or after the previous character's
    stop bit).


    You still refused to read my original post without skipping words.
    In particular you obviously did not read this part "and at most M=L+T continuous inner '1' bits".



    It is also common to want to combine other encoding features at the
    same time. Limiting the lengths of both zeros and ones means you
    have transitions, which are needed for clock recovery and
    synchronisation. For many transmissions, approximately 0 DC balance
    is highly desirable. And you may also want forward error correction
    or at least error detection.


    In my case of interest the bits are encoded as NRZI so mere
    presence of '1' bits in headers and gaps guarantees that there are
    at least M transitions per packet.

    OK, but you still have to make sure there are a suitable minimum
    number of ones per frame. As it stands, a run of 17 zeros would fit
    your requirements.

    It is sufficient for clock phase recovery
    at receiver (a.k.a Data Phase Alignment). Clock frequency recovery
    is not necessary, because clocks at both sides of the link are
    derived from the same source.

    Fair enough. It is convenient when you have that!

    Anyway, when somewhat higher density of transitions is desirable,
    say, one transition (i.e. logical 1) per 25 or 30 bits, it's
    absolutely trivial to add at no additional coding cost since 76735
    is sufficiently large than 65736.

    Certainly you can make sure you have enough ones in the packet. (I
    would not say it is "trivial" until you have found a simple algorithm
    for coding and decoding to match your requirements - but it is
    unlikely to make the problem any harder.)


    In the mean time I found an algorithm that works. Sort of.
    Modification that causes it to generate no more than 5 leading zeros
    and no more than 16 trailing zeros per extended word is indeed trivial.

    I am not fully satisfied with this algorithm for two reasons:
    A. It's sort of ad hoc. It works for 16->17, but I am not sure that
    similar algorithms are going to work for, say 24->25 or 28->29.
    B. The hardware implementation (in FPGA) is not as compact as I would
    like. There are too many non-trivial constants involved and non-trivial constants consume LEs.


    DC balance is not required in my case. When DC balance is required
    at all, the balance requirements tend to be rather strict. Weak
    balance is rarely of interest.

    Weak balance - a statistical or probabilistic balance - is certainly
    of interest in some systems. PCIe communication, for instance,
    maintains an approximate balance by using a scrambler. Some DC
    "leakage" through the terminators is enough to deal with the
    remainder, if I have understood things correctly.


    PCIe Gen1 and Gen2 apply 8b-10b encoding that guarantees rather strong
    DC balance. As observed on 10T intervals, disparity never exceeds two
    bits. It is possible due too high coding rate of 5/4.
    For later PCIe generations DC balance is indeed non-robust. I suspect
    that hardware here is designed to transmit/receive totally non-balanced streams. With more balanced streams it just works better == has higher
    noise margin.
    But I admit to not learning newer PCIe phys to the same degree I
    learned earlier generations.

    Other systems use active DC balance tracking to keep a stricter
    control, and transmit different encodings to minimise the DC
    imbalance. But even then, it is never perfect - no line drivers are
    equally strong for high and low drives, and there are always
    differences in the latencies for high and low transitions. Even with
    an ideal 50% duty cycle coming in, there will be a small DC bias that
    must be leaked away. So DC balancing in the encoding is /always/
    weak, if you look closely enough.


    N/A for differential, which happens to dominate high speed outside of
    DRAM buses.

    Typically disparity should be bounded to few
    percents over rather short intervals of few dozens of bit times. Or
    better. In order to achieve that you need rather higher coding rates
    than 17/16 or rather longer coding blocks than 16 bits. Or sometimes
    both. In other words, you have to play completely different game.

    Maybe you have some specialised use in mind?


    I do have a specialized use in mind, but for my specialized use the
    schemes that limit the number of continuous '1' bits to 4 are not necessary. Limit of 5 is perfectly o.k. and as I mentioned in
    original post, it's easy. In fact, if 5 was not so easy, from
    practical perspective I would be satisfied with significantly
    higher limit, like 7 or 8.

    The question about bound of 4 is more of theoretical than of
    practical interest. Yesterday it surprised me to find out that such
    codes exists not just for 16b-to-17b, but for much longer words as
    well, up to 30b-to-31b. But there is a big difference between
    knowing that the code exists and finding a way to implement it in environment with limited resources.

    Otherwise, these links might be of interest to you, on related
    ideas :

    <https://en.wikipedia.org/wiki/64b/66b_encoding>
    <https://en.wikipedia.org/wiki/8b/10b_encoding>
    <https://en.wikipedia.org/wiki/Bit_stuffing>



    Don't you know me long enough to guess that I had read that much and
    more?


    I try to answer questions as I see them, rather than attempting to
    keep track of what everyone knows.

    When I see a post that looks like someone is trying to invent a new
    technique in an existing field, and their ideas are noticeably
    different from standard solutions, there are various possible
    explanations. One is that they have a very niche use-case so the
    standard ideas don't suit. Another is that they have come up with
    something truly original and exciting. Most common, however, is that
    they don't know much about the field and are not very aware of
    existing methods, and are currently just playing about with ideas for solutions. Option 3 is the first assumption until demonstrated
    otherwise. It looks like option 1 applies to you here.

    I can't say that I really understand why you have the requirements
    you are asking for here - they don't make sense to me. Obviously I
    don't have a full view of your system, but I cannot currently imagine
    a system for which your requirements would be the best way to solve
    the problem.



    I tend to think about coding techniques 'for transparency' as
    intellectual assets. You want to have more of them even in the absence
    of immediate application.
    My absolute favorite is Consistent Overhead Byte Stuffing a.k.a. COBS.
    It featured in few proposals, but never made it to any major standard.
    Despite it, it's a work of beauty and I use variants of COBS at work
    rather intensely, typically for communication between parts of our
    embedded systems.

    BTW, ideas of COBS are applicable among other approaches to avoiding
    5-long bit patterns. Not so much to 4-long bit patterns.
    The difference here is that avoidance of 5-long patterns can be
    attacked by splitting the word into 3-bit and 4-bit symbols and as long
    as there are symbols COBS has a chance.
    For 4-long patterns partitioning to 4-bit symbols, even when
    interleaved with 3-bit symbols, does not work for an obvious reasons.
    And partitioning [of 17-bit extended word] to 3-bit symbols does not
    work for even more obvious reason: 7**5 * 3 < 2**16.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Wed Dec 6 20:22:36 2023
    On Wed, 6 Dec 2023 12:30:11 +0200
    Michael S <[email protected]> wrote:

    Bit sequences with bounded number of continuous '1' bits are useful
    for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading '1'
    bits, at most T continuous trailing '1' bits and at most M=L+T
    continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a single
    '0' bit and then by desired number of extended words. Any gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for (L=2,T=2)
    there exist more than 2**16 extended words with desired properties.
    For those interested, 76735 and 83665 extended words, respectively.
    So, desired 16b-to-17b conversions certainly exist for M=4.
    The problem is that I know of no simple "algorithmic" way of
    implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old and outdated?
    Or, better, in books or articles?




    Here is my current algorithm.
    As mentioned in other post, it is unsatisfactory for two reasons:
    A. It's sort of ad hoc. It works for 16->17, but I am not sure that
    similar algorithms are going to work for, say 24->25 or 28->29.
    B. The hardware implementation (in FPGA) is not as compact as I would
    like. There are too many non-trivial constants involved. Non-trivial
    constants consume LEs.

    static const unsigned bv_tab[17] = { // bv_tab[k]/bv_tab[k+i] <= 2
    42557,
    21647,
    11011,
    5601,
    2849,
    1449,
    737,
    375,
    191,
    97,
    49,
    25,
    13,
    7,
    4,
    2,
    1,
    };

    // encode 16->17
    unsigned enc(unsigned x)
    {
    // avoid long strings of zeros by mapping inputs in range
    // [0:2**13-1] to [2**16:2**16+2**13-1]
    if (x < 8*1024)
    x += 0x10000;

    // convert x to sub-binary non-constant base
    unsigned y = 0;
    for (int i = 0; i < 17; ++i) {
    y += y;
    const unsigned bv = bv_tab[i];
    if (x >= bv) {
    x -= bv;
    y += 1;
    }
    }
    return y & 0x1FFFF;
    }

    // decode 17->16
    unsigned dec(unsigned x)
    {
    // convert x from sub-binary non-constant base back to binary
    unsigned y = 0;
    for (int i = 0; i < 17; ++i) {
    if (x & 0x10000) {
    y += bv_tab[i];
    }
    x += x;
    }
    return y & 0xFFFF;
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Terje Mathisen@21:1/5 to David Brown on Wed Dec 6 22:01:31 2023
    David Brown wrote:
    On 06/12/2023 11:30, Michael S wrote:
    Bit sequences with bounded number of continuous '1' bits are useful for
    serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading '1'
    bits, at most T continuous trailing '1' bits and at most M=L+T
    continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a single
    '0' bit and then by desired number of extended words. Any gaps between
    packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for (L=2,T=2)
    there exist more than 2**16 extended words with desired properties.
    For those interested, 76735 and 83665 extended words, respectively.
    So, desired 16b-to-17b conversions certainly exist for M=4.
    The problem is that I know of no simple "algorithmic" way of
    implementing such conversion short of using big look-up tables. Neither
    in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old and
    outdated?
    Or, better, in books or articles?



    Usually the interest is in limiting length of repeats at any point in
    the data, not just the start and end.  After all, it's not going to help anyone if the start and end of the word are limited to no more than 2 consecutive ones if you have a run of 10 ones in the middle of the word.

    It is also common to want to combine other encoding features at the same time.  Limiting the lengths of both zeros and ones means you have transitions, which are needed for clock recovery and synchronisation.
    For many transmissions, approximately 0 DC balance is highly desirable.
    And you may also want forward error correction or at least error detection.

    AFAIR an 8->10 encoding is used for all gigabit ethernet links, for
    exactly those purposes.

    Terje


    --
    - <Terje.Mathisen at tmsw.no>
    "almost all programming can be viewed as an exercise in caching"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Terje Mathisen on Thu Dec 7 02:11:35 2023
    On Wed, 6 Dec 2023 22:01:31 +0100
    Terje Mathisen <[email protected]> wrote:

    David Brown wrote:
    On 06/12/2023 11:30, Michael S wrote:
    Bit sequences with bounded number of continuous '1' bits are
    useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words
    to unique (n+1)-bit extended words with at most L continuous
    leading '1' bits, at most T continuous trailing '1' bits and at
    most M=L+T continuous inner '1' bits then we can transfer
    multi-word packets starting with a header consisting of M+1 '1'
    bits followed by a single '0' bit and then by desired number of
    extended words. Any gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired
    properties. For those interested, 76735 and 83665 extended words,
    respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and
    L=2,T=3) where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old
    and outdated?
    Or, better, in books or articles?



    Usually the interest is in limiting length of repeats at any point
    in the data, not just the start and end.� After all, it's not going
    to help anyone if the start and end of the word are limited to no
    more than 2 consecutive ones if you have a run of 10 ones in the
    middle of the word.

    It is also common to want to combine other encoding features at the
    same time.� Limiting the lengths of both zeros and ones means you
    have transitions, which are needed for clock recovery and
    synchronisation. For many transmissions, approximately 0 DC balance
    is highly desirable. And you may also want forward error correction
    or at least error detection.

    AFAIR an 8->10 encoding is used for all gigabit ethernet links, for
    exactly those purposes.

    Terje



    In 1GbE world 8B10B is used only for one of several available variants
    of MAC-to-PHY interface (SGMII) and by optical media (all sub-variants
    of 1000BaseX).

    The dominant 1GbE media standard, 1000BASE-T, does not use 8B10B. It
    applies much more complicated 8bit to 12-bit mapping with multiple
    mappings corresponding to each input octet.
    Strictly speaking, classic 8B10B also has multiple (2) mappings for
    some of input octets, but by comparison it is very simple.

    If we are going to believe Wikipedia, another extant copper 1GbE
    variant, 1000BASE-T1, uses 80->81 bit code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Wed Dec 6 19:28:15 2023
    Michael S <[email protected]> writes:

    On Wed, 06 Dec 2023 09:02:49 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    Bit sequences with bounded number of continuous '1' bits are useful
    for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading
    '1' bits, at most T continuous trailing '1' bits and at most M=L+T
    continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a
    single '0' bit and then by desired number of extended words. Any
    gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired
    properties. For those interested, 76735 and 83665 extended words,
    respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?

    Please excuse my not following.. I don't quite see what you're
    getting at here. I get that you want to transform an arbitrary
    16-bit value into a 17-bit value that satisfies a certain
    property (such that the 16-bit value can be reconstructed), but I
    don't get what the property is exactly or how a 17-bit value is
    transformed back into a 16-bit value. I do realize that what
    you're looking for is a nice way of turning a 16-bit value into a
    17-bit value (assuming I'm not completely confused, which is
    always a possibility). Can you spell it out for me in a bit more
    detail?

    I thought it was rather obvious but rather obviously it was not :(
    So let's do it again.

    1. Reversibility.
    No two 16-bit input patterns correspond to same 17-bit output pattern.
    2. Leading '1' bits.
    Output patterns that have L+1 or more consecutive '1' MS bits are
    illegal.
    3. Trailing '1' bits.
    Output patterns that have T+1 or more consecutive '1' LS bits are
    illegal.
    4. Inner '1' bits.
    Output patterns that have L+T+1 or more consecutive '1' bits starting
    at any position are illegal.

    (L,T) = either (1,3) or (2,2).

    Okay, I get it now.

    Here is an idea, such as it is. Clunky for software, might
    be okay for hardware.

    Subdivide the input space by looking at the top six bits. There
    are three ranges (with (L,T) = (2,2))

    000000... to 100011... maps to 0xxxxx...
    100100... to 110110... maps to 10xxxx...
    110111... to 111111... maps to 110xxx...

    which are

    36864 values of 47338 code points (77.87 percent)
    19456 values of 24079 code points (80.80 percent)
    9216 values of 12248 code points (75.24 percent)
    ===== =====
    65536 83665

    Continue subdividing recursively, choosing nice value ranges in
    binary to correspond to subdivisions in the output space. After
    the first level of dividing, subdivisions have five output ranges
    to choose from rather than just three (until we reach the end, of
    course). For example, values in the range

    [ 100000.. to 100100.. ) could be mapped to 01110xxx and 011110xxx

    (4096 values of 3169 + 1612 = 4781 code points, 85.67 percent)

    All the above just playing around by hand after the initial idea
    of subdividing based on a small number of leading bits.

    Not sure if this approach is good enough for you. I admit it's
    not pretty. My sense is it's plausible, even if rather ugly.
    And who knows, it may spark a different idea or a better approach.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Thu Dec 7 12:11:45 2023
    On Wed, 06 Dec 2023 19:28:15 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 06 Dec 2023 09:02:49 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    Bit sequences with bounded number of continuous '1' bits are
    useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words
    to unique (n+1)-bit extended words with at most L continuous
    leading '1' bits, at most T continuous trailing '1' bits and at
    most M=L+T continuous inner '1' bits then we can transfer
    multi-word packets starting with a header consisting of M+1 '1'
    bits followed by a single '0' bit and then by desired number of
    extended words. Any gaps between packets are filled with '1'
    bits. Receiver can easily synchronize itself to incoming packets
    and will quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired
    properties. For those interested, 76735 and 83665 extended words,
    respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic"
    way of implementing such conversion short of using big look-up
    tables. Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and
    L=2,T=3) where I know many such algorithms.

    Any ideas?

    Please excuse my not following.. I don't quite see what you're
    getting at here. I get that you want to transform an arbitrary
    16-bit value into a 17-bit value that satisfies a certain
    property (such that the 16-bit value can be reconstructed), but I
    don't get what the property is exactly or how a 17-bit value is
    transformed back into a 16-bit value. I do realize that what
    you're looking for is a nice way of turning a 16-bit value into a
    17-bit value (assuming I'm not completely confused, which is
    always a possibility). Can you spell it out for me in a bit more
    detail?

    I thought it was rather obvious but rather obviously it was not :(
    So let's do it again.

    1. Reversibility.
    No two 16-bit input patterns correspond to same 17-bit output
    pattern. 2. Leading '1' bits.
    Output patterns that have L+1 or more consecutive '1' MS bits are
    illegal.
    3. Trailing '1' bits.
    Output patterns that have T+1 or more consecutive '1' LS bits are
    illegal.
    4. Inner '1' bits.
    Output patterns that have L+T+1 or more consecutive '1' bits
    starting at any position are illegal.

    (L,T) = either (1,3) or (2,2).

    Okay, I get it now.

    Here is an idea, such as it is. Clunky for software, might
    be okay for hardware.

    Subdivide the input space by looking at the top six bits. There
    are three ranges (with (L,T) = (2,2))

    000000... to 100011... maps to 0xxxxx...
    100100... to 110110... maps to 10xxxx...
    110111... to 111111... maps to 110xxx...

    which are

    36864 values of 47338 code points (77.87 percent)
    19456 values of 24079 code points (80.80 percent)
    9216 values of 12248 code points (75.24 percent)
    ===== =====
    65536 83665

    Continue subdividing recursively, choosing nice value ranges in
    binary to correspond to subdivisions in the output space. After
    the first level of dividing, subdivisions have five output ranges
    to choose from rather than just three (until we reach the end, of
    course). For example, values in the range

    [ 100000.. to 100100.. ) could be mapped to 01110xxx and 011110xxx

    (4096 values of 3169 + 1612 = 4781 code points, 85.67 percent)

    All the above just playing around by hand after the initial idea
    of subdividing based on a small number of leading bits.

    Not sure if this approach is good enough for you. I admit it's
    not pretty. My sense is it's plausible, even if rather ugly.
    And who knows, it may spark a different idea or a better approach.

    I don't understand a "recursively" part.
    May be, more detailed description of the second stage will help.

    The key question I don't understand is whether boundaries used at
    the stage n+1 depend on the output bits produced at stage n or
    boundaries applied to residue of the input word at given stage are
    always the same.
    If the former, it seems that we end up with the same 64 Kword table
    size as in direct approach.
    If the later, it can be nice. But for that to work at each stage you
    probably want to subdivide to sub-spaces with approximately equal
    numbers of code points. So, may be, 01x, 01x and 1xx at first step?
    Or just 0x and 1x, but then it degenerates into my current
    "conversion to sub-binary non-constant base" algorithm.
    And for my current algorithm I know for sure that "nice value ranges"
    as you put it, can be used only at 5 initial stages. "Non nice" ranges
    are required at stage 6 and later.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to BGB on Thu Dec 7 11:35:16 2023
    On Wed, 6 Dec 2023 16:17:41 -0600
    BGB <[email protected]> wrote:

    On 12/6/2023 4:30 AM, Michael S wrote:
    Bit sequences with bounded number of continuous '1' bits are useful
    for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words to
    unique (n+1)-bit extended words with at most L continuous leading
    '1' bits, at most T continuous trailing '1' bits and at most M=L+T continuous inner '1' bits then we can transfer multi-word packets
    starting with a header consisting of M+1 '1' bits followed by a
    single '0' bit and then by desired number of extended words. Any
    gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired properties. For those interested, 76735 and 83665 extended words, respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and L=2,T=3)
    where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old
    and outdated?
    Or, better, in books or articles?


    For hardware, 8b/10b looks like a more sane option.


    8B10B is indeed simple, but an overhead is high.
    The size of my packets vary between 4 and 33 16-bit words.
    A protocol that was in use in previous generation of this product was a
    simple one - each 16-bit word is prefixed by a single '0' bit, gaps
    between packets are filled with 1s. This protocol requires at least
    17-bit gaps between packets so maximal packet rate for a given number of
    words N derived as: Rate = 1/(17*N+17).
    With 8B10B the formula is: Rate = 1/(20*N+10).
    One can see that 8B10B is strictly slower than an old simple scheme for
    all possible packet sizes. It happens to be just a little slower (1/85
    vs 1/90) for smallest packets, but quite a lot slower (1/578 vs 1/670)
    for biggest packets. It is also more complicated, esp. on receiver side. Another disadvantage of 8B10B is that after link is established packets
    can be started only on 10-bit boundaries. When the rate of packets is
    not a multiple of 10 clocks (which is most of the time) it adds jitter
    to link delay.

    Conclusion: 8B10B is not an improvement on old solution. In fact, it is
    an opposite of improvement.

    Now, you can ask why I want improvement at all?
    The answer is: mostly because I want the ability to transfer shortest
    packets (4 words) at rate of 1/84 clocks. My old protocol gives 1/85.
    From practical perspective very little is missing and rather simple
    improvement in encoding will do the job. The desire to produce encoded
    streams with at most 4 continuous 1s is rooted in the pursuit for
    theoretical perfection rather than the real-world requirements of
    moment.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Thu Dec 7 13:01:40 2023
    On 06/12/2023 18:43, Michael S wrote:
    On Wed, 6 Dec 2023 16:48:51 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 14:41, Michael S wrote:
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:



    You still refused to read my original post without skipping words.
    In particular you obviously did not read this part "and at most M=L+T continuous inner '1' bits".

    I did not "refuse" to do anything, but now that you have pointed this
    out, I see I misread a couple of things from the first post. Sorry for
    my confusion there, leading to some unnecessary concerns in my post as
    you had already covered the points.


    Anyway, when somewhat higher density of transitions is desirable,
    say, one transition (i.e. logical 1) per 25 or 30 bits, it's
    absolutely trivial to add at no additional coding cost since 76735
    is sufficiently large than 65736.

    Certainly you can make sure you have enough ones in the packet. (I
    would not say it is "trivial" until you have found a simple algorithm
    for coding and decoding to match your requirements - but it is
    unlikely to make the problem any harder.)


    In the mean time I found an algorithm that works. Sort of.
    Modification that causes it to generate no more than 5 leading zeros
    and no more than 16 trailing zeros per extended word is indeed trivial.

    I am not fully satisfied with this algorithm for two reasons:
    A. It's sort of ad hoc. It works for 16->17, but I am not sure that
    similar algorithms are going to work for, say 24->25 or 28->29.
    B. The hardware implementation (in FPGA) is not as compact as I would
    like. There are too many non-trivial constants involved and non-trivial constants consume LEs.


    You are trying to implement this in a small number of LE's in an FPGA,
    right? I'm guessing you are looking at a linear shift register
    arrangement of some kind, since they are amenable to compact FPGA implementations (unlike a big lookup table - which would give a simple
    solution if you could afford the space).

    But it does all sound somewhat arbitrary here. Why do you pick these
    specific numbers? It seems you are not concerned about errors other
    than synchronisation (presumably you handle any required error checking
    or correcting at a higher level). Since you have a common clock source,
    are we talking about a pair of FPGAs on the same board here?

    What sort of packet sizes are you using, and how critical is the
    bandwidth utilisation? Could you not massively simplify it by picking M
    as 16? Each 16 bits of data could be transferred as a frame consisting
    of a zero followed by the original data. Inter-packet gaps will be all
    ones, and you need a minimum gap of 17 ones before a new packet.
    Compared to your earlier suggestions this would increase the minimum gap
    time, but would make the implementation trivial.


    Of course if this is all motivated more by interest and the fun and
    challenge of trying to find such algorithms, rather than practical
    necessity, then I'll stop trying to look at the practicalities and try
    to think more about the coding patterns. Fun and interest is always a
    good reason.



    DC balance is not required in my case. When DC balance is required
    at all, the balance requirements tend to be rather strict. Weak
    balance is rarely of interest.

    Weak balance - a statistical or probabilistic balance - is certainly
    of interest in some systems. PCIe communication, for instance,
    maintains an approximate balance by using a scrambler. Some DC
    "leakage" through the terminators is enough to deal with the
    remainder, if I have understood things correctly.


    PCIe Gen1 and Gen2 apply 8b-10b encoding that guarantees rather strong
    DC balance. As observed on 10T intervals, disparity never exceeds two
    bits. It is possible due too high coding rate of 5/4.

    Yes.

    For later PCIe generations DC balance is indeed non-robust. I suspect
    that hardware here is designed to transmit/receive totally non-balanced streams. With more balanced streams it just works better == has higher
    noise margin.
    But I admit to not learning newer PCIe phys to the same degree I
    learned earlier generations.

    I don't know the details either, but there are higher level error
    checking mechanisms for later PCIe versions. So if you are unlucky
    enough to have data patterns that give too high DC bias, causing errors,
    these will lead to retransmissions. And I would assume the scrambler
    phase would be different even when re-transmitting the same original
    data, avoiding a repeat of the problem. Anyway, your transmission
    always has to cope with /some/ DC bias, even if it is small, because
    it's all analogue and never entirely precise. Leaking away the bias
    through terminators is easy enough - you just don't want to have to
    handle too much, because it costs power and slows the signal.


    I can't say that I really understand why you have the requirements
    you are asking for here - they don't make sense to me. Obviously I
    don't have a full view of your system, but I cannot currently imagine
    a system for which your requirements would be the best way to solve
    the problem.



    I tend to think about coding techniques 'for transparency' as
    intellectual assets. You want to have more of them even in the absence
    of immediate application.

    OK, that makes sense, and I fully agree on that principle.

    My absolute favorite is Consistent Overhead Byte Stuffing a.k.a. COBS.
    It featured in few proposals, but never made it to any major standard. Despite it, it's a work of beauty and I use variants of COBS at work
    rather intensely, typically for communication between parts of our
    embedded systems.

    It is certainly an interesting solution to a common problem. It is a
    long time since I heard about COBS - thank you for reminding me of it!


    BTW, ideas of COBS are applicable among other approaches to avoiding
    5-long bit patterns. Not so much to 4-long bit patterns.
    The difference here is that avoidance of 5-long patterns can be
    attacked by splitting the word into 3-bit and 4-bit symbols and as long
    as there are symbols COBS has a chance.
    For 4-long patterns partitioning to 4-bit symbols, even when
    interleaved with 3-bit symbols, does not work for an obvious reasons.
    And partitioning [of 17-bit extended word] to 3-bit symbols does not
    work for even more obvious reason: 7**5 * 3 < 2**16.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Thu Dec 7 15:23:39 2023
    Michael S <[email protected]> writes:
    On Wed, 6 Dec 2023 22:01:31 +0100
    Terje Mathisen <[email protected]> wrote:

    David Brown wrote:
    On 06/12/2023 11:30, Michael S wrote: =20
    Bit sequences with bounded number of continuous '1' bits are
    useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words
    to unique (n+1)-bit extended words with at most L continuous
    leading '1' bits, at most T continuous trailing '1' bits and at
    most M=3DL+T continuous inner '1' bits then we can transfer
    multi-word packets starting with a header consisting of M+1 '1'
    bits followed by a single '0' bit and then by desired number of
    extended words. Any gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=3D16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=3D1,T=3D3) and for
    (L=3D2,T=3D2) there exist more than 2**16 extended words with desired
    properties. For those interested, 76735 and 83665 extended words,
    respectively. So, desired 16b-to-17b conversions certainly exist
    for M=3D4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=3D5 (both L=3D1,T=3D4 and
    L=3D2,T=3D3) where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old
    and outdated?
    Or, better, in books or articles?

    =20
    =20
    Usually the interest is in limiting length of repeats at any point
    in the data, not just the start and end.=A0 After all, it's not going
    to help anyone if the start and end of the word are limited to no
    more than 2 consecutive ones if you have a run of 10 ones in the
    middle of the word.
    =20
    It is also common to want to combine other encoding features at the
    same time.=A0 Limiting the lengths of both zeros and ones means you
    have transitions, which are needed for clock recovery and
    synchronisation. For many transmissions, approximately 0 DC balance
    is highly desirable. And you may also want forward error correction
    or at least error detection. =20
    =20
    AFAIR an 8->10 encoding is used for all gigabit ethernet links, for=20
    exactly those purposes.
    =20
    Terje
    =20
    =20

    In 1GbE world 8B10B is used only for one of several available variants
    of MAC-to-PHY interface (SGMII) and by optical media (all sub-variants
    of 1000BaseX).

    The dominant 1GbE media standard, 1000BASE-T, does not use 8B10B. It
    applies much more complicated 8bit to 12-bit mapping with multiple
    mappings corresponding to each input octet.

    And 10Gbe uses 64b/66b which just requires a two-bit prefix that
    can only have the values 0b01 or 0b10.

    PCIe 3.0 uses 128b/130b.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Thu Dec 7 17:39:01 2023
    On Thu, 7 Dec 2023 13:01:40 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 18:43, Michael S wrote:
    On Wed, 6 Dec 2023 16:48:51 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 14:41, Michael S wrote:
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:



    You still refused to read my original post without skipping words.
    In particular you obviously did not read this part "and at most
    M=L+T continuous inner '1' bits".

    I did not "refuse" to do anything, but now that you have pointed this
    out, I see I misread a couple of things from the first post. Sorry
    for my confusion there, leading to some unnecessary concerns in my
    post as you had already covered the points.


    Anyway, when somewhat higher density of transitions is desirable,
    say, one transition (i.e. logical 1) per 25 or 30 bits, it's
    absolutely trivial to add at no additional coding cost since 76735
    is sufficiently large than 65736.

    Certainly you can make sure you have enough ones in the packet. (I
    would not say it is "trivial" until you have found a simple
    algorithm for coding and decoding to match your requirements - but
    it is unlikely to make the problem any harder.)


    In the mean time I found an algorithm that works. Sort of.
    Modification that causes it to generate no more than 5 leading zeros
    and no more than 16 trailing zeros per extended word is indeed
    trivial.

    I am not fully satisfied with this algorithm for two reasons:
    A. It's sort of ad hoc. It works for 16->17, but I am not sure that
    similar algorithms are going to work for, say 24->25 or 28->29.
    B. The hardware implementation (in FPGA) is not as compact as I
    would like. There are too many non-trivial constants involved and non-trivial constants consume LEs.


    You are trying to implement this in a small number of LE's in an
    FPGA, right? I'm guessing you are looking at a linear shift register arrangement of some kind, since they are amenable to compact FPGA implementations (unlike a big lookup table - which would give a
    simple solution if you could afford the space).


    LFSR that does the work would be great. But I don't think that it is
    even remotely possible. Or, may be, I am not enough of Galois
    algebraist to think about how is it possible.
    So far all my approaches were either logical (chained symbol
    substitution) or arithmetic (conversion to numeric systems with non-power-of-two base). In specific case of L+T=4 all algorithms that
    I tried so far were of later variety.
    FPGAs, except of *very* old ones, are actually quite good at addition
    and subtraction as long as your adder is not too wide. 16-20 bits are
    always o.k.


    But it does all sound somewhat arbitrary here. Why do you pick these specific numbers? It seems you are not concerned about errors other
    than synchronisation (presumably you handle any required error
    checking or correcting at a higher level). Since you have a common
    clock source, are we talking about a pair of FPGAs on the same board
    here?


    Different boards in the same relatively small rack.
    Transmitter size is not important, because there is only one
    transmitter per (data acquisition) board.
    Receiver size is relatively important, because there are many receivers
    per (concentrator) board.


    What sort of packet sizes are you using, and how critical is the
    bandwidth utilisation? Could you not massively simplify it by
    picking M as 16? Each 16 bits of data could be transferred as a
    frame consisting of a zero followed by the original data.
    Inter-packet gaps will be all ones, and you need a minimum gap of 17
    ones before a new packet. Compared to your earlier suggestions this
    would increase the minimum gap time, but would make the
    implementation trivial.


    Of course if this is all motivated more by interest and the fun and
    challenge of trying to find such algorithms, rather than practical
    necessity, then I'll stop trying to look at the practicalities and
    try to think more about the coding patterns. Fun and interest is
    always a good reason.


    Most of your above questions/suggestions answered above in my reply to
    BGB.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to BGB on Thu Dec 7 15:21:21 2023
    BGB <[email protected]> writes:
    On 12/6/2023 4:30 AM, Michael S wrote:

    May be, somebody had seen such methods in patents, preferably old and
    outdated?
    Or, better, in books or articles?


    For hardware, 8b/10b looks like a more sane option.

    Although that has 25% overhead.

    Using 64b/66b reduces the overhead to 3.125%.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephen Fuld@21:1/5 to Michael S on Thu Dec 7 09:10:40 2023
    On 12/6/2023 4:11 PM, Michael S wrote:
    On Wed, 6 Dec 2023 22:01:31 +0100
    Terje Mathisen <[email protected]> wrote:

    David Brown wrote:
    On 06/12/2023 11:30, Michael S wrote:
    Bit sequences with bounded number of continuous '1' bits are
    useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words
    to unique (n+1)-bit extended words with at most L continuous
    leading '1' bits, at most T continuous trailing '1' bits and at
    most M=L+T continuous inner '1' bits then we can transfer
    multi-word packets starting with a header consisting of M+1 '1'
    bits followed by a single '0' bit and then by desired number of
    extended words. Any gaps between packets are filled with '1' bits.
    Receiver can easily synchronize itself to incoming packets and will
    quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired
    properties. For those interested, 76735 and 83665 extended words,
    respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic" way
    of implementing such conversion short of using big look-up tables.
    Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and
    L=2,T=3) where I know many such algorithms.

    Any ideas?
    May be, somebody had seen such methods in patents, preferably old
    and outdated?
    Or, better, in books or articles?



    Usually the interest is in limiting length of repeats at any point
    in the data, not just the start and end.  After all, it's not going
    to help anyone if the start and end of the word are limited to no
    more than 2 consecutive ones if you have a run of 10 ones in the
    middle of the word.

    It is also common to want to combine other encoding features at the
    same time.  Limiting the lengths of both zeros and ones means you
    have transitions, which are needed for clock recovery and
    synchronisation. For many transmissions, approximately 0 DC balance
    is highly desirable. And you may also want forward error correction
    or at least error detection.

    AFAIR an 8->10 encoding is used for all gigabit ethernet links, for
    exactly those purposes.

    Terje



    In 1GbE world 8B10B is used only for one of several available variants
    of MAC-to-PHY interface (SGMII) and by optical media (all sub-variants
    of 1000BaseX).

    The dominant 1GbE media standard, 1000BASE-T, does not use 8B10B. It
    applies much more complicated 8bit to 12-bit mapping with multiple
    mappings corresponding to each input octet.
    Strictly speaking, classic 8B10B also has multiple (2) mappings for
    some of input octets, but by comparison it is very simple.

    If we are going to believe Wikipedia, another extant copper 1GbE
    variant, 1000BASE-T1, uses 80->81 bit code.

    8B/10B is/was also used for lots of serial interconnets, including fibre channel, ESCON, Infiniband, etc.

    But also according to Wikipedia, for 10Gb Ethernet, it has been replaced
    by 64B/66B, which is obviously much more space efficient.

    https://en.wikipedia.org/wiki/64b/66b_encoding

    But the part potentially relevant to the OP is that it uses an algorithm
    called "scrambling" to address the encoding/decoding problem, as a
    look-up table is obviously impractical. Perhaps the scrambling
    algorithm could be adapted for a 16B/17B application. The article
    includes links to descriptions, etc.



    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Stephen Fuld on Thu Dec 7 20:03:56 2023
    On Thu, 7 Dec 2023 09:10:40 -0800
    Stephen Fuld <[email protected]d> wrote:

    But the part potentially relevant to the OP is that it uses an
    algorithm called "scrambling" to address the encoding/decoding
    problem, as a look-up table is obviously impractical. Perhaps the
    scrambling algorithm could be adapted for a 16B/17B application. The
    article includes links to descriptions, etc.


    I don't believe that scrambling could be useful.
    Scrambling is great statistically, much less great when you want to
    guarantee any particular property with absolute certainty.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Michael S on Thu Dec 7 18:24:49 2023
    Michael S <[email protected]> writes:
    On Thu, 7 Dec 2023 09:10:40 -0800
    Stephen Fuld <[email protected]d> wrote:

    But the part potentially relevant to the OP is that it uses an
    algorithm called "scrambling" to address the encoding/decoding
    problem, as a look-up table is obviously impractical. Perhaps the
    scrambling algorithm could be adapted for a 16B/17B application. The
    article includes links to descriptions, etc.


    I don't believe that scrambling could be useful.

    10/100G ethernet seems to disagree with that characterization.

    Generally scrambled using a LFSR.

    100G/200G/400G use PAM-4 (four voltage levels).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Thu Dec 7 23:05:56 2023
    On Thu, 07 Dec 2023 18:24:49 GMT
    [email protected] (Scott Lurndal) wrote:

    Michael S <[email protected]> writes:
    On Thu, 7 Dec 2023 09:10:40 -0800
    Stephen Fuld <[email protected]d> wrote:

    But the part potentially relevant to the OP is that it uses an
    algorithm called "scrambling" to address the encoding/decoding
    problem, as a look-up table is obviously impractical. Perhaps the
    scrambling algorithm could be adapted for a 16B/17B application.
    The article includes links to descriptions, etc.


    I don't believe that scrambling could be useful.

    10/100G ethernet seems to disagree with that characterization.


    O.k. Personally for you I'd spell it differently: I don't believe that scrambling could be useful for generation of 16-bit to 17-bit code words
    with strict bound on the number of consecutive '1' bits.

    Generally scrambled using a LFSR.

    100G/200G/400G use PAM-4 (four voltage levels).


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Fri Dec 8 14:28:37 2023
    On 07/12/2023 16:39, Michael S wrote:
    On Thu, 7 Dec 2023 13:01:40 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 18:43, Michael S wrote:
    On Wed, 6 Dec 2023 16:48:51 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 14:41, Michael S wrote:
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:



    You still refused to read my original post without skipping words.
    In particular you obviously did not read this part "and at most
    M=L+T continuous inner '1' bits".

    I did not "refuse" to do anything, but now that you have pointed this
    out, I see I misread a couple of things from the first post. Sorry
    for my confusion there, leading to some unnecessary concerns in my
    post as you had already covered the points.


    Anyway, when somewhat higher density of transitions is desirable,
    say, one transition (i.e. logical 1) per 25 or 30 bits, it's
    absolutely trivial to add at no additional coding cost since 76735
    is sufficiently large than 65736.

    Certainly you can make sure you have enough ones in the packet. (I
    would not say it is "trivial" until you have found a simple
    algorithm for coding and decoding to match your requirements - but
    it is unlikely to make the problem any harder.)


    In the mean time I found an algorithm that works. Sort of.
    Modification that causes it to generate no more than 5 leading zeros
    and no more than 16 trailing zeros per extended word is indeed
    trivial.

    I am not fully satisfied with this algorithm for two reasons:
    A. It's sort of ad hoc. It works for 16->17, but I am not sure that
    similar algorithms are going to work for, say 24->25 or 28->29.
    B. The hardware implementation (in FPGA) is not as compact as I
    would like. There are too many non-trivial constants involved and
    non-trivial constants consume LEs.


    You are trying to implement this in a small number of LE's in an
    FPGA, right? I'm guessing you are looking at a linear shift register
    arrangement of some kind, since they are amenable to compact FPGA
    implementations (unlike a big lookup table - which would give a
    simple solution if you could afford the space).


    LFSR that does the work would be great. But I don't think that it is
    even remotely possible. Or, may be, I am not enough of Galois
    algebraist to think about how is it possible.

    Most of the Galois Field stuff I have done is in connection with
    redundancy and error correction for RAID, rather than this kind of
    thing. My thought is that some kind of scrambler pattern might be able
    to generate the patterns you need, or at least something close as a
    first step. But I have no idea how to look for such encodings other
    than by trial and error.

    So far all my approaches were either logical (chained symbol
    substitution) or arithmetic (conversion to numeric systems with non-power-of-two base). In specific case of L+T=4 all algorithms that
    I tried so far were of later variety.

    I can see how that could work, but I would expect it to be too big for
    what you want in your FPGAs?

    FPGAs, except of *very* old ones, are actually quite good at addition
    and subtraction as long as your adder is not too wide. 16-20 bits are
    always o.k.


    But it does all sound somewhat arbitrary here. Why do you pick these
    specific numbers? It seems you are not concerned about errors other
    than synchronisation (presumably you handle any required error
    checking or correcting at a higher level). Since you have a common
    clock source, are we talking about a pair of FPGAs on the same board
    here?


    Different boards in the same relatively small rack.
    Transmitter size is not important, because there is only one
    transmitter per (data acquisition) board.
    Receiver size is relatively important, because there are many receivers
    per (concentrator) board.


    Interesting - that might allow for alternative ideas. So you'd be okay
    with needing something like lookup tables for the transmission encoding,
    as long as reception decoding can be small? (I had been assuming that
    you have more symmetrical hardware requirements.)


    What sort of packet sizes are you using, and how critical is the
    bandwidth utilisation? Could you not massively simplify it by
    picking M as 16? Each 16 bits of data could be transferred as a
    frame consisting of a zero followed by the original data.
    Inter-packet gaps will be all ones, and you need a minimum gap of 17
    ones before a new packet. Compared to your earlier suggestions this
    would increase the minimum gap time, but would make the
    implementation trivial.


    Of course if this is all motivated more by interest and the fun and
    challenge of trying to find such algorithms, rather than practical
    necessity, then I'll stop trying to look at the practicalities and
    try to think more about the coding patterns. Fun and interest is
    always a good reason.


    Most of your above questions/suggestions answered above in my reply to
    BGB.


    OK.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Fri Dec 8 22:10:06 2023
    Michael S <[email protected]> writes:

    On Wed, 06 Dec 2023 19:28:15 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 06 Dec 2023 09:02:49 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    Bit sequences with bounded number of continuous '1' bits are
    useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit words
    to unique (n+1)-bit extended words with at most L continuous
    leading '1' bits, at most T continuous trailing '1' bits and at
    most M=L+T continuous inner '1' bits then we can transfer
    multi-word packets starting with a header consisting of M+1 '1'
    bits followed by a single '0' bit and then by desired number of
    extended words. Any gaps between packets are filled with '1'
    bits. Receiver can easily synchronize itself to incoming packets
    and will quickly and robustly recovers after media errors.
    Such conversions have are other application as well. They always
    have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is of
    particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with desired
    properties. For those interested, 76735 and 83665 extended words,
    respectively. So, desired 16b-to-17b conversions certainly exist
    for M=4. The problem is that I know of no simple "algorithmic"
    way of implementing such conversion short of using big look-up
    tables. Neither in software nor in simple compact hardware.
    That is in strong contrast to case of M=5 (both L=1,T=4 and
    L=2,T=3) where I know many such algorithms.

    Any ideas?

    Please excuse my not following.. I don't quite see what you're
    getting at here. I get that you want to transform an arbitrary
    16-bit value into a 17-bit value that satisfies a certain
    property (such that the 16-bit value can be reconstructed), but I
    don't get what the property is exactly or how a 17-bit value is
    transformed back into a 16-bit value. I do realize that what
    you're looking for is a nice way of turning a 16-bit value into a
    17-bit value (assuming I'm not completely confused, which is
    always a possibility). Can you spell it out for me in a bit more
    detail?

    I thought it was rather obvious but rather obviously it was not :(
    So let's do it again.

    1. Reversibility.
    No two 16-bit input patterns correspond to same 17-bit output
    pattern. 2. Leading '1' bits.
    Output patterns that have L+1 or more consecutive '1' MS bits are
    illegal.
    3. Trailing '1' bits.
    Output patterns that have T+1 or more consecutive '1' LS bits are
    illegal.
    4. Inner '1' bits.
    Output patterns that have L+T+1 or more consecutive '1' bits
    starting at any position are illegal.

    (L,T) = either (1,3) or (2,2).

    Okay, I get it now.

    Here is an idea, such as it is. Clunky for software, might
    be okay for hardware.

    Subdivide the input space by looking at the top six bits. There
    are three ranges (with (L,T) = (2,2))

    000000... to 100011... maps to 0xxxxx...
    100100... to 110110... maps to 10xxxx...
    110111... to 111111... maps to 110xxx...

    which are

    36864 values of 47338 code points (77.87 percent)
    19456 values of 24079 code points (80.80 percent)
    9216 values of 12248 code points (75.24 percent)
    ===== =====
    65536 83665

    Continue subdividing recursively, choosing nice value ranges in
    binary to correspond to subdivisions in the output space. After
    the first level of dividing, subdivisions have five output ranges
    to choose from rather than just three (until we reach the end, of
    course). For example, values in the range

    [ 100000.. to 100100.. ) could be mapped to 01110xxx and 011110xxx

    (4096 values of 3169 + 1612 = 4781 code points, 85.67 percent)

    All the above just playing around by hand after the initial idea
    of subdividing based on a small number of leading bits.

    Not sure if this approach is good enough for you. I admit it's
    not pretty. My sense is it's plausible, even if rather ugly.
    And who knows, it may spark a different idea or a better approach.

    I don't understand a "recursively" part.
    May be, more detailed description of the second stage will help.

    Probably it won't, but here goes. :)

    First level decomposition (values on left, codes on the right). In all
    cases values are the lower end of a range.

    0 000 00x xxx xxx xxx 0................
    1 001 00x xxx xxx xxx 10...............
    1 101 11x xxx xxx xxx 110..............

    We give an example of decomposing the third range. The third range has
    14 code bits remaining, which gives 12248 codes available for use.
    The 12248 codes are subdivided as follows

    0............. 6230
    10............ 3169
    110........... 1612
    1110.......... 820
    11110......... 417

    Second level decomposition of the third range above. First we use the
    last two prefixes to take care of the odd-shaped portion up to a value
    point of 1 110 000 000 000 000

    1 101 11x xxx xxx xxx 1024 values with 1237 = 820+417 codes
    0 000 xxx xxx 110 1110... 704 values of 820 codes
    1 011 xxx xxx 110 11110.. 320 values of 417 codes

    (1 110 000 000 000 000 boundary)

    After nibbling off the odd-shaped portion, we are left with 8192 values:

    1 11x xxx xxx xxx xxx

    We subdivide these 8192 values among the first three prefixes

    1 11x xxx xxx xxx xxx
    0 xxx xxx xxx xxx 110 0...... 4096+512 = 4608 : 6230 codes
    1 001 xxx xxx xxx 110 10..... 2048+512 = 2560 : 3169
    1 11x xxx xxx xxx 110 110.... 1024 = 1024 : 1622

    (10 000 000 000 000 000 boundary)

    The principle being used here is to subdivide the value range along
    lines that are determined by high order bit positions, chosen so that
    each subrange maps onto a code point range that is roughly 4/3 the size
    of the value range being mapped.

    The key question I don't understand is whether boundaries used at
    the stage n+1 depend on the output bits produced at stage n or
    boundaries applied to residue of the input word at given stage are
    always the same.
    If the former, it seems that we end up with the same 64 Kword table
    size as in direct approach.

    There is dependence but it's of a different kind. What happens
    at stage n+1 depends on the upstream value bits. The tests
    are done at fixed bit positions, and the results combined by
    logic so there is a lot of sharing. I think the growth is
    more like linear than exponential.

    If the later, it can be nice. But for that to work at each stage you probably want to subdivide to sub-spaces with approximately equal
    numbers of code points.

    My approach is to divide the value space into ranges whose sizes
    are roughly proportional to the set of code points for the code
    prefix used for that set of values.

    So, may be, 01x, 01x and 1xx at first step?
    Or just 0x and 1x, but then it degenerates into my current
    "conversion to sub-binary non-constant base" algorithm.
    And for my current algorithm I know for sure that "nice value ranges"
    as you put it, can be used only at 5 initial stages. "Non nice" ranges
    are required at stage 6 and later.

    Probably we mean different things by "nice value range". The approach
    I've been describing is not iterative, it's cascading logic.

    Now that I have investigated the code space more carefully I understand
    the problem a lot better. I think it's easy to construct a solution,
    but the challenge is to construct a "good" solution. I independently
    developed an iterative approach similar to your table-driven algorithm
    shown in another posting, and it was pretty straightforward (and also generalizes without difficulty to different numbers of bits). Although
    it was easy to produce, it's not as compact as your table-driven
    algorithm. The scheme I have outlined above takes a different approach,
    and looks much better suited to hardware then software.

    If any of the above is confusing, probably it's best to just forget
    about it. I suspect the qualities of a "good" solution depend a lot on
    aspects of the application that are outside the given problem statement.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Tim Rentsch on Sun Dec 10 14:50:01 2023
    On Fri, 08 Dec 2023 22:10:06 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 06 Dec 2023 19:28:15 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    On Wed, 06 Dec 2023 09:02:49 -0800
    Tim Rentsch <[email protected]> wrote:

    Michael S <[email protected]> writes:

    Bit sequences with bounded number of continuous '1' bits are
    useful for serialization of packetized data.

    For example, if we have a way of converting arbitrary n-bit
    words to unique (n+1)-bit extended words with at most L
    continuous leading '1' bits, at most T continuous trailing '1'
    bits and at most M=L+T continuous inner '1' bits then we can
    transfer multi-word packets starting with a header consisting
    of M+1 '1' bits followed by a single '0' bit and then by
    desired number of extended words. Any gaps between packets are
    filled with '1' bits. Receiver can easily synchronize itself
    to incoming packets and will quickly and robustly recovers
    after media errors. Such conversions have are other application
    as well. They always have.

    For me, a case of n=16 (16-bit words, 17-bit extended words) is
    of particular practical interest.

    Brute force counting proves that both for (L=1,T=3) and for
    (L=2,T=2) there exist more than 2**16 extended words with
    desired properties. For those interested, 76735 and 83665
    extended words, respectively. So, desired 16b-to-17b
    conversions certainly exist for M=4. The problem is that I
    know of no simple "algorithmic" way of implementing such
    conversion short of using big look-up tables. Neither in
    software nor in simple compact hardware. That is in strong
    contrast to case of M=5 (both L=1,T=4 and L=2,T=3) where I know
    many such algorithms.

    Any ideas?

    Please excuse my not following.. I don't quite see what you're
    getting at here. I get that you want to transform an arbitrary
    16-bit value into a 17-bit value that satisfies a certain
    property (such that the 16-bit value can be reconstructed), but I
    don't get what the property is exactly or how a 17-bit value is
    transformed back into a 16-bit value. I do realize that what
    you're looking for is a nice way of turning a 16-bit value into a
    17-bit value (assuming I'm not completely confused, which is
    always a possibility). Can you spell it out for me in a bit more
    detail?

    I thought it was rather obvious but rather obviously it was not :(
    So let's do it again.

    1. Reversibility.
    No two 16-bit input patterns correspond to same 17-bit output
    pattern. 2. Leading '1' bits.
    Output patterns that have L+1 or more consecutive '1' MS bits are
    illegal.
    3. Trailing '1' bits.
    Output patterns that have T+1 or more consecutive '1' LS bits are
    illegal.
    4. Inner '1' bits.
    Output patterns that have L+T+1 or more consecutive '1' bits
    starting at any position are illegal.

    (L,T) = either (1,3) or (2,2).

    Okay, I get it now.

    Here is an idea, such as it is. Clunky for software, might
    be okay for hardware.

    Subdivide the input space by looking at the top six bits. There
    are three ranges (with (L,T) = (2,2))

    000000... to 100011... maps to 0xxxxx...
    100100... to 110110... maps to 10xxxx...
    110111... to 111111... maps to 110xxx...

    which are

    36864 values of 47338 code points (77.87 percent)
    19456 values of 24079 code points (80.80 percent)
    9216 values of 12248 code points (75.24 percent)
    ===== =====
    65536 83665

    Continue subdividing recursively, choosing nice value ranges in
    binary to correspond to subdivisions in the output space. After
    the first level of dividing, subdivisions have five output ranges
    to choose from rather than just three (until we reach the end, of
    course). For example, values in the range

    [ 100000.. to 100100.. ) could be mapped to 01110xxx and
    011110xxx

    (4096 values of 3169 + 1612 = 4781 code points, 85.67 percent)

    All the above just playing around by hand after the initial idea
    of subdividing based on a small number of leading bits.

    Not sure if this approach is good enough for you. I admit it's
    not pretty. My sense is it's plausible, even if rather ugly.
    And who knows, it may spark a different idea or a better approach.


    I don't understand a "recursively" part.
    May be, more detailed description of the second stage will help.

    Probably it won't, but here goes. :)

    First level decomposition (values on left, codes on the right). In
    all cases values are the lower end of a range.

    0 000 00x xxx xxx xxx 0................
    1 001 00x xxx xxx xxx 10...............
    1 101 11x xxx xxx xxx 110..............

    We give an example of decomposing the third range. The third range
    has 14 code bits remaining, which gives 12248 codes available for use.
    The 12248 codes are subdivided as follows

    0............. 6230
    10............ 3169
    110........... 1612
    1110.......... 820
    11110......... 417

    Second level decomposition of the third range above. First we use the
    last two prefixes to take care of the odd-shaped portion up to a value
    point of 1 110 000 000 000 000

    1 101 11x xxx xxx xxx 1024 values with 1237 = 820+417 codes
    0 000 xxx xxx 110 1110... 704 values of 820 codes
    1 011 xxx xxx 110 11110.. 320 values of 417 codes

    (1 110 000 000 000 000 boundary)

    After nibbling off the odd-shaped portion, we are left with 8192
    values:

    1 11x xxx xxx xxx xxx

    We subdivide these 8192 values among the first three prefixes

    1 11x xxx xxx xxx xxx
    0 xxx xxx xxx xxx 110 0...... 4096+512 = 4608 : 6230 codes
    1 001 xxx xxx xxx 110 10..... 2048+512 = 2560 : 3169
    1 11x xxx xxx xxx 110 110.... 1024 = 1024 : 1622

    (10 000 000 000 000 000 boundary)

    The principle being used here is to subdivide the value range along
    lines that are determined by high order bit positions, chosen so that
    each subrange maps onto a code point range that is roughly 4/3 the
    size of the value range being mapped.

    The key question I don't understand is whether boundaries used at
    the stage n+1 depend on the output bits produced at stage n or
    boundaries applied to residue of the input word at given stage are
    always the same.
    If the former, it seems that we end up with the same 64 Kword table
    size as in direct approach.

    There is dependence but it's of a different kind. What happens
    at stage n+1 depends on the upstream value bits. The tests
    are done at fixed bit positions, and the results combined by
    logic so there is a lot of sharing. I think the growth is
    more like linear than exponential.

    If the later, it can be nice. But for that to work at each stage
    you probably want to subdivide to sub-spaces with approximately
    equal numbers of code points.

    My approach is to divide the value space into ranges whose sizes
    are roughly proportional to the set of code points for the code
    prefix used for that set of values.

    So, may be, 01x, 01x and 1xx at first step?
    Or just 0x and 1x, but then it degenerates into my current
    "conversion to sub-binary non-constant base" algorithm.
    And for my current algorithm I know for sure that "nice value
    ranges" as you put it, can be used only at 5 initial stages. "Non
    nice" ranges are required at stage 6 and later.

    Probably we mean different things by "nice value range". The approach
    I've been describing is not iterative, it's cascading logic.


    May be.
    For me "nice value range" means that all values can be represented
    as M(i)*2**E(i) where min(M(i)) is small.
    At the end I looked for such nice values with brute force search.
    It turned out that there exist multiple sets that are nice enough for
    my purposes i.e. enable implementation of de-serializer that
    together in additional logic fits in 50 LEs.
    And finding it only took ~3 minutes of compute time on a single core of
    old PC.
    That's my current and likely final algorithm.

    static const unsigned bv_tab[17] = {
    46, 47, 48, 49,
    50, 51, 52, 53,
    54, 55, 56, 56,
    56, 64, 64, 64,
    64, };

    // encode 16->17
    unsigned enc(unsigned x)
    {
    if (x < (1<<12))
    x += 0x10000; // avoid long strings of zeros

    // convert x to non-binary base
    unsigned y = 0;
    for (int i = 0; i < 17; ++i) {
    unsigned bv = bv_tab[i]*1024;
    y += y;
    if (x >= bv) {
    x -= bv;
    y += 1;
    }
    x += x;
    }
    return y & 0x1FFFF;
    }

    // decode 17->16
    unsigned dec(unsigned x)
    {
    // convert x from non-binary base to binary
    unsigned y = 0;
    for (int i = 0; i < 17; ++i) {
    y += y;
    if (x & 0x10000)
    y += bv_tab[i];
    x += x;
    }
    return (y >> 6) & 0xFFFF;
    }


    Now that I have investigated the code space more carefully I
    understand the problem a lot better. I think it's easy to construct
    a solution, but the challenge is to construct a "good" solution. I independently developed an iterative approach similar to your
    table-driven algorithm shown in another posting, and it was pretty straightforward (and also generalizes without difficulty to different
    numbers of bits). Although it was easy to produce, it's not as
    compact as your table-driven algorithm. The scheme I have outlined
    above takes a different approach, and looks much better suited to
    hardware then software.

    If any of the above is confusing, probably it's best to just forget
    about it. I suspect the qualities of a "good" solution depend a lot
    on aspects of the application that are outside the given problem
    statement.

    Yes, that sounds true.
    For example, good solution that encodes/decodes one word per clock is
    probably quite different from my algorithm above that is oriented
    toward producing/consuming one bit per clock.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Sun Dec 10 15:19:35 2023
    On Fri, 8 Dec 2023 14:28:37 +0100
    David Brown <[email protected]> wrote:

    On 07/12/2023 16:39, Michael S wrote:
    On Thu, 7 Dec 2023 13:01:40 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 18:43, Michael S wrote:
    On Wed, 6 Dec 2023 16:48:51 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 14:41, Michael S wrote:
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:



    You still refused to read my original post without skipping words.
    In particular you obviously did not read this part "and at most
    M=L+T continuous inner '1' bits".

    I did not "refuse" to do anything, but now that you have pointed
    this out, I see I misread a couple of things from the first post.
    Sorry for my confusion there, leading to some unnecessary concerns
    in my post as you had already covered the points.


    Anyway, when somewhat higher density of transitions is
    desirable, say, one transition (i.e. logical 1) per 25 or 30
    bits, it's absolutely trivial to add at no additional coding
    cost since 76735 is sufficiently large than 65736.

    Certainly you can make sure you have enough ones in the packet.
    (I would not say it is "trivial" until you have found a simple
    algorithm for coding and decoding to match your requirements -
    but it is unlikely to make the problem any harder.)


    In the mean time I found an algorithm that works. Sort of.
    Modification that causes it to generate no more than 5 leading
    zeros and no more than 16 trailing zeros per extended word is
    indeed trivial.

    I am not fully satisfied with this algorithm for two reasons:
    A. It's sort of ad hoc. It works for 16->17, but I am not sure
    that similar algorithms are going to work for, say 24->25 or
    29. B. The hardware implementation (in FPGA) is not as
    compact as I would like. There are too many non-trivial constants
    involved and non-trivial constants consume LEs.


    You are trying to implement this in a small number of LE's in an
    FPGA, right? I'm guessing you are looking at a linear shift
    register arrangement of some kind, since they are amenable to
    compact FPGA implementations (unlike a big lookup table - which
    would give a simple solution if you could afford the space).


    LFSR that does the work would be great. But I don't think that it is
    even remotely possible. Or, may be, I am not enough of Galois
    algebraist to think about how is it possible.

    Most of the Galois Field stuff I have done is in connection with
    redundancy and error correction for RAID, rather than this kind of
    thing. My thought is that some kind of scrambler pattern might be
    able to generate the patterns you need, or at least something close
    as a first step. But I have no idea how to look for such encodings
    other than by trial and error.


    And suppose that by trial and error I found a polynomial that does a
    decent job. Now how I am going to do a reverse transform? I mean,
    without having algebraic understanding? It sounds like much harder task
    than finding a scrambler :(

    So far all my approaches were either logical (chained symbol
    substitution) or arithmetic (conversion to numeric systems with non-power-of-two base). In specific case of L+T=4 all algorithms
    that I tried so far were of later variety.

    I can see how that could work, but I would expect it to be too big
    for what you want in your FPGAs?


    A logical cores of circuits that do such transforms at rate one bit
    per clock are rather small. Converter from binary to sub-binary base
    is very similar to integer divider and reverse converter is very
    similar to integer multiplier. Both of them belong to class of
    shift-and-add algorithms.
    The difference from DIV/MUL is that in DIV/MUL one of the arguments of algorithm remains in register where in Numeric System transforms this
    argument is delivered from look-up table. Compactness of both circuits
    in FPGA depends mainly on depth and width of look up tables.

    FPGAs, except of *very* old ones, are actually quite good at
    addition and subtraction as long as your adder is not too wide.
    16-20 bits are always o.k.


    But it does all sound somewhat arbitrary here. Why do you pick
    these specific numbers? It seems you are not concerned about
    errors other than synchronisation (presumably you handle any
    required error checking or correcting at a higher level). Since
    you have a common clock source, are we talking about a pair of
    FPGAs on the same board here?


    Different boards in the same relatively small rack.
    Transmitter size is not important, because there is only one
    transmitter per (data acquisition) board.
    Receiver size is relatively important, because there are many
    receivers per (concentrator) board.


    Interesting - that might allow for alternative ideas. So you'd be
    okay with needing something like lookup tables for the transmission
    encoding, as long as reception decoding can be small? (I had been
    assuming that you have more symmetrical hardware requirements.)


    I guess I would be ok with that.
    But by now I found a solution that is sufficiently compact on both
    sides.


    What sort of packet sizes are you using, and how critical is the
    bandwidth utilisation? Could you not massively simplify it by
    picking M as 16? Each 16 bits of data could be transferred as a
    frame consisting of a zero followed by the original data.
    Inter-packet gaps will be all ones, and you need a minimum gap of
    17 ones before a new packet. Compared to your earlier suggestions
    this would increase the minimum gap time, but would make the
    implementation trivial.


    Of course if this is all motivated more by interest and the fun and
    challenge of trying to find such algorithms, rather than practical
    necessity, then I'll stop trying to look at the practicalities and
    try to think more about the coding patterns. Fun and interest is
    always a good reason.


    Most of your above questions/suggestions answered above in my reply
    to BGB.


    OK.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Michael S on Sun Dec 10 10:46:02 2023
    Michael S <[email protected]> writes:

    On Fri, 08 Dec 2023 22:10:06 -0800
    Tim Rentsch <[email protected]> wrote:

    [...]

    Probably we mean different things by "nice value range". The
    approach I've been describing is not iterative, it's cascading
    logic.

    May be.
    For me "nice value range" means that all values can be represented
    as M(i)*2**E(i) where min(M(i)) is small.
    At the end I looked for such nice values with brute force search.
    It turned out that there exist multiple sets that are nice enough
    for my purposes i.e. enable implementation of de-serializer that
    together in additional logic fits in 50 LEs.
    And finding it only took ~3 minutes of compute time on a single
    core of old PC.
    That's my current and likely final algorithm.

    static const unsigned bv_tab[17] = {
    46, 47, 48, 49,
    50, 51, 52, 53,
    54, 55, 56, 56,
    56, 64, 64, 64,
    64, };

    // encode 16->17
    unsigned enc(unsigned x)
    {
    if (x < (1<<12))
    x += 0x10000; // avoid long strings of zeros

    // convert x to non-binary base
    unsigned y = 0;
    for (int i = 0; i < 17; ++i) {
    unsigned bv = bv_tab[i]*1024;
    y += y;
    if (x >= bv) {
    x -= bv;
    y += 1;
    }
    x += x;
    }
    return y & 0x1FFFF;
    }

    // decode 17->16
    unsigned dec(unsigned x)
    {
    // convert x from non-binary base to binary
    unsigned y = 0;
    for (int i = 0; i < 17; ++i) {
    y += y;
    if (x & 0x10000)
    y += bv_tab[i];
    x += x;
    }
    return (y >> 6) & 0xFFFF;
    }

    This scheme certainly looks nicer than the idea I was describing.
    Excellent!

    The regularity of values in bv_tab[] makes it possible to get rid
    of the array entirely. Here is a version of the encode function
    that produces identical values to enc() above. Please ignore
    minor differences in coding style; the important things to focus
    on are the variables 'm' and 'd', with successive values of 'm'
    taking on values of elements from the bv_tab[] array.

    unsigned
    encode_without_bvtab( unsigned x0 ){
    unsigned x = x0 < 1<<12 ? x0 + 0x10000 : x0;
    unsigned r = 0;
    unsigned m = 46;
    unsigned d = 011777; // note that this value is in octal

    for( int i = 0; i < 17; i++, x += x, d >>= 1 ){
    r <<= 1;
    if( x >= m*1024 ) x -= m*1024, r |= 1;
    m += (d&1) + (d==1)*7;
    }

    return r;
    }

    A similar transformation can be made to the dec() decoding function.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Mon Dec 11 08:59:49 2023
    On 10/12/2023 14:19, Michael S wrote:
    On Fri, 8 Dec 2023 14:28:37 +0100
    David Brown <[email protected]> wrote:

    On 07/12/2023 16:39, Michael S wrote:
    On Thu, 7 Dec 2023 13:01:40 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 18:43, Michael S wrote:
    On Wed, 6 Dec 2023 16:48:51 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 14:41, Michael S wrote:
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:




    LFSR that does the work would be great. But I don't think that it is
    even remotely possible. Or, may be, I am not enough of Galois
    algebraist to think about how is it possible.

    Most of the Galois Field stuff I have done is in connection with
    redundancy and error correction for RAID, rather than this kind of
    thing. My thought is that some kind of scrambler pattern might be
    able to generate the patterns you need, or at least something close
    as a first step. But I have no idea how to look for such encodings
    other than by trial and error.


    And suppose that by trial and error I found a polynomial that does a
    decent job. Now how I am going to do a reverse transform? I mean,
    without having algebraic understanding? It sounds like much harder task
    than finding a scrambler :(

    This is not something I have done myself, so I might be getting things
    very wrong. But with a digital scrambler, you are xor'ing the data
    sequence with a pseudo-random generated sequence, giving a result that
    is statistically noise with very close to 50% one/zero ratio.

    To get the data back, you simply scramble again with the same
    pseudo-random sequence. As long as your data and pseudo-random
    sequences are synchronised, it's all easy - scrambling and unscrambling
    are the same thing.

    At least, that's the theory as far as I understand it.

    (Real data is always biased in its original form - typically far more
    zeros than ones. Though if the data is encrypted or compressed, it will
    be much more like unbiased noise.)

    None of this helps find a scrambling algorithm or polynomial that meets
    your needs, but I /think/ it means it would work well if you found such
    a scrambler.


    So far all my approaches were either logical (chained symbol
    substitution) or arithmetic (conversion to numeric systems with
    non-power-of-two base). In specific case of L+T=4 all algorithms
    that I tried so far were of later variety.

    I can see how that could work, but I would expect it to be too big
    for what you want in your FPGAs?


    A logical cores of circuits that do such transforms at rate one bit
    per clock are rather small. Converter from binary to sub-binary base
    is very similar to integer divider and reverse converter is very
    similar to integer multiplier. Both of them belong to class of
    shift-and-add algorithms.

    I keep thinking of division as being big and/or slow (in comparison to multiplication, which is often supported single-cycle in hardware blocks
    of FPGAs) but of course when you don't need more than one bit out per
    clock, it's not bad that bad.

    The difference from DIV/MUL is that in DIV/MUL one of the arguments of algorithm remains in register where in Numeric System transforms this argument is delivered from look-up table. Compactness of both circuits
    in FPGA depends mainly on depth and width of look up tables.

    FPGAs, except of *very* old ones, are actually quite good at
    addition and subtraction as long as your adder is not too wide.
    16-20 bits are always o.k.


    But it does all sound somewhat arbitrary here. Why do you pick
    these specific numbers? It seems you are not concerned about
    errors other than synchronisation (presumably you handle any
    required error checking or correcting at a higher level). Since
    you have a common clock source, are we talking about a pair of
    FPGAs on the same board here?


    Different boards in the same relatively small rack.
    Transmitter size is not important, because there is only one
    transmitter per (data acquisition) board.
    Receiver size is relatively important, because there are many
    receivers per (concentrator) board.


    Interesting - that might allow for alternative ideas. So you'd be
    okay with needing something like lookup tables for the transmission
    encoding, as long as reception decoding can be small? (I had been
    assuming that you have more symmetrical hardware requirements.)


    I guess I would be ok with that.
    But by now I found a solution that is sufficiently compact on both
    sides.

    Great! Everything beyond that is just for fun :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to David Brown on Tue Dec 12 00:36:02 2023
    On Mon, 11 Dec 2023 08:59:49 +0100
    David Brown <[email protected]> wrote:

    On 10/12/2023 14:19, Michael S wrote:
    On Fri, 8 Dec 2023 14:28:37 +0100
    David Brown <[email protected]> wrote:

    On 07/12/2023 16:39, Michael S wrote:
    On Thu, 7 Dec 2023 13:01:40 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 18:43, Michael S wrote:
    On Wed, 6 Dec 2023 16:48:51 +0100
    David Brown <[email protected]> wrote:

    On 06/12/2023 14:41, Michael S wrote:
    On Wed, 6 Dec 2023 13:06:29 +0100
    David Brown <[email protected]> wrote:




    LFSR that does the work would be great. But I don't think that it
    is even remotely possible. Or, may be, I am not enough of Galois
    algebraist to think about how is it possible.

    Most of the Galois Field stuff I have done is in connection with
    redundancy and error correction for RAID, rather than this kind of
    thing. My thought is that some kind of scrambler pattern might be
    able to generate the patterns you need, or at least something close
    as a first step. But I have no idea how to look for such encodings
    other than by trial and error.


    And suppose that by trial and error I found a polynomial that does a
    decent job. Now how I am going to do a reverse transform? I mean,
    without having algebraic understanding? It sounds like much harder
    task than finding a scrambler :(

    This is not something I have done myself, so I might be getting
    things very wrong. But with a digital scrambler, you are xor'ing the
    data sequence with a pseudo-random generated sequence, giving a
    result that is statistically noise with very close to 50% one/zero
    ratio.

    To get the data back, you simply scramble again with the same
    pseudo-random sequence. As long as your data and pseudo-random
    sequences are synchronised, it's all easy - scrambling and
    unscrambling are the same thing.

    At least, that's the theory as far as I understand it.

    (Real data is always biased in its original form - typically far more
    zeros than ones. Though if the data is encrypted or compressed, it
    will be much more like unbiased noise.)

    None of this helps find a scrambling algorithm or polynomial that
    meets your needs, but I /think/ it means it would work well if you
    found such a scrambler.


    Scrambler of this type is guaranteed to not work at all for my purpose.

    All it achieves is making output string to look like random sequence.
    And random sequence has approximately 35% probability to contains 4
    consecutive '1' bits per 16-bit or 17-bit word. Way above 50%
    probability for my smallest 64-bit packet.
    Scrambler like thhat can not even take advantage of additional bit that
    I am willing to add per every 16 bits.
    It is believable that scramblers of this sort are great for reducing probability of *long* bursts of 1s (or zeros) down to insignificance.
    Say, bursts of 40 repetitions. Or, at least, of 25 repetitions. But not
    of 4 or 5 or 10. Not even of 16.




    So far all my approaches were either logical (chained symbol
    substitution) or arithmetic (conversion to numeric systems with
    non-power-of-two base). In specific case of L+T=4 all algorithms
    that I tried so far were of later variety.

    I can see how that could work, but I would expect it to be too big
    for what you want in your FPGAs?


    A logical cores of circuits that do such transforms at rate one bit
    per clock are rather small. Converter from binary to sub-binary base
    is very similar to integer divider and reverse converter is very
    similar to integer multiplier. Both of them belong to class of shift-and-add algorithms.

    I keep thinking of division as being big and/or slow (in comparison
    to multiplication, which is often supported single-cycle in hardware
    blocks of FPGAs) but of course when you don't need more than one bit
    out per clock, it's not bad that bad.


    In FPGA, at 1 bit per clock and without using HW multipliers (which are
    of little use for 1 bit per clock, anyway) I am not sure at all that
    multiplier would be smaller than divider. More likely, divider would be smaller, but a little (few percents) slower in terms of achievable
    clock frequency.

    The difference from DIV/MUL is that in DIV/MUL one of the arguments
    of algorithm remains in register where in Numeric System transforms
    this argument is delivered from look-up table. Compactness of both
    circuits in FPGA depends mainly on depth and width of look up
    tables.
    FPGAs, except of *very* old ones, are actually quite good at
    addition and subtraction as long as your adder is not too wide.
    16-20 bits are always o.k.


    But it does all sound somewhat arbitrary here. Why do you pick
    these specific numbers? It seems you are not concerned about
    errors other than synchronisation (presumably you handle any
    required error checking or correcting at a higher level). Since
    you have a common clock source, are we talking about a pair of
    FPGAs on the same board here?


    Different boards in the same relatively small rack.
    Transmitter size is not important, because there is only one
    transmitter per (data acquisition) board.
    Receiver size is relatively important, because there are many
    receivers per (concentrator) board.


    Interesting - that might allow for alternative ideas. So you'd be
    okay with needing something like lookup tables for the transmission
    encoding, as long as reception decoding can be small? (I had been
    assuming that you have more symmetrical hardware requirements.)


    I guess I would be ok with that.
    But by now I found a solution that is sufficiently compact on both
    sides.

    Great! Everything beyond that is just for fun :-)



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)