block by kiddphunk 6051900

word bricks: a method for visualizing n-grams

Full Screen
Figure 1
Displaying allduplicated n-grams between words and words, using the default'separate by type' layout. The n-grams are rendered with no tailstails and not offsetoffset, and use the color palette named 'pimppastel' (chroma only). Cell height represents n-gram lengthcount. Mutation is offon (mutation generates progressively-altered variations, one every two seconds). Reload to generate new palette colors.

   <center><b>Enter your own text</b> to visualize below:</center>
   <textarea id="content1"></textarea>
The height multiplier is <span data-var="heightScale" class="TKAdjustableNumber" data-min="1" data-max="100"></span>
 and the offset multiplier is <span data-var="yScale"      class="TKAdjustableNumber" data-min="0" data-max="100"></span>       
</div>
Displaying allduplicated n-grams between words and words, using the default'separate by type' layout. The n-grams are rendered with no tailstails and not offsetoffset, and use the color palette from d3chroma named 'pimppastel' (chroma only).
Enter your own text to visualize below:

update2 -

Something about radial sets.

Abstract

illustrate and pattern match visually, alternative/complement to inner arc joinings

introduction

encodings

position, size, shape, and color

n-gram
I have a dream length

test

I have a dream 4
I have a 3
have a dream
I have 2 ↑
have a
a dream

color

Color can be used to encode many possible attributes; below I focus on coloring the cells of an n-gram as a whole, based on the parts-of-speech. This is an Natural Language Toolkit.

fill

While many encodings are possible into the color dimension, I have generally played with coloring a word by the word’s part-of-speech (POS) utilizing the analysis of the Natural Language Toolkit. Assuming we are encoding by POS, one cou

In the absence of a toolkit to determine a word’s POS (as with the interactive examples on this page), a simple indexing into the chosen palette can be utilized.

opacity

could do count

PRP VBP DT NN Part-of-Speech
I have a dream

test

</table>
<p>
test
</p>
<table>
    <tr class='a'>
        <td style='color:black; background-color:#7affaa' class="a some">I</td>
        <td style='color:black; background-color:#7affaa' class="a some">have</td>
        <td style='color:black; background-color:#7affaa' class="a some">a</td>
        <td style='color:black; background-color:#7affaa' class='a'>dream</td>
        <td class='a none2'></td>
        <td class='a none2'><b>4</b></td>        
    </tr>    
    <tr class='a blank'></tr>
    <tr class='a'>
        <td style='color:black; background-color:#fda9a9' class="a some">I</td>
        <td style='color:black; background-color:#fda9a9' class="a some">have</td>
        <td style='color:black; background-color:#fda9a9' class='a'>a</td>
        <td class='a none2'></td>
        <td class='a none2'></td>
        <td class='a none2'><b>3</b></td>        
    </tr>
    <tr class='a spacer'></tr>
    <tr class='a'>
        <td class="a none"></td>
        <td style='color:black; background-color:#7affaa'class="a some">have</td>
        <td style='color:black; background-color:#7affaa'class="a some">a</td>
        <td style='color:black; background-color:#7affaa'class='a'>dream</td>
    </tr>    
    <tr class='a blank'></tr>
    <tr class='a'>
        <td style='color:black; background-color:#d6ff0b' class="a some">I</td>
        <td style='color:black; background-color:#d6ff0b' class='a'>have</td>
        <td class='a none2'></td>
        <td class='a none2'></td>
        <td class='a none2'></td>
        <td class='a none2'><b>2 ↑</b></td>
    </tr>
    <tr class='a spacer'></tr>
    <tr class='a'>
        <td class='a none'></td>
        <td style='color:black; background-color:#fda9a9'class="a some">have</td>
        <td style='color:black; background-color:#fda9a9'class='a'>a</td>
        <td class='a none'></td>
    </tr>
    <tr class='a spacer'></tr>
    <tr class='a'>
        <td class='a none'></td>
        <td class='a none'></td>
        <td style='color:black; background-color:#7affaa' class="a some">a</td>
        <td style='color:black; background-color:#7affaa' class='a'>dream</td>
    </tr>
</table>

position

x/y

offset every other

overlap still happens, but every other word, vs every word - unless each instance gets own track, but that quickly adds up the more repetition there is in and the n-gram lengths go up - practically grasping patterns in this manner is not as easy as one might think.

axis/symmetry

layer order

test 1231231 2312 412 4 124 124

If the algorithm one chooses to render causes each cell of a given n-gram length to be is own its own track, then there isn’t as much overlap, and the layer order may not contribute much to the presentation. I generally lay down cells in either increasing or decreasing n-gram lengths; laying down the smallest ones last generally allows for the greatest detail to be visible. As with all of the previous algorithms for each of the major attributes of the graph, one could imagine other logic for the n-gram layering.

I have a dream 4
I have a 3
have a dream
I have 2 ↑
have a
a dream
I have a dream 4
I have a dream 3
I have a dream 2 ↑

test

I have a dream 4
have a dream 3
I have a dream 2 ↑
have a 2 ↓
I have a 3

shape

height/length

tails

index.html

chart.css

chroma.palette-gen.min.js

natural.coffee

natural.js