Hierarchical clustering

30 views
Skip to first unread message

David Cabanillas

unread,
Mar 21, 2014, 5:59:02 AM3/21/14
to jmotif-...@googlegroups.com
Hi,
I have a doubt about hierarchical clustering. The Newich tree in the attached file shows that the 33 and 36 series theoreticaly the two closest elements. However, I have plotted the series 33 and 36 an the series 31 and 36. And the second tuple (31-36) is closest that the first tuple (33-36). But in the Newich tree not seems consider this detail.


I have added the timeseries used:

series30 = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 1.36, 3.88, 6.23, 8.62, 10.81, 13.25, 15.1, 16.97, 18.96, 20.58, 22.02, 23.41, 24.77, 26.15, 27.45, 28.67, 29.7, 30.93, 32.04, 33.16, 34.27, 35.39, 36.5, 37.62, 39.55, 41.36, 43.04, 44.41, 45.66, 46.9, 47.87, 48.71, 49.54, 50.43, 51.22, 52.0, 52.66, 53.38, 54.1, 54.82, 55.54, 56.26, 56.98, 57.72, 58.5, 59.28, 60.05, 60.76, 61.51, 62.27, 63.02, 63.77, 64.53, 65.28, 66.03, 66.79, 67.54, 68.29, 68.89, 69.51, 70.11, 70.71, 71.31, 71.91, 72.44, 73.0, 73.55, 74.1, 74.65, 75.21, 75.76, 76.31, 76.86, 77.41, 77.98, 78.61, 79.27, 79.94, 80.6, 81.38, 82.5, 88.05, 102.69, 113.87, 127.57, 139.59, 144.36, 146.54, 146.86, 146.53, 144.68, 141.93, 139.17, 136.73, 134.6, 132.67, 130.78, 129.18, 127.8, 126.44, 125.12, 124.0, 122.97, 121.94, 120.94, 120.09, 119.28, 118.47, 117.67, 114.66, 40.29, 12.19, 9.81, 9.59, 33.5, 71.21, 98.56, 106.98, 108.1, 107.87, 107.47, 107.07, 106.67, 106.24, 105.79, 105.33, 104.87, 104.41, 103.95, 103.49, 103.03, 102.59, 102.18, 101.77, 101.42, 101.07, 100.73, 100.39, 100.04, 99.7, 99.35, 99.01, 98.68, 98.38, 98.08, 97.84, 97.59, 97.34, 97.09, 96.85, 96.6, 96.35, 96.1, 95.86, 95.61, 95.39, 95.17, 94.99, 94.81, 94.63, 94.45, 94.26, 94.08, 93.9, 93.72, 93.54, 93.36, 93.18, 92.99, 92.81, 92.68, 92.54, 92.38, 92.24, 92.1, 91.95, 91.82, 91.68, 91.55, 91.43, 91.31, 91.2, 91.09, 90.97, 90.86, 90.75, 90.64, 90.52, 90.41, 90.3, 90.18, 90.07, 89.96, 89.86, 89.76, 89.67, 89.57, 89.48, 89.4, 89.33, 89.25, 89.18, 89.1, 89.03, 88.95, 88.88, 88.8, 88.73, 88.65, 88.58, 88.5, 88.43, 88.35, 88.28, 88.22, 88.15, 88.08, 88.03, 87.97, 87.91, 87.85, 87.8, 87.75, 87.69, 87.64, 87.58, 87.54, 87.49, 87.44, 87.39, 87.35, 87.31, 87.27, 87.22, 87.18, 87.14, 87.09, 87.06, 87.02, 86.98, 86.94, 86.9, 86.86, 86.83, 86.79, 86.76, 86.72, 86.68, 86.65, 86.63, 86.6, 86.56, 86.53, 86.5, 86.46, 86.43, 86.41, 86.38, 86.35, 86.32, 86.29, 86.25, 86.22, 86.19, 86.18, 86.15, 86.12, 86.09, 86.06, 86.03, 86.0, 85.97, 85.95, 85.92, 85.9, 85.87, 85.84, 85.81, 85.78, 85.76, 85.73, 85.71, 85.69, 85.66, 85.63, 85.61, 85.58, 85.55, 85.53, 85.5, 85.47, 85.45, 85.43, 85.41, 85.39, 85.37, 85.35, 85.33, 85.31, 85.29, 85.27, 85.25, 85.24, 85.22, 85.2, 85.18, 85.16, 85.14, 85.12, 85.09, 85.07, 85.05, 85.03, 85.01, 84.99, 84.98, 84.96, 84.94, 84.92, 84.91, 84.89, 84.87, 84.86, 84.84, 84.82)
series31 = c(0.01, 0.02, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 1.37, 3.92, 6.29, 8.7, 10.91, 13.38, 15.25, 17.13, 19.15, 20.78, 22.24, 23.65, 25.02, 26.41, 27.73, 28.95, 30.0, 31.24, 32.37, 33.49, 34.62, 35.74, 36.87, 37.99, 39.94, 41.77, 43.47, 44.85, 46.11, 47.37, 48.35, 49.2, 50.04, 50.93, 51.73, 52.52, 53.18, 53.91, 54.64, 55.37, 56.1, 56.82, 57.01, 57.2, 57.38, 58.1, 58.85, 59.55, 60.28, 61.02, 61.76, 62.5, 63.24, 63.98, 64.71, 65.45, 66.19, 66.92, 67.51, 68.12, 68.71, 69.3, 69.88, 70.47, 70.99, 71.54, 72.08, 72.62, 73.16, 73.7, 74.24, 74.78, 75.33, 75.87, 76.42, 77.04, 77.69, 78.34, 78.99, 79.75, 80.85, 86.29, 100.63, 111.6, 125.02, 136.8, 141.47, 143.61, 143.92, 143.6, 141.78, 139.09, 136.39, 133.99, 131.91, 130.02, 128.17, 126.6, 125.24, 123.91, 122.62, 121.52, 120.51, 119.5, 118.52, 117.69, 116.9, 116.1, 115.32, 112.37, 39.48, 11.95, 9.61, 9.4, 32.83, 69.79, 96.59, 104.84, 105.94, 105.71, 105.32, 104.93, 104.54, 104.12, 103.67, 103.22, 102.77, 102.32, 101.87, 101.42, 100.97, 100.54, 100.14, 99.73, 99.39, 99.05, 98.72, 98.38, 98.04, 97.7, 97.37, 97.03, 96.71, 96.42, 96.12, 95.88, 95.64, 95.39, 95.15, 94.91, 94.67, 94.42, 94.18, 93.94, 93.7, 93.48, 93.27, 93.09, 92.91, 92.73, 92.56, 92.38, 92.2, 92.02, 91.85, 91.67, 91.49, 91.31, 91.13, 90.96, 90.82, 90.69, 90.53, 90.4, 90.25, 90.11, 89.98, 89.85, 89.72, 89.6, 89.49, 89.38, 89.27, 89.15, 89.04, 88.93, 88.82, 88.71, 88.6, 88.49, 88.38, 88.27, 88.16, 88.07, 87.97, 87.88, 87.78, 87.69, 87.61, 87.54, 87.47, 87.39, 87.32, 87.25, 87.17, 87.1, 87.03, 86.95, 86.88, 86.81, 86.73, 86.66, 86.59, 86.52, 86.45, 86.39, 86.32, 86.27, 86.21, 86.15, 86.09, 86.04, 85.99, 85.94, 85.88, 85.83, 85.79, 85.74, 85.69, 85.64, 85.61, 85.56, 85.52, 85.48, 85.44, 85.39, 85.35, 85.32, 85.28, 85.24, 85.2, 85.16, 85.12, 85.09, 85.06, 85.02, 84.99, 84.95, 84.92, 84.9, 84.87, 84.83, 84.8, 84.77, 84.74, 84.7, 84.68, 84.65, 84.62, 84.59, 84.56, 84.53, 84.5, 84.47, 84.45, 84.43, 84.4, 84.37, 84.34, 84.31, 84.28, 84.25, 84.23, 84.21, 84.18, 84.15, 84.12, 84.1, 84.07, 84.04, 84.01, 84.0, 83.97, 83.95, 83.92, 83.89, 83.87, 83.84, 83.82, 83.79, 83.76, 83.75, 83.73, 83.71, 83.69, 83.67, 83.65, 83.63, 83.61, 83.59, 83.57, 83.55, 83.54, 83.52, 83.5, 83.48, 83.45, 83.43, 83.41, 83.39, 83.37, 83.35, 83.33, 83.31, 83.29, 83.28, 83.26, 83.24, 83.23, 83.21, 83.19, 83.18, 83.16, 83.14, 83.13, 83.11, 83.09)
series32 = c(0.01, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 1.38, 3.95, 6.34, 8.77, 11.0, 13.49, 15.37, 17.27, 19.3, 20.95, 22.41, 23.84, 25.22, 26.62, 27.94, 29.18, 30.24, 31.49, 32.62, 33.76, 34.89, 36.03, 37.16, 38.3, 40.26, 42.1, 43.82, 45.21, 46.48, 47.75, 48.74, 49.59, 50.44, 51.33, 52.14, 52.94, 53.61, 54.34, 55.07, 55.81, 56.54, 57.27, 58.01, 58.76, 59.55, 60.35, 61.13, 61.85, 62.62, 63.39, 64.16, 64.92, 65.69, 66.46, 67.22, 67.99, 68.76, 69.52, 70.13, 70.76, 71.37, 71.98, 72.59, 73.21, 73.75, 74.31, 74.87, 75.43, 76.0, 76.56, 77.12, 77.68, 78.25, 78.81, 79.38, 80.03, 80.7, 81.37, 82.05, 82.85, 83.98, 89.64, 104.53, 115.92, 129.87, 142.1, 146.96, 149.18, 149.51, 149.17, 147.28, 144.48, 141.68, 139.19, 137.03, 135.06, 133.14, 131.51, 130.1, 128.72, 127.37, 126.23, 125.18, 124.14, 123.11, 122.25, 121.43, 120.61, 119.79, 116.72, 41.01, 12.41, 9.99, 9.76, 34.1, 72.49, 100.34, 108.91, 110.05, 109.81, 109.4, 109.0, 108.59, 108.15, 107.69, 107.22, 106.76, 106.29, 105.82, 105.35, 104.89, 104.44, 104.02, 103.6, 103.24, 102.89, 102.54, 102.19, 101.84, 101.49, 101.14, 100.79, 100.46, 100.16, 99.85, 99.6, 99.35, 99.09, 98.84, 98.59, 98.34, 98.09, 97.83, 97.58, 97.33, 97.11, 96.88, 96.7, 96.51, 96.33, 96.15, 95.96, 95.78, 95.59, 95.41, 95.22, 95.04, 94.85, 94.67, 94.48, 94.34, 94.2, 94.04, 93.9, 93.75, 93.61, 93.47, 93.33, 93.19, 93.07, 92.96, 92.84, 92.73, 92.61, 92.5, 92.38, 92.27, 92.15, 92.04, 91.92, 91.81, 91.69, 91.58, 91.48, 91.38, 91.28, 91.18, 91.09, 91.01, 90.93, 90.86, 90.78, 90.71, 90.63, 90.55, 90.48, 90.4, 90.32, 90.25, 90.17, 90.1, 90.02, 89.94, 89.87, 89.8, 89.74, 89.67, 89.61, 89.55, 89.49, 89.43, 89.38, 89.32, 89.27, 89.21, 89.16, 89.11, 89.06, 89.01, 88.96, 88.92, 88.88, 88.84, 88.79, 88.75, 88.71, 88.66, 88.63, 88.58, 88.54, 88.5, 88.46, 88.42, 88.39, 88.35, 88.32, 88.28, 88.25, 88.21, 88.19, 88.16, 88.12, 88.09, 88.05, 88.02, 87.99, 87.96, 87.93, 87.9, 87.87, 87.84, 87.81, 87.78, 87.75, 87.73, 87.7, 87.67, 87.64, 87.61, 87.58, 87.55, 87.51, 87.49, 87.47, 87.44, 87.41, 87.38, 87.36, 87.33, 87.3, 87.27, 87.26, 87.23, 87.2, 87.17, 87.15, 87.12, 87.09, 87.07, 87.04, 87.01, 86.99, 86.97, 86.95, 86.93, 86.91, 86.89, 86.87, 86.85, 86.83, 86.81, 86.79, 86.78, 86.76, 86.73, 86.71, 86.69, 86.67, 86.65, 86.63, 86.6, 86.58, 86.56, 86.54, 86.52, 86.51, 86.49, 86.47, 86.45, 86.44, 86.42, 86.4, 86.38, 86.37, 86.35, 86.33, 86.31, 86.3, 86.28, 86.26)
series33 = c(0.01, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.26, 0.29, 0.33, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 1.39, 3.98, 6.39, 8.83, 11.08, 13.58, 15.48, 17.39, 19.43, 21.09, 22.57, 24.0, 25.39, 26.8, 28.14, 29.38, 30.45, 31.7, 32.85, 33.99, 35.13, 36.27, 37.42, 38.56, 40.53, 42.39, 44.12, 45.52, 46.8, 48.08, 49.07, 49.93, 50.78, 51.69, 52.5, 53.3, 53.97, 54.71, 55.45, 56.19, 56.93, 57.67, 58.4, 59.16, 59.96, 60.76, 61.55, 62.28, 63.05, 63.82, 64.6, 65.37, 66.14, 66.91, 67.69, 68.46, 69.23, 69.99, 70.61, 71.25, 71.86, 72.48, 73.09, 73.71, 74.25, 74.82, 75.39, 75.95, 76.52, 77.09, 77.65, 78.22, 78.78, 79.35, 79.93, 80.58, 81.25, 81.93, 82.61, 83.42, 84.56, 90.25, 105.25, 116.72, 130.76, 143.08, 147.97, 150.21, 150.53, 150.19, 148.3, 145.47, 142.65, 140.15, 137.97, 135.99, 134.05, 132.41, 130.99, 129.6, 128.25, 127.1, 126.04, 124.99, 123.96, 123.09, 122.26, 121.44, 120.62, 117.53, 41.29, 12.5, 10.05, 9.83, 34.33, 72.99, 101.03, 109.66, 110.8, 110.56, 110.16, 109.75, 109.34, 108.9, 108.43, 107.96, 107.49, 107.02, 106.55, 106.08, 105.61, 105.16, 104.73, 104.31, 103.95, 103.6, 103.25, 102.9, 102.54, 102.19, 101.84, 101.48, 101.15, 100.84, 100.54, 100.28, 100.03, 99.77, 99.52, 99.27, 99.01, 98.76, 98.51, 98.25, 98.0, 97.77, 97.55, 97.36, 97.18, 96.99, 96.81, 96.62, 96.43, 96.25, 96.06, 95.88, 95.69, 95.51, 95.32, 95.13, 94.99, 94.85, 94.69, 94.55, 94.4, 94.25, 94.12, 93.98, 93.84, 93.71, 93.6, 93.48, 93.36, 93.25, 93.13, 93.02, 92.9, 92.79, 92.67, 92.55, 92.44, 92.32, 92.21, 92.11, 92.01, 91.91, 91.81, 91.72, 91.64, 91.56, 91.48, 91.41, 91.33, 91.25, 91.18, 91.1, 91.02, 90.94, 90.87, 90.79, 90.71, 90.64, 90.56, 90.49, 90.42, 90.35, 90.29, 90.23, 90.17, 90.11, 90.04, 90.0, 89.94, 89.88, 89.83, 89.77, 89.73, 89.67, 89.62, 89.57, 89.54, 89.49, 89.45, 89.4, 89.36, 89.32, 89.27, 89.24, 89.19, 89.15, 89.11, 89.07, 89.03, 89.0, 88.96, 88.93, 88.89, 88.85, 88.82, 88.8, 88.76, 88.73, 88.69, 88.66, 88.63, 88.59, 88.57, 88.54, 88.51, 88.47, 88.44, 88.41, 88.38, 88.35, 88.33, 88.3, 88.27, 88.24, 88.21, 88.18, 88.15, 88.12, 88.09, 88.07, 88.04, 88.01, 87.99, 87.96, 87.93, 87.9, 87.87, 87.86, 87.83, 87.8, 87.77, 87.75, 87.72, 87.69, 87.66, 87.64, 87.61, 87.59, 87.57, 87.55, 87.53, 87.51, 87.49, 87.47, 87.44, 87.42, 87.4, 87.38, 87.37, 87.35, 87.33, 87.31, 87.29, 87.26, 87.24, 87.22, 87.2, 87.18, 87.16, 87.13, 87.12, 87.1, 87.08, 87.07, 87.05, 87.03, 87.01, 87.0)
series34 = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 1.34, 3.84, 6.17, 8.53, 10.7, 13.12, 14.95, 16.8, 18.77, 20.37, 21.8, 23.18, 24.52, 25.89, 27.18, 28.38, 29.41, 30.62, 31.72, 32.83, 33.93, 35.03, 36.14, 37.24, 39.15, 40.95, 42.61, 43.96, 45.2, 46.43, 47.4, 48.22, 49.05, 49.92, 50.71, 51.48, 52.13, 52.84, 53.56, 54.27, 54.99, 55.7, 55.27, 55.99, 56.74, 57.5, 58.25, 58.94, 59.67, 60.4, 61.13, 61.86, 62.59, 63.32, 64.05, 64.78, 65.52, 66.24, 66.82, 67.42, 68.01, 68.59, 69.17, 69.75, 70.27, 70.81, 71.34, 71.88, 72.41, 72.95, 73.49, 74.02, 74.56, 75.09, 75.64, 76.25, 76.89, 77.54, 78.18, 78.94, 80.02, 85.41, 99.61, 110.46, 123.74, 135.4, 140.03, 142.15, 142.46, 142.14, 140.34, 137.67, 135.0, 132.63, 130.56, 128.69, 126.86, 125.3, 123.96, 122.65, 121.37, 120.28, 119.28, 118.28, 117.31, 116.49, 115.7, 114.92, 114.14, 111.22, 39.08, 11.83, 9.52, 9.3, 32.49, 69.07, 95.61, 103.77, 104.86, 104.63, 104.24, 103.86, 103.47, 103.06, 102.61, 102.17, 101.72, 101.28, 100.83, 100.39, 99.94, 99.51, 99.11, 98.71, 98.38, 98.04, 97.71, 97.37, 97.04, 96.71, 96.37, 96.04, 95.72, 95.43, 95.14, 94.9, 94.66, 94.42, 94.18, 93.94, 93.7, 93.46, 93.22, 92.98, 92.74, 92.53, 92.32, 92.14, 91.96, 91.79, 91.61, 91.44, 91.26, 91.08, 90.91, 90.73, 90.56, 90.38, 90.2, 90.03, 89.9, 89.76, 89.61, 89.48, 89.33, 89.2, 89.07, 88.93, 88.8, 88.68, 88.57, 88.46, 88.35, 88.24, 88.14, 88.03, 87.92, 87.81, 87.7, 87.59, 87.48, 87.37, 87.26, 87.17, 87.07, 86.98, 86.88, 86.79, 86.72, 86.65, 86.57, 86.5, 86.43, 86.36, 86.28, 86.21, 86.14, 86.06, 85.99, 85.92, 85.85, 85.77, 85.7, 85.63, 85.57, 85.51, 85.44, 85.39, 85.33, 85.27, 85.21, 85.17, 85.11, 85.06, 85.01, 84.95, 84.91, 84.86, 84.81, 84.76, 84.73, 84.69, 84.65, 84.61, 84.56, 84.52, 84.48, 84.45, 84.41, 84.37, 84.33, 84.29, 84.25, 84.22, 84.19, 84.15, 84.12, 84.08, 84.05, 84.03, 84.0, 83.97, 83.93, 83.9, 83.87, 83.84, 83.82, 83.79, 83.76, 83.73, 83.7, 83.67, 83.64, 83.61, 83.59, 83.56, 83.53, 83.51, 83.48, 83.45, 83.42, 83.39, 83.37, 83.35, 83.32, 83.29, 83.26, 83.24, 83.21, 83.18, 83.16, 83.14, 83.12, 83.09, 83.06, 83.04, 83.01, 82.99, 82.96, 82.93, 82.91, 82.89, 82.87, 82.85, 82.83, 82.81, 82.79, 82.77, 82.75, 82.73, 82.71, 82.69, 82.69, 82.67, 82.64, 82.62, 82.6, 82.58, 82.56, 82.54, 82.52, 82.5, 82.48, 82.46, 82.44, 82.43, 82.41, 82.39, 82.38, 82.36, 82.34, 82.33, 82.31, 82.29, 82.28)
series35 = c(0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 1.22, 3.8, 6.11, 8.44, 10.59, 12.99, 14.8, 16.63, 18.58, 20.17, 21.58, 22.95, 24.27, 25.62, 26.9, 28.09, 29.11, 30.31, 31.4, 32.5, 33.59, 34.68, 35.77, 36.87, 38.75, 40.53, 42.18, 43.52, 44.74, 45.97, 46.92, 47.74, 48.55, 49.42, 50.2, 50.96, 51.6, 52.31, 53.02, 53.72, 54.43, 52.89, 53.56, 54.25, 54.99, 55.73, 56.45, 57.11, 57.82, 58.53, 59.24, 59.95, 60.66, 61.36, 62.07, 62.78, 63.49, 64.19, 64.76, 65.34, 65.9, 66.47, 67.03, 67.6, 68.1, 68.62, 69.13, 69.65, 70.17, 70.69, 71.21, 71.73, 72.25, 72.77, 73.3, 73.89, 74.52, 75.14, 75.76, 76.5, 77.55, 82.77, 96.52, 107.04, 119.92, 131.21, 135.7, 137.75, 138.05, 137.74, 136.0, 133.41, 130.82, 128.53, 126.53, 124.71, 122.94, 121.43, 120.13, 118.85, 117.61, 116.56, 115.59, 114.62, 113.68, 112.89, 112.13, 111.37, 110.61, 107.78, 37.87, 11.46, 9.22, 9.01, 31.49, 66.94, 92.65, 100.57, 101.61, 101.39, 101.02, 100.65, 100.27, 99.87, 99.44, 99.01, 98.58, 98.14, 97.71, 97.28, 96.85, 96.44, 96.05, 95.66, 95.33, 95.01, 94.69, 94.36, 94.04, 93.72, 93.39, 93.07, 92.76, 92.48, 92.2, 91.97, 91.73, 91.5, 91.27, 91.04, 90.8, 90.57, 90.34, 90.11, 89.87, 89.67, 89.46, 89.29, 89.12, 88.95, 88.78, 88.61, 88.44, 88.27, 88.1, 87.93, 87.76, 87.59, 87.41, 87.24, 87.12, 86.98, 86.84, 86.71, 86.57, 86.44, 86.31, 86.18, 86.05, 85.94, 85.83, 85.73, 85.62, 85.52, 85.41, 85.3, 85.2, 85.09, 84.98, 84.88, 84.77, 84.67, 84.56, 84.47, 84.38, 84.29, 84.2, 84.11, 84.04, 83.97, 83.9, 83.83, 83.76, 83.68, 83.61, 83.54, 83.47, 83.4, 83.33, 83.26, 83.19, 83.12, 83.05, 82.98, 82.92, 82.86, 82.8, 82.75, 82.69, 82.63, 82.58, 82.53, 82.48, 82.43, 82.38, 82.33, 82.28, 82.24, 82.19, 82.14, 82.11, 82.07, 82.03, 81.99, 81.95, 81.91, 81.87, 81.84, 81.8, 81.76, 81.72, 81.68, 81.65, 81.62, 81.58, 81.55, 81.52, 81.48, 81.45, 81.43, 81.4, 81.37, 81.34, 81.31, 81.28, 81.25, 81.22, 81.19, 81.17, 81.14, 81.11, 81.08, 81.05, 81.02, 81.01, 80.98, 80.95, 80.92, 80.89, 80.87, 80.84, 80.81, 80.79, 80.77, 80.74, 80.72, 80.69, 80.66, 80.64, 80.61, 80.58, 80.57, 80.55, 80.52, 80.49, 80.47, 80.44, 80.42, 80.39, 80.37, 80.35, 80.33, 80.31, 80.29, 80.27, 80.25, 80.23, 80.21, 80.19, 80.17, 80.15, 80.14, 80.13, 80.11, 80.09, 80.07, 80.05, 80.03, 80.01, 79.99, 79.97, 79.95, 79.93, 79.91, 79.89, 79.88, 79.86, 79.85, 79.83, 79.81, 79.8, 79.78, 79.76, 79.75, 79.73, 79.72, 79.7, 79.68, 79.67, 79.65, 79.64)
series36 = c(0.01, 0.02, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 1.38, 3.94, 6.32, 8.74, 10.97, 13.45, 15.33, 17.22, 19.24, 20.89, 22.35, 23.77, 25.14, 26.54, 27.86, 29.1, 30.15, 31.39, 32.53, 33.66, 34.79, 35.92, 37.05, 38.18, 40.14, 41.98, 43.69, 45.07, 46.34, 47.61, 48.59, 49.44, 50.29, 51.18, 51.99, 52.78, 53.45, 54.18, 54.91, 55.64, 56.37, 57.11, 56.41, 57.14, 57.91, 58.69, 59.45, 60.15, 60.9, 61.64, 62.39, 63.14, 63.88, 64.63, 65.37, 66.12, 66.87, 67.6, 68.2, 68.81, 69.41, 70.0, 70.6, 71.19, 71.72, 72.27, 72.81, 73.36, 73.91, 74.45, 75.0, 75.55, 76.09, 76.64, 77.2, 77.82, 78.48, 79.14, 79.79, 80.57, 81.67, 87.17, 101.66, 112.73, 126.29, 138.19, 142.92, 145.08, 145.39, 145.07, 143.23, 140.51, 137.78, 135.36, 133.26, 131.35, 129.47, 127.89, 126.52, 125.17, 123.87, 122.76, 121.74, 120.72, 119.73, 118.89, 118.09, 117.29, 116.5, 113.51, 39.88, 12.07, 9.71, 9.49, 33.16, 70.5, 97.58, 105.91, 107.02, 106.79, 106.39, 106.0, 105.61, 105.18, 104.73, 104.27, 103.82, 103.36, 102.91, 102.46, 102.0, 101.57, 101.16, 100.75, 100.4, 100.06, 99.72, 99.38, 99.04, 98.7, 98.36, 98.02, 97.69, 97.4, 97.1, 96.86, 96.61, 96.37, 96.12, 95.88, 95.63, 95.39, 95.14, 94.9, 94.65, 94.44, 94.22, 94.04, 93.86, 93.68, 93.5, 93.32, 93.14, 92.96, 92.78, 92.6, 92.42, 92.24, 92.06, 91.89, 91.75, 91.61, 91.46, 91.32, 91.17, 91.03, 90.9, 90.77, 90.63, 90.51, 90.4, 90.29, 90.18, 90.06, 89.95, 89.84, 89.73, 89.62, 89.51, 89.39, 89.28, 89.17, 89.06, 88.97, 88.86, 88.77, 88.67, 88.58, 88.51, 88.43, 88.36, 88.28, 88.21, 88.14, 88.06, 87.99, 87.91, 87.84, 87.77, 87.69, 87.62, 87.54, 87.47, 87.4, 87.33, 87.27, 87.2, 87.15, 87.09, 87.03, 86.97, 86.92, 86.87, 86.81, 86.76, 86.7, 86.66, 86.61, 86.56, 86.51, 86.48, 86.43, 86.39, 86.35, 86.31, 86.27, 86.22, 86.19, 86.15, 86.11, 86.07, 86.03, 85.99, 85.96, 85.92, 85.89, 85.85, 85.82, 85.78, 85.76, 85.73, 85.7, 85.67, 85.63, 85.6, 85.57, 85.54, 85.51, 85.48, 85.45, 85.42, 85.39, 85.36, 85.33, 85.32, 85.29, 85.26, 85.23, 85.2, 85.17, 85.14, 85.11, 85.09, 85.06, 85.04, 85.01, 84.98, 84.95, 84.93, 84.9, 84.87, 84.86, 84.83, 84.8, 84.78, 84.75, 84.72, 84.7, 84.67, 84.64, 84.62, 84.6, 84.58, 84.56, 84.54, 84.52, 84.5, 84.48, 84.46, 84.44, 84.42, 84.4, 84.39, 84.37, 84.35, 84.33, 84.31, 84.29, 84.26, 84.24, 84.22, 84.2, 84.18, 84.16, 84.14, 84.13, 84.11, 84.09, 84.08, 84.06, 84.04, 84.02, 84.01, 83.99, 83.97, 83.96, 83.94)
s
Comparative.jpg

Pavel Senin

unread,
Mar 21, 2014, 6:12:11 AM3/21/14
to jmotif-discuss
Hi David: 

Clustering timeseries could be misleading in many cases, and in our case it is also affected by approximation.

As per this example. I am not sure that I am following the way the clustering was done, so I assume that you have used sliding window, SAX, tf*idf for reduction and weighting, and Cosine similarity as the distance measure for clustering. 
If so, it would be misleading to look on the full series in order to understand why clustering went wrong, because each sliding window was normalized and converted into the word. To get an idea where technique went wrong, you'd need to look on the frequencies of words that resulted from sliding window/SAX first and then onto TF*IDF weight coefficients and words offsets later. It might be that technique performed just fine given all the information, i.e. some of valuable information was lost because of discretization.



--
You received this message because you are subscribed to the Google Groups "jmotif-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jmotif-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Mahalo, Pavel.

David Cabanillas

unread,
Mar 21, 2014, 9:06:39 AM3/21/14
to jmotif-...@googlegroups.com
Dear Pavel,
I have modified the SyntheticControlHClust i.e.  sliding window, SAX, tf*idf for reduction and weighting, and Cosine similarity as the distance measure for clustering.

I have been debugging the code and I have arrived to the HC.getNearest function.
The time series is in a tfidfData i.e. serie 35 = daaed=0,aaced=0.10.

The clustering amongst classes is perfect but the hierarchy intra-series is not perfect in my opinion, perhaps I should change the SAX parameters?



--
You received this message because you are subscribed to a topic in the Google Groups "jmotif-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jmotif-discuss/U4j5ctKtsrw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to jmotif-discus...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
bye
--david

sen...@gmail.com

unread,
Mar 21, 2014, 9:21:10 AM3/21/14
to jmotif-...@googlegroups.com
Hi David:

Yes, I would guess, that it is the loss of information due to SAX approximation and the window choice. I've plotted the series and looked into the clustering using euclidean an canberra distances with ward criterion in R. It works very well, so you would be probably better on that route with this data - series are mostly smooth and equally shaped with minor differences.
The approach we developed is not universal, so on the type of data you have it is not any better, but probably worse than simplest possible technique.
Sliding window discretization and subsequences ranking is helpful when there is difference in series length due to the signal disruption, when there are some distinct features in classes, and so on.

Thank you.
Envoyé avec mon Smartphone BlackBerry® de Free

From: David Cabanillas <davidcab...@gmail.com>
Date: Fri, 21 Mar 2014 14:06:39 +0100
Subject: Re: Hierarchical clustering

David Cabanillas

unread,
Mar 21, 2014, 9:51:52 AM3/21/14
to jmotif-...@googlegroups.com
Dear Pavel,

In the SyntheticControlHClust I have changed the WINDOW_SIZE from 45 to 10 and the PAA and ALPHABET_SIZE from 5 to 15 and right now the intra-serie classification works properly (i.e. the most similar waves are related to). But I should assume that it has a performance cost, right?

Finally, I have two question:
1) How I can obtain the best parameters for SAX?
2) How I can change the SAX approximation parameter?

sen...@gmail.com

unread,
Mar 21, 2014, 10:13:25 AM3/21/14
to jmotif-...@googlegroups.com
Hi David:

These are very good questions.

Choice of SAX parameters is empirical. I.e. it is impossible to guess which set of parameters or numerosity reduction strategy will work the best.

To solve this, one would need to employ an optimization scheme and to define a criterion allowing to measure the performance.

It is easy with classification - the accuracy is an excellent measure and one can employ grid search, monte carlo, or Direct, or whatever one's favorite technique is.

It is not that straightforward in clustering as far as I understand. If you have labeled series, then you can define clustering accuracy function and find best set with the help of an optimization scheme.
If your series are unlabeled, then things become a bit more complicated, but in case of the cosine similarity (which ranges from 1 to 0) you could try to define the clustering goodness criterion based on the resulting clusters (classes) separation. At least I would do that.

I would suggest that unless you have a specific interest in our technique, not to dismiss other approaches to clustering. You may get better results.

I'll update with some code snippets later.
Envoyé avec mon Smartphone BlackBerry® de Free

From: David Cabanillas <davidcab...@gmail.com>
Date: Fri, 21 Mar 2014 14:51:52 +0100

Pavel Senin

unread,
Mar 21, 2014, 11:49:54 AM3/21/14
to jmotif-discuss
This parameters 10, 8, 40 do cluster 31 and 36 together, while 33 goes to different cluster.

public class HClust {

  // prefix for all of the output
  private static final String DATA = "/home/psenin/tmp/series.csv";

  // SAX parameters to use
  //
  private static final int PAA_SIZE = 10;
  private static final int ALPHABET_SIZE = 8;
  private static final int WINDOW_SIZE = 40;

  // processing strategy to utilize
  //
  private static final SAXCollectionStrategy STRATEGY = SAXCollectionStrategy.NOREDUCTION;

  /**
   * @param args
   * @throws TSException
   * @throws IndexOutOfBoundsException
   * @throws IOException
   */
  public static void main(String[] args) throws IndexOutOfBoundsException, TSException, IOException {

    Map<String, List<double[]>> trainData = UCRUtils.readUCRData(DATA);
    System.out.println("trainData classes: " + trainData.size() + ", series length: "
        + trainData.entrySet().iterator().next().getValue().get(0).length);
    for (Entry<String, List<double[]>> e : trainData.entrySet()) {
      System.out.println(" training class: " + e.getKey() + " series: " + e.getValue().size());
    }

    System.out.println("\nParams: WINDOW " + WINDOW_SIZE + ", PAA " + PAA_SIZE + ", ALPHABET "
        + ALPHABET_SIZE + ", Strategy " + STRATEGY + "\n\nDistance matrix:");

    // parameters
    int[] params = new int[4];
    params[0] = WINDOW_SIZE;
    params[1] = PAA_SIZE;
    params[2] = ALPHABET_SIZE;
    params[3] = STRATEGY.index();

    // making bags collection
    List<WordBag> bags = TextUtils.labeledSeries2WordBags(trainData, params);

    // create the TFIDF data structure
    HashMap<String, HashMap<String, Double>> tfidf = TextUtils.computeTFIDF(bags);
    tfidf = TextUtils.normalizeToUnitVectors(tfidf);

    // launch KMeans with random centers
    Cluster clusters = HC.Hc(tfidf, LinkageCriterion.COMPLETE);

    System.out.println((new CosineDistanceMatrix(tfidf)).toString());
    
    System.out.println(TextUtils.tfidfToTable(tfidf));

    BufferedWriter bw = new BufferedWriter(new FileWriter("/home/psenin/tmp/test2.newick"));
    bw.write("(" + clusters.toNewick() + ")");
    bw.close();

  }
}

--
Mahalo, Pavel.
clusters.png
series.png
RCode.R
ClusteringJMotif.png

Pavel Senin

unread,
Mar 21, 2014, 2:43:48 PM3/21/14
to jmotif-discuss
Hi David:

Just realized that thing. Did you actually see how JMotif clustered your series?

I would say it is much better than convenient distance-based clustering - series with the step around 70 are in one cluster, whereas those without that step in another. So here you can actually see why it happened - because SAX words reflecting the step matter.
 
--
Mahalo, Pavel.
series_cluster.png

Pavel Senin

unread,
Mar 22, 2014, 8:24:37 AM3/22/14
to jmotif-discuss
David, by the way, what is the nature of this data set? Are these some measurements or any other phenomena?
--
Mahalo, Pavel.

David Cabanillas

unread,
Mar 22, 2014, 11:54:50 AM3/22/14
to jmotif-...@googlegroups.com

Summarizing we can said that with good sax parameters jmotif clustering obtains better results than distance clustering (euclidean, camberra,...) at least with this data. Right? It will be great.
My last question is how are you calculate these sax parameters?

El dia 22/03/2014 13:25, "Pavel Senin" <sen...@gmail.com> va escriure:

Pavel Senin

unread,
Mar 22, 2014, 12:02:32 PM3/22/14
to jmotif-discuss
I just picked them randomly. Didn't even think much, took 40/10 for window/paa, which means each 4 points will map to a single series, and then just used alphabet 8.

David Cabanillas

unread,
Mar 24, 2014, 9:55:09 AM3/24/14
to jmotif-...@googlegroups.com
In my opinion, in this case the clustering motif is better than distance-based clustering.

I suppose that the step around 70 (in the attachment) is relevant in the jmotif algorithm. To turn a time serie into => PAA and finally symbols.Gives to this step a relevant component. And this relevant mark is shaded working with euclidean distance metric.
series_cluster_v2.png
Reply all
Reply to author
Forward
0 new messages