Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: hwbot v5.9.0
    • Labels:
      None

      Description

      Not sure why it is happening more often nowadays, but HWBOT just went down again. Both front page and the forums. It's 12:30 here, local time.

      Please provide me with instructions on how to reboot the server in case this happens. The site is completely unreachable. Critical problem!

        Activity

        Hide
        Dennis Devriendt added a comment -
        Linux kernel kills tomcat because it runs out of memory. I'm certain it's not because of a ddos but there might be a memory leak somewhere. It has also happened in the past but it definitely is occuring more often since the latest deploy. There aren't any logs left of february, the earliest I could find was on march 2nd.

        Mar 2 11:06:46 ip-10-48-39-239 kernel: [180119.195650] Out of memory: Kill process 27596 (java) score 688 or sacrifice child
        Mar 2 11:06:46 ip-10-48-39-239 kernel: [180119.195680] Killed process 27596 (java) total-vm:2605540kB, anon-rss:1190720kB, file-rss:0kB
        Mar 7 05:49:02 ip-10-48-39-239 kernel: [593054.681094] Out of memory: Kill process 7628 (java) score 702 or sacrifice child
        Mar 7 05:49:02 ip-10-48-39-239 kernel: [593054.681142] Killed process 7628 (java) total-vm:2559180kB, anon-rss:1214984kB, file-rss:0kB
        Mar 8 23:29:17 ip-10-48-39-239 kernel: [743070.284464] Out of memory: Kill process 3551 (java) score 623 or sacrifice child
        Mar 8 23:29:17 ip-10-48-39-239 kernel: [743070.284511] Killed process 3551 (java) total-vm:2473840kB, anon-rss:1077964kB, file-rss:0kB
        Mar 9 14:19:14 ip-10-48-39-239 kernel: [796467.020092] Out of memory: Kill process 28869 (java) score 654 or sacrifice child
        Mar 9 14:19:14 ip-10-48-39-239 kernel: [796467.020126] Killed process 28869 (java) total-vm:2630076kB, anon-rss:1131924kB, file-rss:0kB
        Mar 9 14:19:14 ip-10-48-39-239 kernel: [796467.034693] Out of memory: Kill process 22749 (java) score 656 or sacrifice child
        Mar 9 14:19:14 ip-10-48-39-239 kernel: [796467.034710] Killed process 22749 (java) total-vm:2630076kB, anon-rss:1134120kB, file-rss:260kB
        Mar 9 22:41:16 ip-10-48-39-239 kernel: [826589.320497] Out of memory: Kill process 24360 (java) score 595 or sacrifice child
        Mar 9 22:41:16 ip-10-48-39-239 kernel: [826589.320536] Killed process 24360 (java) total-vm:2632896kB, anon-rss:1030260kB, file-rss:0kB
        Mar 10 18:47:37 ip-10-48-39-239 kernel: [898969.536384] Out of memory: Kill process 30495 (java) score 613 or sacrifice child
        Mar 10 18:47:37 ip-10-48-39-239 kernel: [898969.536419] Killed process 30495 (java) total-vm:2625864kB, anon-rss:1060368kB, file-rss:0kB
        Mar 16 10:23:32 ip-10-48-39-239 kernel: [1387124.841836] Out of memory: Kill process 24608 (java) score 684 or sacrifice child
        Mar 16 10:23:32 ip-10-48-39-239 kernel: [1387124.841867] Killed process 24608 (java) total-vm:2506676kB, anon-rss:1182608kB, file-rss:0kB
        Mar 19 04:05:04 ip-10-48-39-239 kernel: [1623616.800429] Out of memory: Kill process 29606 (java) score 649 or sacrifice child
        Mar 19 04:05:04 ip-10-48-39-239 kernel: [1623616.800478] Killed process 29606 (java) total-vm:2736988kB, anon-rss:1122168kB, file-rss:0kB
        Mar 21 04:04:56 ip-10-48-39-239 kernel: [1796409.049371] Out of memory: Kill process 10593 (java) score 593 or sacrifice child
        Mar 21 04:04:56 ip-10-48-39-239 kernel: [1796409.049416] Killed process 10593 (java) total-vm:2738300kB, anon-rss:1025780kB, file-rss:0kB
        Mar 22 05:43:03 ip-10-48-39-239 kernel: [1888696.393863] Out of memory: Kill process 21267 (java) score 595 or sacrifice child
        Mar 22 05:43:03 ip-10-48-39-239 kernel: [1888696.393896] Killed process 21267 (java) total-vm:2651400kB, anon-rss:1029272kB, file-rss:0kB
        Mar 24 00:06:00 ip-10-48-39-239 kernel: [2041273.566977] Out of memory: Kill process 1459 (java) score 565 or sacrifice child
        Mar 24 00:06:00 ip-10-48-39-239 kernel: [2041273.567016] Killed process 1459 (java) total-vm:2602056kB, anon-rss:977456kB, file-rss:0kB
        Mar 24 00:07:03 ip-10-48-39-239 kernel: [2041335.755727] Out of memory: Kill process 24363 (java) score 251 or sacrifice child
        Mar 24 00:07:03 ip-10-48-39-239 kernel: [2041335.755766] Killed process 24363 (java) total-vm:2553580kB, anon-rss:433832kB, file-rss:0kB
        Mar 25 04:06:00 ip-10-48-39-239 kernel: [2142072.999154] Out of memory: Kill process 2399 (java) score 614 or sacrifice child
        Mar 25 04:06:00 ip-10-48-39-239 kernel: [2142072.999205] Killed process 2399 (java) total-vm:2758608kB, anon-rss:1062896kB, file-rss:0kB
        Show
        Dennis Devriendt added a comment - Linux kernel kills tomcat because it runs out of memory. I'm certain it's not because of a ddos but there might be a memory leak somewhere. It has also happened in the past but it definitely is occuring more often since the latest deploy. There aren't any logs left of february, the earliest I could find was on march 2nd. Mar 2 11:06:46 ip-10-48-39-239 kernel: [180119.195650] Out of memory: Kill process 27596 (java) score 688 or sacrifice child Mar 2 11:06:46 ip-10-48-39-239 kernel: [180119.195680] Killed process 27596 (java) total-vm:2605540kB, anon-rss:1190720kB, file-rss:0kB Mar 7 05:49:02 ip-10-48-39-239 kernel: [593054.681094] Out of memory: Kill process 7628 (java) score 702 or sacrifice child Mar 7 05:49:02 ip-10-48-39-239 kernel: [593054.681142] Killed process 7628 (java) total-vm:2559180kB, anon-rss:1214984kB, file-rss:0kB Mar 8 23:29:17 ip-10-48-39-239 kernel: [743070.284464] Out of memory: Kill process 3551 (java) score 623 or sacrifice child Mar 8 23:29:17 ip-10-48-39-239 kernel: [743070.284511] Killed process 3551 (java) total-vm:2473840kB, anon-rss:1077964kB, file-rss:0kB Mar 9 14:19:14 ip-10-48-39-239 kernel: [796467.020092] Out of memory: Kill process 28869 (java) score 654 or sacrifice child Mar 9 14:19:14 ip-10-48-39-239 kernel: [796467.020126] Killed process 28869 (java) total-vm:2630076kB, anon-rss:1131924kB, file-rss:0kB Mar 9 14:19:14 ip-10-48-39-239 kernel: [796467.034693] Out of memory: Kill process 22749 (java) score 656 or sacrifice child Mar 9 14:19:14 ip-10-48-39-239 kernel: [796467.034710] Killed process 22749 (java) total-vm:2630076kB, anon-rss:1134120kB, file-rss:260kB Mar 9 22:41:16 ip-10-48-39-239 kernel: [826589.320497] Out of memory: Kill process 24360 (java) score 595 or sacrifice child Mar 9 22:41:16 ip-10-48-39-239 kernel: [826589.320536] Killed process 24360 (java) total-vm:2632896kB, anon-rss:1030260kB, file-rss:0kB Mar 10 18:47:37 ip-10-48-39-239 kernel: [898969.536384] Out of memory: Kill process 30495 (java) score 613 or sacrifice child Mar 10 18:47:37 ip-10-48-39-239 kernel: [898969.536419] Killed process 30495 (java) total-vm:2625864kB, anon-rss:1060368kB, file-rss:0kB Mar 16 10:23:32 ip-10-48-39-239 kernel: [1387124.841836] Out of memory: Kill process 24608 (java) score 684 or sacrifice child Mar 16 10:23:32 ip-10-48-39-239 kernel: [1387124.841867] Killed process 24608 (java) total-vm:2506676kB, anon-rss:1182608kB, file-rss:0kB Mar 19 04:05:04 ip-10-48-39-239 kernel: [1623616.800429] Out of memory: Kill process 29606 (java) score 649 or sacrifice child Mar 19 04:05:04 ip-10-48-39-239 kernel: [1623616.800478] Killed process 29606 (java) total-vm:2736988kB, anon-rss:1122168kB, file-rss:0kB Mar 21 04:04:56 ip-10-48-39-239 kernel: [1796409.049371] Out of memory: Kill process 10593 (java) score 593 or sacrifice child Mar 21 04:04:56 ip-10-48-39-239 kernel: [1796409.049416] Killed process 10593 (java) total-vm:2738300kB, anon-rss:1025780kB, file-rss:0kB Mar 22 05:43:03 ip-10-48-39-239 kernel: [1888696.393863] Out of memory: Kill process 21267 (java) score 595 or sacrifice child Mar 22 05:43:03 ip-10-48-39-239 kernel: [1888696.393896] Killed process 21267 (java) total-vm:2651400kB, anon-rss:1029272kB, file-rss:0kB Mar 24 00:06:00 ip-10-48-39-239 kernel: [2041273.566977] Out of memory: Kill process 1459 (java) score 565 or sacrifice child Mar 24 00:06:00 ip-10-48-39-239 kernel: [2041273.567016] Killed process 1459 (java) total-vm:2602056kB, anon-rss:977456kB, file-rss:0kB Mar 24 00:07:03 ip-10-48-39-239 kernel: [2041335.755727] Out of memory: Kill process 24363 (java) score 251 or sacrifice child Mar 24 00:07:03 ip-10-48-39-239 kernel: [2041335.755766] Killed process 24363 (java) total-vm:2553580kB, anon-rss:433832kB, file-rss:0kB Mar 25 04:06:00 ip-10-48-39-239 kernel: [2142072.999154] Out of memory: Kill process 2399 (java) score 614 or sacrifice child Mar 25 04:06:00 ip-10-48-39-239 kernel: [2142072.999205] Killed process 2399 (java) total-vm:2758608kB, anon-rss:1062896kB, file-rss:0kB
        Hide
        Dennis Devriendt added a comment -
        Ok, so. We have 1.7 GiB of memory. The jvm is configured with a max memory usage of 1 GiB. It seems to consume around 850 MiB but it's not unthinkable that this spikes to 1 GiB when recalculations happen. Apache memory usage is very variable. I'm logging memory usage every minute (to /log/mem_report.log) and apache is using around 200 MiB most of the time but I've seen a spike to 500 MiB. The logging is a simple bash script started with nohup sudo ./mem_report.sh > /logs/mem_report.log 2>&1&

        When the kernel kills the java process, it is indeed restarted but (at least this morning) after restarting it dies again after a minute without any warning. As a last resort, I added a 1 GiB swapfile. Processes will only be swapped when absolutely necessary (there's a vm.swappiness kernel property to configure this). All changes I maded will be lost on reboot, this is just a temporary solution. The swapfile is stored on the local storage, not on a EBS volume.

        sudo sysctl vm.swappiness=0
        sudo dd if=/dev/zero of=/mnt/swapfile bs=1M count=1024
        sudo chmod 600 /mnt/swapfile
        sudo mkswap /mnt/swapfile
        sudo swapon /mnt/swapfile
        Show
        Dennis Devriendt added a comment - Ok, so. We have 1.7 GiB of memory. The jvm is configured with a max memory usage of 1 GiB. It seems to consume around 850 MiB but it's not unthinkable that this spikes to 1 GiB when recalculations happen. Apache memory usage is very variable. I'm logging memory usage every minute (to /log/mem_report.log) and apache is using around 200 MiB most of the time but I've seen a spike to 500 MiB. The logging is a simple bash script started with nohup sudo ./mem_report.sh > /logs/mem_report.log 2>&1& When the kernel kills the java process, it is indeed restarted but (at least this morning) after restarting it dies again after a minute without any warning. As a last resort, I added a 1 GiB swapfile. Processes will only be swapped when absolutely necessary (there's a vm.swappiness kernel property to configure this). All changes I maded will be lost on reboot, this is just a temporary solution. The swapfile is stored on the local storage, not on a EBS volume. sudo sysctl vm.swappiness=0 sudo dd if=/dev/zero of=/mnt/swapfile bs=1M count=1024 sudo chmod 600 /mnt/swapfile sudo mkswap /mnt/swapfile sudo swapon /mnt/swapfile

          People

          • Assignee:
            Dennis Devriendt
            Reporter:
            Pieter-Jan Plaisier
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: