Here you may find ways to fix some common known issues with software used to run a CONFINE testbed.
As discussed on issue #625, there is a firmware generation bug that affects the generation of certificates used by
uhttpd (node web server). Although it is fixed on Controller version 0.11.7, operators of existing testbeds need to perform a few actions in their Controller as the system user.
0.11.7or later (see Upgrading Controller to a newer version):
$ sudo python ~/mytestbed/manage.py upgradecontroller \ --controller_version=0.11.7
$ # Get path of server certificate. $ python ~/mytestbed/manage.py print_settings | grep PKI_CA_CERT_PATH PKI_CA_CERT_PATH = '/var/lib/vct/server/pki/ca/cert' $ # Backup current certificate. $ mv ~/mytestbed/pki/ca/cert ~/mytestbed/pki/ca/cert.old $ # Show current certificate information (keep it to generate new certificate). $ openssl x509 -in ~/mytestbed/pki/ca/cert.old -text [...] $ # Generate a new certificate with version 3 (0x2). $ python ~/mytestbed/manage.py setuppki # include your organization details
python ~/mytestbed/manage.py shell_plus:
from M2Crypto import RSA, X509 def get_node_certificate_version(node): """Check if certificate has invalid version (0x3).""" if node.keys.cert is None: return False pem_string = str(node.keys.cert) cert = X509.load_cert_string(pem_string) return cert.get_version() def is_valid_node_certificate_version(node): if get_node_certificate_version(node) == 3: return False return True def fix_node_certificate_version(node): """ Remove invalid stored /etc/uhttpd.crt.pem file Will be regenerated on next firmware build (node.api.cert too) NOTE: should be executed with patched controller (X509 version 0x2) """ assert not is_valid_node_certificate_version(node) cert = node.files.get(path=NodeKeys.CERT) assert cert.content == node.api.cert, "Node %s" % node.pk cert.delete() # Get nodes with invalid certificate version. affected_nodes =  for node in Node.objects.all(): if not is_valid_node_certificate_version(node): affected_nodes.append(node.pk) #fix_node_certificate_version(node) # UNCOMMENT to massive fix print "Fixed %i" % len(affected_nodes)
If you go to Administration > Djcelery > Tasks and the lists show no objects or only old tasks (e.g. received yesterday, 2 days ago…), you need to check if the Celery monitor components are running:
ps ax | grep celeryev)
ps ax | grep celerybeat)
If any of them is not running, start it as
service SERVICE start (with
celerybeat). Otherwise you may try restarting it with
service SERVICE restart.
Probably you have reached NginX's
POST maximum size. This limit exists for discouraging the upload of big sliver templates (because they are supposed to be transferred over not that reliable community networks).
You should remove or increase
client_max_body_size in your NginX configuration (it may appear more than once).
The operating system has a limit of open files for processes. As the ping and state apps use one file descriptor per node and sliver for a big number of nodes and slivers, the limit of open files can be reached (e.g. 240 nodes and 800 slivers means 1040 open files).
In this situation Celery tasks show the
FAILED state and this error message:
OperationalError: could not create socket: Too many open files
You can check the limit by running:
$ ulimit -Hn # hard limit 4096 $ ulimit -Sn # soft limit 1024
To temporally increase this limit you can run as
# ulimit -Sn 2048 # ulimit -Sn 2048
To make this change permanent in Celery, place the previous commands in Celery's initialization script and restart the daemon. See the Debian Wiki page on limits for more information.