From Drupal to Django: how to migrate contents

In a recent article I explain the motivations for an upgrade from a no longer maintained Drupal 6 installation to Django 1.8. I will now cover more in detail the migration techniques adopted in the upgrade and I’ll deepen the models and the relationships.

Structure

If you’re a drupaler, you’re familiar with the node/NID/edit and the node/add/TYPE pages:

A-New-Page-Drupal-6-Sandbox

Here we have two visible fields: Title and Body. One is an input type text and the other a texarea. The good Form API provided by Drupal calls these two types textfield and textarea. However if you use the Content type creation interface you don’t see any of these, just declare some field types and you’ll see the form populating with new fields after the addition.

It’s similar in Django but you haven’t to pass to a graphical interface to do this: structure is code-driven and the side effect is the ability to put on revision almost anything. You can choose between different field types that will be reflected in database and on the user interface.

Here what the Drupal Body and Title fields looks like in a model called Article:

# models.py
from django.db import models
from tinymce import models as tinymce_models
# Articles
class Article(models.Model):
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')

The TinyMCE part require TinyMCE app installed and configured. If you’re new to Django read and follow the great Writing your first Django app to understand the basics, e.g the difference between a project and an app or the following sections will sound pretty obscure.

After editing your projectname/appname/models.py file you can now apply the changes in your app via makemigrations (create a migration file for the changes in the database) and migrate (apply the migrations inside the migration files).

In a real world scenario these two fields alone aren’t enough neither in a Drupal 6. These information are all presented by default in any type on Drupal 6:

authoring-info

Drupal 6 treats author as entities you can search through an autocomplete field, and date as a pseudo-ISO 8601 date field. The author field is a link to the User table in Drupal. In Django a similar user model exists but if you want to unchain the access to the admin backend and the authorship it’s simpler to create a custom author model and later associate this with the real user model.

sport3_uml

E-R of our app where migrate the Drupal contents to.

from django.db import models
from tinymce import models as tinymce_models
# Authors
class Author(models.Model):
    alias       = models.CharField(max_length=100)
    name        = models.CharField(max_length=100, null=True, blank=True)
    surname     = models.CharField(max_length=100, null=True, blank=True)
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')

As you can see in the Entity-Relationship diagram one Article must have one and only one Author, but many Articles can have the same Author. This is called Many-to-one relationship and it’s represented in Django as a foreign key from the destination “many” model (e.g. Article) to the “one” model (Author).

The Article.publishing_date field is where publishing date and time are stored and, clicking on the text field, a calendar popup is presented to choose the day and hour, with a useful “now” shortcut to populate the field with the current time.

calendario

How a calendar is represented in a DateTime field.

Now that the basic fields are in the right place you can makemigrations / migrate again to update your app, restarting the webserver to apply the changes.

Attachments and images

Drupal is shipped with the ability to upload files and images to nodes. Django has two different field for this: FileField and ImageField. Before continuing we have to rethink our E-R model to allow attachments.

sport3_uml

 

The model.py code is:

from django.db import models
from tinymce import models as tinymce_models
# Authors
class Author(models.Model):
    alias       = models.CharField(max_length=100)
    name        = models.CharField(max_length=100, null=True, blank=True)
    surname     = models.CharField(max_length=100, null=True, blank=True)
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')
# Attachments
class Attachments(models.Model):
    description = models.CharField(max_length=255, default='', blank=True)
    list = models.BooleanField(default=True)
    file = models.FileField(upload_to='attachments_directory', max_length=255)

Images are similar: if you want to enrich your model with images you can create another model like Attachments but with an ImageField instead. Remember to use a different upload_to directory in order to keep the attachments and images separated.

We miss the last one field to complete our models: path. Django comes with an useful SlugField that as of Django 1.8 allows only ASCII characters and can be mapped to another field, the title for example.

from django.db import models
from tinymce import models as tinymce_models
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author', verbose_name='Authored by')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
    publishing_date = models.DateTimeField(auto_now=False, auto_now_add=False, verbose_name='First published on')

Keep in mind that a SlugField differs from a Drupal path field because it doesn’t allow slashes. Consider a path like this:

news/news-title

In Drupal you will have a A) view with the path news and the argument news title or B) a fake path generated by pathauto or similar modules. In years of Drupal development, I can affirm that the B option is the typical easy way that turns into a nightmare of maintainance. Django core as far as I know allows only the A choice, so if you want a news view you have to declare it in urls.py and then in views.py as stated in official documentation.

  • news/: the news root path, coupled with the view
  • news-title: the  argument passed to the view and the SlugField content for an article. It must be unique to be used as key to retreive an article but since it can be empty we cannot force it to has a value or to be unique at first. When all data are imported and fixed we can change this field to unique to improve database retrieval performance.

Categories

And what about categories? If you have a category named Section, and an article can be associated with only one Section, you have to create a Many-to-one relationship. As you see before, you have to put the foreign key in the N side of the relation, in this case Article, so the model Article will have a ForeignKey field referencing a specific section.

On the other hands if you have tags to associate to your article you have to create a Tag model with a Many-to-many relationship to the Article. Django will create an intermediate model storing the Article-Tag relationships.

Do not abuse of M2M relationships because each relation needs a separate table and the number of JOIN on database table will increase with side effects on the performance, not even perceivable on the first since Django ORM is very efficient. The event handling will be more difficult for a beginner since the many to many events occurs only when the parent models are saved and this require some experience if you need to add a custom action to a M2M event. If you design wisely your E-R model you have nothing to be scared of.

Migration techniques

Now that we have the destination models, fields and relationship we can import the content from Drupal. In the previous article I suggested to use Views Datasource module to create a JSON view to export content. Please read the Exporting the data from Drupal section inside the article before continue.

The obtained row is something like:

{
  [
    {
      {nid: '30004',
      domainsourceid: '2',
      nodepath: 'http://example.com/path/here',
      postdate: '2014-09-17T22:18:42+0200',
      nodebody: 'HTML TEXT HERE',
      nodetype: 'drupal type',
      nodetitle: 'Title here',
      nodeauthor: 'monty',
      nodetags: 'Drupal, dragonball, paintball'
      }
    },
    ...
  ]
}

If you haven’t a multi-site Drupal you can ignore domainsourceid field. The nodetags lists some Tag names of a Many-to-many relationship not covered here.

All the other value are useful for the import:

  • nid: the original content id, used for pagination and retrieval
    Destination: parsing
  • nodepath: content path
    Destination: Article.path
  • nodebody: content body
    Destination: Article.body
  • nodetype: type of the node
    Destination: parsing
  • nodetitle: title of the node
    Destination: Article.title
  • nodeauthor: author of the content
    Destination: Article.author -> Author.alias

In the previous article you find how to make the View on Drupal (source) and now you have  rough idea of the field mapping. How to fetch the data from Django?

Management command and paged view

To start a one-time import you can write a custom management command for your Django application named project/app/management/commands/myimport.py.

from __future__ import unicode_literals
from django.core.management.base import BaseCommand, CommandError
from django.core.exceptions import ValidationError, MultipleObjectsReturned, ObjectDoesNotExist
import json, urllib
import urlparse
from shutil import copyfile
from django.conf import settings
from os import sep
from django.core.files.storage import default_storage
from django.utils.text import slugify
import requests
import grequests
import time
from md5 import md5

class Command(BaseCommand):
    help = 'Import data from Drupal 6 Json view'
    def add_arguments(self, parser):
        parser.add_argument('start', nargs=1, type=int)
        parser.add_argument('importtype', nargs=1)
        # Named (optional) arguments
        # Crawl
        parser.add_argument('--crawl',
            action='store_true',
            dest='crawl',
            default=False,
            help='Crawl data.')
    def handle(self, *args, **options):
        # process data
        pass

This management command can be launched with

python manage.py myimport 0 article --crawl

Where 0 is the item to start + 1, “article” is the type of content to import (e.g. the destination model) and –crawl is the import option. Let’s add the import logic to the Command.handle method:

def handle(self, *args, **options):
    try:
        assert options['crawl'] and options['importtype']
        # start to import or store data
        sid = int(options['start'].pop())
        reading = True
        while reading:
            importazioni = []
            articoli = []
            url = 'http://www.example.com/json-path-verylongkey?nid=%d' % (sid,)
            print url
            response = urllib.urlopen(url)
            data = json.loads(response.read())
            data = data['']
            # no data received, quit
            if not data:
                reading = False
                break
            for n, record in enumerate(data):
                sid = int(record['']['nid'])
                title = record['']['nodetitle']
                # continue to process data, row after row
                # ...

    except AssertionError:
        raise CommandError('Invalid import command')

This example will fetch /json-path-verylongkey starting from nid passed from the command + 1. Then, it will process the json row after row and keep in memory the id of the last item. When no content is available, the cycle will stop. It’s a common method and it’s lightweight on the source server because only one request at time are sent and then the response is processed. Anyway, this method can be also slow because we have to sum waiting time: (request 1 + response 1 + parse 1) + (request 2 + response 2 + parse 2) etc.

Multiple, asyncronous requests

We can speed up the retrieval by using grequests. You have to check what is the last element first by cloning the Drupal data source json view and showing only the last item, then fetching the id.

def handle(self, *args, **options):
    try:
        assert options['crawl'] and options['importtype']
        # start to import or store data
        sid = int(options['start'].pop())
        # find last node id to create an url list
        url = 'http://www.example.com/json-path-verylongkey-last-nid'
        response = requests.get(url, timeout = 50)
        r = response.json()
        last_nid = int(r[''].pop()['']['nid'])

You can then create a from-to range starting from the first element passed by command line to the last.

url_pattern = "http://www.example.com/json-path-verylongkey-last-nid?fromnid=%d&tonid=%d";
urls = []
per_page = 20
# e.g. [0, 20, 40, 60]
relements       = range(0, last_nid, per_page)
if relements[-1] < last_nid:
    relements.append(last_nid + 1)
for fromx, toy in zip(relements, relements[1:]):
    u = url_pattern % (fromx, toy)
    urls.append(u)

rs = (grequests.get(u) for u in self.urls)
# blocking request: stay here until the last response is received
async_responses = grequests.map(rs)
# all responses fetched

The per_page is the number of element per page specified on Drupal json view. Instead of a single nid parameter, fromnid and tonid are the parameter “greater than” and “less or equal than” specified in the Drupal view.

The core of the asyncronous, multiple requests is grequests.map(). It take a list of urls and then request them. The response will arrive in random order but the async_responses will be populated by all of them.

At that point you can treat the response list like before, parsing the response.json() of each element of the list.

With these hints you can now create JSON views within Drupal ready to be fetched and parsed in Django. In a next article I will cover the conversion between the data and Django using the Django ORM.

How to exchange a webserver maintaining the same IP address: haproxy

server

Scenario: An obsolete server hosting a website must be put offline and a new one must take his IP address. You don’t want to change all of yours A records on DNS to put online your new website, just exchange the internal IP address of your local network.

Your provider can exchange the addresses to assign the old IP public address to the new internal IP but it takes his time. How to cover the time between the provider action and your new website online?

Haproxy is the answer.

This is the content of the /etc/haproxy/haproxy.cfg:

#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
# redirect all traffic on :80 to another server
frontend  main *:80
    default_backend             app

backend app
    balance     roundrobin
    # where 192.168.x.x is the internal address of the new server
    server  app1 192.168.x.x:80 check

And then restart haproxy (e.g. on centos 6 service haproxy restart).

Before configuring haproxy remember to stop or reconfigure the service listening to the port 80. If for example you want to reconfigure apache you have to change Listen 80 into something like Listen 8081 in /etc/httpd/conf/httpd.conf (CentOS 6 systems), if you have varnish listening to the 80 you have to change the VARNISH_LISTEN_PORT parameter in  /etc/sysconfig/varnish.

Now all the traffic arriving to the old server on port 80 will be redirected to the new server via local network. When the provider will exchange the addresses you haven’t to change anything. Meanwhile, you can test all the production settings prior to the real internal IP exchange.

 

Guide to migrate a Drupal website to Django after the release of Drupal 8

I maintain a news website written in Drupal since 2007. It is a Drupal 6, before was a 5. I made many Drupal 7 installations in these years and I went to three Drupal local conventions. This is a guide on how to abandon Drupal if you already knows some basics of Django and Python.

Drupal on LAMP: lessons learned

  • PHP is for (not so) fast development but maintainability can be a pain.
  • Drupal try to overcome PHP limits, with mixed results.
  • Apache cannot stands heavy traffic without an accelerator like Varnish and time-consuming ad-hoc configurations. If traffic increases, Apache cannot stand it at all.
  • Drupal contrib modules are a mix of high quality tools (like Webform or Views Datasource) and bad written projects. The more module are enabled, the more the project lose in maintainability. It is not so evident if you don’t see any other open source project.

This is not the only real truth, this is my experience in these 8 years. I feel a more confident Python programmer than PHP programmer having spent less than one-third of the years working on it. At the end of the article I cite a list of article written for programmers feeling the same uneasiness of mine working on PHP and Drupal after trying other tools.

Django experiences

In the last years with Drupal still paying most of my bills I used the Django MVC framework written in Python for three project: an e-mail application, a real estate catalog  and a custom-made CRM. One of this is a porting of something written in PHP on Drupal 5. In all of these three project I was very happy with the maintainability, clearness of the code and high-level, well written packages I found while exploring it like Tastypie and many python packages found on cake shop.

Even considering I’m the only developer of these, I haven’t experienced the frustration I feel on Drupal when trying to make something work as I design or trying to fix some code I write time ago. I know that a CMS is at higher level than a framework, simply some projects are not suited for Drupal and I found more comfortable with Python than PHP in these days.

At the time I write Drupal 8 is out as Release Candidate. I made migrations from 5 to 6 and from 6 to 7 on some websites in the past. Migrating to a new major it’s not a science, it’s a sort of mystical art. When the Drupal 8 will be out, Drupal 6 will be automatically unsupported after 3 months Drupal 8 is out as of Drupal announcement since only the current and previous version are supported, 8.x and 7.x when 8 is out. Keeping a Drupal 6 running after that term will be risky.

Choosing the stack

Back to the news website I maintain, the choice is between a platform I already know well and it proves stable and maintainable for small/one-person team and another I have to learn. Plus, Django will be the natural choice to avoid the problems I’ve listed above and use the solutions I used on past django projects exploring new tools in the meanwhile.

Here the choices I made:

I decided to use gunicorn because it’s very easy to run and maintain for a django project and you haven’t to make wsgi run on nginx. Nginx is in front of gunicorn, serving static files and sending right requests to it. Memcached is used inside Django and it will store cached pages from views on volatile memory avoiding to read from the database any time a page is requested. I try to avoid using Varnish even if is a very good tool because I want to keep the stack as simple as I can and I’m confident Varnish and Memcache will speed up the website enough. Now is the time to rewrite the Drupal-hosted website into a Django application.

Write the E-R model

If you are here probably you have a running Drupal website you want to port to Django. Browse it like an user, and then open your Content types list to identify the Entities and the Relationships as of the E-R model suggests. If your website is running for a long time you probably want to redesign some parts, adding, removing or fusing entities into another.

Take my news website for example. I have 15 content types + 12 vocabularies (27 entities) on Drupal. After rewriting the E-R I’ve 14 models (entities), including the core ones. On the database side it translates into a 199 tables for Drupal and 25 for Django since it usually make an entity property into a database column. I trash some entities and fuse 4 entities into one.

From entities to models: understanding relationships

When you establish a relation between your re-designed entities you can have N:1 relations, N:N relations and 1:1 relations. A Drupal node “Article” that accepts a single term for a vocabulary named “Cheese type” translates into a N:1 relationship between the model Article (N) and the model CheeseType (1). It is a simple case since you can translate it into a ForeignKey field on your model since Article will get a ForeignKey field named author referencing to the Author model.

from django.db import models
from tinymce import models as tinymce_models
# Authors
class Author(models.Model):
    alias       = models.CharField(max_length=100)
    name        = models.CharField(max_length=100, null=True, blank=True)
    surname     = models.CharField(max_length=100, null=True, blank=True)
# Articles
class Article(models.Model):
    author      = models.ForeignKey('Author')
    title       = models.CharField(max_length=250,null=False, blank=False)
    body        = tinymce_models.HTMLField(blank=True, default='')
# Attachments to an Article
class Attachment(models.Model):
    article       = models.ForeignKey('Article', blank=True, null=True)
    file          = models.FileField(upload_to='attachment_dir', max_length=255, blank=True, null=True)
    description   = models.TextField(null=True, blank=True)
    weight        = models.PositiveSmallIntegerField()

In the case of a list of attachments to Article, you have a 1:N relationship between the Article model (1) and the Attachment model (N). Since the relationship is reversed, in the usual Django admin interface you cannot see the attachments in the article as is since you have to create an Attachment and then choose an article from a dropdown where attach it to.

For this case, Django provides an handy administration interface called inline to include entities in reversed relationship. This approach fix by design something that in Drupal world costs a lot of effort, with dozen of modules like Field Collection or workaround like this I write of in the past and it keep aligned your E-R design with your models. Plus, a list of all Attachment are available for free.

Exporting the data from Drupal

JSON is a pretty good interchange format: very fast to encode and decode, very well supported. I’m fascinated with YAML format but since I’ve to export thousands of articles I need pure speed and solid import/export modules on both Django and Drupal side.

There are many export module in the Drupal world. I’m very fond of Views Datasource and here how I used it:

  1. Install Views Json (part of Views Datasource): it is available for Drupal 6 and 7 and very solid
  2. Create a new view with your published nodes with the JSON Data style
    1. Field output: Normal
    2. Without Plain text (you need HTML)
    3. Json data format: Simple
    4. Without Views API mode
    5. application/json as Mime type
    6. Remove all parent / children tag name so you will have only arrays and objects
  3. Choose a path for your view
  4. Limit the view to a large number of elements, e.g. 1000
  5. Sort by node id, ascendent
  6. Add an exposed filter “greater than” Nid with a custom Filter identifier (e.g. nid)
  7. Add any field you need to import and any filter you need to limit the results
  8. Avoid caching the view
  9. Limit the access to the view if you don’t want to expose sensible contents (optional)
  10. Install a plugin like JsonView (chrome) or JsonView (firefox) to look at the data on your browser

You will get something like that:

{
  [
    {
      {nid: "30004",
      domainsourceid: "1",
      nodepath: "http://example.com/path/here",
      postdate: "2014-09-17T22:18:42+0200",
      nodebody: "HTML TEXT HERE",
      nodetype: "drupal type",
      nodetitle: "Title here",
      nodeauthor: "monty",
      nodetags: "Drupal, basketball, paintball"
      }
    },
    ...
  ]
}

Now you can reach the view appending ?nid=0 to your path. It means that any node with id greater than 0 will be listed. With nid=0 a max of 1000 elements are listed. To get other nodes you have simply to get the nid from the last record (e.g. 2478) and use it as value for the nid parameter obtaining something like http://example.com/myview?nid=2478.

Try it on your browser simulating what a procedure will do for you: check the response size and adapt the number of elements (#4) accordingly to avoid to overload your server, hit the timeout or simply storing too much data into the memory when parsing. When the view response is empty you’ve listed all nodes matching your filters and the parsing is complete.

In this example I’ve talked about nodes but you can do the same with files, using fid as id to pass as parameter and to sort your rows. In the case of files you have to move the files as well but it’s pretty simple to import these on a custom model on Django as you will see.

Importing data to Django

Django comes with some nice export (dumpdata)  and import (loaddata) commands. I’ve used a lot the YAML format to migrate and backup data from models but Json and SQL are other supported formats you can try. However in this migration I choose custom admin command to do the job. It’s fast: in less than 10 minutes the procedure imported 15k+ articles writing on a custom model some logging information on both error and success.

All the import code in my case, comments and import included, is about 300 lines of python code. The core of the import function for nodes willing to become Articles is that:

import json, urllib
# ...
sid = int(options['start'].pop())
reading = True
while reading:
    url = "http://mydrupalwebsite.example.com/myview?nid=%d" % (sid,)
    print url
    response = urllib.urlopen(url)
    data = json.loads(response.read())
    data = data['']
    # no data received, empty view result, quit
    if not data:
        reading = False
        break
    for n, record in enumerate(data):
        sid = int(record['']['nid'])
        # ... do something with data ...

In this cycle, sid is the start argument passed to the admin command via command line. Next, sid will be set to the last read record so, when record finishes, a new request to myview starting from the last read element will be made.

All input and output is UTF-8 in my case. JSON View quotes strings and you have to decode them before saving in Django:

from myapp.models import Article
import HTMLParser
hp = HTMLParser.HTMLParser()
authors = Author.objects.all()
...
for n, record in enumerate(data):
    try:
        art = Article(
            title = hp.unescape(record['']['nodetitle']),
            body = record['']['nodebody'],
            author = authors.get(alias=record['']['nodeauthor'])
        )
        # run the same validation of an admin interface submit
        art.full_clean()
        art.save()
    except ValidationError as e:
      # cannot save the element
      # inside e all the error data you can save into
      # a custom log model or print to screen
    except:
      # any other exception
      pass

On line 9 a new article is declared. The title in Json source is named nodetitle. On line 10 the title from json is unescaped and assigned to title CharField of Article. The nodebody  is set as it is since the destination field is a TextField with HTML. On line 11 username nodeauthor from Json is used as key to associate the already imported user to the ForeignKey field author, where username is saved as Author.alias.

Conclusion

Here the very basics on how to prepare a migration from Django to Drupal using Views Datasource module and a custom admin command. I described why I choose Django after years of Drupal development for this migration suggesting some tools to do the job and introducing some basic concepts for Drupal developer who wants to try Django.

Before leaving here a list of good contributions I’ve read about Drupal enthusiasts that suffers the same uneasiness of mine after long-time Drupal / PHP development. In their words I found some confort in my day programming job and a lot of inspiration. As an half-joke, I put on parenthesis the time that specific developer have spent on Drupal.

Epilogue

Here the download time graph from Google Search Console after some months:
downloadtime

You can clearly see the results in speed, expressed in milliseconds, between 2015 (old Drupal 6 platform) and 2016 (new Django platform).

Installing Solr 5 on CentOS 6 with Java 1.7

Here the instructions for a CentOS 6 with an already-installed Java 1.7 for Solr 5 without Tomcat.

cd
yum install lsof unzip
wget http://www.eu.apache.org/dist/lucene/solr/5.3.0/solr-5.3.0.tgz
tar zxvf solr-5.3.0.tgz
cd solr-5.3.0/bin

Now run the install_solr_service script as documented on official documentation:

mkdir /usr/local/etc/apache-solr-5
./install_solr_service.sh ../../solr-5.3.0.tgz -i /usr/local/etc/apache-solr-5 -d /var/mysolr5 -u mysolr5 -s
mysolr5 -p 5448

To get the current status:

service mysolr5 status

The service is already set to autostart:

chkconfig --list | grep solr
mysolr5 0:off 1:off 2:on 3:on 4:on 5:on 6:off

If you want to secure the Solr instance running it only on localhost, you can add a custom SOLR_OPTS:

nano /var/mysolr5/solr.in.sh
# Anything you add to the SOLR_OPTS variable will be included in the java
# start command line as-is, in ADDITION to other options. If you specify the
# -a option on start script, those options will be appended as well. Examples:
# ...
# run only on localhost
SOLR_OPTS="$SOLR_OPTS -Djetty.host=127.0.0.1"

Apply the changes and then check where the service is running:

service mysolr5 restart
netstat -tulpn | grep java
tcp 0 0 ::ffff:127.0.0.1:4448 :::* LISTEN 11273/java
tcp 0 0 ::ffff:127.0.0.1:5448 :::* LISTEN 11273/java

Before was available to all clients:

tcp 0 0 :::5448 :::* LISTEN 24541/java

Using supervisord

As alternative of the standard service you can use a nice tool like supervisor using the -f option to execute the command from there: I try before without the argument and supervisord will start the service on the client but it will not stop. Not good. The -f (foreground) option can solve this issue but I haven’t tested yet.

Using Tomcat

Tomcat is another way to run solr. I’ve used it in the past for multicore solr, but I will not use it anymore because I prefer single core running on multiple instances on different port. With this approach you can have a solr 5.x and 3.x instances running on the same server, not exactly efficient for consumed resources but really really more easy to deploy and maintain than Tomcat / multicore. So I’m happy with the service right now.

How to enable gzip on proxy servers on nginx

I use often Gunicorn as web server for django applications.

Usually I use Apache but I’m starting to use nginx as webserver to serve both the static files and the proxied gunicorn response.

I need to do something like what I’ve done with Apache to compress the response after I received from django since I’ve noticed that in my case compressing it before using @gzip_page decorator is more detrimental to performance than doing it after.

Here an essential mysite.conf to put in /etc/nginx/conf.d.

server {
    listen      80;
    server_name .example.com other.domain.example.com;
    charset     utf-8;
    # max upload size
    client_max_body_size 75M;
    location /media  {
        alias /usr/local/etc/files/mysite/media_root;
    }
    location /static {
        alias /usr/local/etc/files/mysite/static_root;
    }
    location / {
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        # gzip proxy response
        gzip on;
        gzip_proxied any;
        gzip_comp_level 7;
        # Serve static files via nginx
        # Serve dynamic requests via gunicorn on custom port (e.g. 8585)
        # and gzip the response
        if (!-f $request_filename) {
            proxy_pass http://localhost:8585;
            break;
        }
    }
}

In this way, content by Gunicorn is served to nginx and before to send it to client nginx gzip it (here with a compression level of 7 of 9).

See also:

Installing and configure Memcache on CentOS 7

Memcached

Memcached is a service to speed up page caching by saving them not on file or database tables but on volatile memory.

This howto cover three configurations: memcached for use on localhost (A) and memcached for local and remote use (AB).

A: configuration for host for Memcache server.
B: configuration for client host that will use the memcached service.AB: configuration for host for the server machine AND for host that will use the memcache service (e.g. via loopback) client and server on the same machine.

I will tag the steps with these symbols to allow to do the right steps if you want an A or an AB configuration. Any of these steps has to run as root user.

Apply to: AB, A

Install memcached daemon, start it and set it to boot on system restart (enable):

yum install memcached nano
systemctl start memcached
systemctl enable memcached

And allow memcache to be contacted by the webserver if needed:

setsebool -P httpd_can_network_memcache 1

Install libraries for Memcache client

Apply to: AB, B

Install libraries needed to consume the memcached service by applications. The fundamental library is libmemcached, a very efficient library written in C and then wrapped by libraries in other languages like pylibmc.

yum install memcached python-memcached gcc python-pip libmemcached libmemcached-devel zlib-devel
pip install pylibmc

Check the configuration

Apply to: A, AB

Check if service is running:

systemctl status memcached -l

You’ll get something like:

memcached.service – Memcached
Loaded: loaded (/usr/lib/systemd/system/memcached.service; enabled)
Active: active (running) since gio 2015-09-03 09:36:18 CEST; 23h ago
Main PID: 25149 (memcached)
CGroup: /system.slice/memcached.service
└─25149 /usr/bin/memcached -u memcached -p 11211 -m 64 -c 1024

set 03 09:36:18 myhostnamehere systemd[1]: Started Memcached.

Check again via netstat:

netstat -tulpn | grep memcached

And look at the stats:

memcached-tool 127.0.0.1 stats

The default setting for memcache is to run as TCP service. If you want to use memcache as UNIX socket to remove the TCP overhead, you can.

If you are are in AB configuration and you want to use Memcache only on the same server via TCP on loopback, you’ve done. If you are on A configuration and you want to serve memcache on other machine of the same network skip the next step.

Serve Memcache on UNIX socket

Apply to: AB (optional, skip if you want Memcached to be served as regular TCP service)

nano /etc/sysconfig/memcached

change:

OPTIONS=""

to:

OPTIONS="-s '/var/run/memcached/memcached.sock' -a 0766"

Restart the service:

systemctl restart memcached

it should fail due to write permission. Check the SELinux rule that is blocking the socket writing:

cat /var/log/audit/audit.log | grep memcached  | audit2allow

You should get something like:

#============= memcached_t ==============
allow memcached_t tmp_t:dir write;
allow memcached_t var_run_t:file getattr;
allow memcached_t var_run_t:sock_file create;

Apply the rule:

cat /var/log/audit/audit.log | grep memcached  | audit2allow -M mymemcached
semodule -i mymemcached.pp

And then restart the service again:

systemctl restart memcached

Now the TCP service is not running anymore:

netstat -tulpn | grep memcached

And to check the Memcached stats you have to ask to the socket instead of IP:

memcached-tool /var/run/memcached/memcached.sock stats

Serving memcache via TCP on different host on the same network

Apply to: A

You have to run memcache not on 127.0.0.1 but on the private address of the current machine. To do this, you have to get the address of the current machine and to bind memcache on it.

nano /etc/sysconfig/memcached

change:

OPTIONS=""

to:

OPTIONS="-l 192.168.xxx.yyy"

Where 192.168.xxx.yyy is the private address of your Memcache server host. To check what argument get -l you have to check using the ifconfig command. You get something like:

interfacenamehere: flags=0000 mtu 1500
inet 192.168.xxx.yyy netmask 255.255.255.0 broadcast 192.168.zzz.zzz
inet6 xxx::xxx:xxx:xxx:xxx prefixlen 00 scopeid 0x00
ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet)
RX packets 1657561 bytes 482287070 (459.9 MiB)
RX errors 0 dropped 6355 overruns 0 frame 0
TX packets 1492103 bytes 349546801 (333.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Now if you are on the B server and you ask for the 11211 port on the 192.168.xxx.yyy, you can’t connect.

You have to add a rule to the firewall on memcache server (A) to allow connections on local network.

Serving memcache via TCP on different host: create a memcached service for firewalld

Now you have to add a service to identify memcache

python

Then type the rows without the initial hashtag #. To avoid conflicts with future services I use memcached_chirale as service name:

# @see http://forums.fedoraforum.org/showpost.php?s=ff16ee76b348a3a4af5fb9c35c6c42a1&p=1730933&postcount=5
import firewall.core.io.service as ios
#Creates a service object
s=ios.Service()
#A short description
s.short = 'Memcached chirale'

#this defines the name of the xml file
s.name = 'memcached_chirale'

#A list of ports
s.ports = [('11211', 'tcp'), ('11211', 'udp')]
ios.service_writer(s, '/etc/firewalld/services')

Ctrl+D and or exit() and the configuration file is written:

less /etc/firewalld/services/memcached_chirale.xml

You can see all the configuration just written.

firewall-cmd --reload

to apply and then

firewall-cmd --get-services | grep memcached_chirale

will highlight the new service.

Serving memcache via TCP on different host: allow connection from the B server

Apply to: A

On the B host, run ifconfig to get the private address of the machine as before.

Then go to the A server and whitelist the B machine address on the firewall on the internal zone where 192.168.bbb.bbb is the B host private address.

firewall-cmd --permanent --zone=internal --add-service=memcached_chirale
firewall-cmd --permanent --zone=internal --add-source=192.168.bbb.bbb
firewall-cmd --reload

You will receive success messages if everything is ok.

You can check the rules on the file /etc/firewalld/zones/internal.xml or using:

firewall-cmd --zone=internal --list-all

Check the service on 192.168.bbb.bbb (B host)

Use telnet to connect to 11211 port on A host:

telnet

After the connection establishment just type:

stats

And you’ll get values like:

STAT pid 55555
STAT uptime...

Then, Ctrl+D and you’re done. You can use the same command you use via memcached-tool but remember

A note about the firewalld zone

Note: I used the internal zone because it match my need. The internal zone is described like this:

For use on internal networks. You mostly trust the other computers on the networks to not harm your computer. Only selected incoming connections are accepted.

The very last sentence is important, since only IPs added via add-source on the zone are allowed to connect to the service. Use this and other rules with caution and don’t be too permissive. This howto can be very shorter avoiding firewall and selinux but disabling these tools will open to malicious attacks your systems.

Acknowledgements

Here some of the sources I’ve used to make this thing happen. Thank you for helping the community to spare time writing useful howtos!

Photo by Memcache

How to display a custom cover embedding a youtube video and when stopped display the cover again

I need to display a custom image cover in front of an embedded Youtube video.

After the video has stopped, I need to display again the clickable cover.

For a better graphical result I’ve added an over image for the cover and a fadein to the cover when the video ends. To do this I’ve used the Youtube iframe API.

This code is for jQuery 1.4.4. If you have a newer version of jQuery and live() is not working change live() to on().

Here the html:

<a id="idcover" href="#" 
style="display: block; width: 100%;">
<img src="/path/to/cover/off.jpg" alt="Video"></a>

Here the js:

// include youtube API
$.getScript("http://www.youtube.com/player_api");
var myselector = "#idcover";
// preload image displayed on over to avoid glitches: 900 with, 500 height
overimg = new Image(900,500);
overimg.src = '/path/to/cover/on/hover.jpg';
var offimg_src = overimg.src;
$(myselector).live('mouseover', function (e) {
offimg_src = $(this).find('img:first').attr('src');
$(this).find('img:first').attr('src', overimg.src);
});
$(myselector).live('mouseout', function (e) {
$(this).find('img:first').attr('src', offimg_src);
});
$(myselector).live('click', function (e) {
e.preventDefault();
// add video player container
var playerid = 'yourplayercontainerid';
$(myselector).after('&lt;div style="display: none;" id="' + playerid + '"&gt;&lt;/div&gt;');
// I suppose the framework is loaded before the click, so this is not strictly necessary
// function onYouTubeIframeAPIReady() {
window.player = new YT.Player(playerid, {
width: '100%',
height: 720,
videoId: '7W2vjTgzucA', // your youtube code here
playerVars: { 'autoplay': 1, 'controls': 1, 'rel': 0 },
events: {
'onReady': onPlayerReady,
'onStateChange': onPlayerStateChange,
// 'onError': onPlayerError
}
});
// }
function onPlayerReady(event) {
// hide cover
$(myselector).hide();
// view the player
$('#'+playerid).show();
}

function onPlayerStateChange(e) {
// se stopped (raggiunto il fondo), rimette il tappo e distrugge il video player
if (e.data == 0) {
$(myselector).fadeIn(500);
// destroy iframe player
window.player.destroy();
// destroy player container
$('#'+playerid).remove();
// now the cover is ready to another click, and all 
// this process will restart on user click on cover
}
}
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: