Headless Drupal: Using Drupal’s API to Batch Script Your Drupal Site.

January 20, 2009

Whenever I work with a significant framework or off-the-shelf software, I invariably encounter situations in which I need to do “one-off” programmatic batch tasks outside the normal flow of the application.

Of course, you can look at the database structure and manipulate the data directly in a database client or through your favorite programming language, but this can actually be less convenient (and less safe) then directly using the application’s API which encapsulates and abstracts away the underlying data structure.

And often we are already familiar with this API anyway as a result of using the framework or customizing the software. Unfortunately, its not always obvious how to invoke the application in an entirely programmatic way to perform these types of tasks. These methods usually exist, but they are often not well documented.

Today, I will explore how to do some programmatic manipulation of Drupal (specifically Drupal 6, although this approach is very similar in Drupal 5) showing specific examples to get you started creating your own scripts.

Invoking Drupal Programmatically

Invoking Drupal programmatically is surprisingly simple, with just a few lines of code:

<?php
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

The drupal_bootstrap() function can take other constants to load certain parts of Drupal. I usually just fully load Drupal to ensure that I have access to the full API and can perform any task I require.

And because we are the only ones who will see this execute, and we want to know if anything goes wrong, we may want to set PHP’s error reporting manually by adding this just before the bootstrapping process:

error_reporting(E_ALL);

We now have access to Drupal’s API and can use it to manipulate our Drupal site programmatically.

But before we do anything interesting with this foundation, you may be wondering how to actually execute this code. The easiest way is to just create a PHP script in your Drupal root directory alongside Drupal’s cron.php script, add the code you want, and navigate to it in your browser. So, if Drupal is installed in a subdirectory called ‘drupal_example’ and we added this code to the a file in that subdirectory called ‘batch_example.php’, we would simply visit this URL to invoke it:

http://www.example.com/drupal_example/batch_example.php

This may seem like an odd way to invoke a batch processing script, especially if you are coming from another language. But as I said, this is the easiest approach, which pretty much eliminates any possibility of things like path errors, and it allows you to spit out nicely formatted HTML.

If you really want to invoke this script on the command line, I would even suggest that you not do this by calling the PHP binary, but instead pass the URL to wget, which will have the same effect as loading the script in your browser:

wget http://www.example.com/drupal_example/batch_example.php

In fact, this is exactly how you typically invoke the cron.php Drupal script from your system’s cron, and obviously, we can do the same with these batch scripts to run periodic scripts that don’t logically fit into our custom modules, inside a hook_cron() function.

With all that out of the way, lets start actually doing something useful.

Querying Drupal, Outputting HTML, Email

As the name suggests, the db_query() method in the Drupal API allows you to send a query to the underlying database, without having to manage the database connection yourself. You can then use db_fetch_object() to access the data. For illustration purposes, I am going to use a trivial query that will list the node types that have been created in the site:

<?php
error_reporting(E_ALL);
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$results = db_query('select distinct type as type from node');
while ($result = db_fetch_object($results)) {
  echo 'A node type: ' . $result->type  . '<br />';
}

During the bootstrapping process, Drupal gathers the database connection information from your settings.php file and uses it behind the scenes to execute the query.

If you want to track something in your site using a query as I did above, you easily extend this:

<?php
error_reporting(E_ALL);
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$results = db_query('select distinct type as type from node')
$message = 'This is the current list of node types:\n';
while ($result = db_fetch_object($results)) {
  $message .= "A node type: " . $result->type  . "\n";
}

$to='my_email@example.com';
$subject="Available Types Report";
if (mail($to, $subject, $message)) {
  echo 'email sent';
} else {
  echo 'email not sent';
}

Again, this can be triggered periodically by a cron job, calling the URL of the script with wget.

But this isn’t very interesting yet. Because nodes are the foundation of Drupal’s approach to managing content, we will most often want to get at node objects.

Getting Drupal Node Objects

Before we start doing anything truly useful, we need to understand node objects in Drupal.

Continuing with our examples above, where we are just querying the underlying data, we can access nodes using the node_load() function in the Drupal API like so:

<?php
error_reporting(E_ALL);
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$results = db_query('select nid from node where type="page"');
while ($result = db_fetch_object($results)) {
  $node = node_load($result->nid);
  echo 'A page title: ' . $node->title . '<br />';
  echo 'created on: ' . format_date($node->created) . '<br />';
  echo 'changed on: ' . format_date($node->changed, 'custom', 'Y-m-d H:i:s O') . '<br /><br />';
}

I simply constructed a query for the nodes I wanted to find, in this case, pages only, and returned the “nid” or node identifier field. With this, I can iterate over the results, passing the nid to node_load() to instantiate the basic node object.

Once we have an instantiated node object, say, in a variable called “$node” we can access fields like the following that might be of interest:

  • $node->nid: the node’s ID.
  • $node->vid: the version ID for the node.
  • $node->type: basically, the content type, such as a ‘page’ or ‘blog’.
  • $node->uid: the author’s user ID.
  • $node->created: the date the node was created, stored as a UNIX timestamp.
  • $node->changed: the date the node was last updated, stored as a UNIX timestamp.
  • $node->title: the title assigned to the node.
  • $node->body: the entire representation of the node.
  • $node->content['body']['#value']: the actual value assigned to the body.
  • $node->status: whether published/visible (= 1) or unpublished/hidden (= 0).
  • $node->sticky: no(=0) or yes (=1).
  • $node->promote: no(=0) or yes (=1).
  • $node->moderate: no(=0) or yes (=1).
  • $node->comment: disabled (=0), read only (=1). or read/write(=2).
  • $node->format: filtered HTML (=1) or full HTML (=2), and possibly others depending on your configuration.

Notice that in the example above, I used the format_date() function in the Drupal API to convert the date fields to something human readable. The two examples of format_date() suggest its flexibility.

Now that we know how to access node fields, we can easily update these fields.

Batch Updating Drupal Nodes

Now things are starting to get interesting. Once you have a node object, simple assignment can be used to change its values. In the following example, I will disable commenting on all page nodes in a site:

<?php
error_reporting(E_ALL);
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$results = db_query('select nid from node where type="page"');
while ($result = db_fetch_object($results)) {
  $node = node_load($result->nid);
  $node->comment=0;
  $node = node_submit($node);
  if ($node->validated) {
    node_save($node);
  } else {
    echo 'Node: ' . $node->title . '(' . $node->nid . ') was not saved. <br />'; 
  }
}

The call to node_submit() allows installed modules to act on this node before it is saved. So for example, the core Drupal node module sets the creation date of the node. The $node->validated check makes sure the node has finished this process successfully, and of course, node_save() actually saves the node back to the database.

Batch Creating Drupal Nodes

As long as Drupal’s machinery has something that looks and acts like a node, it will treat it as a node. So we can simply create a generic object by calling new StdClass(), make the necessary assignments and save it as we did before. In the following example, I will create ten story nodes:

<?php
error_reporting(E_ALL);
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

function add_new_node($title, $type='page', $status=1, $promote=1, $format=2) {
  $node = new StdClass();
  $node->type = $type;
  $node->status = $status;
  $node->promote = $promote;
  $node->format = $format;
  $node->title = $title;
  $node = node_submit($node);
  if ($node->validated) {
    node_save($node);
  }
}

$story_num=1;
while ($story_num < 11) {
  add_new_node('Story number: ' . $story_num);
  $story_num += 1;
} 

To make this easier, I moved the node creation to a function with some sensible default arguments that can be overridden as needed. This function populates a very bare-bones node, and most likely you will want to expand on what I’ve provided here.

Batch Deleting Drupal Nodes

As you may have guessed by now, to delete a node, we call the node_delete() method, passing it a node identifier. So, to clean up the batch creation script we just ran, let’s delete all story nodes created withing the past hour:

<?php
error_reporting(E_ALL);
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);

$hour_ago = time() - (60 * 60); // 60 minutes * 60 seconds
$results = db_query('select nid from node where type="story" and created > ' . $hour_ago);
while ($result = db_fetch_object($results)) {
  node_delete($result->nid);
  echo 'Deleted node: ' . $result->nid . '<br />'; 
}

Much More Can Be Done…

I’ve provided a very basic overview of what you can do programmatically with the Drupal API, but with this foundation, you can create more useful scripts.

In a future post, I will show how to work programmatically with CCK defined node fields.

This article is translated to Serbo-Croatian language by Anja Skrba from Webhostinggeeks.com.

50 Comments

Comment by Jason
2009-01-28 10:37:09

Thanks so much for this tutorial! I’ve searched high and low for useful details and examples.

I’m going to need to programatically create, update, and delete nodes from a cron job, unfortunately it involves several custom CCK content types, so I would be very very interested in seeing a similar tutorial for that task.

Thanks!

 
Comment by admin
2009-01-28 10:45:07

Its great to hear that you found this useful. I’ll try to post a CCK follow-up in the next few weeks. (I hope that fits your time line.)

 
Comment by DaveX99
2009-02-03 12:10:50

Thanks as well. This came along at exactly the right time. I’ll keep my eye on this for updates. -dave.

Comment by domuz gribi
2009-12-20 16:22:08

I’ll keep my eye on this for updates

 
 
Comment by Tim
2009-02-15 19:38:31

Thanks for the info. I had something similar, but you nailed down some loose ends I had not gotten around to.

But a warning – be careful of the Update script! You mention that calling node_save will call the Drupal core. It will change your created and changed dates in the node table. Suddenly, all your posts are out of order. I have not yet determined if that was a side-effect of one of the other installed optional modules that I have, or if it would do the same on a pure, clean, fresh Drupal install. I’ll let you know what I discover.

Comment by admin
2009-02-16 04:05:45

Yes, the node_submit() function described above will set these dates. If this is causing you problems, you may be able to do away with this step, and the subsequent validation check. These exist mainly to play nicely with other modules.

 
 
Comment by Patrick
2009-03-04 01:23:50

Thanks for this nice example!

I’m having to do some batch processing on several nodes. These nodes have been created on a local Drupal installation and are images, videos or audio files (which make them sometimes big). My batch work has to send these nodes to another Drupal installation, located on the web. The work would be something like this.

for each local node
1) get local node infos (attached files, taxonomy, etc)
2) send all the data (images, videos, …) by FTP to the remote Drupal files directory.
3) Call several XMLRPC services on the remote Drupal to create the node, attach the sent data to it, set taxonomy, …
4) Log result.
end for each

But, the problem, is that calling a PHP script accross a web server often means that it will have a maximum execution time. (Perhaps 15 seconds).

I was thinking about using drupal’s built in batch functions, but the problem of timeout will be the same. And even if I would try to cut the job into smalls jobs, handled by some Ajax calls, sending a video file of 10 Mo will take to much time too.

Do you know if I can call the batch script directly with the PHP CGI? How will Drupal handle the domain name, as they will be no URL? Have you got an Idea how to do that?

I don’t really want to disable the max execution time on the local server. I was also thinking about creating another Virtual Host with some other PHP settings, just to handle the long batch works.

Comment by admin
2009-03-04 07:26:29

Patrick,

I’m not sure I have an exact answer that will fit your situation, but I could suggest some alternatives to consider that might work well for you. It occurs to me that this response might be long enough to warrant its own blog post (and I should put something up again soon anyway!) so I might do that. Either way, look for my $ 0.02 soon.

 
 
Comment by thePanz
2009-03-05 12:25:33

Hi, your example saved my time!
I’m waiting for your next CCK tutorial ;)

Some suggestions:

- use module_load_all(); for enabling all modules features
- use cck validators before saving node

I also found very interesting this paste-code:
http://pastie.textmate.org/pastes/406996

Regards!

 
2009-03-06 10:24:44

[...] as well as CCK and Views Definitions, Between Drupal Instances March 6, 2009 In my previous Headless Drupal post, I proposed ways to work with Drupal content programmatically, particularly for bulk tasks like [...]

 
Comment by Andrew
2009-03-09 08:21:01

Thanks very much for this clear and helpful tutorial. I needed to store outputs from an application in a manageable way and Drupal as a CMS seemed ideal – -the only problem was doing it programmatically. Your tutorial has made this possible for me . Thanks

Comment by admin
2009-03-09 09:18:12

Glad to hear it, and I am planning on posting a tutorial this week on handling CCK nodes programmatically….

You might also be interested in this post which links to modules that allow you to output nodes as CSV and XML, which might be easier for your purposes.

 
 
2009-03-13 06:17:53

[...] Drupal Revisited: Programmatic Manipulation of CCK Defined Nodes March 13, 2009 In my previous Headless Drupal post, I proposed ways to work with Drupal content programmatically, particularly for bulk tasks like [...]

 
Comment by Martin Baker
2009-04-30 02:58:49

Great tutorial, thanks

 
Comment by Estebandido
2009-05-20 08:43:37

Thanks for this. This info was the missing link I’ve spent days searching for. APIs are easy to find, but no one ever explained what to DO with them.

 
Comment by Madan U S
2009-06-06 02:24:48

Wow… just what I was looking for… great job, and thanks a ton!

 
2009-06-26 05:58:47

[...] i found this excellent blog post on creating drupal nodes programmatically and this useful resource on drupal node fields. i knocked up the following script that does the job. Just copy this to s9y-import.php and place it [...]

 
Comment by Michael
2009-06-26 16:29:23

Thanks! A simple, clear, and useful discussion.

 
2009-07-15 15:02:29

[...] stonemind consulting » Headless Drupal: Using Drupal’s API to Batch Script Your Drupal Site. (tags: programming reference drupal php api scripting) [...]

 
Comment by ben
2009-07-24 09:06:27

hey gr8 art !
do you know how to load uid giving username or how to load all registered users ?

 
Comment by New Technology Site
2009-07-25 06:39:12

This is great work. Very impressive, indeed. I myself am involved in developing a couple of measurement concepts and should use your work as a reference, I shall certainly cite you. Have you made any updates to these? Also, is there a way you could either possibly post a video taking us through this line by line.? Perhaps even on YouTube? Thanks and great work!

Comment by admin
2009-07-26 13:13:02

Glad you found this useful. I use this approach fairly often, and don’t really have a significant update to this post. I’ll think about the YouTube idea.

 
 
Comment by Bogdan
2009-07-26 08:33:25

Thanks for clear examples.

What about multi-site Drupal installations with no ‘default’ site? How would I specify the correct site to work with?

Comment by ai
2009-07-26 09:30:44

found an answer to my question here http://current.workingdirectory.net/posts/2009/attach-file-to-node-drupal-6/

one just has to set $_SERVER['HTTP_HOST'] = ‘example.org’ to specify the exact site of the multi-site Drupal to use…

 
 
Comment by pissed off
2009-07-27 22:15:19

This problem is driving me up the freaking wall…files with the two lines of code (require_once deal and the function under it) only work in the root folder . If i move the file to some other folder and also chance the require_once file path it still doesn’t work. I’m wondering what the hell’s going on. It’s only behaving correctly if the php file is in the root folder

Comment by admin
2009-07-28 02:09:06

Yes, this needs to be run in the Drupal root directory, the same directory that contains the cron.php script.

 
Comment by sam c
2009-12-03 05:52:10

I use PHP’s chdir to get around this, e.g.:

chdir(‘path/back/do/drupal’);

 
 
 
Comment by sfyn
2009-10-22 14:19:37

I am looking for a way to bootstrap Drupal outside of the root directory, any thoughts – already tried setting the include path, but since bootstrap includes files with a ‘./’ prepended, I’m getting nowhere.

 
Comment by ussher
2009-11-19 03:06:32

Awesome. Look forward to anything else you want to say on this subject. Particularly how to do everything in code rather than having to use the admin interface.

I’ve found drush, and its really good, but it would love to learn more about turning settings on and off through code. Doing stuff programmatically

 
Comment by Jim
2010-01-04 21:28:54

It should be noted that unless an anonymous user has access to delete, update, or create nodes, then these scripts will not work. You can use wget’s cookie handling to work around this.

From the wget manual:

This example shows how to log to a server using POST and then proceed to download the desired pages, presumably only accessible to authorized users:
# Log in to the server. This can be done only once.
wget –save-cookies cookies.txt \
–post-data ‘user=foo&password=bar’ \
http://server.com/auth.php

# Now grab the page or pages we care about.
wget –load-cookies cookies.txt \
-p http://server.com/interesting/article.php

If the server is using session cookies to track user authentication, the above will not work because –save-cookies will not save them (and neither will browsers) and the cookies.txt file will be empty. In that case use –keep-session-cookies along with –save-cookies to force saving of session cookies.

An example shell script can be found in this post:
http://drupal.org/node/118759#comment-664498

 
2010-01-10 07:33:04

Very interesting and informative site! Good job done by you guys, Thanks

 
Comment by Andrew
2010-01-31 16:53:49

Helpful hint -

If you get the warning “Warning: Cannot modify header information – headers already sent”, it’s because you are using the “echo” or “print” command in your script.

See the note on this page for details: http://gala4th.blogspot.com/2009/09/drupal-command-line-script-template.html

 
Comment by Edsko de Vries
2010-02-16 10:43:19

Great article! This sort of stuff should be better documented on the Drupal website. Thanks for posting!

 
Comment by kisisel blog
2010-03-23 12:26:10

Thanks as well. This came along at exactly the right time. I’ll keep my eye on this for updates

 
Comment by tv rehberi
2010-03-23 12:26:33

Thanks as well.

 
Comment by ankara
2010-05-07 21:44:59

Thanks as well.

 
Comment by outlooksoft
2010-06-29 11:03:27

Great post! I am glad that i found this site this will definitely help me. Thanks!

 
Comment by da?han külegeç
2010-08-19 01:13:29

Wow… thanks..

 
Comment by Quangdog
2010-09-24 21:15:58

Your batch delete script does not delete the nodes created in your batch create script. You are creating pages in the create script, but deleting stories in the delete script. Just a word of warning to anyone trying to use them both for some reason – make sure the items created in the batch are the same node type as those you are trying to delete.

Thanks for sharing these awesome examples!

 
Comment by pvp serverlar
2010-10-26 06:29:55

Thanks as well…

 
Comment by Cor Driehuis
2010-12-15 16:52:53

What can I say… you ROCK!, THX for the article.

 
Comment by vigilon
2010-12-29 18:07:06

That’s a very useful tutorial to our current project. Thanks.

 
Comment by Scott
2012-03-27 07:04:45

I know this is an old post, but thanks a million!

You have NO idea how difficult it is to locate examples for using the Drupal API outside of Drupal.

I still don’t fully understand what I set out to locate – an example how to use the Drupal Form API in a ‘regular” (standalone) PHP page… but this gets me closer.

 
Comment by Lida Sumler
2012-04-17 03:37:58

I appreciate your wp design, where did you download it from?

Comment by admin
2012-04-17 03:53:43

Thanks! Its a custom design I did, 6 years ago now….I just wanted something unique, clean and minimal, so I whipped it up. This is actually a revised version of it, the original can be seen here: http://www.flickr.com/photos/44846967@N00/2403843653/in/set-72157600683299545 .

 
 
Comment by Cheap nfl jerseys
2013-01-06 23:45:15

I hardly leave a response, but after looking at through a few of the responses
on this page stonemind consulting ? Headless Drupal: Using Drupal?s API to Batch Script Your Drupal Site.

. I actually do have a few questions for you if you tend not to mind.

Could it be simply me or do a few of these responses look as if
they are written by brain dead visitors? :-P And, if you are writing on additional online sites, I would like to keep up with you.
Could you post a list of every one of all your
social sites like your Facebook page, twitter feed, or linkedin profile?

 
2013-01-11 19:34:21

These are actually fantastic ideas in about blogging.
You have touched some pleasant factors here. Any way
keep up wrinting.

 
2013-01-17 12:51:54

It’s awesome designed for me to have a website, which is valuable in support of my know-how. thanks admin

 
2013-02-13 09:13:24

I know this if off topic but I’m looking into starting my own blog and was wondering what all is required to get setup? I’m assuming having
a blog like yours would cost a pretty penny?
I’m not very web smart so I’m not 100% sure. Any tips or advice would be greatly appreciated. Many thanks

 

Sorry, the comment form is closed at this time.