Development

DatabaseFileStorageSystem

You must first sign up to be able to contribute.

Version 5 (modified by Erestar, 11 years ago)
--

How to Store Files in a Database

Plugin available

The concepts for this tutorial have been implemented in a plugin. http://www.symfony-project.com/trac/wiki/sfPropelFileStoragePlugin

Introduction

This document aims to provide a good method for storing files in a database using Symfony. The argument between flat file storage vs. database storage has been going on for quite a while, so this system tries to alleviate some of the concerns with speed. It outlines general file storage, a caching system, and means to serve the files in a way a user would expect.

Benefits

Some useful features of this method include:

  • complete separation between application code and application data
  • application portability (you don't need to worry about copying files from a live server to setup a development site or mirror)
  • no need to worry about setting application-specific file permissions (everything is stored in the symfony cache)
  • no need to worry about keeping your file system and your database relationships in check

Drawbacks

  • hard to export to SQL scripts (best way to transport is to zip the db folder and extract it to another name)
  • hard to put testing data into fixtures

Disclaimers

This article is not meant to take a side in whether it is better to store files on the file system or in a database. Like most things, you'll need to examine your situation and determine which method suits you best.

A note about uploading large files

This method is somewhat configuration dependent. For PHP, your max_post_size, max_execution_time, and memory_limit play a factor, as does the max_packet_size in the my.cnf MySQL configuration. If you don't want to muck with any config files then use smaller files.

This test was done on Linux, Apache, MySQL, and PHP5. I have tested a similar method using Windows/IIS with PHP5/MySQL, but no other systems. If you test this on other systems please post the results. This tutorial assumes you are using Propel as your DBA

This article also omits a lot of the details for working with a Symfony project, so you should be familiar with Symfony in general before using this approach.

Getting Started

The best way to follow along with this tutorial is to download the files and run the project yourself. What you find on this page deals only with the important details and explanations.

http://www.erestar.net/db_store.zip

Database Structure

The following schema file outlines two tables: One to hold the information for a file and another to hold the actual binary data. Since speed is important to us, we don't want to have to get all of the binary data for every file just when we're listing file names. We want to be sure we only pull the binary data when we're actually going to send it to the browser. Most fields below are self explanatory.

<?xml version="1.0" encoding="UTF-8"?>
<database name="propel" defaultIdMethod="native" noxsd="true">
  <table name='file_info'>
    <column name="id" type="integer" autoIncrement="true" required="true" primaryKey="true" />
    <column name="name" type="varchar" size="255" />
    <column name="size" type="integer" />
    <column name="mime" type="varchar" size="255" />
    <column name="is_cached" type="boolean" />
    
  </table>
 
  <table name='file_data'>
    <column name="id" type="integer" autoIncrement="true" required="true" primaryKey="true" />
    <column name="binary_data" type="blob" />
    <column name="file_info_id" type="integer" />
   
    <foreign-key foreignTable="file_info">
      <reference local="file_info_id" foreign="id" />
    </foreign-key>
  </table>
</database>

As you can see, we have a one-to-many relationship between the file info and the binary data. To make it easy to retrieve this information, we can put the following in our FileInfo? object:

  public function getFileData() {
    $datas = $this->getFileDatas();
    if(!count($datas)) {
      return null;
    }
    
    return $datas[0];
  }

This simply allows us to access the FileData? object with one call (we'll use this later).

The Action and Templates

The downloadable code contains a single module, called 'file', where we're going to do everything. This was initially generated using propel-generate-crud to give starting point, and I'll outline the interesting changes below.

At this point, you should have your database and model created, along with the update to the FileInfo? object. You should also have a CRUD module called 'file' generated.

Uploading a file

First, we'll need a form to upload the file, so we'll whack out everything in editSuccess.php and put in:

<?php use_helper('Object') ?>

<?php echo form_tag('file/update', array('multipart' => true)) ?>

<?php echo object_input_hidden_tag($file_info, 'getId') ?>

<table>
<tbody>
<tr>
  <th>Upload a file:</th>
  <td><?php echo input_file_tag('uploaded_file') ?></td>
</tr>
</tbody>
</table>
<hr />
<?php echo submit_tag('save') ?>
<?php if ($file_info->getId()): ?>
  &nbsp;<?php echo link_to('delete', 'file/delete?id='.$file_info->getId(), 'post=true&confirm=Are you sure?') ?>
  &nbsp;<?php echo link_to('cancel', 'file/show?id='.$file_info->getId()) ?>
<?php else: ?>
  &nbsp;<?php echo link_to('cancel', 'file/list') ?>
<?php endif; ?>
</form>

The only thing even of mild interest here is to make sure that we make the form multipart/form-data.

A bit more interesting is the update action, where we'll put our file in the database. Please read the inline comments:

  public function executeUpdate ()
  {
    if (!$this->getRequestParameter('id', 0))
    {
      $file_info = new FileInfo();
    }
    else
    {
      $file_info = FileInfoPeer::retrieveByPk($this->getRequestParameter('id'));
      $this->forward404Unless($file_info);
    }

    //Use symfony's request object to pull information from the uploaded file
    $name = $this->getRequest()->getFileName('uploaded_file');
    
    //path returns the temp file that our file was stored to
    $path = $this->getRequest()->getFilePath('uploaded_file');
    $size = $this->getRequest()->getFileSize('uploaded_file');
    $type = $this->getRequest()->getFileType('uploaded_file');    
    
    //Record information about this file in the database
    $file_info->setId($this->getRequestParameter('id'));
    $file_info->setName($name);
    $file_info->setSize($size);
    $file_info->setMime($type);

    //Make sure to save the file info so $file_info gets an ID
    $file_info->save();
    
    //Create a new object to hold our data, make sure to establish which $file_info object the data belongs to
    $file_data = new FileData();
    $file_data->setFileInfoId($file_info->getId());
    
    //Read the data off the disk from the temporary location, put it in our data object, and save it to the database.
    $data = fread(fopen($path, "r"), $size);
    $file_data->setBinaryData($data);
    $file_data->save();

    
    //redirect to our list of files
    return $this->redirect('file/list');
  }

So far, we haven't done anything that you wouldn't do with normal file access with the exception of storing the data in our blob.

Our update action shoots us back to our list, but that's fairly uninteresting. However, the listSuccess.php in the project download does offer some better formatting and some links that will be helpful later.

Pulling the file from the database and caching

Pressing on, though, we'll see how to not only serve our files from the database but also to cache them so we don't always have to query large amounts of data. We're going to do this in the show action.

  public function executeShow ()
  {
    $file_info = FileInfoPeer::retrieveByPk($this->getRequestParameter('id'));    
    $this->forward404Unless($file_info);
    
    $this->file_data = $this->serve($file_info);
  }

Not too much here yet. The bulk of the processing is going to occur in the protected 'serve' method that we're going to add. You'll understand the need for the separation later. For now, just note that serve() takes in a FileInfo? object, and returns a FileData? object.

The serve method has to do the following things for us:

  • determine if the file has been cached
  • if not:
    • then pull the file data from the database
    • write the file data to a cache
    • tell the FileInfo? object that the we cached the file
  • if it is cached:
    • load the file data from the cache
  • serve the file to the browser

So, first we need a place to cache the file. We're going to use Symfony's built in cache object for this. At the time of this writing, the Symfony docs go into great detail on how to use caching within the framework, but don't describe how the sfFileCache object works (or I've missed it completely). At any rate, its fairly simple.

You instantiate the sfFileCache object by passing it a directory. This directory becomes the root directory of this cache. From then on, you have the has($key, $ns), set($key, $val, $ns), and get($key, $ns) methods. These are fairly self explanatory, but of interest is the last parameter, which allows us to specify a namespace, which will create subdirectories under the cache root. We'll use this to keep our cache from stepping on the toes of symfony's application caching.

For our cache, we're going to store our uploaded files in the root of the current symfony cache, along side of the application caches. We're going to use the same cache for all environments, because we want to keep things consistent.

So, finally, the serve() method, with comments inline:

  protected function serve($file_info) {
    //get an instance of the file cache object. We grab the web directory, jump back one (which
    //puts us in the root of our project), then get the name of the cache folder
    
    //we don't want to use sf_cache_dir because that is application and environment specific
    
    
    $cache = new sfFileCache(sfConfig::get('sf_web_dir').DIRECTORY_SEPARATOR."..".DIRECTORY_SEPARATOR.sfConfig::get('sf_cache_dir_name'));
    
    //The last parameter for the cache methods is the name space. We're going to be putting our cached files in the 'uploaded_files' name space

    //Next we see if we can pull the file from the cache. This is dependent on both the FileInfo object
    //thinking that the file is cached, and on the file actually being there. Remember a large point of this
    //is to not care if we bring our flat files with us when we move our database!
    
    if($file_info->getIsCached() && $cache->has($file_info->getId(), 'uploaded_files')) {
      //Ok! We have a cached copy of the file! We're going to create a throw-away FileData object to
      //store our cached file in.
      
      $file_data = new FileData();
      $file_data->setBinaryData($cache->get($file_info->getId(), 'uploaded_files'));
    }
    
    else {
      //File not cached, so we have to pull from the database.
      $file_data = $file_info->getFileData();  //This is the function we added to FileInfo when we first created the model
      
      //Write the file data to the cache and make sure FileInfo knows its there
      $cache->set($file_info->getId(), 'uploaded_files', $file_data->getBinaryData()->getContents());
            //getContents is a method of the propel Blob oject that gives us exactly what we need
      $file_info->setIsCached(true);
      $file_info->save();
    }
    
    //Next we need to get our response straightened out. We can't send anything to the browser except
    //the headers and the data, so make sure that the web debugger is off
    sfConfig::set('sf_web_debug', false);
    $this->getResponse()->addHttpMeta('content-type', $file_info->getMime());
    $this->getResponse()->addHttpMeta('content-length', $file_info->getSize());
    
    return $file_data;        
  }

Ok, so there's the guts of what we need to accomplish. The bulk of it is explained in the comments, and this method should do just about everything we need. Two more things must be done before we can actually see a file though.

The Template and the View

Remember that we can't have anything in the response except headers and the raw data, so we have to make sure we're not decorating with the application's layout. So, in config/view.yml for this module, set:

showSuccess:
  has_layout: Off
  
downloadSuccess:
  has_layout: Off

Right now we're only concerned with the show action/template, but we'll use that download one later, so you might as well set it now.

And finally, all we have left is the showSuccess.php template that will contain the data we're dumping to the client.

<?php $file_data->getBinaryData()->dump(); ?>

Like the $file_data->getBinaryData()->getContents() we used in the action, dump() is a method provided by the Propel Blob object to dump our binary data to the response (Thanks Propel!).

Does it work?

Well, hopefully it does. At this point you should be able to get your file spit to the browser. Not only that, if you open /project/cache, you should see the ID of your file there. Try uploading a few more to see the several different IDs popup. For fun, try switching IDs between two cached files to see that its working.

So, at this point you should have the basis for a database file storage system that makes use of disk caching. Feel free to use the project linked to at the top for testing and other things. Also, if people are having problems with the application, I'll add a section here with common issues and their resolutions.

For now, though, we're going to move on to bigger and better things.

Nice Looking URLs

One point of annoyance with what we've got so far is that you need to access the files by ID. No one wants to view an image for /file/show/id/14! Instead we want to give our browsers what they expect: /images/drunk_and_on_fire.jpg or something similar.

There's not much to do, but serving a query string with a dot (.) in it has been a problem for a lot of people so far, so we're going to address that here as well.

Getting the File

First we need to be able to get our file by name. That's a bit of an issue because we're not validating our uploads to make sure that a file of the same name already exists. That part is being omitted because I'm a lazy bum its being left as an exercise to the reader.

So, from here we're going on the assumption that our file names are unique. First, we have to get the file by name. No big deal, just add this to the FileInfoPeer?:

  public static function retrieveByName($name) {
    $c = new Criteria();
    $c->add(FileInfoPeer::NAME, $name);
    
    return FileInfoPeer::doSelectOne($c);
    
  }

This is similar to retrieveByPK, but it isn't a PK. Again, the steps that must be taken to ensure that the file name is unique are not covered here, but in a real life application you need to account for it.

Next, we go back to our module, and we're going to create a download action:

  public function executeDownload() {
    $file_info = FileInfoPeer::retrieveByName($this->getRequestParameter('name'));    
    $this->forward404Unless($file_info);
    
    $this->file_data = $this->serve($file_info);
    
    $this->setTemplate('show');
  }

We don't have to do very much here, thanks to our protected serve() method that we wrote earlier. We just get the FileInfo? for the file we want and then let serve() do the rest. We're going to reuse our show template here to dump the actual data. Remember that you need to put serveSuccess: has_layout: Off into your view.yml file. Even though we're using the showSuccess.php file, symfony needs to have the action name in the view to prevent the layout from going across.

So at this point, we should also be able to see files via /file/download/name/drunk_and_on_fire.jpg. For the moment, this will ONLY work if you have no_script_name set to On or are in the development environment. I suggest staying in the development env until we fix this little issue.

First though lets get a cleaner URL. Remember we want something more like /download/drunk_and_on_fire.jpg, without all of the confusing module/action/param/value stuff. To accomplish this, we add a simple rule to routing.yml.

download_by_file_name:
  url:  /download/:name
  param: { module: file, action: download }

We need to put this rule at the TOP of the routing file for our application, otherwise the default symfony rules will get fired before it. With this rule in place (and in the development environment or with no_script_name set to On) we should be able to access the files that we want.

Dealing with the DOT

So our one last hurdle is to have a URL like http://mysite.com/download/my_image.jpg. The problem, however, is that mod_write on Apache or ISAPIRewrite on Windows are preventing that URL from hitting the front controller, so the routing rule that we created above never has a chance to fire.

Apache

Let's look at the .htaccess file that symfony gives us:

Options +FollowSymLinks +ExecCGI

<IfModule mod_rewrite.c>
  RewriteEngine On

  # we skip all files with .something
  RewriteCond %{REQUEST_URI} \..+$
  RewriteCond %{REQUEST_URI} !\.html$
  RewriteRule .* - [L]

  # we check if the .html version is here (caching)
  RewriteRule ^$ index.html [QSA]
  RewriteRule ^([^.]+)$ $1.html [QSA]
  RewriteCond %{REQUEST_FILENAME} !-f

  # no, so we redirect to our front web controller
  RewriteRule ^(.*)$ index.php [QSA,L]
</IfModule>

# big crash from our front web controller
ErrorDocument 500 "<h2>Application error</h2>symfony application failed to start properly"

The lines that are giving us grief are

  # we skip all files with .something
  RewriteCond %{REQUEST_URI} \..+$
  RewriteCond %{REQUEST_URI} !\.html$
  RewriteRule .* - [L]

As the comment tells us, we're skipping anything that has a dot (.) in it. I believe the intent of this was to ensure that flat files get served. However, if we comment those three lines out, we're not stopping things from going to our front controller. This could be bad because it would prevent any file in the web folder (like our CSS or JS files) to never be accessed, because we'd always hit index.php. However (and I'm no master of mod_rewrite rules) but the condition

RewriteCond %{REQUEST_FILENAME} !-f

is making sure that the file the web server thinks we're accessing exists. If it does not exist on the file server, then continue on to index.php. However, we'll only get to index if there's no file, so we don't really need those rules we commented out. **If anyone has any insight or more information about what else might they might be trying to be accomplished here please let me know. A long time ago I asked on the forum but never got an authoritative answer.**

So after commenting those three lines out, we should now have our URLs working the way we want, at least on Apache.

IIS

The IIS version of mod_rewrite is ISAPIRewrite. Unfortunately, it doesn't have the ability to detect if a file exists on the file system, so we need to go to a bit more trouble and have a less elegant solution. On Apache, we can make our 'download' folder whatever we want just by changing the routing rules within symfony. However, on IIS we're going to need to mention them in the httpd.ini file that stores the ISAPI rewrite rules.

Here's what I use:

RewriteRule /(images|download)(.*) /index.php/$1$2 [L]

The general idea is that if the URL has 'images' or 'downloads' in it, then we guarantee we go to the front controller. The $1$2 afterwards takes the parts that were matched and appends them to the end, so images/pic.jpg would match, and 'images' would be in $1 and /pic.jpg would be in $2, resulting in exactly what was sent, but prepended with /index.php. So http://mysite.com/images/pic.jpg gets translated to http://mysite.com/index.php/images/pic.jpg.

It isn't great that we need to modify our httpd.ini file every time we want to put a dot (.) in the query string in IIS, but this is the best I've come up with that can still hide the front controller. Any other ideas are MORE than welcome.

Conclusion

So, we have what we set out for: a way to store files in the database, a way to cache those files to disk to maintain performance, and a way to serve our files as we would files from the database.

Any questions can be sent to Jim Podroskey [jim@pbtg.com] or posted at the forum: http://www.symfony-project.com/forum/index.php/t/2697/ . Corrections, suggestions, and counterpoints are welcome.