I have to admit I am not much of a database guy but there is a new SQL Server 2008 feature that managed to get my attention. A lot of content management systems or even document management systems use SQL Server for storage. This is a nice concept but there are some issues. First of all is performance. It is definitely slower than holding your files on the file system. Then there was capacity. VARBINARY(MAX) is able to hold 2GB maximum. I am sure you have never seen a 2GB document, nevertheless you can never know if you will come across one. These issues are addressed in SQL Server 2008 using the FILESTREAM type. It actually holds the documents on the file system and keeps any metadata in the table itself. It is much more faster than traditional VARBINARY(MAX) columns and it has no 2GB limitation. Moreover, you can configure your server to allow remote share access to the files! In order to use FILESTREAM, you need to enable it on your server (if you haven’t done so during installation). You can achieve this with the following code:

EXEC sp_filestream_configure 
  @enable_level = 3
, @share_name = N'FILESTREAMS';

Level 3 here indicates that remote machines can access your database documents using the FILESTREAMS share name specified as @share_name. For example \MACHINENAMEFILESTREAMS should give you access to your documents in the database. Level 0 specifies disabled which is the default. Level 1 allows only T-SQL access. Level 2 allows T-SQL and local machine access only. This really reminds me the long anticipated WinFS. On the database level, we need to create a special FILESTREAM filegroup which in reality is a filegroup that points to a directory. I never managed to do this through the UI but the T-SQL is pretty easy:

ALTER DATABASE Repository
ADD FILEGROUP FGRepository CONTAINS FILESTREAM
GO
ALTER DATABASE Repository
ADD FILE
(
    NAME= 'FSRepository',
    FILENAME = 'C:repository'
)
TO FILEGROUP FGRepository
GO

This adds a new filegroup filestream to an existing database which points to the directory “c:repository”. Now for the actual table. The table must have at least one rowGuid column in order for FILESTREAM to work. Then we have a VARBINARY(MAX) column which is marked as FILESTREAM. Then we add any additional fields for our metadata if we need any:

REATE TABLE [dbo].[Documents](
    [FileID] [uniqueidentifier] ROWGUIDCOL  NOT NULL UNIQUE,
    [Filename] [nvarchar](250) NOT NULL,
    [FileContent] [varbinary](max) FILESTREAM NULL
)
GO

This creates a table with 3 columns; the FileID is RowGuid as mentioned earlier. The FileContent column is the actual FILESTREAM. As you can see there is a tiny difference than standard VARBINARY(MAX) columns, just append FILESTREAM. It is important to mention here that the ROWGUIDCOL is required for Win32 API access. Inserting data isnt as complicated; just use standard T-SQL:

INSERT INTO Documents VALUES(NEWID(),
N’SampleFile.txt’,
12,
CAST(N’Hello from FILESTREAM’ as varbinary(max)))
GO

As you can see the FILESTREAM is treated just like another column. I think the above example helps you understand the manipulation of such columns in delete or update scenarios.

When accessing files using .NET, you are actually working with the SafeFileHandle class. In order for that to work you must execute the operation using a transaction and you must also specify the full path to the file. We can retrieve the path by using the following stored procedure:

CREATE PROCEDURE GetAllFiles
AS
BEGIN
    SET NOCOUNT ON;

    SELECT FileID,[Filename],FileContent.PathName() FROM Documents
END
GO

The magic is done using .PathName() on our FILESTREAM column. This returns the full path to the file which we will use to open it.

There is only one thing left now; a code example. Once again I will ask you to read my next post since it is going to be a code example about Virtual Path Providers,  SQL Server 2008 FILESTREAM and another technology as the cherry on top.

Advertisements